Big Data With Pyspark (Paperback)
Pythquill Publishing
Verkauft von CitiRetail, Stevenage, Vereinigtes Königreich
AbeBooks-Verkäufer seit 29. Juni 2022
Neu - Softcover
Zustand: Neu
Anzahl: 1 verfügbar
In den Warenkorb legenVerkauft von CitiRetail, Stevenage, Vereinigtes Königreich
AbeBooks-Verkäufer seit 29. Juni 2022
Zustand: Neu
Anzahl: 1 verfügbar
In den Warenkorb legenPaperback. You'll LearnUnderstand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability.
Bestandsnummer des Verkäufers 9798290030715
You'll Learn
Understand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.
Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.
Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.
Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.
Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.
Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).
Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.
Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.
Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.
Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.
Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.
Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio.
„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.
Orders can be returned within 30 days of receipt.
Please note that titles are dispatched from our US, Canadian or Australian warehouses. Delivery times specified in shipping terms. Orders ship within 2 business days. Delivery to your door then takes 7-14 days.
Bestellmenge | 7 bis 60 Werktage | 7 bis 14 Werktage |
---|---|---|
Erster Artikel | EUR 42.61 | EUR 42.61 |
Die Versandzeiten werden von den Verkäuferinnen und Verkäufern festgelegt. Sie variieren je nach Versanddienstleister und Standort. Sendungen, die den Zoll passieren, können Verzögerungen unterliegen. Eventuell anfallende Abgaben oder Gebühren sind von der Käuferin bzw. dem Käufer zu tragen. Die Verkäuferin bzw. der Verkäufer kann Sie bezüglich zusätzlicher Versandkosten kontaktieren, um einen möglichen Anstieg der Versandkosten für Ihre Artikel auszugleichen.