Paperback. You'll LearnUnderstand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability.

Bestandsnummer des Verk�ufers 9798290030715

Verk�ufer kontaktieren

Diesen Artikel melden

Bibliografische Details

Titel

Big Data With Pyspark (Paperback)

Verlag

Independently Published

Erscheinungsjahr

2025

Sprache

Englisch

ISBN-13

9798290030715

Einband

Paperback

Zustand

new

Verk�uferkataloge

�ber diesen Titel

Inhaltsangabe

You'll Learn

Understand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.
Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.
Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.
Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.
Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.
Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).
Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.
Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.
Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.
Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.
Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.
Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio.

��ber diesen Titel� kann sich auf eine andere Ausgabe dieses Titels beziehen.

Anbieterinformationen

Online business

Zur Homepage des Verk�ufers

Unternehmensdaten des Verk�ufers

ABC BOOKS LIMITED
10 John Street, London, WC1N 2EB, United Kingdom

Gesch�ftsbedingungen und Versandinformationen

Verkaufsbedingungen

Orders can be returned within 30 days of receipt.

Widerrufsbelehrung

Versandbedingungen

Please note that titles are dispatched from our US, Canadian or Australian warehouses. Delivery times specified in shipping terms. Orders ship within 2 business days. Delivery to your door then takes 7-14 days.

Versandkosten von Vereinigtes K�nigreich nach USA

Versandkosten von Vereinigtes K�nigreich nach USA
Bestellmenge	7 bis 60�Werktage	7 bis 14�Werktage
Erster Artikel	EUR 42.61	EUR 42.61

Die Versandzeiten werden von den Verk�uferinnen und Verk�ufern festgelegt. Sie variieren je nach Versanddienstleister und Standort. Sendungen, die den Zoll passieren, k�nnen Verz�gerungen unterliegen. Eventuell anfallende Abgaben oder Geb�hren sind von der K�uferin bzw. dem K�ufer zu tragen. Die Verk�uferin bzw. der Verk�ufer kann Sie bez�glich zus�tzlicher Versandkosten kontaktieren, um einen m�glichen Anstieg der Versandkosten f�r Ihre Artikel auszugleichen.