Part 1. Gentle overview of big data and Spark. What is Apache Spark? -- A gentle introduction to Spark -- A tour of Spar
2,448 305 8MB
English Pages (xxvi, 576 pages) : illustrations Year 2018
Table of contents :
Part 1. Gentle overview of big data and Spark. What is Apache Spark? --
A gentle introduction to Spark --
A tour of Spark's toolset --
Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview --
Basic structured operations --
Working with different types of data --
Aggregations --
Joins --
Data sources --
Spark SQL --
Datasets --
Part 3. Low-level APIs. Resilient distributed datasets (RDDs) --
Advanced RDDs --
Distributed shared variables --
Part 4. Production applications. How Spark runs on a cluster --
Developint Spark applications --
Deploying Spark --
Monitoring and debugging --
Performance tuning --
Part 5. Streaming. Stream processing fundamentals --
Structured streaming basics --
Event-time and stateful processing --
Structured streaming in production --
Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview --
Preprocessing and feature engineering --
Classification --
Regression --
Recommendation --
Unsupervised learning --
Graph analytics --
Deep learning --
Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr) --
Ecosystem and community.