Spark: the definitive guide: big data processing made simple [First edition] 9781491912218, 9781491912201, 1491912200, 1491912219, 9781491912294, 1491912294, 9781491912300, 1491912308

Part 1. Gentle overview of big data and Spark. What is Apache Spark? -- A gentle introduction to Spark -- A tour of Spar

2,448 305 8MB

English Pages (xxvi, 576 pages) : illustrations Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Spark: the definitive guide: big data processing made simple [First edition]
 9781491912218, 9781491912201, 1491912200, 1491912219, 9781491912294, 1491912294, 9781491912300, 1491912308

Table of contents :
Part 1. Gentle overview of big data and Spark. What is Apache Spark? --
A gentle introduction to Spark --
A tour of Spark's toolset --
Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview --
Basic structured operations --
Working with different types of data --
Aggregations --
Joins --
Data sources --
Spark SQL --
Datasets --
Part 3. Low-level APIs. Resilient distributed datasets (RDDs) --
Advanced RDDs --
Distributed shared variables --
Part 4. Production applications. How Spark runs on a cluster --
Developint Spark applications --
Deploying Spark --
Monitoring and debugging --
Performance tuning --
Part 5. Streaming. Stream processing fundamentals --
Structured streaming basics --
Event-time and stateful processing --
Structured streaming in production --
Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview --
Preprocessing and feature engineering --
Classification --
Regression --
Recommendation --
Unsupervised learning --
Graph analytics --
Deep learning --
Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr) --
Ecosystem and community.

Polecaj historie