Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs 9781838644130, 9781786463708, 9781788835367, 183864413X

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily

1,015 167 5MB

English Pages 182 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Hands-On Big Data Analytics with PySpark: Analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
 9781838644130, 9781786463708, 9781788835367, 183864413X

Table of contents :
Table of ContentsInstalling Pyspark and Setting up Your Development EnvironmentGetting Your Big Data into the Spark Environment Using RDDsBig Data Cleaning and Wrangling with Spark NotebooksAggregating and Summarizing Data into Useful ReportsPowerful Exploratory Data Analysis with MLlibPutting Structure on Your Big Data with SparkSQLTransformations and ActionsImmutable DesignAvoiding Shuffle and Reducing Operational ExpensesSaving Data in the Correct FormatWorking with the Spark Key/Value APITesting Apache Spark JobsLeveraging the Spark GraphX API

Polecaj historie