Practical Web Scraping for Data Science: Best Practices and Examples with Python 9781484235812, 9781484235829, 1484235819, 1484235827

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossi

3,317 559 5MB

English Pages XVI, 306 Seiten 35 Illustrationen) [313] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Practical Web Scraping for Data Science: Best Practices and Examples with Python
 9781484235812, 9781484235829, 1484235819, 1484235827

Table of contents :
Table of Contents......Page 5
About the Authors......Page 8
About the Technical Reviewer......Page 9
Introduction......Page 10
Part I: Web Scraping Basics......Page 14
1.1 What Is Web Scraping?......Page 15
1.1.1 Why Web Scraping for Data Science?......Page 16
1.1.2 Who Is Using Web Scraping?......Page 17
1.2.1 Setting Up......Page 20
1.2.2 A Quick Python Primer......Page 21
2.1 The Magic of Networking......Page 36
2.2 The HyperText Transfer Protocol: HTTP......Page 39
2.3 HTTP in Python: The Requests Library......Page 45
2.4 Query Strings: URLs with Parameters......Page 50
3.1 Hypertext Markup Language: HTML......Page 60
3.2 Using Your Browser as a Development Tool......Page 62
3.3 Cascading Style Sheets: CSS......Page 67
3.4 The Beautiful Soup Library......Page 72
3.5 More on Beautiful Soup......Page 83
Part II: Advanced Web Scraping......Page 89
4.1 Working with Forms and POST Requests......Page 90
4.2 Other HTTP Request Methods......Page 106
4.3 More on Headers......Page 109
4.4 Dealing with Cookies......Page 117
4.5 Using Sessions with Requests......Page 128
4.6 Binary, JSON, and Other Forms of Content......Page 130
5.1 What Is JavaScript?......Page 136
5.2 Scraping JavaScript......Page 137
5.3 Scraping with Selenium......Page 143
5.4 More on Selenium......Page 157
6.1 What Is Web Crawling?......Page 164
6.2 Web Crawling in Python......Page 167
6.3 Storing Results in a Database......Page 170
Part III: Managerial Concerns and Best Practices......Page 182
7.1 The Data Science Process......Page 183
7.2 Where Does Web Scraping Fit In?......Page 187
7.3 Legal Concerns......Page 189
8.1.1 Alternative Python Libraries......Page 195
8.1.3 Caching......Page 196
8.1.4 Proxy Servers......Page 197
8.1.5 Scraping in Other Programming Languages......Page 198
8.1.7 Graphical Scraping Tools......Page 199
8.2 Best Practices and Tips......Page 201
Chapter 9: Examples......Page 204
9.1 Scraping Hacker News......Page 206
9.2 Using the Hacker News API......Page 208
9.3 Quotes to Scrape......Page 209
9.4 Books to Scrape......Page 213
9.5 Scraping GitHub Stars......Page 216
9.6 Scraping Mortgage Rates......Page 221
9.7 Scraping and Visualizing IMDB Ratings......Page 227
9.8 Scraping IATA Airline Information......Page 229
9.9 Scraping and Analyzing Web Forum Interactions......Page 235
9.10 Collecting and Clustering a Fashion Data Set......Page 244
9.11 Sentiment Analysis of Scraped Amazon Reviews......Page 248
9.12 Scraping and Analyzing News Articles......Page 259
9.13 Scraping and Analyzing a Wikipedia Graph......Page 278
9.14 Scraping and Visualizing a Board Members Graph......Page 285
9.15 Breaking CAPTCHA’s Using Deep Learning......Page 288
Index......Page 306

Polecaj historie