Hands-On Web Scraping with Python: Extract quality data from the web using effective Python techniques [2 ed.] 9781837636211

Work through practical examples to unlock the full potential of web scraping with Python and gain valuable insights from

483 69 16MB

English Pages 395 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Hands-On Web Scraping with Python: Extract quality data from the web using effective Python techniques [2 ed.]
 9781837636211

Table of contents :
Hands-On Web Scraping with Python
Contributors
About the author
About the reviewers
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Share Your Thoughts
Download a free PDF copy of this book
Part 1:Python and Web Scraping
1
Web Scraping Fundamentals
Technical requirements
What is web scraping?
Understanding the latest web technologies
HTTP
HTML
XML
JavaScript
CSS
Data-finding techniques used in web pages
HTML source page
Developer tools
Summary
Further reading
2
Python Programming for Data and Web
Technical requirements
Why Python (for web scraping)?
Accessing the WWW with Python
Setting things up
Creating a virtual environment
Installing libraries
Loading URLs
URL handling and operations
requests – Python library
Implementing HTTP methods
GET
POST
Summary
Further reading
Part 2:Beginning Web Scraping
3
Searching and Processing Web Documents
Technical requirements
Introducing XPath and CSS selectors to process markup documents
The Document Object Model (DOM)
XPath
CSS selectors
Using web browser DevTools to access web content
HTML elements and DOM navigation
XPath and CSS selectors using DevTools
Scraping using lxml – a Python library
lxml by example
Web scraping using lxml
Parsing robots.txt and sitemap.xml
The robots.txt file
Sitemaps
Summary
Further reading
4
Scraping Using PyQuery, a jQuery-Like Library for Python
Technical requirements
PyQuery overview
Introducing jQuery
Exploring PyQuery
Installing PyQuery
Loading a web URL
Element traversing, attributes, and pseudo-classes
Iterating using PyQuery
Web scraping using PyQuery
Example 1 – scraping book details
Example 2 – sitemap to CSV
Example 3 – scraping quotes with author details
Summary
Further reading
5
Scraping the Web with Scrapy and Beautiful Soup
Technical requirements
Web parsing using Python
Introducing Beautiful Soup
Installing Beautiful Soup
Exploring Beautiful Soup
Web scraping using Beautiful Soup
Web scraping using Scrapy
Setting up a project
Creating an item
Implementing the spider
Exporting data
Deploying a web crawler
Summary
Further reading
Part 3:Advanced Scraping Concepts
6
Working with the Secure Web
Technical requirements
Exploring secure web content
Form processing
Cookies and sessions
User authentication
HTML processing using Python
User authentication and cookies
Using proxies
Summary
Further reading
7
Data Extraction Using Web APIs
Technical requirements
Introduction to web APIs
Types of API
Benefits of web APIs
Data formats and patterns in APIs
Example 1 – sunrise and sunset
Example 2 – GitHub emojis
Example 3 – Open Library
Web scraping using APIs
Example 1 – holidays from the US calendar
Example 2 – Open Library book details
Example 3 – US cities and time zones
Summary
Further reading
8
Using Selenium to Scrape the Web
Technical requirements
Introduction to Selenium
Advantages and disadvantages of Selenium
Use cases of Selenium
Components of Selenium
Using Selenium WebDriver
Setting things up
Exploring Selenium
Scraping using Selenium
Example 1 – book information
Example 2 – forms and searching
Summary
Further reading
9
Using Regular Expressions and PDFs
Technical requirements
Overview of regex
Regex with Python
re (search, match, and findall)
re.split
re.sub
re.compile
Regex flags
Using regex to extract data
Example 1 – Yamaha dealer information
Example 2 – data from sitemap
Example 3 – Godfrey’s dealer
Data extraction from a PDF
The PyPDF2 library
Extraction using PyPDF2
Summary
Further reading
Part 4:Advanced Data-Related Concepts
10
Data Mining, Analysis, and Visualization
Technical requirements
Introduction to data mining
Predictive data mining
Descriptive data mining
Handling collected data
Basic file handling
JSON
CSV
SQLite
Data analysis and visualization
Exploratory Data Analysis using ydata_profiling
pandas and plotly
Summary
Further reading
11
Machine Learning and Web Scraping
Technical requirements
Introduction to ML
ML and Python programming
Types of ML
ML using scikit-learn
Simple linear regression
Multiple linear regression
Sentiment analysis
Summary
Further reading
Part 5:Conclusion
12
After Scraping – Next Steps and Data Analysis
Technical requirements
What happens after scraping?
Web requests
pycurl
Proxies
Data processing
PySpark
polars
Jobs and careers
Summary
Further reading
Index
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Download a free PDF copy of this book

Polecaj historie