Python Web Scraping [2nd ed] 9781786464293, 1786464292

Chapter 5: Dynamic Content ; An example dynamic web page; Reverse engineering a dynamic web page; Edge cases; Rendering

990 188 6MB

English Pages 215 Year 2017

Python Web Scraping [2nd ed]
9781786464293, 1786464292

Author / Uploaded
Richard Lawson
Katharine Jarmul

Table of contents :
Content: Cover
Credits
Copyright
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Introduction to Web Scraping
When is web scraping useful?
Is web scraping legal?
Python 3
Background research
Checking robots.txt
Examining the Sitemap
Estimating the size of a website
Identifying the technology used by a website
Finding the owner of a website
Crawling your first website
Scraping versus crawling
Downloading a web page
Retrying downloads
Setting a user agent
Sitemap crawler
ID iteration crawler
Link crawlers. Advanced featuresParsing robots.txt
Supporting proxies
Throttling downloads
Avoiding spider traps
Final version
Using the requests library
Summary
Chapter 2: Scraping the Data
Analyzing a web page
Three approaches to scrape a web page
Regular expressions
Beautiful Soup
Lxml
CSS selectors and your Browser Console
XPath Selectors
LXML and Family Trees
Comparing performance
Scraping results
Overview of Scraping
Adding a scrape callback to the link crawler
Summary
Chapter 3: Caching Downloads
When to use caching?
Adding cache support to the link crawler
Disk Cache. Implementing DiskCacheTesting the cache
Saving disk space
Expiring stale data
Drawbacks of DiskCache
Key-value storage cache
What is key-value storage?
Installing Redis
Overview of Redis
Redis cache implementation
Compression
Testing the cache
Exploring requests-cache
Summary
Chapter 4: Concurrent Downloading
One million web pages
Parsing the Alexa list
Sequential crawler
Threaded crawler
How threads and processes work
Implementing a multithreaded crawler
Multiprocessing crawler
Performance
[Python multiprocessing and the GIL]
Python multiprocessing and the GIL. Optical character recognitionFurther improvements
Solving complex CAPTCHAs
Using a CAPTCHA solving service
Getting started with 9kw
The 9kw CAPTCHA API
Reporting errors
Integrating with registration
CAPTCHAs and machine learning
Summary
Chapter 8: Scrapy
Installing Scrapy
Starting a project
Defining a model
Creating a spider
Tuning settings
Testing the spider
Different Spider Types
Scraping with the shell command
Checking results
Interrupting and resuming a crawl
Scrapy Performance Tuning
Visual scraping with Portia
Installation
Annotation
Running the Spider.

Polecaj historie

Python Web Scraping: Hands-on data scraping and crawling using PyQT, Selnium, HTML and Python, 2nd Edition 9781786462589, 1786462583

Successfully scrape data from any website with the power of Python 3.x Key FeaturesA hands-on guide to web scraping usin

188 37 18MB Read more

Python Web Scraping Cookbook 9781787285217, 1787285219

2,408 667 8MB Read more

A Python Guide for Web Scraping 9789390684991

910 214 2MB Read more

Web scraping with Python: collecting more data from the modern web [2nd edition] 9781491985571, 1491985577

If programming is magic then web scraping is surely a form of wizardry. By writing a simple automated program, you can q

4,531 592 5MB Read more

Python Web Scraping, Second Edition [2 ed.] 1786462583, 9781786462589

Successfully scrape data from any website with the power of Python 3.xAbout This Book* A hands-on guide to web scraping

892 154 6MB Read more

Web Scraping with Python [2 ed.] 9781491985564

1,760 281 4MB Read more

Python Web Scraping, Second Edition. Code 1786462583, 9781786462589

Code. Successfully scrape data from any website with the power of Python 3.xAbout This Book* A hands-on guide to web sc

1,632 293 248KB Read more

Python Web Scraping Cookbook 9781787285217, 9781787126787, 9781787121485, 1787285219

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrap

1,668 254 9MB Read more

Python Web Scraping: Hands-on data scraping and crawling using PyQT, Selnium, HTML and Python [2 ed.] 1786462583, 9781786462589

Successfully scrape data from any website with the power of Python 3.x Key FeaturesA hands-on guide to web scraping usin

1,254 255 7MB Read more

Python Web Scraping Cookbook 9781787285217, 9781787126787, 9781787121485, 1787285219

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrap

659 82 9MB Read more