Python Web Scraping Projects

More Information
Learn
  • Model your web scraper for distributed scraping
  • Automate your web browser with the help of Selenium, Puppeteer, and Splash Lua scripting
  • Scrape the content behind authentication gates and combine values from multiple web pages
  • Discover industry trends and edge cases such as AI usage in crawler detection and crawler scaling
  • Replicate the behavior of your web apps in Python
  • Understand the processes of dealing with proxies, handling errors and bad responses, and managing big data storage
  • Integrate browser automation with a Python web scraper
About

Web scrapers are programmed to navigate through multiple web pages to extract data as per your needs. This book will cover core web scraping ideas in Python with the help of 10 interesting projects, which utilize real-world examples and varied datasets.

The book starts with an introduction to web scraping, along with guiding you through creating a basic submission scraper. Each chapter will address one end-to-end project to scrape and crawl a unique set of data. With every new project, you’ll develop your skills in using web scraping at work or in projects. You’ll also learn about synchronous and asynchronous HTTP scraping, HTML parsing and web crawler modeling and scaling. Moving ahead, you’ll cover other web-scraping-related mediums such as reverse engineering websites and JavaScript behavior that you can use in web scraping. Later, you’ll get to grips with advanced projects related to domains such as employment, sports, and eCommerce. To build on your skills, the book assists you in handling difficult AJAX requests, and scraping JavaScript-heavy pages, along with guiding you through automated web browser scraping. Finally, you’ll learn to work on unstructured data by creating powerful scrapers and crawlers.

By the end of this book, you’ll have learned how to build automated web scrapers to perform a wide range of complex tasks.

Features
  • Implement 10 interesting web scraping projects using modern Python libraries such as extruct, NLTK, spaCy, and requests
  • Perform advanced scraping operations using real-world examples and NLP techniques
  • Learn how to reverse engineer the websites you want and reproduce their results in Python
Page Count 103
Course Length 3 hours 5 minutes
ISBN 9781838648671
Date Of Publication 12 Dec 2020

Authors

Bernardas Ališauskas

Bernard is a Python developer with over 5 years of experience in data crawling - ranging from small crawlers for free open-source apps to big data and AI crawling for major companies. He is also a big contributor on StackExchange forum for web crawling subjects as well as a contributor to many web crawling FLOSS projects like parsel, scrapy and its extensions