Instant Web Scraping with Java

Build simple scrapers or vast armies of Java-based bots to untangle and capture the Web

Instant Web Scraping with Java

Ryan Mitchell

Build simple scrapers or vast armies of Java-based bots to untangle and capture the Web
Mapt Subscription
FREE
$29.99/m after trial
eBook
$10.50
RRP $14.99
Save 29%
What do I get with a Mapt Pro subscription?
  • Unlimited access to all Packt’s 5,000+ eBooks and Videos
  • Early Access content, Progress Tracking, and Assessments
  • 1 Free eBook or Video to download and keep every month after trial
What do I get with an eBook?
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with Print & eBook?
  • Get a paperback copy of the book delivered to you
  • Download this book in EPUB, PDF, MOBI formats
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
What do I get with a Video?
  • Download this Video course in MP4 format
  • DRM FREE - read and interact with your content when you want, where you want, and how you want
  • Access this title in the Mapt reader
$0.00
$10.50
$29.99p/m after trial
RRP $14.99
Subscription
eBook
Start 30 Day Trial
Subscribe and access every Packt eBook & Video.
 
  • 5,000+ eBooks & Videos
  • 50+ New titles a month
  • 1 Free eBook/Video to keep every month
Start Free Trial
 
Preview in Mapt

Book Details

ISBN 139781849696883
Paperback72 pages

Book Description

Java is often thought of as a stuffy enterprise language, while web scraping is the often-murky domain of scripting languages. By combining the robustness and extensibility of Java with the flexibility and power of web scraping, we can create immensely useful tools that can solve very difficult problems.

Instant Web Scraping with Java will guide you, step by step, through setting up your Java environment. You will also learn how to write simple web scrapers and distributed networks of crawlers. Throughout the book, we will provide useful tips, out-of-the-box working code, and additional resources to build expert knowledge.

Instant Web Scraping with Java will teach how to build your own web scrapers using real-world scraping examples that collect and store data from Wikipedia, public records data sites, IP address geolocation services, and more. You will learn how to run scrapers across multiple servers, run them in parallel, and subvert common methods of anti-scraper security used on modern websites. This book will also provide you with detailed step-by-step instructions, out-of-the-box working code, and expert pointers to further resources on key topics.

Instant Web Scraping with Java will show you how to view and collect any Internet data at the speed of your processor!

Table of Contents

Chapter 1: Instant Web Scraping with Java
How is this legal?
Setting up your Java Environment (Simple)
Writing and executing HelloWorld.java (Simple)
Writing a simple scraper (Simple)
Writing more complicated scraper (Intermediate)
Handling errors (Simple)
Writing robust, scalable code (Advanced)
Persisting data (Advanced)
Writing tests (Intermediate)
Going undercover (Intermediate)
Submitting a basic form (Advanced)
Scraping Ajax Pages (Advanced)
Faster scraping through threading (Intermediate)
Faster scraping with RMI (Advanced)

What You Will Learn

  • Set up your Java environment and work with the Eclipse IDE
  • Execute complicated web crawlers that run without intervention
  • Handle errors, documentation, and writing robust code
  • Log scraped data for later retrieval and analysis
  • Write code to test website content and functionality with the JUnit framework
  • Learn techniques for getting around website security, designed to prevent automated scraping
  • Fill and submit forms automatically
  • Use threading to run scrapers in parallel
  • Use Java’s Remote Machine Invocation to create multi-server distributed scrapers

Authors

Table of Contents

Chapter 1: Instant Web Scraping with Java
How is this legal?
Setting up your Java Environment (Simple)
Writing and executing HelloWorld.java (Simple)
Writing a simple scraper (Simple)
Writing more complicated scraper (Intermediate)
Handling errors (Simple)
Writing robust, scalable code (Advanced)
Persisting data (Advanced)
Writing tests (Intermediate)
Going undercover (Intermediate)
Submitting a basic form (Advanced)
Scraping Ajax Pages (Advanced)
Faster scraping through threading (Intermediate)
Faster scraping with RMI (Advanced)

Book Details

ISBN 139781849696883
Paperback72 pages
Read More

Read More Reviews

Recommended for You

Mastering Web Application Development with AngularJS Book Cover
Mastering Web Application Development with AngularJS
$ 26.99
$ 5.40
Practical Data Analysis Book Cover
Practical Data Analysis
$ 29.99
$ 21.00
Building Machine Learning Systems with Python Book Cover
Building Machine Learning Systems with Python
$ 29.99
$ 6.00
Git: Version Control for Everyone Book Cover
Git: Version Control for Everyone
$ 23.99
$ 16.80
AngularJS Web Application Development Blueprints Book Cover
AngularJS Web Application Development Blueprints
$ 29.99
$ 21.00
Responsive Web Design with HTML5 and CSS3 Book Cover
Responsive Web Design with HTML5 and CSS3
$ 23.99
$ 4.80