Reader small image

You're reading from  Go Web Scraping Quick Start Guide

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789615708
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Vincent Smith
Vincent Smith
author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith

Right arrow

Preface

The internet is a place full of interesting information and insights just waiting to be gleaned. Much like golden nuggets, these fragmented pieces of data can be collected, filtered, combined, and refined to produce extremely valuable products. Armed with the right knowledge, skills, and a little creativity, you can build a web scraper that can power multi-billion-dollar companies. To support this, you need to use the best tools for the job, starting with a programming language built for speed, simplicity, and safety.

The Go programming language combines the best ideas from its predecessors and cutting-edge ideology, leaving out the unnecessary fluff, to produce a razor-sharp set of tools and clean architecture. With the Go standard library and projects from open source contributors, you have everything you need to build a web scraper of any size.

Who this book is for

This book is for anyone with a little coding experience who is curious about how to build a web scraper that is fast and efficient.

What this book covers

Chapter 1, Introducing Web Scraping and Go, explains what web scraping is and how to install the Go programming language and tools.

Chapter 2, The Request/Response Cycle, outlines the structure of HTTP requests and responses, and explains how to use Go to make and process them.

Chapter 3, Web Scraping Etiquette, explains how to build a web scraper that uses best practices and recommendations for crawling the web efficiently, while respecting others.

Chapter 4, Parsing HTML, shows how to use various tools to parse information from HTML pages.

Chapter 5, Web Scraping Navigation, demonstrates the best ways to navigate websites efficiently.

Chapter 6, Protecting Your Web Scraper, explains how to use various tools to navigate through the internet safely and securely.

Chapter 7, Scraping with Concurrency, introduces the Go concurrency model and explains how to build a productive web scraper.

Chapter 8, Scraping at 100x, provides a blueprint for building a large-scale web scraper and provides some examples from the open source community.

To get the most out of this book

In order to get the most from this book, you should familiarize yourself with your Terminal or Command Prompt, ensure you have a good internet connection, and read each chapter, even if you think you already know it. The readers of this book should keep an open mind as to how they think a web scraper should act, and they should learn the current best practices and proper etiquette. This book also focuses on the Go programming language, covering the installation, basic commands, the standard library, and package management, so some familiarity with Go will be helpful as this book covers the language in a broad sense and only goes into the depth that is needed for web scraping. To be able to run most of the code in this book, the reader should be familiar with their Terminal or Command Prompt in order to run the examples, among other tasks.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com
  2. Select the SUPPORT tab
  3. Click on Code Downloads & Errata
  4. Enter the name of the book in the Search box and follow the onscreen instructions

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Go-Web-Scraping-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "This is using the net/http package's default HTTP client to request the index.html resource."

A block of code is set as follows:

POST /login HTTP/1.1
Host: myprotectedsite.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 38

username=myuser&password=supersecretpw

Any command-line input or output is written as follows:

go run main.go

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "In this case, you will receive a status code of 500 Internal Server Error."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Go Web Scraping Quick Start Guide
Published in: Jan 2019Publisher: PacktISBN-13: 9781789615708
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith