All Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Go Web Scraping Quick Start Guide

You're reading from Go Web Scraping Quick Start Guide

Product type Book

Published in Jan 2019

Publisher Packt

ISBN-13 9781789615708

Pages 132 pages

Edition 1st Edition

Languages

Concepts

Author (1):

Vincent Smith

Table of Contents (10) Chapters

Preface

1. Introducing Web Scraping and Go

2. The Request/Response Cycle

3. Web Scraping Etiquette

4. Parsing HTML

5. Web Scraping Navigation

6. Protecting Your Web Scraper

7. Scraping with Concurrency

8. Scraping at 100x

9. Other Books You May Enjoy

Leave a review - let other readers know what you think

Scraping HTML pages with colly

colly is one of the available projects on GitHub that covers most of the systems discussed earlier. This project is built to run on a single machine, due to its reliance on a local cache and queuing system.

The main worker object in colly, the Collector, is built to run in its own goroutine, allowing you to run multiple Collectors simultaneously. This design offers you the ability to scrape from multiple sites at the same time with different parameters, such as crawl delays, white and blacklists, and proxies.

colly is built to only process HTML and XML files. It does not offer support for JavaScript execution. However, you would be surprised at how much information you can collect with pure HTML. The following example is adapted from the GitHub README:

package main

import (
  "github.com/gocolly/colly"
  "fmt"
)

func main() {
  c ...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Authors (1)

Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.

Read more

See other products by Vincent Smith

Other recommended products

Related to this chapter

R Web Scraping Quick Start Guide

R Web Scraping Quick Start Guide

Web scraping is a technique to extract data from websites. It simulates the behavior of a website user to turn the website itself into a web service to retrieve or introduce new data. This book gives you all you need to get started with scraping web pages using R programming.

Oct 2018 3 hours 48 minutes

Python Web Scraping

Python Web Scraping

This book is the ultimate guide to using latest features of Python 3.x to scrape data from websites. Learn right from extracting data from static web pages to creating class-based scrapers with Scrapy libraries. This book will also help you build crawlers and determine how to scrape data from JavaScript dependent website using PyQt and Selenium. You will also explore testing websites with scrapers, remote scraping, best practices, working with images and many more.

May 2017 7 hours 20 minutes

Go Standard Library Cookbook

Go Standard Library Cookbook

Google’s Golang is the next talk of the town, with amazing features and a powerful library. This book will gear you up by taking you through recipes that will teach you how to leverage the standard library to implement a particular solution. This will enable Go developers to take advantage of using a rock-solid standard library instead of third-party frameworks

Feb 2018 11 hours 20 minutes

Security with Go

Security with Go

Since Go has become enormously popular, Go's obvious advantages, like stability, speed and simplicity, make it a first class choice to develop security-oriented scripts and applications. Security with Go is a classical title for security developers, with its emphasis on Go. Based on John Leon's first mover experience, He starts out basic forensics and intrusion detection, and then switches tack from defense to attack, for example brute force attacks and host discovery. In all, this title enables you to use Go for all your security-related tasks.

Jan 2018 11 hours 20 minutes

Hands-On Web Scraping with Python

Hands-On Web Scraping with Python

Web scraping is an essential technique used in many organizations to scrape valuable data from web pages. This book will help you master web scraping techniques and methodologies using Python libraries and other popular tools such as Selenium. By the end of this book, you will have learned how to efficiently scrape different websites.

Jul 2019 11 hours 40 minutes

The Go Workshop

The Go Workshop

The Go Workshop takes you from being a novice Go programmer to a confident developer who can leverage the key features of the language to build real-world applications. This book helps you cut through excessive theory and delve into the practical features and techniques that are commonly applied to design performant, scalable applications.

Dec 2019 27 hours 28 minutes

Distributed Computing with Go

Distributed Computing with Go

To learn all of Go, a developer has to be conversant with Go concurrency and parallelism in theory and practice. Distributed Computing with Go takes the reader from concurrency using Goroutines and Channels to the full range of web and cloud environments where Go applications are usually deployed. Concurrency achieves scalability and resiliency, and Golang not only enables, but also frames development in such as a way as to give the developer a natural path towards both.

Feb 2018 8 hours 12 minutes

Personalised recommendations for you

Based on your interests and search pattern

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

Aug 2023 7 hours 40 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Generative AI with LangChain

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Dec 2023 12 hours 0 minutes

Mastering Tableau 2023

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

Aug 2023 22 hours 48 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

Sep 2023 8 hours 36 minutes

Data Engineering with AWS

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Oct 2023 21 hours 12 minutes

Modern Data Architecture on AWS

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

Aug 2023 14 hours 0 minutes

Practical Guide to Applied Conformal Prediction in Python

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

Dec 2023 8 hours 0 minutes

TinyML Cookbook

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

Nov 2023 22 hours 8 minutes