Reader small image

You're reading from  Go Web Scraping Quick Start Guide

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789615708
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Vincent Smith
Vincent Smith
author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith

Right arrow

Avoiding loops

If you are building a web scraper that follows links, you might need to be aware of which pages you've already visited. It's quite possible that a page you are visiting contains a link to a page you have already visited, sending you into an infinite loop. Therefore, it is very important to build a tracking system into your scraper that records its history.

The simplest data structure for storing a unique collection of items would be a set. The Go standard library does not have a set data structure, but it can be emulated by using a map[string]interface{}{}.

An interface{} in Go is a generic object, similar to java.lang.Object.

In Go, you can define a map as follows:

visitedMap := map[string]interface{}{}

In this case, we would use the visited URL as the key, and anything you want as the value. We will just use nil, because as long as the key is present...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Go Web Scraping Quick Start Guide
Published in: Jan 2019Publisher: PacktISBN-13: 9781789615708

Author (1)

author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith