Reader small image

You're reading from  Go Web Scraping Quick Start Guide

Product typeBook
Published inJan 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789615708
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Vincent Smith
Vincent Smith
author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith

Right arrow

What is a robots.txt file?

Most of the pages on a website are free to be accessed by web scrapers and bots. Some of the reasons for allowing this are in order to be indexed by search engines or to allow pages to be discovered by content curators. Googlebot is one of the tools that most websites would be more than happy to give access to their content. However, there are some sites that may not want everything to show up in a Google search result. Imagine if you could google a person and instantly obtain all of their social media profiles, complete with contact information and address. This would be bad news for the person, and certainly not a good privacy policy for the company hosting the site. In order to control access to different parts of a website, you would configure a robots.txt file.

The robots.txt file is typically hosted at the root of the website in the /robots.txt...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Go Web Scraping Quick Start Guide
Published in: Jan 2019Publisher: PacktISBN-13: 9781789615708

Author (1)

author image
Vincent Smith

Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
Read more about Vincent Smith