Packt+ | Advance your knowledge in tech

You're reading from Bash Cookbook Leverage Bash scripting to automate daily tasks and improve productivity

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781788629362

Length 264 pages

Edition 1st Edition

Languages

Bash

Tools

Bash

Concepts

Workflow Automation

Authors (2):

Brash

Ganesh Sanjiv Naik

View More author details

Table of Contents (10) Chapters

Preface

1. Crash Course in Bash

2. Acting Like a Typewriter and File Explorer FREE CHAPTER

3. Understanding and Gaining File System Mastery

4. Making a Script Behave Like a Daemon

5. Scripts for System Administration Tasks

6. Scripts for Power Users

7. Writing Bash to Win and Profit

8. Advanced Scripting Techniques

9. Other Books You May Enjoy

Leave a review - let other readers know what you think

Scraping the web and collecting files

In this recipe, we will learn how to collect data by web scraping. We will write a script for that.

Getting ready

Besides having a Terminal open, you need to have basic knowledge of the grep and wget commands.

How to do it…

Now, we will write a script to scrape the contents from imdb.com. We will use the grep and wget commands in the script to get the contents. Create a scrap_contents.shscript and write the following code in it:

$ mkdir -p data
$ cd data
$ wget -q -r -l5 -x 5  https://imdb.com
$ cd ..
$ grep -r -Po -h '(?<=href=")[^"]*' data/ > links.csv
$ grep "^http" links.csv > links_filtered.csv
$ sort -u links_filtered.csv > links_final.csv
$ rm -rf data links.csv links_filtered.csv

How it works…

In the preceding script, we have written code to get contents from a website. The wget utility is used for retrieving files from the web using the http, https, and ftp protocols. In this example, we are getting data from imdb.com and therefore we specified...