Reader small image

You're reading from  Python Web Scraping. - Second Edition

Product typeBook
Published inMay 2017
Reading LevelIntermediate
Publisher
ISBN-139781786462589
Edition2nd Edition
Languages
Concepts
Right arrow
Author (1)
Katharine Jarmul
Katharine Jarmul
author image
Katharine Jarmul

Katharine Jarmul is a data scientist and Pythonista based in Berlin, Germany. She runs a data science consulting company, Kjamistan, that provides services such as data extraction, acquisition, and modelling for small and large companies. She has been writing Python since 2008 and scraping the web with Python since 2010, and has worked at both small and large start-ups who use web scraping for data analysis and machine learning. When she's not scraping the web, you can follow her thoughts and activities via Twitter (@kjam)
Read more about Katharine Jarmul

Right arrow

Performance

To further understand how increasing the number of threads and processes affects the time required when downloading, here is a table of results for crawling 500 web pages:

Script Number of threads Number of processes Time Comparison with sequential Errors Seen?
Sequential 1 1 1349.798s 1 N
Threaded 5 1 361.504s 3.73 N
Threaded 10 1 275.492s 4.9 N
Threaded 20 1 298.168s 4.53 Y
Processes 2 2 726.899s 1.86 N
Processes 2 4 559.93s 2.41 N
Processes 2 8 451.772s 2.99 Y
Processes 5 2 383.438s 3.52 N
Processes 5 4 156.389s 8.63 Y
Processes 5 8 296.610s 4.55 Y

The fifth column shows the proportion of time in comparison to the base case of sequential downloading. We can see that the increase in performance is not linearly proportional to the number of threads and processes but appears logarithmic, that is, until adding more threads actually decreases performance. For example, one process...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Python Web Scraping. - Second Edition
Published in: May 2017Publisher: ISBN-13: 9781786462589

Author (1)

author image
Katharine Jarmul

Katharine Jarmul is a data scientist and Pythonista based in Berlin, Germany. She runs a data science consulting company, Kjamistan, that provides services such as data extraction, acquisition, and modelling for small and large companies. She has been writing Python since 2008 and scraping the web with Python since 2010, and has worked at both small and large start-ups who use web scraping for data analysis and machine learning. When she's not scraping the web, you can follow her thoughts and activities via Twitter (@kjam)
Read more about Katharine Jarmul