Speeding up web scraping
Most of the time spent downloading information from web pages is usually spent waiting. A request goes from our computer to the remote server to process it, and until the response is composed and comes back to our computer, we cannot do much about it.
During the execution of the recipes in the book, you'll notice there's a wait involved in requests calls, normally of around one or two seconds. But computers can do other stuff while waiting, including making more requests at the same time. In this recipe, we will see how to download a list of pages in parallel and wait until they are all ready. We will use an intentionally slow server to show why it's worth getting this right.
Getting ready
We'll get code to crawl and search for keywords, making use of the futures capabilities of Python 3 to download multiple pages at the same time.
A future is an object that represents the promise of a value. This means that you immediately...
 
                                             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
     
         
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                