Python Requests Essentials

5 (1 reviews total)
By Rakesh Vidya Chandra , Bala Subrahmanyam Varanasi
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Interacting with the Web Using Requests

About this book

Python is one of the most popular programming languages of our era; the Python Requests library is one of the world's best clients, with the highest number of downloads. It allows hassle-free interactions with web applications using simple procedures.

You will be shown how to mock HTTP Requests using HTTPretty, and will learn to interact with social media using Requests. This book will help you to grasp the art of web scraping with the BeautifulSoup and Python Requests libraries, and will then paddle you through Requests impressive ability to interact with APIs. It will empower you with the best practices for seamlessly drawing data from web apps. Last but not least, you will get the chance to polish your skills by implementing a RESTful Web API with Python and Flask!

Publication date:
June 2015
Publisher
Packt
Pages
134
ISBN
9781784395414

 

Chapter 1. Interacting with the Web Using Requests

Reading data and obtaining information from web services tends to be a crucial task in these modern days. Everyone knows how an Application Programming Interface (API) allowed Facebook to spread the use of the Like button all over the Web and dominated the field of social communication. It has got its own flair to influence the business development, product development and supply chain management. At this stage, learning an efficient way to deal with the API's and opening the web URLs is the need of the hour. This will greatly affect many processes of web development.

 

Introduction to HTTP request


Whenever our Web browser tries communicating with a Web server, it is done by using the Hypertext Transfer Protocol (HTTP) which functions as a request-response protocol. In this process of communication, we send a request to the web server and expect a response in return. Take an example of downloading a PDF from a website. We send a request saying "Get me this specific file", and we get a response from the Web server with "Here is the file followed by the file itself". The HTTP request we are sending possibly has much interesting information. Let us dig inside it.

Here is the raw information of the HTTP request, that I have sent through my device. We can grasp the important parts of the request after looking at the following example:

* Connected to google.com (74.125.236.35) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: google.com
> Accept: */*
>
< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=UTF-8
< Location: http://www.google.co.in/?gfe_rd=cr&ei=_qMUVKLCIa3M8gewuoCYBQ
< Content-Length: 261
< Date: Sat, 13 Sep 2014 20:07:26 GMT
* Server GFE/2.0 is not blacklisted
< Server: GFE/2.0
< Alternate-Protocol: 80:quic,p=0.002

Now, we will send a request to the server. Let us make use of these parts of the HTTP request:

  • Method: The GET / http /1.1 in the preceding example, is the HTTP method which is case sensitive. Here are some of the HTTP request methods:

    • GET: This fetches information from the given server using the given URI.

    • HEAD: The functionality of this is similar to GET but the difference is, it delivers only the status line and header section.

    • POST: This can submit data to the server that we wish to process.

    • PUT: This creates or overwrites all the current representations of the target resource, when we intend to create a new URL.

    • DELETE: This removes all the resources that are described by the given Request-URI.

    • OPTIONS: This specifies the communication options for a request/response cycle. It lets the client to mention different options associated with the resource.

  • Request URI: Uniform Resource Identifier (URI) has the ability to recognize the name of the resource. In the previous example, the hostname is the Request-URI.

  • Request Header fields: If we want to add more information about the request, we can use the requests header fields. They are colon-separated key value pairs. Some of the request-headers values are:

    • Accept-Charset: This is used to indicate the character sets that are acceptable for the response.

    • Authorization: This contains the value of the credentials which has the authentication information of the user agent.

    • Host: This identifies the Internet host and port number of the resource that has been requested, using the original URI given by the user.

    • User-agent: It accommodates information about the user agent that originates the request. This can be used for statistical purposes such as tracing the protocol violations.

 

Python modules


There are some extensively used Python modules which help in opening URLs. Let us have a look at them:

  • httplib2: This is a comprehensive HTTP client library. It supports many features that are left out of other HTTP libraries. It supports features like caching, keep-alive, compression, redirects and many kinds of authentication.

  • urllib2: This is an extensively used module for fetching HTTP URLs in a complex world. It defines functions and classes that help with URL actions such as basic and digest authentication, redirections, cookies, and so on.

  • Requests: This is an Apache2 licensed HTTP library which is written in Python, gifted with many capabilities to result in productivity.

 

Requests versus urllib2


Let's compare urllib2 and Requests; urllib2.urlopen(), which can be used to open a URL (which can be a string or a request object), but there are many other things that can be a burden while interacting with the web. At this point, a simple HTTP library which has the capabilities to make interaction with the web smooth is the need of the hour, and Requests is one of its kind.

The following is an example for fetching the data from a web service with urllib2 and Requests gives us a clear picture of how easy it is to work with Requests:

The following code gives an example of urllib2:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import urllib2
 
gh_url = 'https://api.github.com'
 
req = urllib2.Request(gh_url)
 
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, 'user', 'pass')
 
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
 
urllib2.install_opener(opener)
 
handler = urllib2.urlopen(req)
 
print handler.getcode()
print handler.headers.getheader('content-type')
 
# ------
# 200
# 'application/json'

The same example implemented with Requests:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import requests
 
r = requests.get('https://api.github.com', auth=('user', 'pass'))
 
print r.status_code
print r.headers['content-type']
 
# ------
# 200
# 'application/json'

These examples can be found at https://gist.github.com/kennethreitz/973705.

At this initial stage, the example may look much complicated. Don't go deep into the details of the example. Just see the beauty of requests that allowed us to login to GitHub with very few lines of code. The code with requests seems much simpler and efficient than the urllib2 example. This would help us increase the productivity in all sorts of things.

 

Essence of Requests


As with HTTP/1.0, HTTP/1.1 has a lot of perks and added features like reusing a connection multiple times which decreases the considerable overhead, keep-alive mechanism, and so on. And fortunately, requests is built from it, giving us the benefits of interacting with the web smoothly and seamlessly. There is no need to manually add query strings to our URLs, or to encode our POST data. Keep-alive and HTTP connection pooling are 100 percent automatic, powered by urllib3, which is embedded within requests. With requests we are gifted with a means to forget about encoding parameters again and again, irrespective of whether it is GET/POST.

There is no requirement for manually adding query strings to the URLs, and also to the features such as connection pooling keep-alive, sessions with cookie persistence, Basic/Digest Authentication, Browser-style SSL Verification, Connection Timeouts, Multipart File Uploads, and so on.

 

Making a simple request


Now let us create our first request for getting a web page, which is very simple. The process includes importing the requests module, and then getting the web page with the get method. Let us look into an example:

>>> import requests
>>> r =  requests.get('http://google.com')

Voila! We are done.

In the preceding example, we get the google webpage, using requests.get and saving it in the variable r, which turns out to be the response object. The response object r contains a lot of information about the response, such as header information, content, type of encoding, status code, URL information and many more sophisticated details.

In the same way, we can use all the HTTP request methods like GET, POST, PUT, DELETE, HEAD with requests.

Now let us learn how to pass the parameters in URLs. We can add the parameters to a request using using the params keyword.

The following is the syntax used for passing parameters:

parameters = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('url', params=parameters)

For getting a clear picture on this, let us get a GitHub user details by logging into GitHub, using requests as shown in the following code:

>>> r = requests.get('https://api.github.com/user', auth=('myemailid.mail.com', 'password'))
>>> r.status_code
200
>>> r.url
u'https://api.github.com/user'
>>> r.request
<PreparedRequest [GET]>

We have used the auth tuple which enables Basic/Digest/Custom Authentication to login to GitHub and get the user details. The r.status_code result indicates that we have successfully got the user details, and also that we have accessed the URL, and the type of request.

 

Response content


Response content is the information about the server's response that is delivered back to our console when we send a request.

While interacting with the web, it's necessary to decode the response of the server. While working on an application, there are many cases in which we may have to deal with the raw, or JSON, or even binary response. For this, requests has the capability to automatically decode the content from the server. Requests can smoothly decode many of the Unicode charsets. To add to that, Requests makes informed guesses about the encoding of the response. This basically happens taking the headers into consideration.

If we access the value of r.content, it results us the response content in a raw string format. And if we access r.text, the Requests library encodes the response (r.content value) using r.encoding and returns a new encoding string. In case, if the value of r.encoding is None, Requests assumes the encoding type using r.apparent_encoding, which is provided by the chardet library.

We can access the server's response content in the following way:

>>> import requests
>>> r = requests.get('https://google.com')
>>> r.content
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" …..'
>>> type(r.content)
<type 'str'>
>>> r.text
u'<!doctype html><html itemscope=""\ itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="........
>>> type(r.text)
<type 'unicode'>

In the preceding lines, we try to get the google homepage, using requests.get() and assigning it to a variable r. The r variable turns out to be a request object here, and we can access the raw content using r.content and the encoded response content with r.text.

If we wish to find what encoding Requests is using, or if we desire to change the encoding, we can use the property r.encoding as shown in the following example:

>>> r.encoding
'ISO-8859-1'
>>> r.encoding = 'utf-8'

In the first line of the code, we are trying to access the type of encoding that is being followed by Requests. It resulted in 'ISO-8859-1'. In the next line, I wished to change the encoding to 'utf-8'. So I assigned the type of encoding to r.encoding. If we change the encoding like we did in the second line, Requests tends to use the latest value of r.encoding that has been assigned. So from that point in time, it uses the same encoding whenever we call r.text.

For an instance, if the value of r.encoding is None, Requests tend to use the value of r.apparent_encoding. The following example explains the case:

>>> r.encoding = None
>>> r.apparent_encoding
'ascii'

Generally, the value of apparent encoding is specified by the chardet library. With more enthusiasm, if we attempt to set a new encoding type to r.apparent_encoding, Requests raises an AttributeError as its value can't be altered.

>>> r.apparent_encoding = 'ISO-8859-1'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

Requests are efficient enough to use custom encodings. Take a case in which we have created an encoding of our own, and got it registered with the module of codecs. We can use our custom codec with ease; this is because the values of r.encoding and Requests will take care of the decoding.

 

Different types of request contents


Requests has the facility to deal with different types of Request contents like binary response content, JSON response content, and raw response content. To give a clear picture on different types of response content, we listed the details. The examples used here are developed using Python 2.7.x.

Custom headers

We can send custom headers with a request. For that, we just need to create a dictionary with our headers and pass the headers parameter in the get, or post method. In the dictionary, key is the name of the header and the value is, well, the value of the pair. Let us pass an HTTP header to a request:

>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>>  payload = {'some': 'data'}
>>> headers = {'Content-Type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)

This example has been taken from the Request documents found at http://docs.python-requests.org/en/latest/user/quickstart/#custom-headers.

In this example, we have sent a header content-type with a value application/json, as a parameter to the request.

In the same way, we can send a request with a custom header. Say we have a necessity to send a request with an authorization header with a value as some token. We can create a dictionary with a key 'Authorization' and value as a token which would look like the following:

>>> url = 'some url'
>>>  header = {'Authorization' : 'some token'}
>>> r.request.post(url, headers=headers)

Sending form-encoded data

We can send form-encoded data like an HTML form using Requests. A simple dictionary to the data argument gets this done. The dictionary of data will turn as form-encoded automatically, when a request is made.

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = request.post("some_url/post", data=payload)
>>> print(r.text)
{
   …
   "form": {
       "key2": "value2",
     "key1": "value1"
   },
   …
}

In the preceding example, we tried sending data that is form-encoded. While dealing with data that is not form-encoded, we should send a string in the place of a dictionary.

Posting multipart encoded files

We tend to upload multipart data like images or files through POST. We can achieve this in requests using files which is a dictionary of 'name' and value of file-like-objects. And also we can specify it as 'name', and value could be 'filename', fileobj just like in the following way:

{'name' : file-like-objects} or
{'name': ('filename',  fileobj)}

The example is as follows:

>>> url = 'some api endpoint'
>>> files = {'file': open('plan.csv', 'rb')}
>>> r = requests.post(url, files=files)

We can access the response using 'r.text'.
>>>  r.text
{
   …
   "files": {
       "file": "< some data … >"
       },
   ….
}

In the former example, we didn't specify the content-type or headers. To add to that, we have the capability to set the name for the file we are uploading:

>>> url = 'some url'
>>> files = {'file': ('plan.csv', open('plan.csv', 'rb'), 'application/csv', {'Expires': '0'})}
>>> r = requests.post(url, files)
>>> r.text
{
   …
   "files"
       "file": "< data...>"
       },
   …
}

We can also send strings to be received as files in the following way:

>>> url = 'some url'
>>> files = {'file' : ('plan.csv', 'some, strings, to, send')}
>>> r.text
{
   …
   "files": {
       "file": "some, strings, to, send"
    },
   …
}
 

Looking up built-in response status codes


Status codes are helpful in letting us know the result, once a request is sent. To know about this, we can use status_code:

>>> r = requests.get('http://google.com')
>>> r.status_code
200

To make it much easier to deal with status_codes, Requests has got a built-in status code lookup object which serves as an easy reference. We must compare the requests.codes.ok with r.status_code to achieve this. If the result turns out to be True, then it's 200 status code, and if it's False, it's not. We can also compare the r.status.code with requests.codes.ok, requests.code.all_good to get the lookup work.

>>> r = requests.get('http://google.com')
>>> r.status_code == requests.codes.ok
True

Now, let's try checking with a URL that is non-existent.

>>> r = requests.get('http://google.com/404')
>>> r.status_code == requests.codes.ok
False

We have got the facility to deal with the bad requests like 4XX and 5XX type of errors, by notifying with the error codes. This can be accomplished by using Response.raise_for_status().

Let us try this by sending a bad request first:

>>> bad_request = requests.get('http://google.com/404')
>>> bad_request.status_code
404
>>>bad_request.raise_for_status()
---------------------------------------------------------------------------
HTTPError                              Traceback (most recent call last)
----> bad_request..raise_for_status()

File "requests/models.py",  in raise_for_status(self)
   771
   772         if http_error_msg:
--> 773             raise HTTPError(http_error_msg, response=self)
   774
   775     def close(self):

HTTPError: 404 Client Error: Not Found

Now if we try a working URL, we get nothing in response, which is a sign of success:

>>> bad_request = requests.get('http://google.com')
>>> bad_request.status_code
200
>>> bad_request.raise_for_status()
>>>

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

 

Viewing response headers


The server response header helps us to know about the software used by the origin server to handle the request. We can access the server response headers using r.headers:

>>> r = requests.get('http://google.com')
>>> r.headers
CaseInsensitiveDict({'alternate-protocol': '80:quic', 'x-xss-protection': '1; mode=block', 'transfer-encoding': 'chunked', 'set-cookie': 'PREF=ID=3c5de2786273fce1:FF=0:TM=1410378309:LM=1410378309:S=DirRRD4dRAxp2Q_3; …..

Requests for Comments (RFC) 7230 says that HTTP header names are not case-sensitive. This gives us a capability to access the headers with both capital and lower-case letters.

>>> r.headers['Content-Type']
'text/html; charset=ISO-8859-1'

>>>  r.headers.get('content-type')
'text/html; charset=ISO-8859-1'
 

Accessing cookies with Requests


We can access cookies from the response, if they exist:

>>> url = 'http://somewebsite/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['some_cookie_name']
'some_cookie_value'

We can send our own cookies, as shown in the following example:

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
 

Tracking redirection of the request using request history


Sometimes the URL that we are accessing may have been moved or it might get redirected to some other location. We can track them using Requests. The response object's history property can be used to track the redirection. Requests can accomplish location redirection with every verb except with HEAD. The Response.history list contains the objects of the Requests that were generated in order to complete the request.

>>> r = requests.get('http:google.com')
>>> r.url
u'http://www.google.co.in/?gfe_rd=cr&ei=rgMSVOjiFKnV8ge37YGgCA'
>>> r.status_code
200
>>> r.history
(<Response [302]>,)

In the preceding example, when we tried sending a request to 'www.google.com', we got the r.history value as 302 which means the URL has been redirected to some other location. The r.url shows us the proof here, with the redirection URL.

If we don't want Requests to handle redirections, or if we are using POST, GET, PUT, PATCH, OPTIONS, or DELETE, we can set the value of allow_redirects=False, so that redirection handling gets disabled.

>>> r = requests.get('http://google.com', allow_redirects=False)
>>> r.url
u'http://google.com/'
>> r.status_code
302
>>> r.history
[ ]

In the preceding example, we used the parameter allow_redirects=False, which resulted the r.url without any redirection in the URL and the r.history as empty.

If we are using the head to access the URL, we can facilitate redirection.

>>> r = requests.head('http://google.com', allow_redirects=True)
>>> r.url
u'http://www.google.co.in/?gfe_rd=cr&ei=RggSVMbIKajV8gfxzID4Ag'
>>> r.history
(<Response [302]>,)

In this example, we tried accessing the URL with head and the parameter allow_redirects enabled which resulted us the URL redirected.

 

Using timeout to keep productive usage in check


Take a case in which we are trying to access a response which is taking too much time. If we don't want to get the process moving forward and give out an exception if it exceeds a specific amount of time, we can use the parameter timeout.

When we use the timeout parameter, we are telling Requests not to wait for a response after some specific time period. If we use timeout, it's not equivalent to defining a time limit on the whole response download. It's a good practice to raise an exception if no bytes have been acknowledged on the underlying socket for the stated timeout in seconds.

>>> requests.get('http://google.com', timeout=0.03)
---------------------------------------------------------------------------
Timeout                                   Traceback (most recent call last)
…….
……..
Timeout: HTTPConnectionPool(host='google.com', port=80): Read timed\ out. (read timeout=0.03)

In this example we have specified the timeout value as 0.03 in which the timeout has been exceeded to bring us the response and so it resulted us the timeout exception. The timeout may occur in two different cases:

  • The request getting timed out while attempting to connect to the server that is in a remote place.

  • The request getting timed out if the server did not send the whole response in the allocated time period.

 

Errors and exceptions


Different types of errors and exceptions will be raised when something goes wrong in the process of sending a request and getting back a response. Some of them are as follows:

  • HTTPError: When there are invalid HTTP responses, Requests will raise an HTTPError exception

  • ConnectionError: If there is a network problem, such as refused connection and DNS failure, Requests will raise a ConnectionError exception

  • Timeout: If the request gets timed out, this exception will be raised

  • TooManyRedirects: If the request surpasses the configured number of maximum redirections, this type of exception is raised

Other types of exception that come in to the picture are Missing schema Exception, InvalidURL, ChunkedEncodingError, and ContentDecodingError and so on.

This example has been taken from Request documents available at http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions.

 

Summary


In this chapter, we covered a few basic topics. We learned why Requests is better than urllib2, how to make a simple request, different types of response contents, adding custom headers to our Requests, dealing with form encoded data, using the status code lookups, locating request redirection location and about timeouts.

In the next chapter, we will learn the advanced concepts in Requests, in depth, which will help us to use the Requests library flexibly, according to the requirements.

About the Authors

  • Rakesh Vidya Chandra

    Rakesh Vidya Chandra has been in the field of software development for the last 3 years. His love for programming first sparked when he was introduced to LOGO in his school. After obtaining his bachelor's degree in Information Technology, he worked with Agiliq Info Solutions and built several web applications using Python. Rakesh is passionate about writing technical blogs on various open source technologies. When not coding, he loves to dance to hip-hop and listens to EDM.

    Browse publications by this author
  • Bala Subrahmanyam Varanasi

    Bala Subrahmanyam Varanasi loves hacking and building web applications. He has a bachelor's degree in Information Technology. He has been in the software industry for the last three and a half years, where he worked with Agiliq Info Solutions and Crypsis Technologies. Bala has also built different web applications using Python, Ruby, and JavaScript. Apart from coding, he is interested in entrepreneurship and is the founder of Firebolt Labs. Currently, he is working as a software engineer at TinyOwl Technology.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Excellent book. Straight to the point.
Book Title
Access this book, plus 7,500 other titles for FREE
Access now