Chapter 6. Building a Simple Web Client
Hypertext Transfer Protocol (HTTP) is the application protocol that powers the World Wide Web (WWW). Whenever you fire up your web browser to do an internet search, browse Wikipedia, or make a post on social media, you are using HTTP. Many mobile apps also use HTTP behind the scenes. It's safe to say that HTTP is one of the most widely used protocols on the internet.
In this chapter, we will look at the HTTP message format. We will then implement a C program, which can request and receive web pages.
The following topics are covered in this chapter:
- The HTTP message format
- HTTP request types
- Common HTTP headers
- HTTP response code
- HTTP message parsing
- Implementing an HTTP client
- Encoding form data (
POST
) - HTTP file uploads
The example programs from this chapter can be compiled with any modern C compiler. We recommend MinGW on Windows and GCC on Linux and macOS. See appendices B, C, and D for compiler setup.
The code for this book can be found at https://github.com/codeplea/Hands-On-Network-Programming-with-C.
From the command line, you can download the code for this chapter with the following command:
git clone https://github.com/codeplea/Hands-On-Network-Programming-with-C
cd Hands-On-Network-Programming-with-C/chap06
Each example program in this chapter runs on Windows, Linux, and macOS. When compiling on Windows, each example program requires linking with the Winsock library. This is accomplished by passing the -lws2_32
option to gcc
.
We provide the exact commands needed to compile each example as they are introduced.
All of the example programs in this chapter require the same header files and C macros that we developed in Chapter 2, Getting to Grips with Socket APIs. For brevity, we put...
HTTP is a text-based client-server protocol that runs over TCP. Plain HTTP runs over TCP port 80
.
It should be noted that plain HTTP is mostly deprecated for security reasons. Today, sites should use HTTPS, the secure version of HTTP. HTTPS secures HTTP by merely running the HTTP protocol through a Transport Layer Security (TLS) layer. Therefore, everything we cover in this chapter regarding HTTP also applies to HTTPS. See Chapter 9, Loading Secure Web Pages with HTTPS and OpenSSL, for more information about HTTPS.
HTTP works by first having the web client send an HTTP request to the web server. Then, the web server responds with an HTTP response. Generally, the HTTP request indicates which resource the client is interested in, and the HTTP response delivers the requested resource.
Visually, the transaction is illustrated in the following graphic:
The preceding graphic illustrates a GET request. A GET request is used when the Web Client simply wants the Web Server to send it...
Uniform Resource Locators (URL), also known as web addresses, provide a convenient way to specify particular web resources. You can navigate to a URL by typing it into your web browser's address bar. Alternately, if you're browsing a web page and click on a link, that link is indicated with a URL.
Consider the http://www.example.com:80/res/page1.php?user=bob#account
URL. Visually, the URL can be broken down like this:
The URL can indicate the protocol, the host, the port number, the document path, and hash. However, the host is the only required part. The other parts can be implied.
We can parse the example URL from the preceding diagram:
- http://: The part before the first :// indicates the protocol. In this example, the protocol is http, but it could be a different protocol such as
ftp://
or https://
. If the protocol is omitted, the application will generally make an assumption. For example, your web browser would assume the protocol to be http. - www.example.com: This specifies...
Implementing a web client
We will now implement an HTTP web client. This client takes as input a URL. It then attempts to connect to the host and retrieve the resource given by the URL. The program displays the HTTP headers that are sent and received, and it attempts to parse out the requested resource content from the HTTP response.
Our program begins by including the chapter header, chap06.h
:
/*web_get.c*/
#include "chap06.h"
We then define a constant, TIMEOUT
. Later in our program, if an HTTP response is taking more than TIMEOUT
seconds to complete, then our program abandons the request. You can define TIMEOUT
as you like, but we give it a value of five seconds here:
/*web_get.c continued*/
#define TIMEOUT 5.0
Now, please include the entire parse_url()
function as given in the previous section. Our client needs parse_url()
to find the hostname, port number, and document path from a given URL.
Another helper function is used to format and send the HTTP request. We call it send_request()
, and...
An HTTP POST
request sends data from the web client to the web server. Unlike an HTTP GET
request, a POST
request includes a body containing data (although this body could be zero-length).
The POST
body format can vary, and it should be identified by a Content-Type
header. Many modern, web-based APIs expect a POST
body to be JSON encoded.
Consider the following HTTP POST
request:
POST /orders HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0)
Content-Type: application/json
Content-Length: 56
Connection: close
{"symbol":"VOO","qty":"10","side":"buy","type":"market"}
In the preceding example, you can see that the HTTP POST
request is similar to an HTTP GET
request. Notable differences are as follows: the request starts with POST
instead of GET
; a Content-Type
header field is included; a Content-Length
header field is present; and an HTTP message body is included. In that example, the HTTP message body is in JSON format, as specified by...
HTTP is the protocol that powers the modern internet. It is behind every web page, every link click, every graphic loaded, and every form submitted. In this chapter, we saw that HTTP is a text-based protocol that runs over a TCP connection. We learned the HTTP formats for both client requests and server responses.
In this chapter, we also implemented a simple HTTP client in C. This client had a few non-trivial tasks – parsing a URL, formatting a GET
request HTTP header, waiting for a response, and parsing the received data out of the HTTP response. In particular, we looked at handling two different methods of parsing out the HTTP body. The first, and easiest, method was Content-Length
, where the entire body length is explicitly specified. The second method was chunked encoding, where the body is sent as separate chunks, which our program had to delineate between.
We also briefly looked at the POST
requests and the content formats associated with them.
In the next chapter, Chapter 7...
Try these questions to test your knowledge from this chapter:
- Does HTTP use TCP or UDP?
- What types of resources can be sent over HTTP?
- What are the common HTTP request types?
- What HTTP request type is typically used to send data from the server to the client?
- What HTTP request type is typically used to send data from the client to the server?
- What are the two common methods used to determine an HTTP response body length?
- How is the HTTP request body formatted for a
POST
-type HTTP request?
The answers to these questions can be found in Appendix A, Answers to Questions.
For more information about HTTP and HTML, please refer to the following resources: