Learning Python Networking - Second Edition

3 (2 reviews total)
By José Manuel Ortega , Dr. M. O. Faruque Sarker , Sam Washington
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Network Programming with Python

About this book

Network programming has always been a demanding task. With full-featured and well-documented libraries all the way up the stack, Python makes network programming the enjoyable experience it should be.

Starting with a walk through of today's major networking protocols, through this book, you'll learn how to employ Python for network programming, how to request and retrieve web resources, and how to extract data in major formats over the web. You will utilize Python for emailing using different protocols, and you'll interact with remote systems and IP and DNS networking. You will cover the connection of networking devices and configuration using Python 3.7, along with cloud-based network management tasks using Python.

As the book progresses, socket programming will be covered, followed by how to design servers, and the pros and cons of multithreaded and event-driven architectures. You'll develop practical clientside applications, including web API clients, email clients, SSH, and FTP. These applications will also be implemented through existing web application frameworks.

Publication date:
March 2019
Publisher
Packt
Pages
490
ISBN
9781789958096

 

Chapter 1. Network Programming with Python

This book will focus on writing programs for networks that use the Internet Protocol (IP) suite. Why have we chosen to do this? Well, out of the sets of protocols that are supported by the Python standard library, the Transmission Control Protocol (TCP)/IP protocol is by far the most widely employable. It contains the principal protocols that are used by the internet. By learning to program for TCP/IP, you'll be learning how to potentially communicate with just about every device that is connected to this great tangle of network cables and electromagnetic waves.

The following topics will be covered in this chapter:

  • An introduction to TCP/IP networking
  • Protocol concepts and the problems that protocols solve
  • Addressing
  • Creating RESTful web applications and working with flask and HTTP requests
  • Interacting flask with the SQLAlchemy database

In this chapter, we will be looking at some concepts and methods related to networks and network programming in Python, which we'll be using throughout this book.

This chapter has two sections. The first section, An introduction to TCP/IP networking, offers an introduction to essential networking concepts, with a strong focus on the TCP/IP stack. We'll be looking at what comprises a network, how the IP allows data transfer across and between networks, and how TCP/IP provides us with services that help us to develop network applications. This section is intended to provide a grounding in these essential areas and to act as a point of reference for them. If you're already comfortable with concepts such as IP addresses, routing, TCP and User Datagram Protocol (UDP), and protocol stack layers, then you may wish to skip to the second section, Python network programming through libraries.

In the second part, we'll look at the way in which network programming is approached with Python. This chapter provides a review of basic network elements and principles, as well as a discussion of how Python supports network programming with an overview of key libraries. Finally, we will introduce you to Wireshark, a protocol exploration and network programming diagnostic tool. We will also look at how we can interact with Wireshark from Python with the pyshark module.

 

Technical requirements


Before you start reading this book, you should already know the basics of Python programming, such as the basic syntax, variable types, data type tuple, list dictionary, functions, strings, and methods. At the moment of writing this book, versions 3.7.2 and 2.7.15 are available at python.org/downloads. In this book, we will work with version 3.7 for code examples and installing packages.

The examples and source code for this chapter are available in this book's GitHub repository in the Chapter01 folder: https://github.com/PacktPublishing/Learning-Python-Networking-Second-Edition.

 

An introduction to TCP/IP networking


This first section offers an introduction to essential networking concepts, with a strong focus on the TCP/IP stack.

The following discussion is based on Internet Protocol version 4 (IPv4). Since the internet has run out of IPv4 addresses, a new version, IPv6, has been developed, which is intended to resolve this situation. However, although IPv6 is being used in a few areas, its deployment is progressing slowly and the majority of the internet will likely be using IPv4 for a while longer. We'll focus on IPv4 in this section, and then we will discuss the relevant changes in the IPv6 section of this chapter.

Introduction to TCP/IP

TCP/IP is a set of protocols that were designed to work together to provide an end-to-end transmission of messages across interconnected networks. TCP provides transparent data transfers between end systems using the services of the lower network layer to move the packets between the two communicating systems. TCP is a protocol that works at the transport layer, while IP works at the network layer.

TCP is responsible for creating connections through a data flow. This process guarantees that the data is delivered to the destination without errors and in the same order in which they came out. It is also used to distinguish different applications in the same device.

IP is responsible for sending and receiving data in blocks. The shipment always does this to find the best route, but without guaranteeing that it reaches the destination.

Both protocols are used to solve the transmission of data that is generated in a network, either internally or externally. The union of these protocols is done to ensure that the information always arrives on the best route and in the correct way to the destination.

The protocol stack, layer by layer

A protocol stack is organized in such a way that the highest level of communication resides in the top layer. Each layer in the stack is built on the services of the immediate lower layer.

The TCP/IP protocol stack has four layers, as follows:

  • Application layer: This layer manages the high-level protocols, including representation, coding, and dialogue control issues. It handles everything related to applications, and the data is packed appropriately for the next layer. It is a user process that cooperates with other processes on the same host or a different one. Examples of protocols at this layer are TELNET, File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMTP).
  • Transport layer: This layer handles quality of service, reliability, flow control, and error correction. One of its protocols is the TCP, which provides reliable network communications that are oriented to the connection, unlike UDP, which is not connection oriented. It also provides data transfer. Example protocols include TCP (connection oriented) and UDP (non-connection oriented).
  • Network layer: The purpose of the internet layer is to send packets from the source of any network and make them reach their destination, regardless of the route they take to get there.
  • Network access layer: This is also called a host-to-host network layer. It includes the LAN and WAN protocols, and the details in the physical and data link layers of the OSI model. Also known as the link layer or data link layer, the network interface layer is the interface to the current network hardware.

The following diagram represents the TCP/IP protocol stack:

The IP is the most important protocol of the network layer. It is a non-connection oriented protocol that does not assume reliability of the lower layers. IP does not provide reliability, flow control, or error recovery. These functions must be provided by the upper level, in the transport layer with TCP as the transport protocol, or in the application layer if UDP is being used as the transport protocol. The message unit in an IP network is called an IP datagram. This is the basic unit of information that is transmitted from one side of the TCP/IP network to the other.

The application layer is where all of the user interaction with the computer and services occurs. As an example of this, any browser can work, even without the TCP/IP stack installed. Usually, we use browsers such as Google Chrome, Mozilla, Firefox, Internet Explorer, and Opera for communicating with this layer.

When initiating a query for a remote document, the HTTP protocol is used. Each time we request a communication of this type, the browser interacts with the application layer, which, in turn, serves as an interface between the user's applications and the protocol stack, which will provide communication with the help of the lower layers.

The responsibilities of the application layer are to identify and establish the communication availability of the target destination, as well as to determine the resources for that communication to exist. Some of the protocols of the application layer are as follows:

  • FTP
  • HTTP
  • Post Office Protocol version 3 (POP3)
  • Internet Message Access Protocol (IMAP)
  • SMTP
  • Simple Network Management Protocol (SNMP)
  • TELNET—TCP/IP Terminal Emulation Protocol

UDP

UDP is a non-connection oriented protocol. That is, when machine A sends packets to machine B, the flow is unidirectional. The data transfer is made without warning the recipient of machine B, and the recipient receives the data without sending a confirmation to the sender of machine A.

This is because the data that's sent by the UDP protocol does not allow you to transmit information related to the sender. Therefore, the recipient will not know about the sender's data, except their IP address. Let's have a look at some properties of the UDP protocols:

  • Unreliable: In UDP, there is no concept of packet retransmission. Therefore, when a UDP packet is sent, it is not possible to know whether the packet has reached its destination since there are no errors in the correction mechanism.
  • Not ordered: The order in which packages are sent and received cannot be determined.
  • Datagrams: The integrity of packet delivery is done individually and can only be checked to ensure that the packages arrived correctly.
  • Lightweight and speed: The UDP protocol does not provide error recovery services, so it offers a direct way to send and receive datagrams through an IP network. It is used when speed is an important factor in the transmission of information, for example, when streaming audio or video.

TCP

The TCP protocol, unlike the UDP protocol, is connection oriented. When machine A sends data to machine B, machine B is informed of the arrival of this data and confirms its good reception.

Here, the CRC control of data intervenes, which is based on a mathematical equation that allows you to verify the integrity of the transmitted data. In this way, if the received data is corrupted, the TCP protocol allows the recipients to request the sender to send them again.

This protocol is one of the main protocols of the transport layer of the TCP/IP model, since, at the application level, it makes it possible to manage data coming from the lowest level of the model.

So, when data is provided to the IP protocol, it binds it in IP datagrams, fixing the field protocol with 6, so that you know in advance that the protocol is TCP. This protocol is connection oriented, so it allows two machines that are communicated to control the status of the transmission.

Several programs within a data network that are composed of computers can use TCP to create connections between them, by means of which they can send a data flow. Thus, the protocol guarantees that the data will be delivered to its destination. The most important thing to take into account is that it has no errors and maintains the order in which they are transmitted.

On the basis of the preceding example, we can devise the properties of TCP:

  • Reliable: The TCP protocol has the ability to manage the attempts that can be made to send a message if a packet is lost, and can resend those fragments that were not sent on the first attempt.
  • Ordered: The messages are delivered in a particular order.
  • Heavyweight: TCP has the ability to verify that the connection can be established through a socket before any packet can be sent, for which it uses three sending confirmation packets, called SYN, SYN-ACK, and ACK.
 

Protocol concepts and the problems that protocols solve


This section explains concepts regarding IP addresses and ports, network interfaces in a local machine, and other concepts related to protocols, such as Dynamic Host Configuration Protocol (DHCP) and DNS.

IP addresses and ports

IP addresses are addresses that help to uniquely identify a device over the internet. A port is an endpoint for communication in an operating system.

When you connect to the internet, your device is assigned a public IP address, and each website you visit also has a public IP address. So far, we have used IPv4 as an addressing system. The main problem with this is that the internet is running out of IPv4 public address space and so it is necessary to introduce IPv6, which provides a larger address space. 

The following are the addresses for total IPv4 and IPv6 space:

  • Total IPv4 space: 4, 294, 967, 296 addresses
  • Total IPv6 space: 340, 282, 366, 920, 938, 463, 463, 374, 607, 431, 768, 211, 456 addresses

The ports are numerical values (between 0 and 65, 535) that are used to identify the processes that are being communicated. At each end, each process that intervenes in the communication process uses a single port to send and receive data.

In conjunction with this, two pairs of ports and IP addresses, you can identify two processes in a TCP/IP network. A system might be running thousands of services, but to uniquely identify a service on a system, the application requires a port number.

Port numbers are sometimes seen on the web or other URLs as well. By default, HTTP uses port 80, and HTTPS uses port 443, but a URL like http://www.domain.com:8080/path/ specifies that the web browser, instead of using default port 80, is connecting to port 8080 of the HTTP server.

Some common ports are as follows:

  • 22: Secure Shell (SSH)
  • 23: Telnet remote login service
  • 25: SMTP
  • 53: Domain Name System (DNS) service
  • 80: HTTP

Regarding IP addresses, we can differentiate two types, depending on whether they are for a public or private rank for the internal network of an organization:

  • Private IP address: Ranges from 192.168.0.0 to 192.168.255.255, 172.16.0.0 to 172.31.255.255, or 10.0.0.0 to 10.255.255.255
  • Public IP address: A public IP address is an IP address that your home or business router receives from your Internet Service Provider (ISP)

Network interfaces

You can find out what IP addresses have been assigned to your computer by running ip addr or ipconfig all on Windows systems, or on a Terminal.

If we run one of these commands, we will see that the IP addresses are assigned to our device's network interfaces. On Linux, these will have names, such as eth0; on Windows, these will have phrases, such as Ethernet adapter Local Area Connection.

You will get the following output when you run the ip addr command on Linux:

You will get the following options when you run the ipconfig command on Windows:

You will get IP addresses for the interfaces in your local machine when you run the ip addr command:

Every device has a virtual interface called the loopback interface, which you can see in the preceding listing as interface 1. This interface doesn't actually connect to anything outside the device, and only the device itself can communicate with it. While this may sound a little redundant, it's actually very useful when it comes to local network application testing, and it can also be used as a means of inter-process communication. The loopback interface is often referred to as localhost, and it is almost always assigned the IP address 127.0.0.1.

UDP versus TCP

The main difference between TCP and UDP is that TCP is oriented to connections, where once the connection is established, the data can be transmitted in both directions, while UDP is a simpler internet protocol, without the need for connections.

Now, we have to analyze the differences according to certain features:

  • Differences in data transfer: TCP ensures the orderly and reliable delivery of a series of data from the user to the server and vice versa. UDP is not dedicated to point-to-point connections and does not verify the availability of whoever receives the data.
  • Reliability: TCP is more reliable because it manages to recognize that the message was received and retransmits the packets that have been lost. UDP does not verify what the communication has produced because it does not have the ability to check the connection and retransmit the packets.
  • Connection: TCP is a protocol that's oriented toward the congestion control of the network and the reliability of the frames, while UDP is a non-connection oriented protocol that's designed to establish a rapid exchange of packets without the need to know whether the packets are arriving correctly.
  • Transfer method: TCP reads data as a sequence and the message is transmitted in defined segments. UDP messages are data packets that are sent individually and their integrity is verified upon arrival.
  • How TCP and UDP work: A TCP connection is established through the process of starting and verifying a connection. Once the connection has been established, it is possible to start the data transfer, and once the transfer is complete, the connection is completed by closing the established virtual circuits. UDP provides an unreliable service and the data may arrive unordered, duplicated, or incomplete, and it doesn't notify either the sender or receiver. UDP assumes that corrections and error checking are not necessary, avoiding the use of resources in the network interface.
  • TCP and UDP applications: TCP is used mainly when you need to use error correction mechanisms in the network interface, while UDP is mainly used in applications based on small requests from a large number of clients, for example, DNS and Voice Over IP (VoIP).

DHCP

IP addresses can be assigned to a device by a network administrator in one of two ways: statically, where the device's operating system is manually configured with the IP address, or dynamically, where the device's operating system is configured by using the DHCP.

When using DHCP, as soon as the device first connects to a network, it is automatically allocated an address by a DHCP server from a predefined pool. Some network devices, such as home broadband routers, provide a DHCP server service out of the box; otherwise, a DHCP server must be set up by a network administrator. DHCP is widely deployed, and it is particularly useful for networks where different devices may frequently connect and disconnect, such as public Wi-Fi hotspots or mobile networks.

DHCP environments require a DHCP server that's been configured with the appropriate parameters for the proposed network. The main DHCP parameters include the range or pool of available IP addresses, the correct subnet masks, and the gateway and server name addresses.

A DHCP server dynamically allocates IP addresses instead of having to depend on the static IP address and is responsible for assigning, leasing, reallocating, and renewing IP addresses. The protocol will assign an address that is available in a subnet or pool. This means that a new device can be added to a network without you having to manually assign it a unique IP address. DHCP can also combine static and dynamic IPs, and also determines how long an IP address is assigned to a device.

When a computer in a network wants to obtain a valid network configuration, usually when starting up the machine, it issues a DHCP Discover request. When this request—which is made through a UDP broadcast packet—reaches a DHCP server, a negotiation is established whereby the server grants the use of an IP, and other network parameters, to the client for a certain time.

It is important to take note of the following:

  • The client does not need to have the network interface configured to issue a DHCP Discover request.
  • The DHCP server can be on the same or a different subnet as the client will be on. If the client does not have network configuration, it cannot reach other subnets.
  • When the DHCP server receives the DHCP request, Discover obtains the Mac address of the client, which may affect the IP address assigned to the client.
  • The DHCP server grants network configuration to the client for a certain time. Before reaching the deadline, the client may try to renew the concession. If a concession occurs, the client must stop using the network configuration.

To make a DHCP request, you can use a client such as dhclient (native GNU/Linux) or the ipconfig/renew command (in the case of Windows). When a network configuration is obtained, the client uses it:

DNS

DNS allows for the association of domain names with IP addresses, which greatly facilitates access to the machines on the network. Without DNS, referring to a machine implies remembering your IP address. Working directly with IP addresses is not comfortable, because they are difficult to remember and because the IP address of a station can vary for different reasons. Whoever uses the domain name does not need to worry about these changes (although the DNS server must know the real IP in each case).

The domain name system is a distributed and hierarchical database, and although its main function is to associate domain names with IP addresses, it can also store other information. The DNS service is one of the pillars of the network, so its availability must be absolute. To achieve this, redundant servers are used and extensive caching is used to improve their performance.

The nslookup tool comes with most Linux and Windows systems and lets us query DNS on the command line, as follows:

We can use this command to request the IP address for the packtpub.com domain:

With this command, we determined that the packtpub.com host has the IP address 83.166.169.231. DNS distributes the work of looking up hostnames by using a hierarchical system of caching servers. Internet DNS services are a set of databases that are scattered on servers around the world. These databases indicate the IP that is associated with a name of a website. When we enter an address in the search engine, for example, packtpub.com, the computer asks the DNS servers of the internet provider to find the IP address associated with packtpub.com. If the servers do not have that information, a search is made with other servers that may have it.

When we run our preferred browser and write a web address in its address bar to access the content that's hosted on the site, the DNS service will translate these names into elements that can be understood and used for the equipment and systems that make up the internet.

On Windows computers, this system is configured by default to automatically use the DNS server of our internet service provider. At this point, we may have different DNS providers such as OpenDNS, UltraDNS, or Google DNS as an alternative, but we must always keep in mind that these providers offer us minimum security conditions to navigate. More information about configuration using Google DNS can be found at the following URL: https://developers.google.com/speed/public-dns/.

 

Addressing


This section explains concepts regarding the Network Address Translation (NAT) protocol and introduces the differences between the IPv4 and IPv6 formats.

NAT

This mechanism makes the traffic from the private network appear to be coming from a single valid public internet address, which effectively hides the private addresses from the internet. If you inspect the output of ip addr or ipconfig/all commands, then you will find that your devices are using private range addresses, which would have been assigned to them by your DHCP server or by your router through DHCP address dynamic assignment.

The private address ranges that are usually assigned are as follows:

  • 10.0.0.0 to 10.255.255.255
  • 172.16.0.0 to 172.31.255.255
  • 192.168.0.0 to 192.168.255.255

The idea is simple: make computer networks use a range of private IP addresses and connect to the internet using a single public IP address. Thanks to this patch, large companies will only be able to use one public IP address instead of as many public addresses as the number of machines there are in that company. It is also used to connect home networks to the internet.

There are two types of operations with NAT:

  • Static: A private IP address is always translated into the same public IP address. This mode of operation would allow a host within the network to be visible from the internet.
  • Dynamic: The router is assigned several public IP addresses so that each private IP address is mapped using one of the public IP addresses that the router has assigned. This is done so that each private IP address corresponds to at least one public IP address.

Each time a host requires an internet connection, the router will assign a public IP address that is not being used. This time, security is increased because it makes it difficult for an external host to enter the network since public IP addresses are constantly changing.

IPv4

IPv4 is the technology that allows computers to connect to the internet, whatever device we use. Each of these devices, in the instance that it connects to the internet, gets a unique code so that we can send and receive data with other connections.

As we already know, the IPv4 protocol transfers addresses that are 32 bits in length. With this type of architecture, it can manage approximately 4.3 billion IPs around the world, but the explosion of internet users in recent years has meant that the system is at its maximum capacity in regards to supporting more IP addresses.

The IPv4 address space is limited to 4.3 billion addresses. To obtain this number, we could decompose an IPv4 address as a 32-bit number consisting of four groups of 8 bits. In this way, we would have 256 different combinations to represent one IP address. This means that the possible values of an octet in an IP address would be in the range of 0 to 255.

To obtain the total number of IPv4 addresses, it would be enough to multiply 256 * 256 * 256 * 256, since an IPv4 address is composed of four sections with 256 possibilities in each section. In total, we would have 4, 294, 967, 296 addresses. In IPv4, the universe of addresses is divided into ranges or classes, as follows:

  • CLASS A: 1.0.0.0-126.255.255.255
  • CLASS B: 128.0.0.0-191.255.255.255
  • CLASS C: 192.0.0.0-223.255.255.255
  • CLASS D: 224.0.0.0-239.255.255.255 (Multicast)
  • CLASS E: 240.0.0.0-254.255.255.255 (Experimental)

By definition, multicast and experimental addresses cannot be used as source addresses, so the previous number must be subtracted from 520, 093, 696. Within the different classes, we have network 0.0.0.0 (the identifier of all IPv4 networks), network 127.0.0.0 (used to identify physical loopbacks in network equipment), and network 255.0.0.0 (which includes the broadcast addresses of all networks). With these restrictions, 116, 777, 216 addresses must be removed from the total.

Due to this, the need to find a replacement was palpable, and it fell to the IPv6 protocol, the sixth revision of IP and the natural successor of IPv4, to create more addresses.

IPv6

IPv6 addresses have a length of 128 bits, and so the total number of addresses will be raised to 128, where each IPv6 address consists of eight groups of 16 bits, separated by colons :, and expressed in hexadecimal notation.

Unlike IPv4, in which addresses consist of four-thirds of decimal digits ranging from 0 to 255, IPv6 addresses contain eight groups of four hexadecimal digits. fe80::e53f: e43b: ad07: 9cab is an example of an IPv6 address.

With the ifconfig command on a Windows machine, we can see an example configuration:

 

Python network programming through libraries


In this section, we're going to look at a general approach to network programming in Python. We'll be introducing the main standard library modules and look at some examples to see how they relate to the TCP/IP stack.

An introduction to the PyPI Python repository

The Python Package Index, or PyPI, which can be found at https://pypi.python.org, is the official software repository for third-party applications in the Python programming language. Python developers want it to be a comprehensive catalog of all Python packages written in open source code.

To download packages from the PyPI repository, you can use several tools, but in this section, we will explain how to use the pip command to do so. pip is the official package installer that comes already installed when you install Python on your local machine.

You can find all of the Python networking libraries in the Python PyPI repository, such as requests (https://pypi.org/project/requests) and urllib (https://pypi.org/project/urllib3).

Installing a package using pip is very simple—just execute pip install <package_name>; for example, pip install requests. We can also install pip using the package manager of a Linux distribution. For example, in a Debian or Ubuntu distribution, we can use the apt-get command:

$ sudo apt-get install python-pip

Alternatives to pip for installing packages

We can use alternatives such as conda and Pipenv for the installation of packages in Python. Other components, such as virtualenv, also exist for this reason.

Conda

Conda is another way in which you can install Python packages, though its development and maintenance is provided by another Anaconda company. An advantage of the Anaconda distribution is that it comes with over 100 very popular Python packages, so you can start elbowing in Python straight away. You can download conda from the following link: https://www.anaconda.com/download/.

Installing packages with conda is just as easy as with pip—just run conda install <package_name>; for example, conda install requests.

The conda repository is independent of the official Python repository and does not find all of the Python packages that are in PyPI, but you will find all of the Python networking libraries such as requests (https://anaconda.org/anaconda/requests), urllib, and socket.

Virtualenv

virtualenv is a Python tool for creating virtual environments. To install it, you just have to run pip install virtualenv. With this, you can start creating virtual environments, for example, virtualenv ENV. Here, ENV is a directory that will be installed in a virtual environment that includes a separate Python installation. For more information, see the complete guide, which includes information on how to activate the environments: https://virtualenv.pypa.io.

Pipenv

Pipenv is a relatively new tool that modernizes the way Python manages dependencies, and includes a complete dependency resolver in the same way conda does for handling virtual environments, locking files, and more. Pipenv is an official Python program, so you just have to run pip install pipenv to install it. You can find an excellent guide for Pipenv in English here: https://realpython.com/pipenv-guide.

An introduction to libraries for network programming with Python

Python provides modules for interfacing with protocols at different levels in the network stack, and modules that support higher-layer protocols follow the aforementioned principle by using the interfaces that are supplied by the lower-level protocols.

Introduction to sockets

The socket module is Python's standard interface for the transport layer, and it provides functions for interacting with TCP and UDP, as well as for looking up hostnames through DNS. In this section, we will introduce you to this module. We'll learn much more about this in Chapter 10, Programming with Sockets.

A socket is defined by the IP address of the machine, the port on which it listens, and the protocol it uses. The types and functions that are needed to work with sockets are in Python in the socket module.

Sockets are classified into stream sockets, socket.SOCK_STREAM, or datagram sockets, socket.SOCK_DGRAM, depending on whether the service uses TCP, which is connection oriented and reliable, or UDP, respectively.

The sockets can also be classified according to their family. We have Unix sockets, such as socket.AF_UNIX, that were created before the conception of the networks and are based on socket.AF_INET file, which are based on network connections and sockets related to connections with IPv6, such as socket.AF_INET6.

Socket module in Python

To create a socket, the socket.socket() constructor is used, which can take the family, type, and protocol as optional parameters. By default, the AF_INET family and the SOCK_STREAM type are used.

The general syntax is socket.socket(socket_family, socket_type, protocol=0), where the parameters are as follows:

  • socket_family: This is either AF_UNIX or AF_INET
  • socket_type: This is either SOCK_STREAM or SOCK_DGRAM
  • protocol: This is usually left out, defaulting to 0

Client socket methods

To connect to a remote socket in one direction, we can use the connect() method by using the connect (host, port) format:

import socket

# a socket object is created for communication
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# now connect to the web server on port 80
client_socket.connect(("www.packtpub.com", 80))

Server socket methods

The following are some server socket methods, which are also shown in the following code:

  • bind(): With this method, we can define in which port our server will be listening to connections
  • listen(backlog): This method makes the socket accept connections and accept to start listening to connections
  • accept(): This method is used for accepting the following connection:
import socket

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#bind the socket to localhost on port 80
serversocket.bind((‘localhost', 80))
#become a server socket and listen a maximum of 10 connections
serversocket.listen(10)

Working with RFC

The Request for Comments, better known by its acronym, RFC, are a series of publications of the internet engineering working group that describe various aspects of the operation of the internet and other computer networks, such as protocols and procedures.

Each RFC defines a monograph or memorandum that engineers or experts in the field have sent to the Internet Engineering Task Force (IETF) organization, the most important technical collaboration consortium on the internet, so that it can be valued by the rest of the community.

RFCs cover a wide range of standards, and TCP/IP is just one of these. They are freely available on the IETF's website, which can be found at www.ietf.org/rfc.html. Each RFC has a number; IPv4 is documented by RFC 791, and other relevant RFCs will be mentioned as we progress throughout this book.

The most important IPs are defined by RFC, such as the IP protocol that's detailed in RFC 791, FTP in RFC 959, or HTTP in RFC 2616.

You can use this service to search by RFC number or keyword. This can be found here: https://www.rfc-editor.org/search/rfc_search.php.

In the following screenshot, we can see the result of searching for RFC number 2616 for the HTTP protocol:

Extracting RFC information

The IETF landing page for RFCs is http://www.rfc-editor.org/rfc/, and reading through it tells us exactly what we want to know. We can access a text version of an RFC using a URL of the form http://www.rfc-editor.org/rfc/rfc741.txt. The RFC number in this case is 741. Therefore, we can get the text format of RFCs using HTTP.

At this point, we can build a Python script for downloading an RCF document from IETF, and then display the information that's returned by the service. We'll make it a Python script that just accepts an RFC number, downloads the RFC in text format, and then prints it to stdout.

The main modules that we can find in Python to make HTTP requests are urllib and requests, which work at a high level. We can also use the socket module if we want to work at a low level.

Downloading an RFC with urllib

Now, we are going to write our Python script using the urllib module. For this, create a text file called RFC_download_urllib.py:

#!/usr/bin/env python3

import sys, urllib.request
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)
template = 'http://www.rfc-editor.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc_raw = urllib.request.urlopen(url).read()
rfc = rfc_raw.decode()
print(rfc)

We can run the preceding code by using the following command:

$ python RFC_download_urllib.py 2324

This is the output of the previous script, where we can see the RFC description document:

First, we import our modules and check whether an RFC number has been supplied on the command line. Then, we construct our URL by substituting the supplied RFC number. Next, the main activity, the urlopen() call, will construct an HTTP request for our URL, and then it will connect to the IETF web server and download the RFC text. Next, we decode the text to Unicode, and finally we print it out to the screen.

Downloading an RFC with requests

Now, are going to create the same script but, instead of using urllib, we are going to use the requests module. For this, create a text file called RFC_download_requests.py:

#!/usr/bin/env python3

import sys, requests
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)
template = 'http://www.rfc-editor.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc = requests.get(url).text
print(rfc)

We can simplify the previous script using the requests module. The main difference with the requests module is that we use the get method for the request and access the text property to get information about the specific RFC.

Downloading an RFC with the socket module

Now, we are going to create the same script but, instead of using urllib or requests, we are going to use the socket module for working at a low level. For this, create a text file called RFC_download_socket.py:

#!/usr/bin/env python3

import sys, socket
try:
    rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
    print('Must supply an RFC number as first argument')
    sys.exit(2)

host = 'www.rfc-editor.org'
port = 80
sock = socket.create_connection((host, port))

req = ('GET /rfc/rfc{rfcnum}.txt HTTP/1.1\r\n'
'Host: {host}:{port}\r\n'
'User-Agent: Python {version}\r\n'
'Connection: close\r\n'
'\r\n'
)

req = req.format(rfcnum=rfc_number,host=host,port=port,version=sys.version_info[0])
sock.sendall(req.encode('ascii'))
rfc_bytes = bytearray()

while True:
 buf = sock.recv(4096)
 if not len(buf):
     break
 rfc_bytes += buf
rfc = rfc_bytes.decode('utf-8')
print(rfc)

The main difference here is that we are using a socket module instead of urllib or requests. Socket is Python's interface for the operating system's TCP and UDP implementation. We have to tell socket which transport layer protocol we want to use. We do this by using the socket.create_connection() convenience function. This function will always create a TCP connection. For establishing the connection, we are using port 80, which is the standard port number for web services over HTTP.

Next, we deal with the network communication over the TCP connection. We send the entire request string to the server by using the sendall() call. The data that's sent through TCP must be in raw bytes, so we have to encode the request text as ASCII before sending it.

Then, we piece together the server's response as it arrives in the while loop. Bytes that are sent to us through a TCP socket are presented to our application in a continuous stream. So, like any stream of unknown length, we have to read it iteratively. The recv() call will return the empty string after the server sends all of its data and closes the connection. Finally, we can use this as a condition for breaking out and printing the response.

 

Interacting with Wireshark with pyshark


This section will help you update the basics of Wireshark to capture packets, filter them, and inspect them. You can use Wireshark to analyze the network traffic of a suspicious program, analyze the traffic flow in your network, or solve network problems. We will also review the pyshark module for capturing packets in Python.

Introduction to Wireshark

Wireshark is a network packet analysis tool that captures packets in real time and displays them in a graphic interface. Wireshark includes filters, color coding, and other features that allow you to analyze network traffic and inspect packets individually.

Wireshark implements a wide range of filters that facilitate the definition of search criteria for the more than 1,000 protocols it currently supports. All of this happens through a simple and intuitive interface that allows each of the captured packages to be broken down into layers.

Thanks to Wireshark understanding the structure of these protocols, we can visualize the fields of each of the headers and layers that make up the packages, providing a wide range of possibilities to the network administrator when it comes to performing tasks in the analysis of traffic.

One of the advantages that Wireshark has is that at any given moment, we can leave capturing data in a network for as long as we want and then store them so that we can perform the analysis later. It works on several platforms, such as Windows, OS X, Linux, and Unix.

Wireshark is also considered a protocol analyzer or packet sniffer, thus allowing us to observe the messages that are exchanged between applications. For example, if we capture an HTTP message, the packet analyzer must know that this message is encapsulated in a TCP segment, which, in turn, is encapsulated in an IP packet, and which, in turn, is encapsulated in an Ethernet frame.

Note

A protocol analyzer is a passive element, since it only observes messages that are transmitted and received from to an element of the network, but never sends messages themselves. Instead, a protocol analyzer receives a copy of the messages that are being received or sent to the Terminal where it is running.

Wireshark is composed mainly of two elements: a packet capture library, which receives a copy of each data link frame that is either sent or received, and a packet analyzer, which shows the fields corresponding to each of the captured packets. To do this, the packet analyzer must know about the protocols that it is analyzing so that the information that's shown is consistent.

Wireshark installation

You can download the Wireshark tool from the official page: http://www.wireshark.org/download.html.

On Windows systems, we can install the following wizard in the Windows installer. On a Linux distribution based on the Debian operating system, such as Ubuntu, this is as easy as executing the apt-get command:

sudo apt-get install wireshark

One of the advantages of Wireshark is the filtering we can make regarding the captured data. We can filter protocols, source, or destination IP, for a range of IP addresses, ports, or uni-cast traffic, among a long list of options. We can manually enter the filters in a box or select these filters from a default list.

Capturing packets with Wireshark

To start capturing packets, you can click on the name of an interface from the list of interfaces. For example, if you want to capture traffic on your Ethernet network, double-click on the Ethernet connection interface:

As soon as you click on the name of the interface, you will see that the packages start to appear in real time. Wireshark captures every packet that's sent to or from your network traffic. You will see random flooding of data in the Wireshark dashboard. There are many ways to filter traffic:

  • To filter traffic from any specific IP address, type ip.addr == 'xxx.xx.xx.xx' in the Apply a display filter field
  • To filter traffic for a specific protocol, say, TCP, UDP, SMTP, ARP, and DNS requests, just type the protocol name into the Apply a display filter field

We can use the Apply a display filter box to filter traffic from any IP address or protocol:

The graphical interface of Wireshark is mainly divided into the following sections:

  • The toolbar, where you have all the options that you can perform on the pre and post capture
  • The main toolbar, where you have the most frequently used options in Wireshark
  • The filter bar, where you can apply filters to the current capture quickly
  • The list of packages, which shows a summary of each package that is captured by Wireshark
  • The panel of details of packages that, once you have selected a package in the list of packages, shows detailed information of the same
  • The packet byte panel, which shows the bytes of the selected packet, and highlights the bytes corresponding to the field that's selected in the packet details panel
  • The status bar, which shows some information about the current state of Wireshark and the capture

Network traffic in Wireshark

Network traffic or network data is the amount of packets that are moving across a network at any given point of time. The following is a classical formula for obtaining the traffic volume of a network: Traffic volume = Traffic Intensity or rate * Time

In the following screenshot, we can see what the network traffic looks like in Wireshark:

In the previous screenshot, we can see all the information that is sent over, along with the data packets on a network. It includes several pieces of information, including the following:

  • Time: The time at which packets are captured
  • Source: The source from which the packet originated
  • Destination: The sink where packets reach their final destination
  • Protocol: Type of IP (or set of rules) the packet followed during its journey, such as TCP, UDP, SMTP, and ARP
  • Info: The information that the packet contains

The Wireshark website contains samples for capture files that you can import into Wireshark. You can also inspect the packets that they contain: https://wiki.wireshark.org/SampleCaptures.

For example, we can find an HTTP section for downloading files that contains examples of HTTP requests and responses:

Color coding in Wireshark

When you start capturing packets, Wireshark uses colors to identify the types of traffic that can occur, among which we can highlight green for TCP traffic, blue for DNS traffic, and black for traffic that has errors at the packet level.

To see exactly what the color codes mean, click View | Coloring rules. You can also customize and modify the coloring rules in this screen.

If you need to change the color of one of the options, just double-click it and choose the color you want:

Working with filters in Wireshark

When we have a very high data collection, the filters allow us to show only those packages that fit our search criteria. We can distinguish between capture filters and display filters depending on the syntax with which each of them is governed.

The capture filters are supported directly on libpcap libraries such as tcpdump or Snort, so they depend directly on them to define the filters. For this reason, we can use Wireshark to open files that are generated by tcpdump or by those applications that make use of them.

The most basic way to apply a filter is by typing its name into the filter box at the top of the window. For example, type dns and you will see only DNS packets.

The following is a screenshot of the dns filter:

You can also click on the Analyze menu and select Display Filters to see the filters that are created by default.

In the following screenshot, we can see the display filters that we can apply when capturing packets with Wireshark:

Filtering by protocol name

This filter is very powerful, but you will realize its full potential now that you are going to filter by protocol. Some of the filters include TCP, HTTP, POP, DNS, ARP, and SSL.

We can find out about HTTP requests by applying the HTTP filter. In this way, we can know about all of the GET and POST requests that have been made during the capture. Wireshark displays the HTTP message that was encapsulated in a TCP segment, which was encapsulated in an IP packet and encapsulated in an Ethernet frame:

In the preceding screenshot, we can see how a GET request has been sent to the URL that was requested from the browser. After this, the web server where the page is hosted has answered successfully (200 OK), encapsulating itself in an HTTP message where the html code contains the required path. It is the browser (application) that de-encapsulates the code and interprets it.

HTTP objects filter

As we can see, the filters provide us with a great traceability of communications and also serves as an ideal complement to analyze a multitude of attacks. An example of this is the http.content_type filter, thanks to which we can extract different data flows that take place in an HTTP connection (text/html, application/zip, audio/mpeg, image/gif). This will be very useful for locating malware, exploits, or other types of attacks that are embedded in such a protocol:

Wireshark contemplates two types of filters, that is, capture filters and display filters:

  • Capture filters are those that are set to show only packets that meet the requirements indicated in the filter
  • Display filters establish a filter criterion on the captured packages, which we are visualizing in the main screen of Wireshark

Capture filters

Capture filters are those that are set to show only the packages that meet the requirements indicated in the filter. If we do not establish any, Wireshark will capture all of the traffic and present it on the main screen. Even so, we can set the display filters to show us only the desired traffic:

Display filters

The visualization filters establish a criterion of filter on the packages that we are capturing and that we are visualizing in the main screen of Wireshark. When you apply a filter on the Wireshark main screen, only the filtered traffic will appear through the display filter. We can also use it to filter the content of a capture through a pcap file:

Analyzing networking traffic using the pyshark library

We can use the pyshark library to analyze the network traffic in Python, since everything Wireshark decodes in each packet is made available as a variable. We can find the source code of the tool in GitHub's repository: https://github.com/KimiNewt/pyshark.

In the PyPI repository, we can find the last version of the library, that is, https://pypi.org/project/pyshark, and we can install it with the pip install pyshark command.

In the documentation of the module, we can see that the main package for opening and analyzing a pcap file is capture.file_capture:

Here's an example that was taken from pyshark's GitHub page. This shows us how, from the Python 3 command-line interpreter, we can read packets stored in a pcap file. This will give us access to attributes such as packet number and complete information for each layer, such as its protocol, IP address, mac address, and flags, where you can see if the packet is a fragment of another:

>> import pyshark
 >>> cap = pyshark.FileCapture(‘http.cap')
 >>> cap
 >>> print(cap[0])

In the following screenshot, we can see the execution of the previous commands, and also see where we passed the pcap file path in the FileCapture method as a parameter: 

We can apply a filter for DNS traffic only with the display_filter argument in the FileCapture method:

import pyshark
cap = pyshark.FileCapture('http.cap', display_filter="dns")
for pkt in cap:
  print(pkt.highest_layer)

In the following screenshot, we can see the execution of the previous commands:

FileCapture and LiveCapture in pyshark

As we saw previously, you can use the FileCapture method to open a previously saved trace file. You can also use pyshark to sniff from an interface in real time with the LiveCapture method, like so:

import pyshark
 # Sniff from interface in real time
 capture = pyshark.LiveCapture(interface='eth0')
 capture.sniff(timeout=10)
 <LiveCapture (5 packets)>

Once a capture object is created, either from a LiveCapture or FileCapture method, several methods and attributes are available at both the capture and packet level. The power of pyshark is that it has access to all of the packet decoders that are built into TShark.

Now, let's see what methods provide the returned capture object.

To check this, we can use the dir method with the capture object:

The display_filter, encryption, and input_filename attributes are used for displaying parameters that are passed into FileCapture or LiveCapture.

Both methods offer similar parameters that affect packets that are returned in the capture object. For example, we can iterate through the packets and apply a function to each. The most useful method here is the apply_on_packets() method. apply_on_packets() is the main way to iterate through the packets, passing in a function to apply to each packet:

>>> cap = pyshark.FileCapture('http.cap', keep_packets=False)
 >>> def print_info_layer(packet):
 >>>     print("[Protocol:] "+packet.highest_layer+" [Source IP:] "+packet.ip.src+" [Destination IP:]"+packet.ip.dst)
 >>> cap.apply_on_packets(print_info_layer)

In the following screenshot, we can see the information that's returned when we are obtaining information for each packet pertaining to Protocol, Source IP, and Destination IP:

We can also use the apply_on_packets() method for adding the packets to a list for counting or other processing means. Here's a script that will append all of the packets to a list and print the count. For this, create a text file called count_packets.py:

import pyshark
packets_array = []

def counter(*args):
 packets_array.append(args[0])

def count_packets():
    cap = pyshark.FileCapture('http.cap', keep_packets=False)
    cap.apply_on_packets(counter, timeout=10000)
    return len(packets_array)

print("Packets number:"+str(count_packets()))

for packet in packets_array:
 print(packet)

We can use only_summaries, which will return packets in the capture object with just the summary information of each packet:

>>> cap = pyshark.FileCapture(‘http.cap', only_summaries=True)
 >>> print cap[0]

This option makes capture file reading much faster, and with the dir method, we can check the attributes that are available in the object to obtain information about a specific packet.

In the following screenshot, we can see information about a specific packet and get all of the attributes that return not null information:

The information you can see in the form of attributes is as follows:

  • destination: The IP destination address
  • source: The IP source address
  • info: A summary of the application layer
  • length: Length of the packet in bytes
  • no: Index number of the packet
  • protocol: The highest layer protocol that's recognized in the packet
  • summary_line: All of the summary attributes in one string
  • time: Time between the current packet and the first packet
 

Summary


In this chapter, we have completed an introduction to TCP/IP and how machines communicate in a network. We learned about the main protocols of the network stack and the different types of address for communicating in a network. We started with Python libraries for network programming and looked at socket and the urlllib and requests modules, and provided an example of how we can interact and obtain information from RFC documents. We also acquired some basic knowledge so that we are able to perform a network traffic analysis with Wireshark.

Wireshark is provided with innumerable functionalities, thanks to which we will be able to identify and analyze network traffic and identify communications in our network.

In the next chapter, you will learn how to use Python as an HTTP client so that you can make requests over the REST API and retrieve web resources with the urllib and requests modules.

 

Questions


  1. What TCP/IP layer does user interaction with computers and services occur?
  2. Why do we need to replace IPv4 with the IPv6 protocol?
  3. What protocol allows you to dynamically configure IP addresses in the device's operating system?
  4. What mechanism makes the traffic from the private network appear to be coming from a single valid public internet address and hides the private addresses from the internet?
  5. What are the main options for installing Python packages on your localhost machine?
  6. What is the main Python tool for creating virtual environments, which also includes a separate Python installation for the packages?
  7. What are the main modules that we can find in Python to make HTTP requests at a high level?
  8. What are the main modules that we can find in Python to make HTTP requests at a low level?
  9. Which library can we use to analyze network traffic in Python that Wireshark decodes in each packet?
  10. What method from the pyshark package can we use to iterate through the packets and apply a function to each one?
 

Further reading


By going to the following links, you will find more information about the tools and the official Python documentation that was mentioned in this chapter:

About the Authors

  • José Manuel Ortega

    José Manuel Ortega is a software engineer, focusing on new technologies, open source, security, and testing. His career goal has been to specialize in Python and security testing projects. In recent years, he has developed an interest in security development, especially in pentesting with Python. Currently, he is working as a security tester engineer and his functions in the role involves the analysis and testing of the security of applications in both web and mobile environments. He has taught at university level and collaborated with the official school of computer engineers. He has also been a speaker at various conferences. He is eager to learn about new technologies and loves to share his knowledge with the community.

    Browse publications by this author
  • Dr. M. O. Faruque Sarker

    Dr. M. O. Faruque Sarker is a software architect based in London; he has shaped various Linux and open source software solutions mainly on cloud computing platforms for various institutions. Over the past 10 years, he has led numerous Python software development and cloud infrastructure automation projects. In 2009, he started using Python and shepherded a fleet of miniature E-puck robots at the University of South Wales, Newport, UK. Later, he was invited to work on the Google Summer of Code (2009/2010) programs to contribute to the BlueZ and Tahoe-LAFS open source projects. He is the author of Python Network Programming Cookbook, Packt Publishing and received his PhD in multirobot systems at the University of South Wales.

    Browse publications by this author
  • Sam Washington

    Sam Washington currently works at University College London as a systems administrator in the platform integration team of the central IT department, supporting a variety of web hosting and network services. He enjoys the daily challenges of managing the demands of full-stack enterprise web applications and looking for ways to employ new technologies to improve services and workflows. He has been using Python for professional and personal projects for over 10 years.

    Browse publications by this author

Latest Reviews

(2 reviews total)
De concrete uitleg bij de codevoorbeelden ontbreekt of is veel te summier. De codevoorbeelden bevatten bovendien soms fouten. Het boek is wel goedkoop, maar gratis tutorials op het internet verschaffen veel duidelijkere informatie en zijn dus aangenamer om te lezen. De structuur van het boek vind ik wel ok, maar er is nog erg veel werk aan de winkel om de inhoud van het boek de moeite waarde te maken. Dat het boek dan ook duurder zal worden, vind ik geen probleem, als het maar echt leerzaam is.
Cover a lot of good infos

Recommended For You

Book Title
Unlock this full book FREE 10 day trial
Start Free Trial