This chapter will provide an introduction to Python, focusing primarily on data types, variables, expressions, and program structures that the Python programming language follows. The objective of this chapter is to familiarize the reader with the basics of Python so that they can use it in the upcoming chapters. The chapter will cover the installation of Python and its dependency manager. We will also start taking a look at scripting in Python.
In this chapter, we will cover the following topics:
- An introduction to Python (including its installation and setup)
- Basic data types
- Sequence data types – lists, dictionaries, tuples
- Variables and keywords
- Operations and expressions
Make sure you have the following setup ready before proceeding with this chapter:
- A working computer or laptop
- An Ubuntu operating system, preferably version 16.04
- Python 3.x
- A working internet connection
When we think about exploring a new programming language or technology, we often wonder about the scope of the new technology and how it might benefit us. Let's start this chapter by thinking about why we might want to use Python and what advantages it might give us.
To answer this question, we are going to think about current technology trends and not get into more language-specific features, such as the fact that it is object-oriented, functional, portable, and interpreted. We have heard these terms before. Let's try to think about why we might use Python from a strictly industrial standpoint, what the present and future landscapes of this language might look like, and how the language can serve us. We'll start by mentioning a few career options that someone involved in computer science might opt for:
- Programmer or software developer
- Web developer
- Database engineer
- Cyber security professional (penetration tester, incident responder, SOC analyst, malware analyst, security researcher, and so on)
- Data scientist
- Network engineer
There are many other roles as well, but we'll just focus on the most generic options for the time being to see how Python fits into them. Let's start off with the role of programmer or software developer. As of 2018, Python was recorded as the second most popular language listed in job adverts (https://www.codingdojo.com/blog/7-most-in-demand-programming-languages-of-2018/). The role of programmer might vary from company to company, but as a Python programmer, you might be making a software product written in Python, developing a cyber security tool written in Python (there are tons of these already in existence that can be found on GitHub and elsewhere in the cyber security community), prototyping a robot that can mimic humans, engineering a smart home automation product or utility, and so on. The scope of Python covers every dimension of software development, from typical software applications to robust hardware products. The reason for this is the ease of the language to understand, the power of the language in terms of its excellent library support, which is backed by a huge community, and, of course, the beauty of it being open source.
Let's move on to the web. In recent years, Python has done remarkably well in terms of its maturity as a web development language. The most popular full stack web-based frameworks such as Django, Flask, and CherryPy have made web development with Python a seamless and clean experience, with lots of learning, customization, and flexibility on the way. My personal favorite is Django, as it provides a very clean MVC architecture, where business, logic, and presentation layers are completely isolated, making the development code much cleaner and easier to manage. With all batteries loaded and support for ORM and out-the-box support for background task processing with celery, Django does everything that any other web framework would be capable of doing, while keeping the native code in Python. Flask and CherryPy are also excellent choices for web development and come with lots of control over the data flow and customization.
Cyber security is a field that would be incomplete without Python. Every industry within the cyber security domain is related to Python in one way or another and the majority of cyber security tools are written in Python. From penetration testing to monitoring security operations centers, Python is widely used and needed. Python aids penetration testers by providing them with excellent tools and automation support with which they can write quick and powerful scripts for a variety of penetration testing activities, from reconnaissance to exploitation. We will learn about this in great detail throughout the course of this book.
Machine learning (ML) and artificial intelligence (AI) are buzz words in the tech industry that we come across frequently nowadays. Python has excellent support for all ML and AI models. Python, by default in most cases, is the first choice for anyone who wants to learn ML and AI. The other famous language in this domain is R, but because of Python's excellent coverage across all the other technology and software development stacks, it is easier to combine machine learning solutions written in Python with existing or new products than it is to combine solutions written in R. Python has got amazing machine learning libraries and APIs such as scikit-learn, NumPy, Pandas, matplotlib, NLTK, and TensorFlow. Pandas and NumPy have made scientific computations a very easy task, giving users the flexibility to process huge datasets in memory with an excellent layer of abstraction, which allows developers and programmers to forget about the background details and get the job done neatly and efficiently.
A few years ago, a typical database engineer would have been expected to know relational databases such as MySQL, SQL Server, Oracle, PostgreSQL, and so on. Over the past few years, however, the technology landscape has completely changed. While a typical database engineer is still supposed to know and be proficient with this database technology stack, this is no longer enough. With the increasing volume of data, as we enter the era of big data, traditional databases have to work in conjunction with big data solutions such as Hadoop or Spark. Having said that, the role of the database engineer has evolved to be one that includes the skill set of a data analyst. Now, data is not to be fetched and processed from local database servers—it is to be collected from heterogeneous sources, pre-processed, processed across a distributed cluster or parallel cores, and then stored back across the distributed cluster of nodes. What we are talking about here is big data analytics and distributed computing. We mentioned the word Hadoop previously. If you are not familiar with it, Hadoop is an engine that is capable of processing huge files by spawning chunks of files across a cluster of computers and then performing an aggregation on the processed result set, something which is popularly known as a map-reduce operation. Apache Spark is a new buzzword in the domain of analytics and it claims to be 100 times faster than the Hadoop ecosystem. Apache Spark has got a Python API for Python developers called pyspark, using which we can run Apache Spark with native Python code. It is extremely powerful and having familiarity with Python makes the setup easy and seamless.
The objective of mentioning the preceding points was to highlight the significance of Python in the current technological landscape and in the coming future. ML and AI are likely to be the dominating industries, both of which are primarily powered by Python. For this reason, there will not be a better time to start reading about and exploring Python and cyber security with machine learning than now. Let's start our journey into Python by looking at a few basics.
Compilers work by converting human-readable code written in high-level programming languages into machine code, which is then run by the underlying architecture or machine. If you don't wish to run the code, the compiled version can be saved and executed later on. It should be noted that the compiler first checks for syntax errors and only creates the compiled version of the program if none are found. If you have used C, you might have come across .out files, which are examples of compiled files.
In the case of interpreters, however, each line of the program is taken and interpreted from the source code at runtime and then converted into machine code for execution. Python falls into the category of interpreted byte code. This means that the Python code is first translated to an intermediate byte code (a .pyc file). Then, this byte code is interpreted line by line by the interpreter and executed on the underlying architecture.
Over the course of this book, all of the exercises will be shown on a Linux OS. In my case, I am using Ubuntu 16.04. You can choose any variant you prefer. We will be using python3 for our exercises, which can be installed as follows:
sudo apt-get install python3
sudo apt-get install python3-pip
The second command installs pip, which is Python's package manager. All open source Python libraries that do not come as part of the standard installation can be installed with the help of pip. We will be exploring how to use pip in the upcoming sections.
Throughout the course of this book, we will aim to cover advanced and well-known industry standards in Python, cyber security, penetration testing, and the data science space. However, as they say, every remarkable journey starts with small steps. Let's go ahead and start our journey by understanding the basics of Python.
Variables, as the name suggests, are placeholders that hold a value. A Python variable is nothing but a name that can hold a user-defined value during the scope of a Python program or script. If we compare Python variables to other conventional languages, such as C, C++, Java, and so on, we will see that they are a little bit different. In the other languages, we have to associate a data type with the name of the variable. For example, to declare an integer in C or Java, we have to declare it as int a=2, and the compiler will immediately reserve two bytes of memory in C and four bytes in Java. It would then name the memory location as a, which is to be referenced from the program with the value 2 stored in it. Python, however, is a dynamically typed language, which means that we do not need to associate a data type with the variable that we will declare or use in our program.
A typical Python declaration of an integer might look like a=20. This simply creates a variable named a and places the value 20 in it. Even if we change the value in the next line to be a="hello world", it would associate the string hello world with the variable a. Let's see that in action on the Python Terminal, as follows:
To use the Python Terminal, simply type the python3 command in your Terminal prompt. Let's think about how this works. Take a look at the following diagram, which compares statically typed languages with dynamically typed languages:
As you can see in the preceding diagrams, in the case of Python, the variable actually holds a reference to the actual object. Every time the value is changed, a new object is created in memory and the variable points toward this new object. The previous object is claimed by the garbage collector.
Having discussed that Python is a dynamically typed language, we must not confuse it with a weakly typed one. Though Python is dynamically typed, it is also a strongly typed language, just like Java, C, or C++.
In the following example, we declare a variable, a, of string type and a variable, b, of integer type:
When we carry out the operation c=a+b, what might happen in a weakly typed language is that the integer value of b would be typecasted to a string, and the result that was stored in variable c would have been hello world22. However, because Python is strongly typed, the function adheres to the type that is associated with the variable. We need to make the conversion explicitly to carry out any operations of this kind.
Let's take a look at the following example to understand what it means to be a strongly typed language; we explicitly change the type of variable b and typecast it to a string type at runtime:
Having understood the basics of how variables can be declared and used, let's try to understand the naming conventions they follow. A variable, also known as an identifier, can be named by anything that starts with any letter between A-Z, a-z, or an underscore. This can then be followed by any number of digits or alphanumeric characters.
Keywords, as the name implies, are certain reserved words that have a predefined meaning within a particular language implementation. In other languages, we cannot usually name our variables with the same name as that of the keywords, but Python is a slightly different case. Although we shouldn't name the variables or identifiers with the same name as those reserved for keywords, even if we do, the program will not throw any errors and we will still get an output. Let's try to understand this with the help of a conventional C program and an equivalent Python script:
It should be noted that this is a simple C program in which we have declared an integer and used the int identifier to identify it, following which we simply print hello world.
When we try to compile the program, however, it throws a compilation error, as shown in the following screenshot:
Let's try to do the same in a Python shell and see what happens:
It can be seen that the program did not throw any errors when we declared our variable with the names int and str. Although both int and str are Python keywords, in the preceding case, we saw that a variable declared with name as int held a string value and a variable declared with str type held an int value. We also saw how a normal variable, a, was typecasted from int to string type. From this, it can be established that we can use reserved words as variables in Python. The downside of this is that if we are to make use of keywords as variables or identifiers, we are overriding the actual functionality that these reserved words possess. When we override their actual behavior within the scope of our program, they will follow the updated or overridden functionality, which is very dangerous as this would make our code fall out of Python's conventions. This should always be avoided.
Let's extend the preceding example. We know that str() is a built-in Python function, the purpose of which is to convert a numeric data type into a string type, as we saw for variable a. Later on, however, we overwrote its functionality and, for the scope of our program, we assigned it to an integer type. Now, at any point in time during the scope of this program, if we try to use the str function to convert a numeric type into a string, the interpreter will throw an error, saying that the int type variables can't be used as methods, or that they are not callable, as shown in the following screenshot:
The same would hold true for the int method and we would no longer be able to use it to type cast a string to its equivalent integer.
Now, let's take a look at other types of keywords that are available in Python that we should try not to use as our variable names. There is a cool way to do this with the Python code itself, which lets us print the Python keywords in the Terminal window:
The import statement is used to import the libraries in Python, just as we use imports for importing packages in Java. We will get into the details of using imports and loops in future sections. For now, we will look at what the different Python keywords mean:
- false: The Boolean false operator.
- none: This is equivalent to Null in other languages.
- true: The Boolean true operator.
- and: The logical and that can be used with conditions and loops.
- as: This is used to assign an alias to a module that we import.
- assert: This is used with the objective of debugging code.
- break: This exits the loop.
- class: This is used to declare a class.
- continue: This is the traditional continue statement used with loops that can be used to continue the execution of a loop.
- def: This is used to define a function. Every Python function needs to be preceded by the def keyword.
- del: This is used to delete objects
- elif: The conditional else...if statement.
- else: The conditional else statement.
- except: This is used to catch exceptions.
- finally: This is used with exception handling as part of the final block of code in which we clean our resources.
- for: The traditional for loop declaration keyword.
- global: This is used to declare and use global variables.
- if: The conditional if statement.
- import: This is used to import Python libraries, packages, and modules.
- in: This is used to search between Python strings, lists, and other objects.
- is: This is used to test the identity of an object.
- lambda: This is used with Lambda functions.
- nonlocal: This is used to declare a variable inside a nested function that is not local to it.
- not: This is a conditional operator.
- or: This is another conditional operator.
- pass: This is used as a placeholder in Python.
- raise: This is used to raise an exception in Python.
- return: This is used to return from a function.
- try: The traditional try keyword that's used with exception handling.
- while: This is used with the while loop.
- with: This is used with file opening and so on.
- yield: This is used with generators.
- from: This is used with relative imports.
Throughout this book, we will learn about all the keywords mentioned in this list.
Like any other programming language, Python also comes with standard data types. In this section, we will explore the various powerful data types that Python makes available for us to use.
Numbers, as the name suggests, covers all the numeric data types, including both integer and floating data types. Earlier in this chapter, we saw that to use an integer or a float, we can simply declare the variable and assign an integer or a float value. Now, let's write a proper Python script and explore how to use numbers. Name the script numbers.py which is shown as follows:
The preceding screenshot show a simple Python script that adds an integer with a float and then prints the sum. To run the script, we can type the python3 numbers.py command, as follows:
You might have noticed that the command at the beginning of the script says #! /usr/bin/python. What this line does is make your code executable. After the privileges of the script have changed and it has been made executable, the command says that if an attempt is made to execute this script, then we should go ahead and execute it with python3, which is placed in the /usr/bin/python3 path. This can be seen in the following example:
If we observe the print command, we can see that the string formatter is %s. To fill it in with the actual value, the second argument to the print function is passed:
To convert a string into its equivalent integer or float value, we can use the built-in int() and float() functions.
We know that a string is a collection of characters. In Python, string types come under the sequence category. Strings are really powerful and have many methods that can be used to perform string manipulation operations. Let's look at the following piece of code, which introduces us to strings in Python. Strings can be declared within both single and double quotes in Python:
In the preceding code, we are simply declaring a string called my_str and printing it on the console window.
It must be noted that strings can be accessed as a sequence of characters in Python. Strings can be thought of as a list of characters. Let's try to print the characters at various indices of the string, as shown in the following screenshot:
At index 0, the character 0 gets printed. At index 10, we have an empty space, while at index 5, we have the letter m. It should be noted that the sequences are stored in Python with a starting index of 0, and the same holds true for the string type.
In this section, we will look at how to compare two strings, concatenate strings, copy one string to another, and perform various string manipulation operations with the help of some methods.
The replace method is used to perform string replacement. It returns a new string with the appropriate replacements. The first argument to the replace method is the string or character to be replaced within the string, while the second argument is the string or character with which it is to be replaced:
In the preceding example, we can see that the ! from the original string is replaced by @ and a new string with the replacement is returned. It should be noted that these changes were not actually made to the original string, but instead a new string was returned with the appropriate changes. This can be verified in the following line, where we print the original string and the old unchanged value, Welcome to python strings !, is printed. The reason behind this is that strings in Python are immutable, just like they are in Java. This means that once a string is declared, it can't usually be modified. This isn't always the case, however. Let's try to change the string and this time try and catch the modifications in the originally declared string, my_str, as follows:
In the preceding code, we were able to modify the original string, as we got the newly returned string from the replace method in our earlier declared string, my_str. This might sound contradictory to what we said previously. Let's take a look at how this works by looking at what happens behind the scenes before and after we call the replace method:
After replacing the ! with @, this will look as follows:
It can be seen in the preceding two illustrations that before the replace method was called, the my_str string reference pointed toward the actual object that contained an !. Once the replace() method returned a new string and we updated the existing string variable with the newly returned object, the older memory object was not overwritten, but instead a new one was created. The program reference now points toward the newly created object. The earlier object is in memory and doesn't have any references pointing toward it. This will be cleaned up by the garbage collector at a later stage.
Another thing we can do is try and change any character in any position of the original string. We have already seen that the string characters can be accessed by their index, but if we try to update or change a character at any specific index, an exception will be thrown and the operation will not be permitted, as shown in the following screenshot:
By default, the replace() method replaces all the occurrences of the replacement string within the target string. If we only want to replace one or two occurrences of something within the target string, however, we can pass a third argument to the replace() method and specify the number of replacement occurrences that we want to have. Let's say we have the following string:
If we just want the first occurrence of the ! character to be @ and we want the rest to be the same, this can be achieved as follows:
Obtaining part of the string is a common exercise that we come across frequently in day-to-day string operations. Languages such as C or Java provide us with dedicated methods such as substr(st_index,end_index) or subString(st_index,end_index). To perform the substring operation in Python, there is no dedicated method, but we can instead use slicing. For example, if we wish to get the first four characters of our original my_str string, we can achieve this by using operations such as my_str[0:4], as shown in the following screenshot:
Again, the slice operation returns a new string and the changes are not applied to the original string. Furthermore, it is worth understanding here that the slicing happens over n-1 characters, where n is the upper limit, specified as the second parameter, which is four, in our case. Thus, the actual substring operation will be performed starting from index 0 and ending at index 3, thus returning the string Welc.
Let's take a look at some more examples of slicing:
To get the whole string from index 4, do the following:
To get the string from the start up to index 4, do the following:
To print the whole string with slicing, do the following:
To print the characters with a step of 2, do the following:
To print the reverse of the string, do the following:
To print a part of the string in reverse order, to the following:
+ is the concatenation operator that's used in Python to concatenate two strings. As always, the result of the concatenation is a new string and unless we get the updated string, the update will not be reflected with the original string object. The + operator is internally overloaded to perform concatenation of objects when it is used on string types. It is also used for the addition of two numbers when used on numeric data types, like so:
Interestingly, Python also supports another operator that gets overloaded when used with string data types. Instead of performing a conventional operation, this operator performs a variation of the original operation so that the functionality can be replicated across string data types. Here, we are talking about the multiplication operator, *. This is conventionally supposed to perform the multiplication of numeric data types, but when it is used on string data types, it performs a replication operation instead. This is shown in the following code snippet:
In the preceding case, the multiplication operator actually replicates the Hello world string stored in the c variable five times, as we specified in the expression. This is a very handy operation and can be used to generate fuzzing payloads, which we will see in the later chapters of this book.
The strip method is actually used to strip off the white spaces from the input string. By default, the strip method will strip off the spaces from both the left and right sides of the string and will return a new string without spaces on both the leading and trailing sides, as shown in the following screenshot:
However, if we only wish to strip off the left spaces ,we can use the lstrip() method. Similarly, if we just wish to strip off the right spaces, we can use the rstrip() method. This is shown as follows:
The split method, as the name suggests, is used to split the input string over a particular delimiter and return a list that contains the words that have been split. We will be looking at lists in more detail shortly. For now, let's take a look at the following example, where we have the name, the age, and the salary of an employee in a string separated by commas. If we wish to obtain this information separately, we can perform a split over ,. The split function takes the first argument as the delimiter on which the split operation is to be performed:
By default, the split operation is performed over a space, that is, if a delimiter is not specified. This can be seen as follows:
The find() function is used to search for a character or string within our target string. This function returns the first index of the string if a match is found. It returns -1 if it does not find the match:
The index() method is identical to the find() method. It returns the first index of the string if it finds the match and raises an exception if it does not find a match:
The upper() method is used to transform the input string to upper case letters and the lower() method is used to transform a given string to lowercase letters:
The len() method returns the length of the given string:
The count() method returns the number of occurrences of any character or string that we wish to count within the target string:
The in and not in methods are very handy, as they let us perform a quick search on the sequences. If we wish to check if a certain character or word is present or not present in the target string, we can use the in and not in methods. These will return True if the word is present and False otherwise:
The endswith() method checks whether the given string ends with a specific character or word that we pass as an argument:
The isdigit() method checks whether the given string is of a digit type or not:
The isalpha() method checks whether the given string is of an alphabetic character type or not:
The islower() method checks whether the string is lowercase, while the isupper() method checks if the string is uppercase. The capitalize() method puts a given string into sentence case:
Python does not have array types, but instead offers the list data type. Python lists also fall under the category of sequences and offer a wide range of functionalities. Coming from a Java, C, or C++ background, you are likely to find that Python lists are slightly different from the arrays and list types offered by these languages. In C, C++, or Java, an array is a collection of elements of similar data types, and this is also the case for Java array lists. This is different in the case of Python. In Python, a list is a collection of elements that can be of either homogeneous and heterogeneous data types. This is one of the features that makes Python lists powerful, robust, and easy to use. We also don't need to specify the size of a Python list when declaring it. It can grow dynamically to match the number of elements it contains. Let's see a basic example of using lists:
Lists in Python start from index 0 and any item can be accessed on the basis of indices, as shown in the preceding screenshot. The preceding list is homogeneous, as all the elements are of string type. We can also have a heterogeneous list, as follows:
For now, we are printing the list elements manually. We can very easily iterate over them with loops instead, and we will explore that later on. For now, let's try to understand which operations can be performed on list structures in Python.
Slicing is an operation that allows us to extract elements from sequences and lists. We can slice lists to extract portions that we might be interested in. It must be noted again that the indexes of slicing are 0-based and that the last index is always considered to be n-1, where n is the specified last index value. To slice the first five and last five elements from the list, we can perform the following operation:
Let's see some examples of list slicing and their results:
- To get the list from index 4 onwards, do the following:
- To get the list elements from the start up to index 4, do the following:
- To print the whole list with slicing, do the following:
- To print the list elements with a step size of 2, do the following:
- To print the reverse of the list, do the following:
- To print a portion of the list in reverse order, do the following:
- Add new elements to list-append(): The append() method is used to add elements to the list, and the elements to be added are given as an argument to the append() method. These elements to be added can be of any type. As well as being a number or a string, the element can be a list in itself:
We can see in the preceding example that we added three elements, 6, 7, and 8, to our original list using the append() method. Then, we actually added another list containing three characters that would be stored intact as a list inside the original list. These can be accessed by specifying the my_list index. In the preceding example, the new list is added intact to the original list, but is not merged.
List merging can be done in two ways in Python. First, we can use the traditional + operator, which we used previously to concatenate two strings. It does the same when used on list object types. The other way to achieve this would be by using the extend method, which takes the new list as an argument to be merged with the existing list. This is shown in the following example:
To update an element in the list, we can access its index and add the updated value for any element that we wish to update. For example, if we want to have the string Hello as the 0th element of the list, this can be achieved by assigning the 0th element to the Hello value as merged="hello":
We have seen that Python variables are nothing but references to actual objects. The same holds true for lists. For this reason, manipulating lists gets a little tricky. By default, if we copy one list variable to another one by simply using the = operator, it won't actually create a duplicate or local copy of the list for that variable – instead, it would just create another reference and point the newly created reference toward the same memory location. Thus, when we make a change to the copied variable, the same change will be reflected in the original list. In the following example, we will create new isolated copies, where a change in the copied variable will not be reflected in the original list:
Now, let's look at how can we create a new copy of an existing list so that the changes to the new one do not cause any changes to the existing one:
Another way to create the isolated copy of the original list is to make use of the copy and deepcopy functions that are available in Python. A shallow copy constructs a new object and then inserts references to that object to the objects found in the original list. A deep copy, on the other hand, constructs a new compound object and then recursively inserts copies of the objects found in the original list:
We can use the del command to delete either an element from the list or the whole list. The del command does not return anything. We can also use the pop method to remove elements from the list. The pop method takes the index of the element that we wish to remove as an argument:
The entire list structure can be deleted as follows:
The multiplication operator *, when applied to lists, causes a replication effect of the list elements. The contents of the list are repeated as many times as indicated by the number passed to the replication operator:
The len() method gives the length of the Python lists. The max() method returns the maximum element of the list, while the min() method returns the minimum element of the list:
We can use the max and min methods on the character types as well, but we cannot use them on a list that has mixed or heterogeneous types. If we do this, we will get an exception stating that we are trying to compare numbers and characters:
The in and not in methods are essential Python operations that can be used against any sequence type. We saw how these were used previously with strings, where we used them to search for a string or character within the target string. The in method returns true if the search is successful and returns false if not. The opposite is the case for the not in method. The execution is shown as follows:
A Python tuple is very similar to a Python list. The difference is that it's a read-only structure, so once it is declared, no modification can be made to the elements of the tuple. Python tuples can be used as follows:
In the preceding code, we can see that we can access tuples in the same way as we can access lists, but when we try to change any element of the tuple, it throws us an exception as a tuple is a read-only structure. If we perform the operations that we performed on lists, we will see that they work in exactly the same way as tuples:
If a tuple has only one element in it, it has to be declared with a trailing comma. If we do not add that comma while declaring it, it will be interpreted as a numeric or string data type, depending on the elements of the tuple. The following example explains this better:
A tuple can be converted into a list and can then be operated on as follows:
Dictionaries are very powerful structures and are widely used in Python. A dictionary is a key-value pair structure. A dictionary key can be a unique number or string, and the value can be any Python object. Dictionaries are mutable and can be changed in place. The following example demonstrates the basics of dictionaries in Python:
A Python dictionary can be declared within curly braces. Each key value pair is separated by a comma. It should be noted that the keys have to be unique; if we try to repeat the keys, the old key value pair is overwritten by the new one. From the preceding example, we can establish that the dictionary keys can be either string or numeric types. Let's try to perform various operations on dictionaries in Python:
- Retrieving the dictionary values with the keys: Dictionary values can be accessed through the name of the dictionary key. If the name of the key is not known, we can use loops to iterate through the whole dictionary structure. We will cover this in the next chapter of this book:
This is one of the many ways to print dictionary values. However, if the key for which the value we wish to print does not exist in the dictionary, we will get a key not found exception, as shown in the following screenshot:
There is a better way to handle this and avoid these kinds of exceptions. We can use the get() method provided by the dictionary class. The get() method takes the key name as the first argument and the default value if the key is not present as the second argument. Then, instead of throwing an exception, the default value will be returned if the key is not found. This is shown in the following screenshot:
In the preceding example, when the k1 key is present in the actual dictionary, dict1, the value for the k1 key is returned, which is v1. Then, the k0 key was searched, which was not present originally. In that case, no exception was raised, but instead the False value was returned, suggesting that no such key, K0, was actually present. Remember that we can specify any placeholder as the second argument to the get() method to indicate the absence of the key we are searching for.
Adding keys and values to the dictionary: Once a dictionary has been declared, over the course of the code, there could be many occasions in which we want to modify a dictionary key or add a new dictionary key and value. This can be achieved as follows. As mentioned earlier, a dictionary value can be any Python object, so we can have tuples, lists, and dictionary types as values inside a dictionary:
Now, let's add more complex types as values:
These values can be retrieved as normal values by their keys as follows:
- Expanding a dictionary with the contents of another dictionary: In the preceding example,we added a dictionary as a value to an existing dictionary. We will now see how can we merge two dictionaries into one common or new dictionary. The update() method can be used to do this:
- Keys(): To get all the dictionary keys, we can use the keys() method. This returns the class instances of the dictionary keys:
We can see that the keys method returns an instance of the dict_keys class, which holds the list of dictionary keys. We can type cast this as a list type as follows:
- values(): The values() method returns all the values that are present in the dictionary:
- Items(): This method is actually used to iterate over the dictionary key value pairs, as it returns a list class instance that contains a list of tuples. Each tuple has two entries, the first one being the key and the second one being the value:
We can convert the returned class instance into a tuple, list tuple, or list type as well. The ideal way to do this is to iterate over the items, which we will see later when we look at loops:
- in and not in: The in and not in methods are used to see whether a key is present in the dictionary or not. By default, the in and not in clauses will search the dictionary keys, not the values. Take a look at the following example:
- Order of storing: By default, Python dictionaries are unordered, which means they are not stored internally in the same order as we define them. The reason for this is that the dictionaries are stored in dynamic tables called hash tables. As these tables are dynamic, they can increase and shrink in size. What happens internally is that a hash value of the key is computed and stored in the table. The key goes in the first column, while the second column holds the actual value. Let's take a look at the following example to explain this better:
In the preceding case, we declare a dictionary, a, with the first key as abc and the second key as abcd. When we print the values, however, we can see that abcd is stored internally before abc. To explain this, let's assume that the dynamic table or hash table in which the dictionary is internally stored is of size 8.
As we mentioned earlier, the keys will be stored as hash values. When we compute the hash of the abc string and and divide it in a modular fashion by 8, which is the table size, we get the result of 7. If we do the same for abcd, we get a result of 4. This means that the hash abcd will be stored at index 4, while the hash abc will be stored at index 7. For this reason, in the listing, we get abcd listed before abc:
There may be occasions in which two keys arrive at a common value after the hash(key)%table_size operation, which is called a collision. In this case, the key to be slotted first is the one that is stored first.
sorted(): If we want our dictionary to be sorted according to the keys, we can use the built-in sorted method. This can be tweaked to return a list of tuples, with each tuple having a key at the 0th index and its value at the 1st index:
- Removing elements: We can use the conventional del statement to delete any dictionary item. When we say delete, we mean delete both the key and the value. Dictionary items work in pairs, so deleting the key would remove the value as well. Another way to delete an entry is to use the pop() method and pass the key as an argument. This is shown in the following code snippet:
An operator in Python is something that can carry out arithmetic or logical operations on an expression. The variable on which the operator operates is called the operand. Let's try to understand the various operators that are available in Python:
|Addition||a + b|
|Subtraction||a - b|
|Multiplication||a * b|
|Division||a / b|
|Modulo||a % b|
|Exponentiation||a ** b|
|Floor Division||a // b|
- a = 0 evaluates to a=0
- a +=1 evaluates to a = a + 1
- a -= 1 evaluates to a = a + 1
- a *= 2 evaluates to a = a * 2
- a /= 5 evaluates to a = a / 5
- a **= 3 evaluates to a = a ** 3
- a //= 2 evaluates to a= a // 2 (floor division 2)
- a %= 5 evaluates to a= a % 5
- Logical operators:
- and: True: If both the operands are true, then the condition becomes true. For example, (a and b) is true.
- or: True: If any of the two operands are non-zero, then the condition becomes true. For example, (a or b) is true.
- not: True: This is used to reverse the logical state of its operand. For example, not (a and b) is false.
- Bitwise operators:
|and||a & b|
|or||a | b|
|xor||a ^ b|
|Right Shift||a >> b|
|Left Shift||a << b|
In this chapter, we discussed the basics of Python and explored the syntax of the language. This isn't very different from the languages that you may have studied in the past, such as C, C ++, or Java. However, it's much easier to use and is really powerful in the cyber security domain compared to its peers. This chapter formulates the basics of Python and will help us progress, as some data types such as lists, dictionaries, tuples, and strings are used heavily throughout the course of this book.
In the next chapter, we will learn about conditions and loops and see how loops can be used with the data types that we have studied so far.
- Is Python open source? If so, how is it different from other open source languages?
- Who manages Python and works on further feature enhancements?
- Is Python faster than Java?
- Is Python object-oriented or functional?
- Can I learn Python quickly if I have little to no experience with any programming language?
- How is Python beneficial to me, being a cyber security engineer?
- I am a penetration tester – why do I need to understand AI and machine learning?