Using the Registry and xlswriter modules

In this article by Chapin Bryce and Preston Miller, the authors of Learning Python for Forensics, we will learn about the features offered by the Registry and xlswriter modules.

(For more resources related to this topic, see here.)

Working with the Registry module

The Registry module, developed by Willi Ballenthin, can be used to obtain keys and values from registry hives. Python provides a built-in registry module called _winreg; however, this module only works on Windows machines. The _winreg module interacts with the registry on the system running the module. It does not support opening external registry hives.

The Registry module allows us to interact with the supplied registry hives and can be run on non-Windows machines. The Registry module can be downloaded from https://github.com/williballenthin/python-registry. Click on the releases section to see a list of all the stable versions and download the latest version. For this article, we use version 1.1.0. Once the archived file is downloaded and extracted, we can run the included setup.py file to install the module. In a command prompt, execute the following code in the module's top-level directory as shown:

python setup.py install

This should install the Registry module successfully on your machine. We can confirm this by opening the Python interactive prompt and typing import Registry. We will receive an error if the module is not installed successfully. With the Registry module installed, let's begin to learn how we can leverage this module for our needs.

First, we need to import the Registry class from the Registry module. Then, we use the Registry function to open the registry object that we want to query. Next, we use the open() method to navigate to our key of interest. In this case, we are interested in the RecentDocs registry key. This key contains recent active files separated by extension as shown:

>>> from Registry import Registry
>>> reg = Registry.Registry('NTUSER.DAT')
>>> recent_docs = reg.open('SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Explorer\\RecentDocs')

If we print therecent_docs variable, we can see that it contains 11 values with five subkeys, which may contain additional values and subkeys. Additionally, we can use thetimestamp() method to see the last written time of the registry key.

>>> print recent_docs
Registry Key CMI-CreateHive{B01E557D-7818-4BA7-9885-E6592398B44E}\Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs with 11 values and 5 subkeys

>>> print recent_docs.timestamp() # Last Written Time
2012-04-23 09:34:12.099998

We can iterate over the values in the recent_docs key using the values() function in a for loop. For each value, we can access the name(), value(), raw_data(), value_type(), and value_type_str() methods. The value() and raw_data() represent the data in different ways. We will use the raw_data() function when we want to work with the underlying binary data and use the value() function to gather an interpreted result. The value_type() and value_type_str() functions display a number or string that identify the type of data, such as REG_BINARY, REG_DWORD, REG_SZ, and so on.

>>> for i, value in enumerate(recent_docs.values()):
...     print '{}) {}: {}'.format(i, value.name(), value.value())
...
0) MRUListEx: ????
1) 0: myDocument.docx
2) 4: oldArchive.zip
3) 2: Salaries.xlsx
...

Another useful feature of the Registry module is the means provided for querying for a certain subkey or value. This is provided by the subkey(), value(), or find_key() functions. A RegistryKeyNotFoundException is generated when a subkey is not present while using the subkey() function:

>>> if recent_docs.subkey('.docx'):
...     print 'Found docx subkey.'
...
Found docx subkey.
>>> if recent_docs.subkey('.1234abcd'):
...     print 'Found 1234abcd subkey.'
...
Registry.Registry.RegistryKeyNotFoundException: ...

The find_key() function takes a path and can find a subkey through multiple levels. The subkey() and value() functions only search child elements. We can use these functions to confirm that a key or value exists before trying to navigate to them.

If a particular key or value cannot be found, a custom exception from the Registry module is raised. Be sure to add error handling to catch this error and also alert the user that the key was not discovered.

With the Registry module, finding keys and their values becomes straightforward. However, when the values are not strings and are instead binary data we have to rely on another module to make sense of the mess. For all binary needs, the struct module is an excellent candidate.

Creating Spreadsheets with the xlsxwriter Module

Xlsxwriter is a useful third-party module that writes Excel output. There are a plethora of Excel-supported modules for Python, but we chose this module because it was highly robust and well-documented. As the name suggests, this module can only be used to write Excel spreadsheets. The xlsxwriter module supports cell and conditional formatting, charts, tables, filters, and macros among others.

Adding data to a spreadsheet

Let's quickly create a script called simplexlsx.v1.py for this example. On lines 1 and 2 we import the xlsxwriter and datetime modules. The data we are going to be plotting, including the header column is stored as nested lists in the school_data variable. Each list is a row of information that we want to store in the output excel sheet, with the first element containing the column names.

001 import xlsxwriter
002 from datetime import datetime
003 
004 school_data = [['Department', 'Students', 'Cumulative GPA', 'Final Date'],
005                ['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)],
006                ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)],
007                ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)],
008                ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]]

The writeXLSX() function, defined on line 11, is responsible for writing our data in to a spreadsheet. First, we must create our Excel spreadsheet using the Workbook() function supplying the desired name of the file. On line 13, we create a worksheet using the add_worksheet() function. This function can take the desired title of the worksheet or use the default name 'Sheet N', where N is the specific sheet number.

011 def writeXLSX(data):
012     workbook = xlsxwriter.Workbook('MyWorkbook.xlsx')
013     main_sheet = workbook.add_worksheet('MySheet')

The date_format variable stores a custom number format that we will use to display our datetime objects in the desired format. On line 17, we begin to enumerate through our data to write. The conditional on line 18 is used to handle the header column which is the first list encountered. We use the write() function and supply a numerical row and column. Alternatively, we can also use the Excel notation, i.e. A1.

015     date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'})
016 
017     for i, entry in enumerate(data):
018         if i == 0:
019             main_sheet.write(i, 0, entry[0])
020             main_sheet.write(i, 1, entry[1])
021             main_sheet.write(i, 2, entry[2])
022             main_sheet.write(i, 3, entry[3])

The write() method will try to write the appropriate type for an object when it can detect the type. However, we can use different write methods to specify the correct format. These specialized writers preserve the data type in Excel so that we can use the appropriate data type specific Excel functions for the object. Since we know the data types within the entry list, we can manually specify when to use the general write() function or the specific write_number() function.

023         else:
024             main_sheet.write(i, 0, entry[0])
025             main_sheet.write_number(i, 1, entry[1])
026             main_sheet.write_number(i, 2, entry[2])

For the fourth entry in the list, thedatetime object, we supply the write_datetime() function with our date_format defined on line 15. After our data is written to the workbook, we use the close() function to close and save our data. On line 32, we call the writeXLSX() function passing it to the school_data list we built earlier.

027             main_sheet.write_datetime(i, 3, entry[3], date_format)
028 
029     workbook.close()
030 
031 
032 writeXLSX(school_data)

A table of write functions and the objects they preserve is presented below.

Function

Supported Objects

write_string

str

write_number

int, float, long

write_datetime

datetime objects

write_boolean

bool

write_url

str

When the script is invoked at the Command Line, a spreadsheet called MyWorkbook.xlsx is created. When we convert this to a table, we can sort it according to any of our values. Had we failed to preserve the data types values such as our dates might be identified as non-number types and prevent us from sorting them appropriately.

Learning Python for Forensics

Building a table

Being able to write data to an Excel file and preserve the object type is a step-up over CSV, but we can do better. Often, the first thing an examiner will do with an Excel spreadsheet is convert the data into a table and begin the frenzy of sorting and filtering. We can convert our data range to a table. In fact, writing a table with xlsxwriter is arguably easier than writing each row individually. The following code will be saved into the file simplexlsx.v2.py.

For this iteration, we have removed the initial list in the school_data variable that contained the header information. Our new writeXLSX() function writes the header separately.

004 school_data = [['Computer Science', 235, 3.44, datetime(2015, 07, 23, 18, 00, 00)],
005                ['Chemistry', 201, 3.26, datetime(2015, 07, 25, 9, 30, 00)],
006                ['Forensics', 99, 3.8, datetime(2015, 07, 23, 9, 30, 00)],
007                ['Astronomy', 115, 3.21, datetime(2015, 07, 19, 15, 30, 00)]]

Lines 10 through 14 are identical to the previous iteration of the function. Representing our table on the spreadsheet is accomplished on line 16.

010 def writeXLSX(data):
011     workbook = xlsxwriter.Workbook('MyWorkbook.xlsx')
012     main_sheet = workbook.add_worksheet('MySheet')
013 
014     date_format = workbook.add_format({'num_format': 'mm/dd/yy hh:mm:ss AM/PM'})

The add_table() function takes multiple arguments. First, we pass a string representing the top-left and bottom-right cells of the table in Excel notation. We use the length variable, defined on line 15, to calculate the necessary length of our table. The second argument is a little more confusing; this is a dictionary with two keys, named data and columns. The data key has a value of our data variable, which is perhaps poorly named in this case. The columns key defines each row header and, optionally, its format, as seen on line 19:

015     length = str(len(data) + 1)
016     main_sheet.add_table(('A1:D' + length), {'data': data,
017                                              'columns': [{'header': 'Department'}, {'header': 'Students'},
018                                                          {'header': 'Cumulative GPA'},
019                                                          {'header': 'Final Date', 'format': date_format}]})
020     workbook.close()

In lesser lines than the previous example, we've managed to create a more useful output built as a table. Now our spreadsheet has our specified data already converted into a table and ready to be sorted.

There are more possible keys and values that can be supplied during the construction of a table. Please consult the documentation at (http://xlsxwriter.readthedocs.org) for more details on advanced usage.

This process is simple when we are working with nested lists representing each row of a worksheet. Data structures not in the specified format require a combination of both methods demonstrated in our previous iterations to achieve the same effect. For example, we can define a table to span across a certain number of rows and columns and then use the write() function for those cells. However, to prevent unnecessary headaches we recommend keeping data in nested lists.

Creating charts with Python

Lastly, let's create a chart with xlsxwriter. The module supports a variety of different chart types including: line, scatter, bar, column, pie, and area. We use charts to summarize the data in meaningful ways. This is particularly useful when working with large data sets, allowing examiners to gain a high level of understanding of the data before getting into the weeds.

Let's modify the previous iteration yet again to display a chart. We will save this modified file as simplexlsx.v3.py. On line 21, we are going to create a variable called department_grades. This variable will be our chart object created by the add_chart()method. For this method, we pass in a dictionary specifying keys and values[SS4] . In this case, we specify the type of the chart to be a column chart.

021     department_grades = workbook.add_chart({'type':'column'})

On line 22, we use theset_title() function and again pass it in a dictionary of parameters. We set the name key equal to our desired title. At this point, we need to tell the chart what data to plot. We do this with the add_series() function. Each category key maps to the Excel notation specifying the horizontal axis data. The vertical axis is represented by the values key. With the data to plot specified, we use theinsert_chart() function to plot the data in the spreadsheet. We give this function a string of the cell to plot the top-left of the chart and then the chart object itself.

022     department_grades.set_title({'name':'Department and Grade distribution'})
023     department_grades.add_series({'categories':'=MySheet!$A$2:$A$5', 'values':'=MySheet!$C$2:$C$5'})
024     main_sheet.insert_chart('A8', department_grades)
025     workbook.close()

Running this version of the script will convert our data into a table and generate a column chart comparing departments by their grades. We can clearly see that, unsurprisingly, the Forensic Science department has the highest GPA earners in the school's program. This information is easy enough to eyeball for such a small data set. However, when working with data orders of larger magnitude, creating summarizing graphics can be particularly useful to understand the big picture.

Learning Python for Forensics

Be aware that there is a great deal of additional functionality in the xlsxwriter module that we will not use in our script. This is an extremely powerful module and we recommend it for any operation that requires writing Excel spreadsheets.

Summary

In this article, we began with introducing the Registry module and how it is used to obtain keys and values from registry hives. Next, we dealt with various aspects of spreadsheets, such as cells, tables, and charts using the xlswriter module.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning Python for Forensics

Explore Title