Reader small image

You're reading from  Jupyter Cookbook

Product typeBook
Published inApr 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788839440
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Dan Toomey
Dan Toomey
author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey

Right arrow

Examining big-text log file access


MonitorWare is a network monitoring solution for Windows machines. It has sample log files that show access to different systems. I downloaded the HTTP log file sample set from http://www.monitorware.com/en/logsamples/apache.php. The log file has entries for different HTTP requests made to a server. 

The URl downloads the apache-samples.rar file. A .rar file is a type of compressed format for very large files that would overwhelm the normal .zip file format. This example is only 20 KB. You need to extract the log file from the .rar file for access in the following coding.

How to do it...

We can use a similar script that loads the file, and then we use additional functions to pull out the records of interest. The coding is:

import pyspark

if not 'sc' in globals():
    sc = pyspark.SparkContext()

textFile = sc.textFile("access_log")
print(textFile.count(),"access records")

gets = textFile.filter(lambda line: "GET" in line)
print(gets.count(),"GETs")

posts...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Jupyter Cookbook
Published in: Apr 2018Publisher: PacktISBN-13: 9781788839440

Author (1)

author image
Dan Toomey

Dan Toomey has been developing application software for over 20 years. He has worked in a variety of industries and companies, in roles from sole contributor to VP/CTO-level. For the last few years, he has been contracting for companies in the eastern Massachusetts area. Dan has been contracting under Dan Toomey Software Corp. Dan has also written R for Data Science, Jupyter for Data Sciences, and the Jupyter Cookbook, all with Packt.
Read more about Dan Toomey