The anatomy of an Apache log file
Before we create the regular expression that will match a line of the Apache file, we need to understand what kind of information it holds.
Let's take a look at a line from access.log
:
127.0.0.1 - jan [30/Jun/2004:22:20:17 +0200] "GET /cgi-bin/trac.cgi/login HTTP/1.1" 302 4370 "http://saturn.solar_system/cgi-bin/trac.cgi" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040620 Galeon/1.3.15"
The Apache access log that we are reading follows the %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
format. Let's take a look at each part:
%h
: The first part of the log is the (127.0.0.1
) IP address%l
: In the second part, the hyphen in the output indicates that the requested piece of information is not available%u
: The third part is the user ID of the person requesting the (jan
) document.%t
: The fourth part is the time taken for the request to be received, such as ([30/Jun/2004:22:20:17 +0200]
). It is in the[day/month/year:hour:minute:second...