Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Squid Proxy Server 3.1: Beginner's Guide

You're reading from  Squid Proxy Server 3.1: Beginner's Guide

Product type Book
Published in Feb 2011
Publisher Packt
ISBN-13 9781849513906
Pages 332 pages
Edition 1st Edition
Languages
Concepts

Table of Contents (20) Chapters

Squid Proxy Server 3.1 Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with Squid 2. Configuring Squid 3. Running Squid 4. Getting Started with Squid's Powerful ACLs and Access Rules 5. Understanding Log Files and Log Formats 6. Managing Squid and Monitoring Traffic 7. Protecting your Squid Proxy Server with Authentication 8. Building a Hierarchy of Squid Caches 9. Squid in Reverse Proxy Mode 10. Squid in Intercept Mode 11. Writing URL Redirectors and Rewriters 12. Troubleshooting Squid Pop Quiz Answers Index

Chapter 5. Understanding Log Files and Log Formats

Understanding Squid log files and log formats is pretty easy. In this chapter, we'll present a brief explanation of the log format and how we can customize it to fit our needs. We will cover the related Squid configuration options and look at how a client's privacy can be protected, by ensuring Squid is properly configured.

In this chapter, we will learn to interpret the different log files. We will also learn about configuring Squid to achieve different log messages, depending on requirements or network policies.

In this chapter, we shall learn about the following:

  • Cache log

  • Access log

  • Customizing the access log

  • Selective logging or protecting clients' privacy

  • Referer log

  • User agent log

  • Emulating the HTTP server like logs

  • Log file rotation

So let's get on with it.

Log messages


Log messages are a nice way for any application to convey messages about its current actions to human users. A log message is basically a computer-generated message that can be interpreted by a human being with prior knowledge of the location of the different fields in the message. Squid also tries to log every possible action in different log files at different stages. When Squid encounters any errors before starting, it logs them to the output log which generally goes to a file named cache.log. Similarly, when clients access our proxy server, a message is logged to the file named access.log whose location is determined by the access_log directive in the Squid configuration file.

Squid uses different formats for logging messages to these files. Log files are important and we can analyze resource consumption and the performance of our proxy server by reading through the log files, or by using various log file parsers available. In this chapter, we will learn to interpret the...

Cache log or debug log


Squid logs all the errors and debugging messages to the cache.log file. This log file also contains messages about the integrity checks such as, availability and validity of cache directories, which are performed by Squid.

Time for action – understanding the cache log


Let's go through the log messages for a test Squid run and see what each line means:

2010/09/10 23:31:10| Starting Squid Cache version 3.1.10 for i686-pc-linux-gnu...
2010/09/10 23:31:10| Process ID 14892

Looking at the preceding example, the first line represents the version of Squid we are currently running and provides some information about the platform. The next line contains the process ID for this instance of Squid.

2010/09/10 23:31:10| With 1024 file descriptors available

This line shows the number of file descriptors available for Squid in this run. We can check back similar lines in our cache log, if we increase or decrease the available number of file descriptors and restart the Squid process. Please refer to the section on Configure or system check in Chapter 1, Getting Started with Squid.

2010/09/10 23:31:10| Initializing IP Cache...
2010/09/10 23:31:10| DNS Socket created at [::], FD 7
2010/09/10 23:31:10| Adding nameserver 192.0.2.86...

Access log


The cache.log file is important for debugging if Squid is misbehaving. But the most important log file is the access.log file, where Squid logs the live information about who is accessing our proxy server, and related information about the status of requests and replies. The location of the access.log file is determined by the directive access_log, in the Squid configuration file. By default it is set defaults to ${prefix}/var/logs/access.log.

Understanding the access log

The log messages in the access.log file are not as readable as messages in the cache.log file, but once we understand what the different fields mean, it's very easy to interpret the log messages. There are multiple formats in which messages are logged in the access.log file. The messages that we are going to see next, are in the default log format called squid.

Time for action – understanding the access log messages


Let's look at a few lines from the access.log file before we actually explore the different fields in the log message:

1284565351.509    114 127.0.0.1 TCP_MISS/302 781 GET http://www.google.com/ - FIRST_UP_PARENT/proxy.example.com text/html

1284565351.633    108 127.0.0.1 TCP_MISS/200 6526 GET http://www.google.co.in/ - FIRST_UP_PARENT/proxy.example.com text/html

1284565352.610    517 127.0.0.1 TCP_MISS/200 29963 GET http://www.google.co.in/images/srpr/nav_logo14.png - FIRST_UP_PARENT/proxy.example.com image/png

1284565354.102    147 127.0.0.1 TCP_MISS/200 1786 GET http://www.google.co.in/favicon.ico - FIRST_UP_PARENT/proxy.example.com image/x-icon

In the previous example of a log message, the first column represents the seconds elapsed since a Unix epoch (for more information on the Unix epoch, refer to http://en.wikipedia.org/wiki/Unix_epoch), which can't really be interpreted by human users. To quickly convert the timestamps in...

Time for action – analyzing a syntax to specify access log


Let's have a look at the syntax of the access_log directive:

access_log <module>:<place> [<logformat name> [acl acl ...]]

The field module is one of the none, stdio, daemon, syslog, tcp, and udp methods, which determine how the messages will be logged to a place, and is the absolute path to the file or place where the messages should be logged. Let's take a brief look at the meaning of different modules:

  • none The log messages will not be logged at all.

  • stdio The log messages will be logged to a file immediately after the completion of each request.

  • daemon This module is similar to stdio module, however the log messages are not written to the disk and are passed to a daemon helper for asynchronous handling instead.

  • syslog This module is used to log each message using the syslog facility. The parameter place is specified in the form of the syslog facility and the priority level for the log entries. For example...

Time for action – learning log format and format codes


Log format can be defined using the logformat directive available in the Squid configuration file. The syntax for defining logformat is as follows:

logformat <name> <format specification>

Format specification is a series of format code, as described in the following information:

Time for action – customizing the access log with a new log format


Squid has a lot of information about every client request and reply, however it writes only the requested information to the log file, which we can customize by defining several log formats.

Now, let's define a log format in which the time will appear in a human-readable format and use it with access_log:

logformat minimal %tl %>a %Ss/%03>Hs %rm %ru
access_log daemon:/opt/squid/var/logs/access.log minimal

So, we have constructed a new log format that will log the information we are most interested in. Let's see a few log messages in the preceding format:

11/Sep/2010:23:52:33 +0530 127.0.0.1 TCP_MISS/200 GET http://en.wikipedia.org/wiki/Main_Page

11/Sep/2010:23:52:34 +0530 127.0.0.1 TCP_MISS/200 GET http://en.wikipedia.org/images/wikimedia-button.png

Now the time in the log messages is human-readable and we can therefore tell when a particular URL was accessed.

We should note that if we are using custom formats for access...

Selective logging of requests


Sometimes we may not want to log requests from certain clients. This could be because of several reasons. One reason may be that a team is working on a highly secret project and we don't want to leave any impressions of their browsing patterns anywhere.

Logging of requests can be controlled using two directives, namely, log_access and access_log. These directives may look confusing when used in the same sentence but we can interpret the meaning by the sequence in which the individual words appear in the directive name. The directive access_log is used for controlling the format of the log messages and the location where the messages will be logged. While the directive log_access is used to control whether a particular request should be logged or not.

We have already learned about the log_access directive in the Log Access section in Chapter 2, Configuring Squid. Now, we will learn about using the access_log directive to cache selective requests.

Time for action – using access_log to control logging of requests


As we have seen in a previous section of this chapter, the syntax of the access_log directive is as follows:

access_log <module>:<path> [<logformat name> [acl acl ...]]

So, here we have an option to specify ACL lists which we can use to control where the different requests will be logged, if at all. Let's consider a scenario where we don't want to log requests to Yahoo! servers and we do want to log requests to Google and Facebook servers to separate files, and all other requests go to the access log. This scenario can be realized with the following configuration:

acl yahoo dstdomain .yahoo.com
acl google dstdomain .google.com
acl facebook dstdomain .facebook.com
log_access deny yahoo
log_access allow all
access_log /opt/squid/var/logs/google.log squid google
access_log /opt/squid/var/logs/facebook.log squid facebook
access_log /opt/squid/var/logs/access.log

If we look at the configuration carefully, we are denying...

Referer log


When a client clicks a link to other.example.com on the website example.com, then the website example.com is a referrer and the client is referred to the website other.example.com. When a client is referred by a website, a HTTP header referer is sent by the HTTP clients. Squid has the ability to log referer HTTP headers, which can later be used for analyzing traffic patterns.

Note

"Referer" is actually a misspelling of the word "Referrer", but it has been officially specified that way in HTTP RFCs.

Time for action – enabling the referer log


By default, there is no referer log. We can enable the referer log using the access_log directive in combination with a custom log format. To generate the referer log, first of all, we need to create a log format as shown:

logformat referer %ts.%03tu %>a %{Referer}>h %ru 

This configuration defines a new log format called referer, which contains a request timestamp, IP address of the client, the referer URL, and the request URL. Now, we need to use the access_log directive with the aforementioned constructed log format as shown:

access_log /opt/squid/var/logs/referer.log referer

Now, let's look at a few lines from the referer log file:

1284576601.898 127.0.0.1 http://en.wikipedia.org/wiki/Main_Page http://en.wikiquote.org/wiki/Main_Page

1284576607.732 127.0.0.1 http://en.wikiquote.org/wiki/Main_Page http://upload.wikimedia.org/wikiquote/en/b/bc/Wiki.png

The referer log is a bit easier to understand. The first column is the time elapsed since epoch...

Time for action – translating the referer logs to a human-readable format


We can translate a referer log to a human-readable format by using the command line utility awk. We can convert the entire referer.log file to a human-readable format by using the following command sequence:

$ cat referer.log | awk '{printf("%s ", strftime("%d/%b/%Y:%H:%M:%S",$1)); print $2 " " $3 " " $4;}' > referer_human_readable.log

The log messages from referer.log, as shown, should look like the following messages after conversion:

12/Sep/2010:01:36:06 127.0.0.1 http://en.wikipedia.org/wiki/Main_Page http://en.wikiquote.org/

12/Sep/2010:01:36:12 127.0.0.1 http://en.wikiquote.org/wiki/Main_Page http://upload.wikimedia.org/wikiquote/en/b/bc/Wiki.png

The command we saw before works fine for the conversion of the entire log file, but is not useful if we want to see the live referer log with human-readable timestamps. For achieving this, we can use the following command:

$ tail -f referer.log | awk '{printf("%s...

User agent log


All requests from clients generally contain the User-Agent HTTP header, which is basically a formatted string describing the HTTP client being used for the current request. As Squid knows everything about the requests, it can log this HTTP header field to the log file defined by the useragent_log directive in the Squid configuration file.

Time for action – enabling user agent logging


By default, the user agent log is disabled and we can enable it by using the following line in our configuration file:

useragent_log /opt/squid/var/logs/useragent.log

Once we have the user agent log enabled, Squid will start logging the User-Agent HTTP header field from the requests, depending on the availability of the field. Let's see a few lines from an example user agent log:

127.0.0.1 [12/Sep/2010:01:55:33 +0530] "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 GTB7.1"
127.0.0.1 [12/Sep/2010:01:55:33 +0530] "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 GTB7.1 GoogleToolbarFF 7.1.20100830 GTBA"

The format of this file is quite simple and only the last column, representing the user agent, is of interest here. The user agent log can be used to analyze the popular web browsers on a network.

What just happened?

We learned to enable logging of the User-Agent HTTP header field from...

Emulating HTTP server-like logs


Squid has an optional feature that can help in generating log messages similar to messages generated for most HTTP servers. We can use the access_log directive to log messages with the log format common.

Time for action – enabling HTTP server log emulation


By default, Squid will generate a native log, which contains more information than the logs generated with the HTTP log emulation on. We can use the following line in our configuration line:

access_log daemon:/opt/squid/var/logs/access.log common

This configuration will log messages in a web server-like format. Let's have a look at a few log messages in the HTTP server-like log format:

127.0.0.1 - - [13/Sep/2010:17:38:57 +0530] "GET http://www.google.com/ HTTP/1.1" 200 6637 TCP_MISS:FIRSTUP_PARENT

127.0.0.1 - - [13/Sep/2010:17:40:11 +0530] "GET http://example.com/ HTTP/1.1" 200 1147 TCP_HIT:HIER_NONE

127.0.0.1 - - [13/Sep/2010:17:40:12 +0530] "GET http://example.com/favicon.ico HTTP/1.1" 404 717 TCP_MISS:FIRSTUP_PARENT

These log messages are similar to log messages generated by the famous open source web server Apache and many others.

What just happened?

We learned to switch on the HTTP server-like log emulation of Squid access logs. Squid...

Log file rotation


As time passes, the size of the log files increases rapidly and starts occupying more and more disk space. To overcome this problem of the accumulation of logs over time, we generally keep the logs for the previous one or two weeks. To remove old log messages and retain the recent ones, Squid has a built-in feature of log file rotation, which can move older log messages to separate files. Moreover, Squid stores the incremental copy of the storage index in a file swap.state, which is also pruned down during log rotation.

To rotate logs, we have to use the squid command as follows:

$ squid -k rotate

This command will rotate logs depending on the value specified with the directive logfile_rotate in the configuration file. The default value of logfile_rotate is 10. This means that 10 older versions of all log files will be retained.

Have a go hero – rotate log files

Try to rotate log files on your proxy server and see how the log files are renamed.

Other log related features


We discussed important logging related directives in the previous sections. Squid has more directives related to logging, but they are less important and we should not have any problems in operating Squid normally, even if we are not aware of these features.

Cache store log

If we have disk caching enabled on our proxy server, Squid can log its entire disk caching related activities to a separate log file whose location is determined by the directive cache_store_log. This log file, contains information about the web objects being cached on the disk, stale objects being removed from the cache, and how long an object was in the cache. The information logged in this file is not particularly user-friendly. By default, logging of storage activity is disabled.

Pop quiz

  1. Consider the following configuration line:

    access_log daemon:/opt/squid/var/logs/access.log

    Which log format will be used by Squid in accordance with the previous configuration?

    1. common

    2. squid

    3. combined

    4. squidmime...

Summary


In this chapter, we have learned to interpret several log files generated by Squid. We had a detailed look at the format codes that Squid uses to construct log messages and how we can construct custom log formats depending on the requirements.

Specifically, we understood cache log, debugged messages generated by Squid, and had a detailed overview of access log and format codes. We customized log messages using several log formats and selectively logged requests to various log files, and enabled the referer and user agent log messages.

We also discussed about rotating log files to prevent unnecessary wastage of disk space.

Now that we have learned about the various log files and log messages, we will go on to learn about using these messages to monitor our proxy server and analyze the performance of our cache, in the next chapter.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Squid Proxy Server 3.1: Beginner's Guide
Published in: Feb 2011 Publisher: Packt ISBN-13: 9781849513906
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Format code

Format description

%

A literal % character.

sn

Unique sequence number per log line entry.

err_code

The ID of an error response served by Squid or a similar internal error identifier.

err_detail

Additional err_code dependent error information.

>a

Client's source IP address.

>A

Client's FQDN (Fully Qualified Domain Name).

>p

Client's source port.

<A

Server's IP address or peer name.

la

Local IP address of the Squid proxy server.

lp

Local port number on which Squid is listening.

<lp

Local port number of the last server or peer connection.

ts

Seconds since Unix epoch.

tu

Sub...