Packt+ | Advance your knowledge in tech

You're reading from Squid Proxy Server 3.1: Beginner's Guide

Product type Book

Published in Feb 2011

Publisher Packt

ISBN-13 9781849513906

Pages 332 pages

Edition 1st Edition

Languages

C++

Concepts

Servers

Table of Contents (20) Chapters

Squid Proxy Server 3.1 Beginner's Guide

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Getting Started with Squid

2. Configuring Squid

3. Running Squid

4. Getting Started with Squid's Powerful ACLs and Access Rules

5. Understanding Log Files and Log Formats

6. Managing Squid and Monitoring Traffic

7. Protecting your Squid Proxy Server with Authentication

8. Building a Hierarchy of Squid Caches

9. Squid in Reverse Proxy Mode

10. Squid in Intercept Mode

11. Writing URL Redirectors and Rewriters

12. Troubleshooting Squid

Pop Quiz Answers

Index

Chapter 5. Understanding Log Files and Log Formats

Understanding Squid log files and log formats is pretty easy. In this chapter, we'll present a brief explanation of the log format and how we can customize it to fit our needs. We will cover the related Squid configuration options and look at how a client's privacy can be protected, by ensuring Squid is properly configured.

In this chapter, we will learn to interpret the different log files. We will also learn about configuring Squid to achieve different log messages, depending on requirements or network policies.

In this chapter, we shall learn about the following:

Cache log
Access log
Customizing the access log
Selective logging or protecting clients' privacy
Referer log
User agent log
Emulating the HTTP server like logs
Log file rotation

So let's get on with it.

Log messages

Log messages are a nice way for any application to convey messages about its current actions to human users. A log message is basically a computer-generated message that can be interpreted by a human being with prior knowledge of the location of the different fields in the message. Squid also tries to log every possible action in different log files at different stages. When Squid encounters any errors before starting, it logs them to the output log which generally goes to a file named cache.log. Similarly, when clients access our proxy server, a message is logged to the file named access.log whose location is determined by the access_log directive in the Squid configuration file.

Squid uses different formats for logging messages to these files. Log files are important and we can analyze resource consumption and the performance of our proxy server by reading through the log files, or by using various log file parsers available. In this chapter, we will learn to interpret the...

Cache log or debug log

Squid logs all the errors and debugging messages to the cache.log file. This log file also contains messages about the integrity checks such as, availability and validity of cache directories, which are performed by Squid.

Time for action – understanding the cache log

Let's go through the log messages for a test Squid run and see what each line means:

2010/09/10 23:31:10| Starting Squid Cache version 3.1.10 for i686-pc-linux-gnu...
2010/09/10 23:31:10| Process ID 14892

Looking at the preceding example, the first line represents the version of Squid we are currently running and provides some information about the platform. The next line contains the process ID for this instance of Squid.

2010/09/10 23:31:10| With 1024 file descriptors available

This line shows the number of file descriptors available for Squid in this run. We can check back similar lines in our cache log, if we increase or decrease the available number of file descriptors and restart the Squid process. Please refer to the section on Configure or system check in Chapter 1, Getting Started with Squid.

2010/09/10 23:31:10| Initializing IP Cache...
2010/09/10 23:31:10| DNS Socket created at [::], FD 7
2010/09/10 23:31:10| Adding nameserver 192.0.2.86...

Access log

The cache.log file is important for debugging if Squid is misbehaving. But the most important log file is the access.log file, where Squid logs the live information about who is accessing our proxy server, and related information about the status of requests and replies. The location of the access.log file is determined by the directive access_log, in the Squid configuration file. By default it is set defaults to ${prefix}/var/logs/access.log.

Understanding the access log

The log messages in the access.log file are not as readable as messages in the cache.log file, but once we understand what the different fields mean, it's very easy to interpret the log messages. There are multiple formats in which messages are logged in the access.log file. The messages that we are going to see next, are in the default log format called squid.

Time for action – understanding the access log messages

Let's look at a few lines from the access.log file before we actually explore the different fields in the log message:

1284565351.509    114 127.0.0.1 TCP_MISS/302 781 GET http://www.google.com/ - FIRST_UP_PARENT/proxy.example.com text/html

1284565351.633    108 127.0.0.1 TCP_MISS/200 6526 GET http://www.google.co.in/ - FIRST_UP_PARENT/proxy.example.com text/html

1284565352.610    517 127.0.0.1 TCP_MISS/200 29963 GET http://www.google.co.in/images/srpr/nav_logo14.png - FIRST_UP_PARENT/proxy.example.com image/png

1284565354.102    147 127.0.0.1 TCP_MISS/200 1786 GET http://www.google.co.in/favicon.ico - FIRST_UP_PARENT/proxy.example.com image/x-icon

In the previous example of a log message, the first column represents the seconds elapsed since a Unix epoch (for more information on the Unix epoch, refer to http://en.wikipedia.org/wiki/Unix_epoch), which can't really be interpreted by human users. To quickly convert the timestamps in...

Time for action – analyzing a syntax to specify access log

Let's have a look at the syntax of the access_log directive:

access_log <module>:<place> [<logformat name> [acl acl ...]]

The field module is one of the none, stdio, daemon, syslog, tcp, and udp methods, which determine how the messages will be logged to a place, and is the absolute path to the file or place where the messages should be logged. Let's take a brief look at the meaning of different modules:

none— The log messages will not be logged at all.
stdio— The log messages will be logged to a file immediately after the completion of each request.
daemon— This module is similar to stdio module, however the log messages are not written to the disk and are passed to a daemon helper for asynchronous handling instead.
syslog— This module is used to log each message using the syslog facility. The parameter place is specified in the form of the syslog facility and the priority level for the log entries. For example...

Time for action – learning log format and format codes

Log format can be defined using the logformat directive available in the Squid configuration file. The syntax for defining logformat is as follows:

logformat <name> <format specification>

Format specification is a series of format code, as described in the following information:

Time for action – customizing the access log with a new log format

Squid has a lot of information about every client request and reply, however it writes only the requested information to the log file, which we can customize by defining several log formats.

Now, let's define a log format in which the time will appear in a human-readable format and use it with access_log:

logformat minimal %tl %>a %Ss/%03>Hs %rm %ru
access_log daemon:/opt/squid/var/logs/access.log minimal

So, we have constructed a new log format that will log the information we are most interested in. Let's see a few log messages in the preceding format:

11/Sep/2010:23:52:33 +0530 127.0.0.1 TCP_MISS/200 GET http://en.wikipedia.org/wiki/Main_Page

11/Sep/2010:23:52:34 +0530 127.0.0.1 TCP_MISS/200 GET http://en.wikipedia.org/images/wikimedia-button.png

Now the time in the log messages is human-readable and we can therefore tell when a particular URL was accessed.

We should note that if we are using custom formats for access...

Selective logging of requests

Sometimes we may not want to log requests from certain clients. This could be because of several reasons. One reason may be that a team is working on a highly secret project and we don't want to leave any impressions of their browsing patterns anywhere.

Logging of requests can be controlled using two directives, namely, log_access and access_log. These directives may look confusing when used in the same sentence but we can interpret the meaning by the sequence in which the individual words appear in the directive name. The directive access_log is used for controlling the format of the log messages and the location where the messages will be logged. While the directive log_access is used to control whether a particular request should be logged or not.

We have already learned about the log_access directive in the Log Access section in Chapter 2, Configuring Squid. Now, we will learn about using the access_log directive to cache selective requests.

Time for action – using access_log to control logging of requests

As we have seen in a previous section of this chapter, the syntax of the access_log directive is as follows:

access_log <module>:<path> [<logformat name> [acl acl ...]]

So, here we have an option to specify ACL lists which we can use to control where the different requests will be logged, if at all. Let's consider a scenario where we don't want to log requests to Yahoo! servers and we do want to log requests to Google and Facebook servers to separate files, and all other requests go to the access log. This scenario can be realized with the following configuration:

acl yahoo dstdomain .yahoo.com
acl google dstdomain .google.com
acl facebook dstdomain .facebook.com
log_access deny yahoo
log_access allow all
access_log /opt/squid/var/logs/google.log squid google
access_log /opt/squid/var/logs/facebook.log squid facebook
access_log /opt/squid/var/logs/access.log

If we look at the configuration carefully, we are denying...

Referer log

When a client clicks a link to other.example.com on the website example.com, then the website example.com is a referrer and the client is referred to the website other.example.com. When a client is referred by a website, a HTTP header referer is sent by the HTTP clients. Squid has the ability to log referer HTTP headers, which can later be used for analyzing traffic patterns.

Note

"Referer" is actually a misspelling of the word "Referrer", but it has been officially specified that way in HTTP RFCs.

Time for action – enabling the referer log

By default, there is no referer log. We can enable the referer log using the access_log directive in combination with a custom log format. To generate the referer log, first of all, we need to create a log format as shown:

logformat referer %ts.%03tu %>a %{Referer}>h %ru

This configuration defines a new log format called referer, which contains a request timestamp, IP address of the client, the referer URL, and the request URL. Now, we need to use the access_log directive with the aforementioned constructed log format as shown:

access_log /opt/squid/var/logs/referer.log referer

Now, let's look at a few lines from the referer log file:

1284576601.898 127.0.0.1 http://en.wikipedia.org/wiki/Main_Page http://en.wikiquote.org/wiki/Main_Page

1284576607.732 127.0.0.1 http://en.wikiquote.org/wiki/Main_Page http://upload.wikimedia.org/wikiquote/en/b/bc/Wiki.png

The referer log is a bit easier to understand. The first column is the time elapsed since epoch...

Time for action – translating the referer logs to a human-readable format

We can translate a referer log to a human-readable format by using the command line utility awk. We can convert the entire referer.log file to a human-readable format by using the following command sequence:

$ cat referer.log | awk '{printf("%s ", strftime("%d/%b/%Y:%H:%M:%S",$1)); print $2 " " $3 " " $4;}' > referer_human_readable.log

The log messages from referer.log, as shown, should look like the following messages after conversion:

12/Sep/2010:01:36:06 127.0.0.1 http://en.wikipedia.org/wiki/Main_Page http://en.wikiquote.org/

12/Sep/2010:01:36:12 127.0.0.1 http://en.wikiquote.org/wiki/Main_Page http://upload.wikimedia.org/wikiquote/en/b/bc/Wiki.png

The command we saw before works fine for the conversion of the entire log file, but is not useful if we want to see the live referer log with human-readable timestamps. For achieving this, we can use the following command:

$ tail -f referer.log | awk '{printf("%s...

User agent log

All requests from clients generally contain the User-Agent HTTP header, which is basically a formatted string describing the HTTP client being used for the current request. As Squid knows everything about the requests, it can log this HTTP header field to the log file defined by the useragent_log directive in the Squid configuration file.

Time for action – enabling user agent logging

By default, the user agent log is disabled and we can enable it by using the following line in our configuration file:

useragent_log /opt/squid/var/logs/useragent.log

Once we have the user agent log enabled, Squid will start logging the User-Agent HTTP header field from the requests, depending on the availability of the field. Let's see a few lines from an example user agent log:

127.0.0.1 [12/Sep/2010:01:55:33 +0530] "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 GTB7.1"
127.0.0.1 [12/Sep/2010:01:55:33 +0530] "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 GTB7.1 GoogleToolbarFF 7.1.20100830 GTBA"

The format of this file is quite simple and only the last column, representing the user agent, is of interest here. The user agent log can be used to analyze the popular web browsers on a network.

What just happened?

We learned to enable logging of the User-Agent HTTP header field from...

Emulating HTTP server-like logs

Squid has an optional feature that can help in generating log messages similar to messages generated for most HTTP servers. We can use the access_log directive to log messages with the log format common.

Time for action – enabling HTTP server log emulation

By default, Squid will generate a native log, which contains more information than the logs generated with the HTTP log emulation on. We can use the following line in our configuration line:

access_log daemon:/opt/squid/var/logs/access.log common

This configuration will log messages in a web server-like format. Let's have a look at a few log messages in the HTTP server-like log format:

127.0.0.1 - - [13/Sep/2010:17:38:57 +0530] "GET http://www.google.com/ HTTP/1.1" 200 6637 TCP_MISS:FIRSTUP_PARENT

127.0.0.1 - - [13/Sep/2010:17:40:11 +0530] "GET http://example.com/ HTTP/1.1" 200 1147 TCP_HIT:HIER_NONE

127.0.0.1 - - [13/Sep/2010:17:40:12 +0530] "GET http://example.com/favicon.ico HTTP/1.1" 404 717 TCP_MISS:FIRSTUP_PARENT

These log messages are similar to log messages generated by the famous open source web server Apache and many others.

What just happened?

We learned to switch on the HTTP server-like log emulation of Squid access logs. Squid...

Log file rotation

As time passes, the size of the log files increases rapidly and starts occupying more and more disk space. To overcome this problem of the accumulation of logs over time, we generally keep the logs for the previous one or two weeks. To remove old log messages and retain the recent ones, Squid has a built-in feature of log file rotation, which can move older log messages to separate files. Moreover, Squid stores the incremental copy of the storage index in a file swap.state, which is also pruned down during log rotation.

To rotate logs, we have to use the squid command as follows:

$ squid -k rotate

This command will rotate logs depending on the value specified with the directive logfile_rotate in the configuration file. The default value of logfile_rotate is 10. This means that 10 older versions of all log files will be retained.

Have a go hero – rotate log files

Try to rotate log files on your proxy server and see how the log files are renamed.

Other log related features

We discussed important logging related directives in the previous sections. Squid has more directives related to logging, but they are less important and we should not have any problems in operating Squid normally, even if we are not aware of these features.

Cache store log

If we have disk caching enabled on our proxy server, Squid can log its entire disk caching related activities to a separate log file whose location is determined by the directive cache_store_log. This log file, contains information about the web objects being cached on the disk, stale objects being removed from the cache, and how long an object was in the cache. The information logged in this file is not particularly user-friendly. By default, logging of storage activity is disabled.

Pop quiz

Consider the following configuration line:
```
access_log daemon:/opt/squid/var/logs/access.log
```
Which log format will be used by Squid in accordance with the previous configuration?
1. common
2. squid
3. combined
4. squidmime...

Summary

In this chapter, we have learned to interpret several log files generated by Squid. We had a detailed look at the format codes that Squid uses to construct log messages and how we can construct custom log formats depending on the requirements.

Specifically, we understood cache log, debugged messages generated by Squid, and had a detailed overview of access log and format codes. We customized log messages using several log formats and selectively logged requests to various log files, and enabled the referer and user agent log messages.

We also discussed about rotating log files to prevent unnecessary wastage of disk space.

Now that we have learned about the various log files and log messages, we will go on to learn about using these messages to monitor our proxy server and analyze the performance of our cache, in the next chapter.

The rest of the chapter is locked

You have been reading a chapter from

Squid Proxy Server 3.1: Beginner's Guide

Published in: Feb 2011 Publisher: Packt ISBN-13: 9781849513906

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime}

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

Aug 2023 17 hours 28 minutes

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

Aug 2023 21 hours 0 minutes

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

Oct 2023 8 hours 0 minutes

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

Sep 2023 15 hours 44 minutes

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

Aug 2023 7 hours 36 minutes

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

Sep 2023 14 hours 52 minutes

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

Nov 2023 12 hours 0 minutes

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

Sep 2023 14 hours 0 minutes

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

Oct 2023 9 hours 32 minutes

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

Nov 2023 14 hours 16 minutes

Format code	Format description
`%`	A literal % character.
`sn`	Unique sequence number per log line entry.
`err_code`	The ID of an error response served by Squid or a similar internal error identifier.
`err_detail`	Additional `err_code` dependent error information.
`>a`	Client's source IP address.
`>A`	Client's FQDN (Fully Qualified Domain Name).
`>p`	Client's source port.
`<A`	Server's IP address or peer name.
`la`	Local IP address of the Squid proxy server.
`lp`	Local port number on which Squid is listening.
`<lp`	Local port number of the last server or peer connection.
`ts`	Seconds since Unix epoch.
`tu`	Sub...