Squid Proxy Server 3.1: Beginner's Guide — Save 50%
Improve the performance of your network using the caching and access control capabilities of Squid
Squid proxy server enables you to cache your web content and return it quickly on subsequent requests. In this article we will learn about the different configuration options available and the transparent and accelerated modes that enable you to focus on particular areas of your network.
In this article by Kulbir Saini, author of Squid Proxy Server 3 Beginners Guide, we will cover:
- Configuring Squid to use DNS servers
- A few directives related to logging
- Other important or commonly used configuration directives
|Read more about this book|
(For more resources on Proxy Servers, see here.)
DNS server configuration
For every request received from a client, Squid needs to resolve the domain name before it can contact the target web server. For this purpose, Squid can either use the built-in internal DNS client or, external DNS program to resolve the hostnames. The default behavior is to use the internal DNS client for resolving hostnames unless we have used the --disable-internal-dns option but it must be set with the configure program before compiling Squid, as shown:
$ ./configure --disable-internal-dns
Let's have a quick look at the DNS-related configuration directives provided by Squid.
Specifying the DNS program path
The directive cache_dns_program is used to specify the path of the external DNS program built with Squid. If we have not moved the Squid-related file after installing, this directive will have the correct value, by default. However, if the DNS program is located at a different location, we can specify the path using the following directive:
Controlling the number of DNS client processes
The number of parallel instances of the DNS program specified by cache_dns_program can be controlled by using the directive dns_children. The syntax of the directive dns_children is as follows:
dns_children max startup=n idle=n
The parameter max determines the maximum number of DNS programs which can run at any one time. We should set it to a significantly high value as Squid has to wait for the response from the DNS program before it can proceed any further and setting this number to a lower value will keep Squid waiting for the response. The default value is set to 32.
The value of the parameter startup determines the number of DNS programs that will be started when Squid starts. This can be set to zero and Squid will not start any processes by default. The first ever request to Squid will result in the creation of the first child process.
The value of the parameter idle determines the number of processes that will be available at any one time. More requests will result in the creation of more processes, but keeping this many processes free (available) is subject to a total of max processes. A minimum acceptable value for this parameter is 1.
Setting the DNS name servers
By default, Squid picks up the name servers from the file /etc/resolv.conf. However, if we want to specify a list of different name servers, we can use the directive dns_nameservers.
Time for action – adding DNS name servers
A list of IP addresses can be passed to this directive or several IP addresses can be written on different lines like the following:
dns_nameservers 192.0.2.25 198.51.100.25
The previous configuration lines will set the name servers to 192.0.2.25, 198.51.100.25, and 203.0.113.25.
What just happened?
We added three DNS name servers to the Squid configuration file which will be used by Squid to resolve the domain names corresponding to the requests received from the clients.
Setting the hosts file
Squid can read the hostname and IP address associations from the hosts file generally found at /etc/hosts. This file normally contains hostnames for the machines or servers in the local area network. We can specify the host's file location using the directive hosts_file as shown:
If we don't want Squid to read the host's file, we can set the value to none.
Default domain name for requests
Using the directive append_domain, we can append a default domain name to the hostnames without any period (.) in them. This is generally useful for handling local domain names. The value of the append_domain must begin with a period (.). For example:
Timeout for DNS queries
If the DNS servers do not respond to the query within the time specified by the directive dns_timeout, they are assumed to be unavailable. The default timeout value is two minutes. Considering the ever increasing network speeds, we can set this to a slightly lower value. For example, if there is no response within one minute, we can consider the DNS service to be unavailable.
Caching the DNS responses
The IP addresses of most domains change quite rarely, so it's safe to cache the positive responses from DNS servers for a few hours. This doesn't provide much of a saving in bandwidth, but caching DNS responses may reduce the latency quite significantly because a DNS query is done for every request. For caching DNS responses while using an external DNS program, Squid provides two directives known as positive_dns_ttl and negative_dns_ttl to tune the caching of DNS responses.
The directive positive_dns_ttl determines the maximum time for which a positive DNS response will be cached while negative_dns_ttl determines the time for which a negative DNS response will be cached. The directive negative_dns_ttl also serves as a minimum time for which the positive DNS responses can be cached.
Let's see the example values for both of the directives:
positive_dns_ttl 8 hours
negative_dns_ttl 30 seconds
We should keep the time to live (TTL) for negative responses to a lower value as the negative responses may be due to problems with the DNS servers.
Setting the size of the DNS cache
Squid performs domain name to address lookups for all the MISS requests and address to domain name lookups for requests involving ACLs such as dstdomain. These lookups are cached. To control the size of these cached lookups, Squid exposes four directives—ipcache_size (number), ipcache_low (percent), ipcache_high (percent), and fqdncache_size (number). Let's see what these directives mean.
The directive ipcache_size determines the maximum number of entries that can be cached for domain name to address lookups. As these entries take really small amounts of memory and the amount of available main memory is enormous these days, we can cache tens of thousands of these entries. The default value for this directive is 1024, but we can easily push it to 15,000 on busy caches.
The directives ipcache_low (let's say 95) and ipcache_high (let's say 97) are low and high water marks for the IP cache. So, Squid will try to keep the number of entries in the cache between 95 percent and 97 percent.
Using fqdncache_size, we can simply set the maximum number of address to domain name lookups that can be in the cache at any time. These entries also take really small amounts of memory, so we can cache a large number of these. The default value is 1024, but we can easily push it to 10,000 on busy caches.
eBook Price: $26.99
Book Price: $44.99
|Read more about this book|
(For more resources on Proxy Servers, see here.)
Squid logs all the client requests and events to files. Squid provides various directives to control the location of log files, format of log messages, and to choose which requests to log. Let's have a brief look at some of the directives.
We can define multiple log formats using the directive logformat as well as the pre-defined log formats supplied by Squid. Log formats are basically an arrangement of one or more pre-defined format codes. Various log formats such as squid, common, combined, and so on, are provided by Squid, by default.
Log file rotation or log file backups
Over a period of time, the log files grow in size. The common practice is to move the older logs to separate files as a backup or for analysis, and then continue writing the logs to the original log file. The default Squid behavior is to keep 10 backups of log files. We can change this behavior with the directive logfile_rotate as follows:
By default, Squid logs requests from all the clients to the log file set by the directive access_log. If we want to prevent some client requests from being logged by Squid, we can use the log_access directive along with ACLs. An example may be that the CEO doesn't want his requests to be logged:
acl ceo_laptop src 192.0.2.21
log_access deny ceo_laptop
We should note that the requests denied for logging using this directive will not count towards performance measurements.
By default, all the log files are written without buffering any output. Buffering the logs enhances/improves performance under heavy usage or when debugging is enabled. This directive is rarely used.
Strip query terms
Query terms are key-value pairs passed using a URL in a HTTP request. Sometimes, this may contain sensitive or private information about the client requesting the web resource. By default, Squid strips all the query terms from a request URL before logging it. Another reason for stripping query terms is that the query terms are often very long and can make monitoring the access log very painful. However, we may want to disable it sometime, especially while debugging a problem, for example, a client is not able to access a website properly.
This configuration will prevent query terms from being stripped before requests are logged. It's a good practice to set this directive to on for protecting clients' privacy.
URL rewriters and redirectors
URL rewriters and redirectors are third party, independent helper programs that we can use with Squid to modify or rewrite requests from clients. In most cases, we try to redirect a client to a different web page or resource from the one that was initially requested by the client.
The interesting part is that URL rewriters can be coded in any programming language. URL rewriters are run as independent processes and communicate with Squid using standard I/O.
URL rewriters provide a totally new area of opportunity as we can redirect clients to custom error pages for different scenarios, redirect users to local mirrors of websites or software repositories, block advertisements with small blank images, and so on.
Squid doesn't have any URL rewriters by default as we are supposed to write our own URL rewriters because the possibilities are enormous. It is also possible to download URL rewriters written by others and use them right away.
Other configuration directives
Squid has hundreds of configuration directives to control it in various ways. It's not possible to discuss all of them here, we'll try to cover the important ones.
Setting the effective user for running Squid
Although we generally start the Squid server as root, it never runs with the privileges of the root user. Right after starting, Squid changes its real UID (User ID)/GID (Group ID) to the user determined by the directive cache_effective_user. By default, it is set to nobody. We can create a separate user for running Squid and set the value of this directive accordingly. For example, on some operating systems, Squid is run as squid user. The corresponding configuration line will be as follows:
Please make sure that the user specified as the value for cache_effective_user exists.
Configuring hostnames for the proxy server
Squid uses hostnames for the server for forwarding requests to other cache peers or for detecting the neighbor caches. There two different directives named visible_hostname and unique_hostname which are used to set the hostname of the proxy server for different purposes. Let's have a quick look at these directives.
Hostname visible to everyone
The directive visible_hostname is used to set the hostname, which will be visible on all the error or information pages used by Squid. We can set it as shown:
Unique hostname for the server
If we want to name all the proxy servers in our network as proxy.example.com, we can achieve it by setting visible_hostname for all of them to proxy.example.com. However, doing so will cause problems in forwarding requests among the caches and detecting forward loops. To solve this problem, Squid provides the directive unique_hostname. We should set this to a unique hostname value to get rid of forward loops.
Controlling the request forwarding
If we have cache peers or neighbors in our network, Squid will try to contact them for HITs or for forwarding requests. We can control the manner in which the requests are forwarded to other caches using the directives always_direct, never_direct, hierarchy_stoplist, prefer_direct, and cache_peer_access. Next we'll have a look at a few of these directives with examples.
Sometimes we may want Squid to fetch the content directly from origin servers instead of forwarding the queries to neighboring caches. This is achieved using the directive always_direct. The syntax is similar to http_access:
always_direct allow|deny [!]ACL_NAME
This directive is very useful in forwarding requests to servers in the local area network directly because contacting cache peers may introduce an unnecessary delay.
acl lan_servers dst 192.0.2.0/24
always_direct allow lan_servers
This code will instruct Squid to forward requests to destination servers identified by lan_servers directly to the origin servers and the requests will not be routed through other cache peers.
This directive is opposite of always_direct, but we should understand it carefully before using it. If we want to enforce the use of a proxy server for all the client requests, then this directive comes handy.
never_direct allow all
This rule will enforce the usage of a proxy server for all the requests. However, generally, it's a good practice to allow clients to connect directly to local servers. So, we can use something similar to the following:
acl lan_servers dst 192.0.2.0/24
never_direct deny lan_servers
never_direct allow all
These rules will make sure that requests to all the servers, except those identified by lan_servers, go through another proxy server.
This is a simple directive preventing the forwarding of client requests to neighbor caches. Let's have a look at the syntax:
hierarchy_stoplist word1 word2 word3 ...
If any of the words from the list of words is found in the request URL, the request will not be forwarded to the neighbor caches and the origin servers will be contacted directly. This directive is generally helpful for handling dynamic pages directly instead of routing them using cache peers.
hierarchy_stoplist cgi-bin jsp ?
This code will prevent the forwarding of URLs containing any of cgi-bin, jsp, or ? to cache peers.
Please note that the directive never_direct overrides hierarchy_stoplist.
Some web servers have broken implementations of the POST method (a method using which we can securely send data to the web server) and they expect a pair of CRLF (new-line) after the POST request data. Using the broken_posts directive, we can request Squid to send an extra CRLF pair after the POST request data.
acl bad_server dstdomain broken.example.com
broken_posts allow bad_server
The rules in this code will take care of the broken implementation of the POST method on the host broken.example.com. We should use this directive only if its absolutely necessary.
TCP outgoing address
This directive is useful for forwarding requests to different network interfaces, depending on the client's network. Let's have a look at the syntax for this directive:
tcp_outgoing_address ip_address [[!]ACL_NAME]
In this line, ip_address is the IP address of the outgoing interface which we want to use. The ACL name is totally optional. An example case may be when we want to route traffic for a specific network using a different network interface:
acl special_network src 192.0.2.0/24
tcp_outgoing_address 198.51.100.25 special_network
The previous code will set the outgoing address for requests from clients in the network 192.0.2.0/24 to 198.51.100.25, and for all other requests the outgoing address will be set to 198.51.100.86.
Just like several other programs for Unix/Linux, Squid writes the process ID of the current process in a PID file. This directive is used to control the location of a PID file.
If we don't want Squid to write its process ID to any file, we can use none instead of filename:
Setting the path of the PID file to none will prevent regular management operations like automatic log rotation or restarting Squid. The operating system will not be able to stop Squid at the time of a shutdown or restart.
By default Squid logs the complete IP address of the client for every request. To enhance the privacy of our clients, we can use this directive to hide the actual IP addresses of the clients. Let's see an example:
If a client with the IP address 192.0.2.21 accesses our proxy server, then his address will be logged as 192.0.2.0 instead of 192.0.2.21 because Squid will set the last 8 bits of the IP address to zero. Basically, a logical AND operation is performed between binary version of the netmask and the IP address to be logged. The same IP address will also be reflected in the cache manager's web interface.
In this article we covered the required DNS configuration for Squid. We learned about specifying DNS servers and optimizing the DNS cache to reduce latency.
- VirtualBox 3.1: Beginner's Guide [Book]
- How to Configure Squid Proxy Server [Article]
- Squid Proxy Server 3: Getting Started [Article]
- Squid Proxy Server: Tips and Tricks [Article]
- Different Ways of Running Squid Proxy Server [Article]
- Squid Proxy Server: Fine Tuning to Achieve Better Performance [Article]
eBook Price: $26.99
Book Price: $44.99
About the Author :
Kulbir Saini is an entrepreneur based in Hyderabad, India. He has had extensive experience in managing systems and network infrastructure. Apart from his work as a freelance developer, he provides services to a number of startups. Through his blogs, he has been an active contributor of documentation for various open source projects, most notable being The Fedora Project and Squid. Besides computers, which his life practically revolves around, he loves travelling to remote places with his friends. For more details, please check http://saini.co.in/.