How to Configure Squid Proxy Server

Exclusive offer: get 50% off this eBook here
Squid Proxy Server 3.1: Beginner's Guide

Squid Proxy Server 3.1: Beginner's Guide — Save 50%

Improve the performance of your network using the caching and access control capabilities of Squid

£16.99    £8.50
by Kulbir Saini | April 2011 | Linux Servers Open Source

In this article by Kulbir Saini, author of Squid Proxy Server 3 Beginners Guide, we are going to learn to configure Squid according to the requirements of a given network. We will learn about the general syntax used for a Squid configuration file.

Specifically, we will cover the following:

  • Quick exposure to Squid
  • Syntax of the configuration file
  • HTTP port, the most important configuration directive
  • Access Control Lists (ACLs)
  • Controlling access to various components of Squid

 

Squid Proxy Server 3.1: Beginner's Guide

Squid Proxy Server 3.1: Beginner's Guide Improve the performance of your network using the caching and access control capabilities of Squid
        Read more about this book      

(For more resources on Proxy Servers, see here.)

Quick start

Let's have a look at the minimal configuration that you will need to get started. Get ready with the configuration file located at /opt/squid/etc/squid.conf, as we are going to make the changes and additions necessary to quickly set up a minimal proxy server.

cache_dir ufs /opt/squid/var/cache/ 500 16 256
acl my_machine src 192.0.2.21 # Replace with your IP address
http_access allow my_machine

We should add the previous lines at the top of our current configuration file (ensuring that we change the IP address accordingly). Now, we need to create the cache directories. We can do that by using the following command:

$ /opt/squid/sbin/squid -z

We are now ready to run our proxy server, and this can be done by running the following command:

$ /opt/squid/sbin/squid

Squid will start listening on port 3128 (default) on all network interfaces on our machine. Now we can configure our browser to use Squid as an HTTP proxy server with the host as the IP address of our machine and port 3128.

Once the browser is configured, try browsing to http://www.example.com/. That's it! We have configured Squid as an HTTP proxy server! Now try to browse to http://www.example.com:897/ and observe the message you receive. The message shown is an access denied message sent to you by Squid.

Now, let's move on to understanding the configuration file in detail.

Syntax of the configuration file

Squid's configuration file can normally be found at /etc/squid/squid.conf, /usr/local/squid/etc/squid.conf, or ${prefix}/etc/squid.conf where ${prefix} is the value passed to the --prefix option, which is passed to the configure command before compiling Squid.

In the newer versions of Squid, a documented version of squid.conf, known as squid.conf.documented, can be found along side squid.conf. In this article, we'll cover some of the import directives available in the configuration file. For a detailed description of all the directives used in the configuration file, please check http://www.squid-cache.org/Doc/config/.

The syntax for Squid's documented configuration file is similar to many other programs for Linux/Unix. Generally, there are a few lines of comments containing useful related documentation before every directive used in the configuration file. This makes it easier to understand and configure directives, even for people who are not familiar with configuring applications using configuration files. Normally, we just need to read the comments and use the appropriate options available for a particular directive.

The lines beginning with the character # are treated as comments and are completely ignored by Squid while parsing the configuration file. Additionally, any blank lines are also ignored.

# Test comment. This and the above blank line will be ignored by Squid.

Let's see a snippet from the documented configuration file (squid.conf.documented)

# TAG: cache_effective_user
# If you start Squid as root, it will change its effective/real
# UID/GID to the user specified below. The default is to change
# to UID of nobody.
# see also; cache_effective_group
#Default:
# cache_effective_user nobody

In the previous snippet, the first line mentions the name of the directive, that is in this case, cache_effective_user. The lines following the tag line provide brief information about the usage of a directive. The last line shows the default value for the directive, if none is specified.

Types of directives

Now, let's have a brief look at the different types of directives and the values that can be specified.

Single valued directives

These are directives which take only one value. These directives should not be used multiple times in the configuration file because the last occurrence of the directive will override all the previous declarations. For example, logfile_rotate should be specified only once.

logfile_rotate 10
# Few lines containing other configuration directives
logfile_rotate 5

In this case, five logfile rotations will be made when we trigger Squid to rotate logfiles.

Boolean-valued or toggle directives

These are also single valued directives, but these directives are generally used to toggle features on or off.

query_icmp on
log_icp_queries off
url_rewrite_bypass off

We use these directives when we need to change the default behavior.

Multi-valued directives

Directives of this type generally take one or more than one value. We can either specify all the values on a single line after the directive or we can write them on multiple lines with a directive repeated every time. All the values for a directive are aggregated from different lines:

hostname_aliases proxy.exmaple.com squid.example.com

Optionally, we can pass them on separate lines as follows:

dns_nameservers proxy.example.com
dns_nameservers squid.example.com

Both the previous code snippets will instruct Squid to use proxy.example.com and squid.example.com as aliases for the hostname of our proxy server.

Directives with time as a value

There are a few directives which take values with time as the unit. Squid understands the words seconds, minutes, hours, and so on, and these can be suffixed to numerical values to specify actual values. For example:

request_timeout 3 hours
persistent_request_timeout 2 minutes

Directives with file or memory size as values

The values passed to these directives are generally suffixed with file or memory size units like bytes, KB, MB, or GB. For example:

reply_body_max_size 10 MB
cache_mem 512 MB
maximum_object_in_memory 8192 KB

As we are familiar with the configuration file syntax now, let's open the squid.conf file and learn about the frequently used directives.

Have a go hero – categorize the directives

Open the documented Squid configuration file and find out at least three directives of each type that we discussed before. Don't use the directives already used in the examples.

HTTP port

This directive is used to specify the port where Squid will listen for client connections. The default behavior is to listen on port 3128 on all the available interfaces on a machine.

Time for action – setting the HTTP port

Now, we'll see the various ways to set the HTTP port in the squid.conf file:

  • In its simplest form, we just specify the port on which we want Squid to listen:
    http_port 8080
  • We can also specify the IP address and port combination on which we want Squid to listen. We normally use this approach when we have multiple interfaces on our machine and we want Squid to listen only on the interface connected to local area network (LAN):
    http_port 192.0.2.25:3128

    This will instruct Squid to listen on port 3128 on the interface with the IP address as 192.0.2.25.

  • Another form in which we can specify http_port is by using hostname and port combination:
    http_port myproxy.example.com:8080

    The hostname will be translated to an IP address by Squid and then Squid will listen on port 8080 on that particular IP address.

  • Another aspect of this directive is that, it can take multiple values on separate lines. Let's see what the following lines will do:
    http_port 192.0.2.25:8080
    http_port lan1.example.com:3128
    http_port lan2.example.com:8081

    These lines will trigger Squid to listen on three different IP addresses and port combinations. This is generally helpful when we have clients in different LANs, which are configured to use different ports for the proxy server.

  • In the newer versions of Squid, we may also specify the mode of operation such as intercept, tproxy, accel, and so on.
    Intercept mode will support the interception of requests without needing to configure the client machines.
    http_port 3128 intercept

    tproxy mode is used to enable Linux Transparent Proxy support for spoofing outgoing connections using the client's IP address.

    http_port 8080 tproxy

    We should note that enabling intercept or tproxy mode disables any configured authentication mechanism. Also, IPv6 is supported for tproxy but requires very recent kernel versions. IPv6 is not supported in the intercept mode.

    Accelerator mode is enabled using the mode accel. It's a good idea to listen on port 80, if we are configuring Squid in accelerator mode. This mode can't be used as it is. We must specify at least one website we want to accelerate.

    http_port 80 accel defaultsite=website.example.com

    We should set the HTTP port carefully as the standard ports like 3128 or 8080 can pose a security risk if we don't secure the port properly. If we don't want to spend time on securing the port, we can use any arbitrary port number above 10000.

What just happened?

In this section, we learned about the usage of one of the most important directives, namely, http_port. We have learned about the various ways in which we can specify HTTP port, depending on the requirement. We can force Squid to listen on multiple interfaces and on different ports, on different interfaces.

 

Squid Proxy Server 3.1: Beginner's Guide Improve the performance of your network using the caching and access control capabilities of Squid
Published: February 2011
eBook Price: £16.99
Book Price: £27.99
See more
Select your format and quantity:

 

        Read more about this book      

(For more resources on Proxy Servers, see here.)

Access control lists

Access Control Lists (ACLs) are the base elements for access control and are normally used in combination with other directives such as http_access, icp_access, and so on, to control access to various Squid components and web resources. ACLs identify a web transaction and then directives such as http_access, cache, and then decides whether the transaction should be allowed or not. Also, we should note that the directives related to accessing resources generally end with _access.

Every access control list definition must have a name and type, followed by the values for that particular ACL type:

acl ACL_NAME ACL_TYPE value
acl ACL_NAME ACL_TYPE "/path/to/filename"

The values for any ACL name can either be specified directly after ACL_TYPE or Squid can read them from a separate file. Here we should note that the values in the file should be written as one value per line.

Time for action – constructing simple ACLs

Let's construct an access control list for the domain name example.com:

acl example_site dstdomain example.com

In this code, example_site is the name of the ACL with type dstdomain, which reflects that the value, example.com, is the domain name.

Now if we want to construct an access control list which can cover a lot of example websites, we have the following three possible ways of doing it:

  1. Values on a single line: We can specify all the possible values on a single line:
    acl example_sites dstdomain example.com example.net example.org

    This works fine as long as there are only a few values.

  2. Values on multiple lines: In case the list of values that we want to specify grows significantly, we can split the list and pass values on multiple lines:
    acl example_sites dstdomain example.com example.net
    acl example_sites dstdomain example.org
  3. Values from a file: If case the number of values we want to specify is quite large, we can put them in a dedicated file and then instruct Squid to read the values from that particular file:
    acl example_sites dstdomain '/opt/squid/etc/example_sites.txt'

    We can place the example_sites.txt file in the same directory as squid.conf so that it's easy to locate. The contents of the example_sites.txt file should be as follows:

    # This file can also have comments
    # Write one value (domain name) per line
    example.net
    example.org # Temporarily remove example.org from example_sites acl
    example.com

ACL names are case-insensitive and are multi-valued. So we can use them, multiple times, and the values will aggregate:

acl NiCe_NaMe src 192.0.2.21
acl nIcE_nAmE src 192.0.2.23

This code doesn't represent two different access control lists. It's just one ACL with two addresses, namely, 192.0.2.21 and 192.0.2.23, as values.

We should carefully note that one ACL name can't be used with more than one ACL type.

acl destination dstdomain example.com
acl destination dst 192.0.2.24

The above code is invalid as it uses ACL name destination across two different ACL types.

What just happened?

We have just learned to create some simple ACLs of the ACL type dstdomain, which identifies the destination domain in a request.

Have a go hero – understanding the pre-defined ACLs

Jump to the ACL section in the Squid configuration file and try to understand the ACLs provided by Squid, by default.

Controlling access to the proxy server

While Squid is running on our server, it can be accessed in several ways for example, via normal web browsing by end users or as a parent or sibling proxy server by neighboring proxy servers. Squid provides various directives to control access to different resources. Next, we'll learn about granting or revoking access to different resources.

HTTP access control

ACLs help only in identifying requests based on different rules. ACLs are of no use by themselves, they should be combined with access control directives to allow or deny access to various resources. http_access is one such directive which is used to grant access to perform HTTP transactions through Squid.

Let's have a look at the syntax of http_access:

http_access allow|deny [!]ACL_NAME

Using http_access, we can either allow or deny access to the HTTP transactions through Squid. The ACL_NAME in the code signifies the requests for which the access must be granted or revoked. If a bang (!) is prefixed to the ACL_NAME, the access will be granted or revoked for all the requests that are not identified by ACL_NAME.

Time for action – combining ACLs and HTTP access

Let's have a look at a few cases for controlling HTTP access using example ACLs. When we have multiple access rules, Squid matches a particular request against them from top to bottom and keeps doing so until a definite action (allow or deny) is determined. Please note that if we have multiple ACLs within a single access rule, then a request is matched against all the ACLs from left to right, and Squid stops processing the rule as soon as it encounters an ACL that can't identify the request. An access rule with multiple ACLs results in a definite action, only if the request is identified by all the ACLs used in the rule.

acl my_home_machine src 192.0.2.21
acl my_lab_machine src 198.51.100.86
http_access allow my_home_machine
http_access allow my_lab_machine

The ACLs and access rules in the previous code will allow hosts 192.0.2.21 and 198.51.100.86 to access the proxy server. The aforementioned access rules may also be written as:

acl my_machines src 192.0.2.21 198.51.100.86
http_access allow my_machines

Default behavior is to allow access to all the clients in a local area network and deny access to all the other clients. If we want clients (who are not in our local area network) to be able to use our proxy server, we must add additional access rules to allow them.

The default behavior of HTTP access control is a bit tricky if access for a client can't be identified by any of the access rules. In such cases, the default behavior is to do the opposite of the last access rule. If last access rule is deny, then the action will be to allow access and vice-versa. Therefore, to avoid any confusion or undesired behavior, it's a good practice to add a deny all line after the access rules.

http_access deny all

The parameter all is a special ACL element provided by Squid and it represents all the IP addresses. This line will deny access to everything. As this goes after all other access rules, requests from unknown clients will be denied.

What just happened?

We learned to combine ACLs with the http_access directive to allow or deny access to clients. We also learned how to group different ACLs of the same type and then use them to control access.

HTTP reply access

HTTP reply is the response received from the web server corresponding to a request initiated by a client. Using the http_reply_access directive, we can control the access to the replies received. The syntax of http_reply_access is similar to http_access.

http_reply_access allow|deny [!]ACL_NAME

This directive partially overrides the permissions granted by http_access. Let's see an example:

acl my_machine src 192.0.2.21
http_access allow my_machine
http_reply_access deny my_machine

We have allowed http_access to host 192.0.2.21 but still it will not be able to access the websites properly as it's not allowed to receive any replies. The host can only make requests to a proxy server for web documents but won't receive any reply.

This directive is normally used to deny access for content types such as audio, video, and so on, to prevent users from accessing media content.

We should be really careful while using the http_reply_access directive. When a request is allowed by http_access, Squid will contact the original server, even if a rule with the http_reply_access directive denies the response. This may lead to serious security issues. For example, consider a client receiving a malicious URL, which can submit a client's critical private information using the HTTP POST method. If the client's request passes through http_access rules but the response is denied by an http_reply_access rule, then the client will be under the impression that nothing happened but a hacker will have cleverly stolen our client's private information.

ICP access

This directive is used to control the query access by our neighboring caches using the Internet Cache Protocol (ICP). It basically allows or denies access to the ICP port. The syntax is similar to http_access and the default behavior is to deny all ICP queries.

icp_access allow|deny [!]ACL_NAME

HTCP access

Using this directive, we can control whether Squid will respond to certain HTCP requests or not. The syntax is similar to http_access and the default behavior is to deny all queries.

HTCP CLR access

Neighboring caches can make requests to purge or remove cache objects in the form of HTCP CLR requests. The htcp_clr_access directive can be used to grant purge access to only trusted cache peers.

Miss access

This directive is used to specify which all cache peers or clients can use as their parent cache. When a cache peer or client tries to fetch content using our proxy server, the request may result in a MISS (not present in cache) or a HIT (can be satisfied from our cache). Generally, a MISS is fetched by our server on behalf of a client or peer. If we don't want our clients or peers to fetch content using our proxy, then we can use the miss_access directive, as shown:

acl bad_clients src 192.0.2.0/24
miss_access deny bad_clients
miss_access allow all

This code will not allow bad_clients to use our proxy server as a parent proxy. The default behavior is to allow all the clients who pass the http_access rule to use the proxy server as a parent.

Ident lookup access

This directive determines whether or not Squid should perform a username lookup for the client TCP requests.

acl ident_aware_hosts src 192.0.2.0/24
ident_lookup_access allow ident_aware_hosts
ident_lookup_access deny all

This code will allow Squid to perform ident lookups only for ident_aware_hosts. The default behavior is not to perform ident lookups for all queries.

Only TCP/IP-based ACLs are supported with this directive.

Summary

We have learned a lot in this article about configuring Squid. After this article, we should feel more comfortable in dealing with the Squid configuration file. We learned about various types of directives generally used in the configuration file and the possible types of values that they take.


Further resources on this subject:


Squid Proxy Server 3.1: Beginner's Guide Improve the performance of your network using the caching and access control capabilities of Squid
Published: February 2011
eBook Price: £16.99
Book Price: £27.99
See more
Select your format and quantity:

About the Author :


Kulbir Saini

Kulbir Saini is an entrepreneur based in Hyderabad, India. He has had extensive experience in managing systems and network infrastructure. Apart from his work as a freelance developer, he provides services to a number of startups. Through his blogs, he has been an active contributor of documentation for various open source projects, most notable being The Fedora Project and Squid. Besides computers, which his life practically revolves around, he loves travelling to remote places with his friends. For more details, please check http://saini.co.in/.

Books From Packt


OpenVPN 2 Cookbook
OpenVPN 2 Cookbook

OpenX Ad Server: Beginner's Guide
OpenX Ad Server: Beginner's Guide

Scalix: Linux Administrator's Guide
Scalix: Linux Administrator's Guide

Hacking Vim 7.2
Hacking Vim 7.2

Building Telephony Systems with OpenSER
Building Telephony Systems with OpenSER

Linux Email
Linux Email

Nginx 7 Web Server Implementation Cookbook
Nginx 7 Web Server Implementation Cookbook

Cacti 0.8 Beginner's Guide
Cacti 0.8 Beginner's Guide


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software