Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Squid Proxy Server 3.1: Beginner's Guide

You're reading from  Squid Proxy Server 3.1: Beginner's Guide

Product type Book
Published in Feb 2011
Publisher Packt
ISBN-13 9781849513906
Pages 332 pages
Edition 1st Edition
Languages
Concepts

Table of Contents (20) Chapters

Squid Proxy Server 3.1 Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Getting Started with Squid 2. Configuring Squid 3. Running Squid 4. Getting Started with Squid's Powerful ACLs and Access Rules 5. Understanding Log Files and Log Formats 6. Managing Squid and Monitoring Traffic 7. Protecting your Squid Proxy Server with Authentication 8. Building a Hierarchy of Squid Caches 9. Squid in Reverse Proxy Mode 10. Squid in Intercept Mode 11. Writing URL Redirectors and Rewriters 12. Troubleshooting Squid Pop Quiz Answers Index

Chapter 11. Writing URL Redirectors and Rewriters

In the previous chapters, we have learned about installing and configuring the Squid proxy server for various scenarios. In this chapter, we'll learn about writing our own URL redirectors or rewriters to customize Squid's behavior. We'll also see a few examples that can be helpful in enhancing the caching performance of Squid or enforcing the access control.

In this chapter, we shall learn about:

  • URL redirectors and rewriters

  • Writing our own URL helper

  • Configuring Squid

  • A special URL redirector - deny_info

  • Popular URL helpers

So let's get started….

URL redirectors and rewriters


URL redirectors are external helper processes that can redirect the HTTP clients to alternate URLs using HTTP redirect messages. Similarly, URL rewriters are also external helper processes that can rewrite the URLs requested by the client with another URL. When a URL is rewritten by a helper process, Squid fetches the rewritten URL transparently and sends the response to the end client as if it was the one originally requested by the client.

The URL redirectors can be used to send HTTP redirect messages like 301, 302, 303, 307, or 3xx, along with an alternate URL to the HTTP clients. When a HTTP client receives a redirect message, the client will request the new URL. So, the major difference between URL redirectors and URL rewriters is that the client is aware of a URL redirect, while rewritten URLs are fetched transparently by Squid, and the client remains unaware of a rewritten URL. Let's try to understand the workings of URL redirector and rewriter helper...

Squid, URL redirectors, and rewriters


Squid and URL redirector (or rewriter) programs work closely and every request is passed through the specified URL redirector (or rewriter) program and then Squid acts accordingly (redirects the HTTP client to the rewritten URL or fetches the rewritten URL). Let's have a look at a few details about Squid and URL redirectors.

Communication interface

The URL redirectors and rewriters communicate with Squid using a similar and simple interface, which is very easy to understand as well as implement. For each request, the following details are passed to a helper program in one line.

ID URL client_IP/FQDN username method myip=IP myport=PORT [kv-pairs] 

The following table gives a brief explanation of the fields passed by Squid to the redirectors:

Time for action – exploring the message flow between Squid and redirectors


Let's try to understand the message flow between Squid and the redirector (or rewriter) programs.

  1. A line containing the fields shown previously (separated by spaces) is passed by Squid to the URL redirector program using a single line for each client request. Once the helper program has finished processing the fields, it must write one of the following messages on the standard output. Please note that the new line (\n) at the end of the message is important and must not be omitted:

  2. The line containing the fields is read by the URL redirector program from the standard input.

  3. After reading the line from the standard input, the redirector (or rewriter) program can process the fields and make decisions based on the values of different fields.

    • A line containing only the identifier (ID \n).

    • A modified URL with an HTTP redirect code followed by a new line.

    • (ID 3XX:URL \n). The HTTP redirect code and the URL should be separated...

Time for action – writing a simple URL redirector program


Let's see a very simple Perl script that can act as a URL redirector program.

$|=1;
while (<>) {
s@http://www.example.com@303:http://www.example.net@;
print;
}

The previous code is a URL redirector program in its simplest form. It redirects all URLs containing the URL www.example.com to www.example.net without inspecting values of any of the fields by Squid.

What just happened?

We have just seen a simplistic Perl script which can act as a URL redirector program and can be used with Squid.

Have a go hero – modify the redirector program

Modify the previous URL redirector program so that all requests to google.co.uk can be redirected to google.com.

Concurrency

We can make our URL redirector programs concurrent for better performance. When we configure Squid to use a concurrent URL redirector program, it passes an additional field, ID, on the standard input to the redirector program. This is used to achieve concurrency as we learned...

Writing our own URL redirector program


Based on the concepts we learned earlier about the URL redirector helper programs, we can write a program that can redirect/rewrite URLs conditionally. So, let's have a look at an example:

Time for action – writing our own template for a URL redirector


Now, let's have a look at an example URL redirector program in Python, which can be extended to fit any scenario:

#!/usr/bin/env python

import sys

def redirect_url(line, concurrent):
  list = line.split(' ')
  # 1st or 2nd element of the list 
  # is the URL depending on concurrency
  if concurrent:
    old_url = list[1]
  else:
    old_url = list[0]

  # Do remember that the new_url 
  # should contain a '\n' at the end.
  new_url = '\n'
  # Take the decision and modify the url if needed
  if old_url.endswith('.avi'):
    # Rewrite example
    new_url = 'http://example.com/' + new_url
  elif old_url.endswith('.exe'):
    # Redirect example
    new_url = '302:http://google.co.in/' + new_url
  return new_url

def main(concurrent = True):
  # the format of the line read from stdin with concurrency is
  # ID URL ip-address/fqdn ident method myip=ip myport=port
  # and with concurrency disabled is
  # URL ip-address/fqdn ident...

Configuring Squid


Once we have finished writing the redirector program, we need to configure Squid to use it properly. There are a few directives in the Squid configuration file using which we can control how Squid will use our URL redirector program. Let's have a quick look at these directives.

Specifying the URL redirector program

We can specify the absolute path to our URL redirector program using the url_rewrite_program directive. We can also specify any additional interpreter or command line arguments that the program expects. The following are a few examples:

url_rewrite_program /opt/squid/libexec/custom_rewriter

url_rewrite_program /usr/bin/python /opt/squid/libexec/my_rewriter.py

url_rewrite_program /usr/bin/python /opt/squid/libexec/another_rewriter.py --concurrent

Note

Squid can use only one URL redirector program at a time, so we should specify only one program using the url_rewrite_program directive.

Controlling redirector children

Once we have specified the redirector program, we...

A special URL redirector – deny_info


The deny_info option is a directive in the Squid configuration file, which can be used to:

  • Present clients with a custom access denied page.

  • Redirect (HTTP 302) the clients to a different URL, displaying more information about why access was denied or containing help messages.

  • Reset the TCP connection.

Let's have a look at the three syntaxes of the deny_info directive:

deny_info CUSTOM_ERROR_PAGE ACL_NAME
deny_info ALTERNATE_URL ACL_NAME
deny_info TCP_RESET ACL_NAME

The syntaxes shown previously correspond to the uses we have just discussed. In the first syntax, the parameter CUSTOM_ERROR_PAGE specifies a custom error page written in HTML or plain text, which will be displayed instead of Squid's default access denied page. The error page written in English should be placed in the ${prefix}/share/errors/en-us/ directory or another appropriate location for other languages. We can also place this errors file in a custom location such as /etc/squid/local-errors...

Popular URL redirectors


So far, we have learned about how URL redirector programs communicate with Squid and how we can write our own URL redirector programs. Now, let's have a look at a few popular URL redirectors. For a full list of available redirector programs, please visit http://www.squid-cache.org/Misc/related-software.html.

SquidGuard

SquidGuard is a combination of filter, URL rewriter, and an access control plugin for Squid. The main features of SquidGuard includes the fact that it is fast, free, flexible, and ease of installation. Below are a few use cases of SquidGuard:

  • Limiting access for some users to a list of well known web servers or URLs

  • Blocking access for some users based on blacklists

  • Redirect blocked URLs to pages containing helpful information

  • Redirect unregistered users to registration pages

  • And much more...

For more details on SquidGuard, please see http://www.squidguard.org/.

Squirm

Squirm is a fast and configurable URL rewriter for Squid. Please check http://squirm.foote...

Summary


In this chapter, we have learned about URL redirector and rewriter programs, which are very helpful in extending the basic Squid functionality. We have also learned about the deny_info directive which is a better fit for redirecting users to better and more understandable error pages. We also learned how Squid communicates with URL helpers.

Specifically, we covered:

  • URL redirectors and their use

  • How Squid communicates with the URL redirector programs

  • Writing our own URL redirector program

  • Configuring Squid to use our URL redirector program

  • A few popular URL redirectors that are helpful in saving bandwidth and providing better access control

Now that we have learned about most of the components of Squid, we need to learn about troubleshooting in case a component doesn't behave appropriately, and that is the topic of our next chapter.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Squid Proxy Server 3.1: Beginner's Guide
Published in: Feb 2011 Publisher: Packt ISBN-13: 9781849513906
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Field

Description

ID

The ID is used for identifying each request that Squid passes on as the standard input to the redirector program. The redirector program is supposed to pass the ID back to Squid so that it...