Squid: History and Overview

Squid is a caching proxy server. It normally sits "between" a web surfer and a web server. The surfer requests a web page from the proxy server. The proxy server either makes the request for the surfer to the web server (a proxy request) or serves the client the page directly if it's already saved in the proxy server's disk cache. Because the proxy server is between the client and the server, a number of options are available, including logging, filtering, and other access control. Squid can do all of these things and more.

Squid was originally based on the Harvest project, which is an ARPA-funded set of tools for building a standards-compliant web crawler. Squid is currently maintained by open source programmers around the world and is licensed under the GNU General Public License. For more information on Harvest, visit http://webharvest.sourceforge.net/ng, and for more information on Squid, visit the Squid home page at http://www.squid-cache.org.

You can choose to install Squid from either source or a binary included with your distribution. As of this writing, the latest recommended version of Squid is 2.5STABLE12. Unlike Apache, Squid does not have nearly as many compile-time options that you might be interested in. A binary Squid package from your Linux distribution usually works fine. If, however, you are interested in discovering exactly what compile-time options are available to you, you can read about them in the Squid documentation at http://squid-docs.sourceforge.net/latest/html/x220.html.

If you've installed a Squid package from your Linux distribution, the files are probably laid out in your filesystem like this:

/etc/squid

Squid configuration directory and home to the main configuration file, named Squid.conf or squid.conf

/usr/lib/squid

Programs that Squid uses to communicate with external authentication sources

/usr/sbin/squid

The Squid binary itself

/usr/share/doc/squid-2.5/

Squid documentation

/usr/share/squid

Locale-specific errors and icons

/var/log/squid

Squid log directory

/var/spool/squid

Home of the Squid disk cache

We will go through some basic options in the Squid.conf file. By default, a relatively sane set of options is chosen already. If you just want to use Squid as a basic caching proxy server, with no access control, you need to change only a few defaults. For more advanced configurations, greater tweaking of the configuration file is required.

http_port option

The default port on which Squid listens for connections is TCP port 3128. This can be anything you want, but you must make sure that each of your clients is configured to talk to Squid on this port. For example, in the Firefox web browser, select Edit → Preferences → Connection Settings. Then, assuming your Squid server is named squidserver, you set the proxy options shown in Figure 38-3. These settings will cause your Firefox browser to make all requests through squidserver at port 3128.

Setting proxy options in Firefox to use Squid proxy

Figure 38-3. Setting proxy options in Firefox to use Squid proxy

cache_dir option

This option dictates not only where Squid stores its cache, but also what kind of storage system is used, how much disk Squid is allowed to use, and how the directory structure is set up. The format is:

cache_dir storage_type directory-name megabytes L1 L2 [options]

The default storage type is ufs, which is the original Squid storage setup. There are other options available, but they are really necessary only if you have some specific problem with ufs. For most purposes, ufs is sufficient.

The default amount of space used is 100 MB, which you almost certainly want to change. For any reasonably sized disk cache (and considering that hard drives have never been cheaper), you probably want to allocate at least a couple of gigabytes to Squid. Squid won't be able to work effectively as a cache if it doesn't have enough disk space.

L1 sets how many top level subdirectories are created in the cache_dir directory (the default is 16) and L2 sets how many second-level subdirectories are created (the default is 256).

So a standard line would look like this:

cache_dir ufs /var/spool/squid 10000 16 256

This sets a ufs cache directory at /var/spool/squid and allocates 10 GB to it.

cache_mem option

This option lets you designate how much memory Squid can use. The more memory, the more responsive your cache is. If Squid hits this limit, it will start swapping to the disk, which decreases performance dramatically.

cache_access_log option

This option lets you designate a logfile that will keep a record of every request processed by Squid. The format of the logfile is configurable (you can even make it look like a standard Apache logfile), but it is recommended that you leave the format in the default Squid format. There are a number of good third-party reporting tools that can parse Squid log files and provide you with reports.

acl option

This is probably the most complex part of the Squid setup process. Access control lists allow you to determine not only who gets to use your proxy server (via IP address, domain name, or username and password) but also what sites they get to visit. Once you understand the concept, the implementation is relatively straightforward. The most important thing to remember is that, by default, the Squid.conf file is configured with ACL lines that will deny access to everything except the localhost. In order to allow anyone to use your cache, you're going to have to create some ACLs and allow them access.

In our test office environment, let's say that you have a Linux box with two network cards installed. One network card is connected to a DSL router, and the other one to a switch. Also connected to this switch are 10 office computers. You want to set up Squid on this system to cache web requests for the office to save on bandwidth. You also want to log all access, require authentication, and block access to certain web sites. The external IP address of the DSL router is dynamic, and the internal IP address is 192.168.1.1/255.255.255.0.

Here is an example Squid.conf file for this setup. This offers some default access control (all intranet users can access the cache) and enables cache access and logging.

http_port 192.168.1.1:3128
cache_mem 128M
cache_access_log /var/log/squid/access.log
cache_dir ufs /var/spool/squid 1000 16 256
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl intranet src 192.168.1.0/24
acl SSL_ports port 443 563
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443 563     # https, snews
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow intranet
http_access deny all

The three key lines related to ACLs in the file are:

acl intranet src 192.168.1.0/24
http_access allow intranet
http_access deny all

The first ACL line defines an access control list named intranet that includes all systems with a source IP address in the range 192.168.1.0 through 192.168.1.255. This defines all of our internal office machines. The next line applies that ACL to a directive, in this case the http_access directive. This allows any system that matches the intranet ACL to have HTTP access to the cache. Finally, the last line denies access to any system not already explicitly allowed. This is usually good practice when setting up an ACL list, whether it's for a proxy cache, a firewall, or a router. Always have a "default deny" rule at the end, forcing you to explicitly allow anything that you want to provide access to. If your default policy is to allow, it's too easy to make a mistake in your configuration and let more through than you intend.

Now that we have a working configuration, let's add some settings to it. The first thing you might want to do is to restrict access to certain web sites. Whatever your company Internet access policy is, there are probably some web sites that you don't want your employees visiting. This is easy to implement in Squid , using an ACL. Here is an example ACL that defines some potentially nondesirable web sites:

acl blocked_sites dstdomain .espn.com espn.go.com .hotmail.com

Now that you've defined the ACL, apply it to the http_access directive:

http_access deny blocked_sites

Directives are processed in order, so you must ensure that your http_access deny all directive is last; otherwise, it will stop processing and override any following allow directives.

After you've added these two lines to your Squid.conf file, restart Squid and attempt to access www.espn.com. You should see a screen like Figure 38-4.

Access denied by Squid on host basis

Figure 38-4. Access denied by Squid on host basis

Also, you should see a line like this in your Squid access.log:

1136490718.221    870 192.168.1.33 TCP_DENIED/403 1419 GET http://www.espn.com/ - NONE/-
text/html

You can add as many domains as you wish to a directive like this. You can also filter on IP address and strings in URLs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset