Credit: Mark Nenadov
You need to monitor how often client requests are refused by your Apache web server because the client’s cache of the page is up to date.
When a browser queries a server for a page that the browser has in its cache, the browser lets the server know about the cached data, and the server returns an error code (rather than serving the page again) if the client’s cache is up to date. Here’s how to find the statistics for such occurrences in your server’s logs:
def ClientCachePercentage(logfile_pathname):
Contents = open(logfile_pathname, "r").xreadlines( )
TotalRequests = 0
CachedRequests = 0
for line in Contents:
TotalRequests += 1
if line.split(" ")[8] == "304": # if server returned "not modified"
CachedRequests += 1
return (100*CachedRequests)/TotalRequests
The percentage of requests to your Apache server that are met by the client’s own cache is an important factor in the perceived performance of your server. The code in this recipe helps you get this information from the server’s log. Typical use would be:
log_path = "/usr/local/nusphere/apache/logs/access_log" print "Percentage of requests that are client-cached: " + str( ClientCachePercentage(log_path)) + "%"
The recipe reads the log file via the special method
xreadlines
, introduced in Python
2.1, rather than via the more normal readlines
.
readlines
must read the whole file into memory,
since it returns a list of all lines, making it unsuitable for very
large files, which server log files can certainly be. Therefore,
trying to read the whole log file into memory at once might not work
(or work too slowly due to virtual-memory thrashing effects).
xreadlines
returns a special object, meant to be
used only in a for
statement (somewhat like an
iterator
in Python 2.2; Python 2.1 did not have a
formal concept of iterators), which can save a lot of memory. In
Python 2.2, it would be simplest to iterate on the file object
directly, with a for
statement such as:
for line in open(logfile_pathname):
This is the simplest and fastest approach, but it does require Python 2.2 or later to work.
The body of the for
loop calls the
split
method on each line string, with a string of
a single space as the argument, to split the line into a tuple of its
space-separated fields. Then it uses indexing
([8]
) to get the ninth such field. Apache puts the
error code into the ninth field of each line in the log. Code
"304"
means “not
modified” (i.e., the client’s cache
was already correctly updated). We count those cases in the
CachedRequests
variable and all lines in the log
in the TotalRequests
variable, so that, in the
end, we can return the percentage of cache hits. Note that in the
expression used with the return
statement,
it’s important to multiply by 100 before we divide,
since up to Python 2.1 (and even in 2.2, by default), division
between integers truncates (i.e., ignores the remainder). If we
divided first, that would truncate to 0; so multiplying by 100 would
still give 0, which is not a very useful result!
The Apache web server is available and documented at http://httpd.apache.org.