Javascript required
Skip to content Skip to sidebar Skip to footer

Read a Httpd Log File in Python

Log Files

Available Languages:  en  |  fr  |  ja  |  ko  |  tr

In order to effectively manage a web server, it is necessary to go feedback about the action and performance of the server as well as any problems that may be occurring. The Apache HTTP Server provides very comprehensive and flexible logging capabilities. This certificate describes how to configure its logging capabilities, and how to understand what the logs contain.

Support Apache!

See also

  • Comments

top

Overview

The Apache HTTP Server provides a variety of different mechanisms for logging everything that happens on your server, from the initial request, through the URL mapping process, to the terminal resolution of the connection, including whatsoever errors that may accept occurred in the process. In addition to this, third-party modules may provide logging capabilities, or inject entries into the existing log files, and applications such as CGI programs, or PHP scripts, or other handlers, may send messages to the server error log.

In this document nosotros hash out the logging modules that are a standard role of the http server.

top

Security Warning

Anyone who can write to the directory where Apache httpd is writing a log file can almost certainly gain access to the uid that the server is started every bit, which is usually root. Do NOT give people write access to the directory the logs are stored in without existence enlightened of the consequences; encounter the security tips certificate for details.

In addition, log files may contain information supplied straight by the client, without escaping. Therefore, information technology is possible for malicious clients to insert control-characters in the log files, so intendance must be taken in dealing with raw logs.

top

Error Log

The server error log, whose proper name and location is set by the ErrorLog directive, is the almost of import log file. This is the place where Apache httpd will send diagnostic information and record any errors that it encounters in processing requests. Information technology is the start identify to look when a problem occurs with starting the server or with the operation of the server, since information technology volition ofttimes contain details of what went wrong and how to set information technology.

The error log is commonly written to a file (typically error_log on Unix systems and error.log on Windows and OS/two). On Unix systems it is also possible to have the server send errors to syslog or pipe them to a program.

The format of the error log is divers by the ErrorLogFormat directive, with which you can customize what values are logged. A default is format defined if you don't specify one. A typical log message follows:

[Fri Sep 09 10:42:29.902022 2011] [core:error] [pid 35708:tid 4328636416] [client 72.15.99.187] File does not exist: /usr/local/apache2/htdocs/favicon.ico

The first item in the log entry is the date and fourth dimension of the bulletin. The next is the module producing the message (cadre, in this case) and the severity level of that message. This is followed by the process ID and, if appropriate, the thread ID, of the process that experienced the condition. Next, nosotros accept the customer address that made the request. And finally is the detailed error message, which in this example indicates a asking for a file that did non exist.

A very wide diverseness of dissimilar messages can announced in the fault log. Most await similar to the example above. The error log volition also incorporate debugging output from CGI scripts. Any information written to stderr past a CGI script will be copied directly to the mistake log.

Putting a %L token in both the error log and the access log will produce a log entry ID with which you lot tin correlate the entry in the mistake log with the entry in the access log. If mod_unique_id is loaded, its unique asking ID volition be used as the log entry ID, too.

During testing, it is often useful to continuously monitor the error log for whatever problems. On Unix systems, y'all can accomplish this using:

tail -f error_log

top

Per-module logging

The LogLevel directive allows you to specify a log severity level on a per-module basis. In this way, if y'all are troubleshooting a trouble with just i particular module, y'all tin can plough up its logging book without besides getting the details of other modules that you're not interested in. This is particularly useful for modules such as mod_proxy or mod_rewrite where you want to know details about what it's trying to exercise.

Do this past specifying the name of the module in your LogLevel directive:

LogLevel info rewrite:trace5

This sets the main LogLevel to info, but turns information technology up to trace5 for mod_rewrite.

This replaces the per-module logging directives, such as RewriteLog, that were nowadays in earlier versions of the server.

top

Access Log

The server access log records all requests processed by the server. The location and content of the admission log are controlled past the CustomLog directive. The LogFormat directive can be used to simplify the selection of the contents of the logs. This section describes how to configure the server to record information in the admission log.

Storing the information in the access log is but the start of log management. The side by side pace is to analyze this information to produce useful statistics. Log analysis in general is across the scope of this document, and not really role of the task of the web server itself.

Various versions of Apache httpd have used other modules and directives to command access logging, including mod_log_referer, mod_log_agent, and the TransferLog directive. The CustomLog directive now subsumes the functionality of all the older directives.

The format of the access log is highly configurable. The format is specified using a format string that looks much similar a C-style printf(1) format string. Some examples are presented in the next sections. For a complete list of the possible contents of the format cord, see the mod_log_config format strings.

Common Log Format

A typical configuration for the access log might wait as follows.

LogFormat "%h %l %u %t \"%r\" %>s %b" mutual CustomLog logs/access_log common

This defines the nickname common and associates information technology with a particular log format string. The format string consists of pct directives, each of which tell the server to log a particular piece of information. Literal characters may also exist placed in the format string and will be copied directly into the log output. The quote grapheme (") must be escaped past placing a backslash before it to prevent it from beingness interpreted every bit the stop of the format cord. The format string may likewise contain the special control characters "\n" for new-line and "\t" for tab.

The CustomLog directive sets up a new log file using the defined nickname. The filename for the access log is relative to the ServerRoot unless it begins with a slash.

The above configuration will write log entries in a format known as the Mutual Log Format (CLF). This standard format tin can be produced by many different web servers and read by many log analysis programs. The log file entries produced in CLF will look something like this:

127.0.0.1 - frank [10/October/2000:xiii:55:36 -0700] "Go /apache_pb.gif HTTP/1.0" 200 2326

Each part of this log entry is described below.

127.0.0.1 (%h)
This is the IP address of the client (remote host) which made the asking to the server. If HostnameLookups is set to On, then the server volition effort to determine the hostname and log information technology in place of the IP address. However, this configuration is not recommended since it can significantly irksome the server. Instead, it is best to use a log post-processor such equally logresolve to determine the hostnames. The IP address reported here is not necessarily the address of the auto at which the user is sitting. If a proxy server exists between the user and the server, this address volition exist the address of the proxy, rather than the originating machine.
- (%l)
The "hyphen" in the output indicates that the requested piece of information is not available. In this instance, the information that is not bachelor is the RFC 1413 identity of the customer determined by identd on the clients machine. This data is highly unreliable and should near never be used except on tightly controlled internal networks. Apache httpd volition not even try to decide this information unless IdentityCheck is set to On.
frank (%u)
This is the userid of the person requesting the certificate as determined by HTTP authentication. The same value is typically provided to CGI scripts in the REMOTE_USER environment variable. If the status code for the request (see below) is 401, then this value should not be trusted because the user is not yet authenticated. If the document is not password protected, this part will be "-" just like the previous one.
[ten/Oct/2000:13:55:36 -0700] (%t)
The time that the asking was received. The format is:

[day/month/twelvemonth:hr:minute:second zone]
day = ii*digit
month = iii*letter of the alphabet
year = 4*digit
hour = 2*digit
infinitesimal = 2*digit
second = 2*digit
zone = (`+' | `-') 4*digit

Information technology is possible to accept the time displayed in another format by specifying %{format}t in the log format string, where format is either as in strftime(3) from the C standard library, or 1 of the supported special tokens. For details encounter the mod_log_config format strings.

"GET /apache_pb.gif HTTP/1.0" (\"%r\")
The asking line from the client is given in double quotes. The request line contains a groovy deal of useful data. First, the method used by the client is GET. Second, the client requested the resource /apache_pb.gif, and third, the client used the protocol HTTP/1.0. Information technology is besides possible to log one or more than parts of the request line independently. For example, the format cord "%thousand %U%q %H" will log the method, path, query-string, and protocol, resulting in exactly the same output as "%r".
200 (%>s)
This is the status lawmaking that the server sends back to the customer. This data is very valuable, because it reveals whether the request resulted in a successful response (codes beginning in two), a redirection (codes beginning in three), an error caused by the client (codes beginning in 4), or an fault in the server (codes beginning in 5). The full list of possible status codes tin can be found in the HTTP specification (RFC2616 section 10).
2326 (%b)
The last part indicates the size of the object returned to the client, not including the response headers. If no content was returned to the client, this value volition be "-". To log "0" for no content, apply %B instead.

Combined Log Format

Some other unremarkably used format string is called the Combined Log Format. Information technology tin be used every bit follows.

LogFormat "%h %l %u %t \"%r\" %>south %b \"%{Referer}i\" \"%{User-amanuensis}i\"" combined CustomLog log/access_log combined

This format is exactly the same as the Common Log Format, with the improver of two more fields. Each of the additional fields uses the percent-directive %{header}i, where header tin can exist whatsoever HTTP asking header. The access log under this format will look like:

127.0.0.ane - frank [x/October/2000:13:55:36 -0700] "Get /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/beginning.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

The additional fields are:

"http://www.example.com/starting time.html" (\"%{Referer}i\")
The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from. (This should be the page that links to or includes /apache_pb.gif).
"Mozilla/4.08 [en] (Win98; I ;Nav)" (\"%{User-agent}i\")
The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.

Multiple Access Logs

Multiple access logs tin can be created simply by specifying multiple CustomLog directives in the configuration file. For example, the following directives volition create iii access logs. The outset contains the basic CLF information, while the second and third contain referer and browser information. The final two CustomLog lines evidence how to mimic the effects of the ReferLog and AgentLog directives.

LogFormat "%h %l %u %t \"%r\" %>s %b" common CustomLog logs/access_log common CustomLog logs/referer_log "%{Referer}i -> %U" CustomLog logs/agent_log "%{User-amanuensis}i"

This example also shows that it is not necessary to define a nickname with the LogFormat directive. Instead, the log format can exist specified directly in the CustomLog directive.

Conditional Logs

There are times when it is convenient to exclude sure entries from the access logs based on characteristics of the customer request. This is easily achieved with the help of environment variables. First, an environment variable must exist set up to point that the request meets certain conditions. This is ordinarily accomplished with SetEnvIf. Then the env= clause of the CustomLog directive is used to include or exclude requests where the environs variable is gear up. Some examples:

# Marker requests from the loop-back interface SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog # Marker requests for the robots.txt file SetEnvIf Request_URI "^/robots\.txt$" dontlog # Log what remains CustomLog logs/access_log common env=!dontlog

Every bit another example, consider logging requests from english-speakers to one log file, and non-english language speakers to a different log file.

SetEnvIf Take-Linguistic communication "en" english CustomLog logs/english_log mutual env=english language CustomLog logs/non_english_log common env=!english

In a caching scenario one would desire to know about the efficiency of the cache. A very simple method to detect this out would exist:

SetEnv CACHE_MISS 1 LogFormat "%h %l %u %t "%r " %>s %b %{CACHE_MISS}e" mutual-cache CustomLog logs/access_log common-cache

mod_cache volition run before mod_env and, when successful, will deliver the content without it. In that instance a cache hit will log -, while a cache miss will log 1.

In addition to the env= syntax, LogFormat supports logging values conditional upon the HTTP response code:

LogFormat "%400,501{User-agent}i" browserlog LogFormat "%!200,304,302{Referer}i" refererlog

In the first example, the User-agent will be logged if the HTTP condition code is 400 or 501. In other cases, a literal "-" volition be logged instead. Likewise, in the second instance, the Referer will be logged if the HTTP status code is non 200, 304, or 302. (Note the "!" before the condition codes.

Although we have just shown that conditional logging is very powerful and flexible, it is not the only mode to command the contents of the logs. Log files are more useful when they contain a complete record of server activity. It is often easier to simply post-process the log files to remove requests that you lot practice not want to consider.

top

Log Rotation

On even a moderately busy server, the quantity of information stored in the log files is very large. The access log file typically grows ane MB or more than per 10,000 requests. It will consequently exist necessary to periodically rotate the log files by moving or deleting the existing logs. This cannot be done while the server is running, because Apache httpd volition continue writing to the former log file as long as it holds the file open. Instead, the server must be restarted after the log files are moved or deleted and then that it will open new log files.

By using a graceful restart, the server can be instructed to open new log files without losing any existing or pending connections from clients. However, in guild to accomplish this, the server must continue to write to the old log files while it finishes serving erstwhile requests. It is therefore necessary to wait for some time afterwards the restart before doing any processing on the log files. A typical scenario that simply rotates the logs and compresses the sometime logs to salve space is:

mv access_log access_log.old
mv error_log error_log.quondam
apachectl graceful
sleep 600
gzip access_log.erstwhile error_log.old

Some other way to perform log rotation is using piped logs as discussed in the next section.

top

Piped Logs

Apache httpd is capable of writing error and access log files through a pipe to another process, rather than straight to a file. This capability dramatically increases the flexibility of logging, without adding lawmaking to the main server. In order to write logs to a pipe, only supercede the filename with the pipe character "|", followed by the name of the executable which should take log entries on its standard input. The server will get-go the piped-log process when the server starts, and will restart it if it crashes while the server is running. (This last feature is why nosotros can refer to this technique as "reliable piped logging".)

Piped log processes are spawned past the parent Apache httpd procedure, and inherit the userid of that process. This means that piped log programs usually run equally root. It is therefore very important to keep the programs uncomplicated and secure.

One important utilize of piped logs is to allow log rotation without having to restart the server. The Apache HTTP Server includes a simple program chosen rotatelogs for this purpose. For example, to rotate the logs every 24 hours, you can use:

CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common

Observe that quotes are used to enclose the unabridged command that volition be called for the piping. Although these examples are for the admission log, the same technique can be used for the mistake log.

As with conditional logging, piped logs are a very powerful tool, but they should not exist used where a simpler solution like off-line post-processing is available.

By default the piped log process is spawned without invoking a trounce. Utilise "|$" instead of "|" to spawn using a trounce (ordinarily with /bin/sh -c):

# Invoke "rotatelogs" using a beat out CustomLog "|$/usr/local/apache/bin/rotatelogs   /var/log/access_log 86400" mutual

This was the default behaviour for Apache ii.ii. Depending on the beat out specifics this might atomic number 82 to an boosted shell process for the lifetime of the logging pipe program and signal treatment problems during restart. For compatibility reasons with Apache 2.two the notation "||" is besides supported and equivalent to using "|".

Windows note

Notation that on Windows, yous may come across problems when running many piped logger processes, especially when HTTPD is running as a service. This is acquired by running out of desktop heap space. The desktop heap space given to each service is specified by the third argument to the SharedSection parameter in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\SubSystems\Windows registry value. Change this value with care; the normal caveats for changing the Windows registry apply, but you might also frazzle the desktop heap pool if the number is adjusted too high.

top

Virtual Hosts

When running a server with many virtual hosts, there are several options for dealing with log files. Start, it is possible to use logs exactly as in a single-host server. Just by placing the logging directives outside the <VirtualHost> sections in the main server context, it is possible to log all requests in the same admission log and error log. This technique does non let for easy collection of statistics on individual virtual hosts.

If CustomLog or ErrorLog directives are placed inside a <VirtualHost> section, all requests or errors for that virtual host will be logged just to the specified file. Any virtual host which does non have logging directives will still have its requests sent to the main server logs. This technique is very useful for a small number of virtual hosts, but if the number of hosts is very large, information technology can be complicated to manage. In addition, it tin can oft create bug with insufficient file descriptors.

For the admission log, there is a very proficient compromise. By adding data on the virtual host to the log format string, it is possible to log all hosts to the aforementioned log, and later on split the log into private files. For example, consider the following directives.

LogFormat "%5 %l %u %t \"%r\" %>s %b" comonvhost CustomLog logs/access_log comonvhost

The %five is used to log the name of the virtual host that is serving the request. Then a program similar split-logfile can be used to postal service-process the admission log in order to split it into one file per virtual host.

top

Other Log Files

Logging bodily bytes sent and received

mod_logio adds in two additional LogFormat fields (%I and %O) that log the actual number of bytes received and sent on the network.

Forensic Logging

mod_log_forensic provides for forensic logging of client requests. Logging is done before and after processing a request, and then the forensic log contains two log lines for each asking. The forensic logger is very strict with no customizations. Information technology can exist an invaluable debugging and security tool.

PID File

On startup, Apache httpd saves the process id of the parent httpd process to the file logs/httpd.pid. This filename tin can be changed with the PidFile directive. The process-id is for utilise by the administrator in restarting and terminating the daemon by sending signals to the parent process; on Windows, use the -1000 control line selection instead. For more than information run into the Stopping and Restarting folio.

Script Log

In order to aid in debugging, the ScriptLog directive allows yous to record the input to and output from CGI scripts. This should only be used in testing - not for live servers. More information is available in the mod_cgi documentation.

Read a Httpd Log File in Python

Source: https://httpd.apache.org/docs/2.4/logs.html