<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
When it comes to indexing, the answer is of course, "it depends". From
the perspective of a database engineer what we have here is an OLTP
(On-Line Transaction Processing) system which we also wish to use as an
OLAP (On-Line Analytical Processing) system.<br>
<br>
mod_log_sql is recording Transactions (apache hits) in an On-Line
manner (as they happen) to the database. On heavily loaded sites it is
writing upwards of 100 hits per second, 24H a day, every day. If the
option is enabled, mod_log_sql will be writing up to 20 cookies PER HIT
to the cookies table, and multiple headers per hit to each of the
header_in and header_out tables. In other words the ratio of INSERTs
to SELECTs will be HUGE, so any indexes will overall cost more time
than they save. If you wish to use mod_log_sql on a high-traffic
server AND run queries on the data, it would be best to keep a mirror
of these tables, with a script that runs once every day to move the
contents over from the first table to the second set.<br>
<br>
The set of tables being written to by mod_log_sql keeps the original
names, but the copy can be named "access_log_archive" and should
contain indexes.<br>
<blockquote><code class="literal">-- If you are using MyISAM tables, or
HEAP tables use the following line</code><code class="literal"> to
create your archive tables (and clear out the originals)</code><code
class="literal">:</code><br>
<code class="literal">LOCK TABLES access_log, cookies, headers_in,
headers_out, notes;</code><br>
<code class="literal">-- Otherwise if you are using BDB or InnoDB
tables use the following line to create your archive tables (and clear
out the originals):</code><br>
<code class="literal">BEGIN;</code><br>
<code class="literal"></code><br>
<code class="literal">CREATE TABLE access_log_archive SELECT * FROM
access_log;</code><br>
<code class="literal"></code><code class="literal">CREATE TABLE
cookies_archive SELECT * FROM cookies;</code><br>
<code class="literal"></code><code class="literal">CREATE TABLE
headers_in_archive SELECT * FROM headers_in;</code><br>
<code class="literal"></code><code class="literal">CREATE TABLE
headers_out_archive SELECT * FROM headers_out;</code><br>
<code class="literal"></code><code class="literal">CREATE TABLE
notes_archive SELECT * FROM notes;</code><br>
<code class="literal"></code><code class="literal">-- If you are
using BDB or InnoDB
tables use the following line:</code><br>
<code class="literal"></code>COMMIT;<br>
<code class="literal">-- </code><code class="literal">Otherwise </code><code
class="literal">if you are using MyISAM tables, or HEAP tables use the
following line</code><code class="literal"></code><code class="literal">:</code><br>
<code class="literal">UNLOCK TABLES </code><code class="literal">access_log,
cookies, headers_in, headers_out, notes;</code><br>
<code class="literal"></code><br>
</blockquote>
<code class="literal">To keep these tables populated, you should have
your server run these scripts on a nightly basis:
(<a class="moz-txt-link-freetext" href="http://dev.mysql.com/doc/mysql/en/insert-select.html">http://dev.mysql.com/doc/mysql/en/insert-select.html</a>)<br>
</code>
<blockquote><tt><font face="System"><font face="System"><code
class="literal">LOCK TABLES access_log, cookies, headers_in,
headers_out, notes;</code><code class="literal"> -- If you are using
MyISAM tables, or HEAP tables only</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal">BEGIN;
-- for BDB or InnoDB tables only...</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal">INSERT
INTO access_log_archive SELECT * FROM access_log;</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code><code
class="literal">INSERT INTO cookies_archive SELECT * FROM cookies;</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code><code
class="literal">INSERT INTO headers_in_archive SELECT * FROM
headers_in;</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code><code
class="literal">INSERT INTO headers_out_archive SELECT * FROM
headers_out;</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code><code
class="literal">INSERT INTO notes_archive SELECT * FROM notes;</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code>DELETE
FROM access_log; -- This is why transactions, or just table
locking is important! We don't want to delete the hits which were
logged between copying them off into our archive tables and now!</font></font></tt><br>
<tt><font face="System"><font face="System">DELETE FROM headers_in;</font></font></tt><br>
<tt><font face="System"><font face="System">DELETE FROM headers_out;</font></font></tt><br>
<tt><font face="System"><font face="System">DELETE FROM cookies;</font></font></tt><br>
<tt><font face="System"><font face="System">DELETE FROM notes;</font></font></tt><br>
<br>
<tt><font face="System"><font face="System"><code class="literal"></code><code
class="literal"></code><code class="literal"></code>COMMIT; -- for BDB
or InnoDB tables only</font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal">UNLOCK
TABLES </code><code class="literal">access_log, cookies, headers_in,
headers_out, notes;</code><code class="literal"> -- If you are using
MyISAM tables, or HEAP tables only</code></font></font></tt><br>
<tt><font face="System"><font face="System"><code class="literal"></code></font><code
class="literal"></code></font></tt><font face="System"><code
class="literal"></code></font><br>
<font face="System"><code class="literal"></code></font></blockquote>
Indexes should be placed in the ARCHIVE tables only, on every column by
which the results will be SORTED or in the WHERE clause. The bare
minimum indexes (which satisfy the most common queries one could have
about their web-hits) which I use are:<br>
<blockquote>CREATE INDEX time_stamp ON access_log_archive (time_stamp);<br>
CREATE INDEX virtual_host ON access_log_archive (virtual_host);<br>
CREATE PRIMARY INDEX p_id ON access_log_archive (id); -- This index is
only neccessary if you ever wish to manipulate an individual row
(unlikely), or you are recording cookie and header data in the other
tables, or if you are using mysql to create unique IDs (see below),
otherwise it can be skipped<br>
-- <b>IF</b> you may want to do a lot of REFER[R]ER analysis, consider
adding the following index:<br>
CREATE INDEX referrer ON access_log_archive (referer);<br>
-- IF you are curious WHICH files are being accessed the most per site,
consider adding the following index:<br>
CREATE INDEX files_accessed ON access_log_archive (virtual_host,
request_uri);<br>
-- IF you are curious to track WHICH IPs are visiting each site the
most:<br>
CREATE INDEX biggest_fans ON access_log_archive (virtual_host,
remote_host);<br>
</blockquote>
-- if you are using the other tables, you definately want to create an
index on every column which references the ID column of access_log:<br>
<blockquote>CREATE INDEX id ON header_in_archive (id); -- Note: this
index should NOT be primary!<br>
CREATE INDEX id ON header_out_archive (id); -- Note: this index should
NOT be primary!<br>
CREATE INDEX id ON cookies_archive (id); -- Note: this index should NOT
be primary!<br>
CREATE INDEX id ON notes_archive (id); -- Note: this index should NOT
be primary!<br>
</blockquote>
These indexes allow for sub-millisecond response times on queries such
as:<br>
<blockquote>SELECT * FROM access_logs_archive<br>
WHERE time_stamp BETWEEN <code class="literal">UNIX_TIMESTAMP('2005-05-01')
AND UNIX_TIMESTAMP('2005-06-01')</code><br>
<code class="literal">AND (virtual_host = "<a class="moz-txt-link-abbreviated" href="http://www.myfavoritedomain.com">www.myfavoritedomain.com</a>"
OR "myfavoritedomain.com"); -- you could use virtual_host LIKE
"%myfavoritedomain.com" instead, but this wouldn't take full advantage
of the index on that column.</code><br>
</blockquote>
<br>
Or my personal favorite query:<br>
<blockquote>SELECT YEAR(FROM_UNIXTIME(time_stamp)),
MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
virtual_host,<br>
SUM(bytes_sent) AS bytes FROM access_log<br>
GROUP BY YEAR(FROM_UNIXTIME(time_stamp)),
MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
virtual_host<br>
ORDER BY YEAR(FROM_UNIXTIME(time_stamp)),
MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
virtual_host<br>
-- Alternatively: ORDER BY BYTES desc<br>
</blockquote>
I personally do not record anything to the cookies, headers, or notes
tables, however I wanted the ID column to have integer ID numbers that
always auto-increment, so I can refer to a large range of log-entries
at once rapidly. I found that mod_log_sql works flawlessly after I
changed the type of the ID column to BIGINT PRIMARY KEY
AUTO_INCREMENT. The other advantages of this over using apache's
mod_unique_id are: faster performance (mysql can figure out the next ID
number in a PRIMARY KEY column very quickly), and less space being used
(BIGINT takes 8 bytes, versus a 19 byte CHAR field being used to record
the ID number from apache in each of 5 tables. Don't forget that it
does not take long for an active server to exceed 4 billion records.).
<b>I would strongly recommend that future implementations of
mod_log_sql use mysql's AUTO_INCREMENTING integers feature, and join
the other tables to each record using MySQL's LAST_INSERT_ID()
function, or if using the C API, a faster method described here:
<a class="moz-txt-link-freetext" href="http://dev.mysql.com/doc/mysql/en/getting-unique-id.html">http://dev.mysql.com/doc/mysql/en/getting-unique-id.html</a></b><br>
<br>
For more clever ideas, I recommend the book <a
href="http://www.amazon.com/exec/obidos/tg/detail/-/0596003064/">"High
Performance MySQL" by O'Reilly publishing.<br>
</a><br>
Disclaimer: I have not personally tested any of the above code (except
my favorite query), it is intended as a guideline of some good methods
of minimizing server load while allowing fast transactional logging and
fast queries to be written. I also wrote this email in a hurry because
I'm travelling to thailand VERY soon...<br>
<br>
I'll try to follow up as soon as I have internet access again...<br>
<br>
Thanks,<br>
Matt<br>
<br>
Edward Rudd wrote:
<blockquote cite="mid1117546992.3097.35.camel@urkle.omega.home"
type="cite">
<pre wrap="">On Tue, 2005-05-31 at 05:25, Justin Finkelstein wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi guys
I've just had a look at my database and I've notice that nothing
appears to be logged for POST requests; does this need to be
explicitly turned on in the module?
</pre>
</blockquote>
<pre wrap=""><!---->
They should be logged and are so on my servers running 1.100. The
actual data posted does not and will not be logged. but the requested
URL will be.
</pre>
<blockquote type="cite">
<pre wrap="">Also, I use the log_mass_virtual_hosting option, and I've noticed that
this doesn't generate any indices on the tables - could I request an
option for creating indices on the fly for new tables? I'd be happy to
work on this if the project's open for other devs...
</pre>
</blockquote>
<pre wrap=""><!---->
The biggest issue with adding indexes (which I believe there is a bug
report on issues.outoforder.cc about this) is which columns to index,
and worrying about the performance hit for logging, as having indexes
slows down the process of inserting data. I plan to add back in INSERT
DELAYED for mysql in the next release which should relieve this issue
some, but that will only help MySQL back-ends.
If you are handy and knowledgeable w/ SQL index configuration and would
like to test indexes and find a *good* fast set for the table that would
help.
And I'm always open to contributions, just send the patches to this list
or attached to a bug in issues.outoforder.cc.
</pre>
<pre wrap=""><hr size="4" width="90%">
_______________________________________________
Download the latest version at <a class="moz-txt-link-freetext"
href="http://www.outoforder.cc/projects/apache/mod_log_sql/">http://www.outoforder.cc/projects/apache/mod_log_sql/</a>
To unsubscribe send an e-mail to
<a class="moz-txt-link-abbreviated"
href="mailto:mod_log_sql-unsubscribe@lists.outoforder.cc">mod_log_sql-unsubscribe@lists.outoforder.cc</a>
</pre>
</blockquote>
</body>
</html>