[Mod_log_sql] Re: Table indexing
Carl Edwards
edwards at vitesse.com
Tue Jun 7 12:43:40 EST 2005
Hi Matt,
Thanks for the info on indexes and archiving. Instead of using
explicit locks and emptying the whole table how about keeping a
months worth of data in the current table and using something like:
INSERT IGNORE INTO archive.access SELECT * from current.access;
To keep the archive up to date?
Thanks,
Carl at sdrawdE.net
> -----Original Message-----
>
> Today's Topics:
>
> 1. Re: Table indexing (Matt Erbst)
>
>
> ----------------------------------------------------------------------
>
> Date: Fri, 03 Jun 2005 07:47:22 -0700
> From: Matt Erbst <merbst at cox.net>
> Subject: Re: [Mod_log_sql] Table indexing
>
> When it comes to indexing, the answer is of course, "it
> depends". From
> the perspective of a database engineer what we have here is an OLTP
> (On-Line Transaction Processing) system which we also wish
> to use as an
> OLAP (On-Line Analytical Processing) system.
>
> mod_log_sql is recording Transactions (apache hits) in an
> On-Line manner
> (as they happen) to the database. On heavily loaded sites
> it is writing
> upwards of 100 hits per second, 24H a day, every day. If
> the option is
> enabled, mod_log_sql will be writing up to 20 cookies PER HIT to the
> cookies table, and multiple headers per hit to each of the
> header_in and
> header_out tables. In other words the ratio of INSERTs to
> SELECTs will
> be HUGE, so any indexes will overall cost more time than
> they save. If
> you wish to use mod_log_sql on a high-traffic server AND run
> queries on
> the data, it would be best to keep a mirror of these tables, with a
> script that runs once every day to move the contents over
> from the first
> table to the second set.
>
> The set of tables being written to by mod_log_sql keeps the original
> names, but the copy can be named "access_log_archive" and
> should contain
> indexes.
>
> |-- If you are using MyISAM tables, or HEAP tables use
> the following
> line|| to create your archive tables (and clear out the
> originals)||:|
> |LOCK TABLES access_log, cookies, headers_in,
> headers_out, notes;|
> |-- Otherwise if you are using BDB or InnoDB tables use the
> following line to create your archive tables (and clear out the
> originals):|
> |BEGIN;|
> ||
> |CREATE TABLE access_log_archive SELECT * FROM access_log;|
> |||CREATE TABLE cookies_archive SELECT * FROM cookies;|
> |||CREATE TABLE headers_in_archive SELECT * FROM headers_in;|
> |||CREATE TABLE headers_out_archive SELECT * FROM headers_out;|
> |||CREATE TABLE notes_archive SELECT * FROM notes;|
> |||-- If you are using BDB or InnoDB tables use the
> following line:|
> ||COMMIT;
> |-- ||Otherwise ||if you are using MyISAM tables, or
> HEAP tables use
> the following line||||:|
> |UNLOCK TABLES ||access_log, cookies, headers_in,
> headers_out, notes;|
> ||
>
> |To keep these tables populated, you should have your server
> run these
> scripts on a nightly basis:
> (http://dev.mysql.com/doc/mysql/en/insert-select.html)
> |
>
> |LOCK TABLES access_log, cookies, headers_in,
> headers_out, notes;||
> -- If you are using MyISAM tables, or HEAP tables only|
> |BEGIN; -- for BDB or InnoDB tables only...|
> |INSERT INTO access_log_archive SELECT * FROM access_log;|
> |||INSERT INTO cookies_archive SELECT * FROM cookies;|
> |||INSERT INTO headers_in_archive SELECT * FROM headers_in;|
> |||INSERT INTO headers_out_archive SELECT * FROM headers_out;|
> |||INSERT INTO notes_archive SELECT * FROM notes;|
> ||DELETE FROM access_log; -- This is why transactions,
> or just table
> locking is important! We don't want to delete the hits
> which were
> logged between copying them off into our archive tables and now!
> DELETE FROM headers_in;
> DELETE FROM headers_out;
> DELETE FROM cookies;
> DELETE FROM notes;
>
> ||||||COMMIT; -- for BDB or InnoDB tables only
> |UNLOCK TABLES ||access_log, cookies, headers_in, headers_out,
> notes;|| -- If you are using MyISAM tables, or HEAP tables only|
> ||||||
> ||
>
> Indexes should be placed in the ARCHIVE tables only, on
> every column by
> which the results will be SORTED or in the WHERE clause. The bare
> minimum indexes (which satisfy the most common queries one
> could have
> about their web-hits) which I use are:
>
> CREATE INDEX time_stamp ON access_log_archive (time_stamp);
> CREATE INDEX virtual_host ON access_log_archive (virtual_host);
> CREATE PRIMARY INDEX p_id ON access_log_archive (id); --
> This index
> is only neccessary if you ever wish to manipulate an
> individual row
> (unlikely), or you are recording cookie and header data
> in the other
> tables, or if you are using mysql to create unique IDs
> (see below),
> otherwise it can be skipped
> -- *IF* you may want to do a lot of REFER[R]ER analysis, consider
> adding the following index:
> CREATE INDEX referrer ON access_log_archive (referer);
> -- IF you are curious WHICH files are being accessed the most per
> site, consider adding the following index:
> CREATE INDEX files_accessed ON access_log_archive
> (virtual_host,
> request_uri);
> -- IF you are curious to track WHICH IPs are visiting
> each site the
> most:
> CREATE INDEX biggest_fans ON access_log_archive
> (virtual_host,
> remote_host);
>
> -- if you are using the other tables, you definately want to
> create an
> index on every column which references the ID column of access_log:
>
> CREATE INDEX id ON header_in_archive (id); -- Note: this index
> should NOT be primary!
> CREATE INDEX id ON header_out_archive (id); -- Note: this index
> should NOT be primary!
> CREATE INDEX id ON cookies_archive (id); -- Note: this
> index should
> NOT be primary!
> CREATE INDEX id ON notes_archive (id); -- Note: this index should
> NOT be primary!
>
> These indexes allow for sub-millisecond response times on
> queries such as:
>
> SELECT * FROM access_logs_archive
> WHERE time_stamp BETWEEN |UNIX_TIMESTAMP('2005-05-01') AND
> UNIX_TIMESTAMP('2005-06-01')|
> |AND (virtual_host = "www.myfavoritedomain.com" OR
> "myfavoritedomain.com"); -- you could use virtual_host LIKE
> "%myfavoritedomain.com" instead, but this wouldn't take full
> advantage of the index on that column.|
>
>
> Or my personal favorite query:
>
> SELECT YEAR(FROM_UNIXTIME(time_stamp)),
> MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
> virtual_host,
> SUM(bytes_sent) AS bytes FROM access_log
> GROUP BY YEAR(FROM_UNIXTIME(time_stamp)),
> MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
> virtual_host
> ORDER BY YEAR(FROM_UNIXTIME(time_stamp)),
> MONTH(FROM_UNIXTIME(time_stamp)),
> DAY(FROM_UNIXTIME(time_stamp)),
> virtual_host
> -- Alternatively: ORDER BY BYTES desc
>
> I personally do not record anything to the cookies, headers,
> or notes
> tables, however I wanted the ID column to have integer ID
> numbers that
> always auto-increment, so I can refer to a large range of
> log-entries at
> once rapidly. I found that mod_log_sql works flawlessly
> after I changed
> the type of the ID column to BIGINT PRIMARY KEY AUTO_INCREMENT. The
> other advantages of this over using apache's mod_unique_id
> are: faster
> performance (mysql can figure out the next ID number in a
> PRIMARY KEY
> column very quickly), and less space being used (BIGINT
> takes 8 bytes,
> versus a 19 byte CHAR field being used to record the ID number from
> apache in each of 5 tables. Don't forget that it does not
> take long for
> an active server to exceed 4 billion records.). *I would strongly
> recommend that future implementations of mod_log_sql use mysql's
> AUTO_INCREMENTING integers feature, and join the other
> tables to each
> record using MySQL's LAST_INSERT_ID() function, or if using
> the C API, a
> faster method described here:
> http://dev.mysql.com/doc/mysql/en/getting-unique-id.html*
>
> For more clever ideas, I recommend the book "High
> Performance MySQL" by
> O'Reilly publishing.
> <http://www.amazon.com/exec/obidos/tg/detail/-/0596003064/>
> Disclaimer: I have not personally tested any of the above
> code (except
> my favorite query), it is intended as a guideline of some
> good methods
> of minimizing server load while allowing fast transactional
> logging and
> fast queries to be written. I also wrote this email in a
> hurry because
> I'm travelling to thailand VERY soon...
>
> I'll try to follow up as soon as I have internet access again...
>
> Thanks,
> Matt
>
> Edward Rudd wrote:
>
> >On Tue, 2005-05-31 at 05:25, Justin Finkelstein wrote:
> >
> >
> >>Hi guys
> >>
> >>I've just had a look at my database and I've notice that nothing
> >>appears to be logged for POST requests; does this need to be
> >>explicitly turned on in the module?
More information about the Mod_log_sql
mailing list