[Mod_log_sql] Re: Table indexing

Carl Edwards edwards at vitesse.com
Tue Jun 7 12:43:40 EST 2005


Hi Matt,

Thanks for the info on indexes and archiving.  Instead of using
explicit locks and emptying the whole table how about keeping a
months worth of data in the current table and using something like:

INSERT IGNORE INTO archive.access SELECT * from current.access;

To keep the archive up to date?

Thanks,
Carl at sdrawdE.net

>  -----Original Message-----
>  
>  Today's Topics:
>  
>     1. Re: Table indexing (Matt Erbst)
>  
>  
>  ----------------------------------------------------------------------
>  
>  Date: Fri, 03 Jun 2005 07:47:22 -0700
>  From: Matt Erbst <merbst at cox.net>
>  Subject: Re: [Mod_log_sql] Table indexing
>  
>  When it comes to indexing, the answer is of course, "it 
>  depends".  From 
>  the perspective of a database engineer what we have here is an OLTP 
>  (On-Line Transaction Processing) system which we also wish 
>  to use as an 
>  OLAP (On-Line Analytical Processing) system.
>  
>  mod_log_sql is recording Transactions (apache hits) in an 
>  On-Line manner 
>  (as they happen) to the database.  On heavily loaded sites 
>  it is writing 
>  upwards of 100 hits per second, 24H a day, every day.  If 
>  the option is 
>  enabled, mod_log_sql will be writing up to 20 cookies PER HIT to the 
>  cookies table, and multiple headers per hit to each of the 
>  header_in and 
>  header_out tables.  In other words the ratio of INSERTs to 
>  SELECTs will 
>  be HUGE, so any indexes will overall cost more time than 
>  they save.  If 
>  you wish to use mod_log_sql on a high-traffic server AND run 
>  queries on 
>  the data, it would be best to keep a mirror of these tables, with a 
>  script that runs once every day to move the contents over 
>  from the first 
>  table to the second set.
>  
>  The set of tables being written to by mod_log_sql keeps the original 
>  names, but the copy can be named "access_log_archive" and 
>  should contain 
>  indexes.
>  
>      |-- If you are using MyISAM tables, or HEAP tables use 
>  the following
>      line|| to create your archive tables (and clear out the 
>  originals)||:|
>      |LOCK TABLES access_log, cookies, headers_in, 
>  headers_out, notes;|
>      |-- Otherwise if you are using BDB or InnoDB tables use the
>      following line to create your archive tables (and clear out the
>      originals):|
>      |BEGIN;|
>      ||
>      |CREATE TABLE access_log_archive SELECT * FROM access_log;|
>      |||CREATE TABLE cookies_archive SELECT * FROM cookies;|
>      |||CREATE TABLE headers_in_archive SELECT * FROM headers_in;|
>      |||CREATE TABLE headers_out_archive SELECT * FROM headers_out;|
>      |||CREATE TABLE notes_archive SELECT * FROM notes;|
>      |||-- If you are using BDB or InnoDB tables use the 
>  following line:|
>      ||COMMIT;
>      |-- ||Otherwise ||if you are using MyISAM tables, or 
>  HEAP tables use
>      the following line||||:|
>      |UNLOCK TABLES ||access_log, cookies, headers_in, 
>  headers_out, notes;|
>      ||
>  
>  |To keep these tables populated, you should have your server 
>  run these 
>  scripts on a nightly basis: 
>  (http://dev.mysql.com/doc/mysql/en/insert-select.html)
>  |
>  
>      |LOCK TABLES access_log, cookies, headers_in, 
>  headers_out, notes;||
>      -- If you are using MyISAM tables, or HEAP tables only|
>      |BEGIN; -- for BDB or InnoDB tables only...|
>      |INSERT INTO access_log_archive SELECT * FROM access_log;|
>      |||INSERT INTO cookies_archive SELECT * FROM cookies;|
>      |||INSERT INTO headers_in_archive SELECT * FROM headers_in;|
>      |||INSERT INTO headers_out_archive SELECT * FROM headers_out;|
>      |||INSERT INTO notes_archive SELECT * FROM notes;|
>      ||DELETE FROM access_log; -- This is why transactions, 
>  or just table
>      locking is important!  We don't want to delete the hits 
>  which were
>      logged between copying them off into our archive tables and now!
>      DELETE FROM headers_in;
>      DELETE FROM headers_out;
>      DELETE FROM cookies;
>      DELETE FROM notes;
>  
>      ||||||COMMIT; -- for BDB or InnoDB tables only
>      |UNLOCK TABLES ||access_log, cookies, headers_in, headers_out,
>      notes;|| -- If you are using MyISAM tables, or HEAP tables only|
>      ||||||
>      ||
>  
>  Indexes should be placed in the ARCHIVE tables only, on 
>  every column by 
>  which the results will be SORTED or in the WHERE clause.  The bare 
>  minimum indexes (which satisfy the most common queries one 
>  could have 
>  about their web-hits) which I use are:
>  
>      CREATE INDEX time_stamp ON access_log_archive (time_stamp);
>      CREATE INDEX virtual_host ON access_log_archive (virtual_host);
>      CREATE PRIMARY INDEX p_id ON access_log_archive (id); -- 
>  This index
>      is only neccessary if you ever wish to manipulate an 
>  individual row
>      (unlikely), or you are recording cookie and header data 
>  in the other
>      tables, or if you are using mysql to create unique IDs 
>  (see below),
>      otherwise it can be skipped
>      -- *IF* you may want to do a lot of REFER[R]ER analysis, consider
>      adding the following index:
>          CREATE INDEX referrer ON access_log_archive (referer);
>      -- IF you are curious WHICH files are being accessed the most per
>      site, consider adding the following index:
>          CREATE INDEX files_accessed ON access_log_archive 
>  (virtual_host,
>      request_uri);
>      -- IF you are curious to track WHICH IPs are visiting 
>  each site the
>      most:
>          CREATE INDEX biggest_fans ON access_log_archive 
>  (virtual_host,
>      remote_host);
>  
>  -- if you are using the other tables, you definately want to 
>  create an 
>  index on every column which references the ID column of access_log:
>  
>      CREATE INDEX id ON header_in_archive (id); -- Note: this index
>      should NOT be primary!
>      CREATE INDEX id ON header_out_archive (id); -- Note: this index
>      should NOT be primary!
>      CREATE INDEX id ON cookies_archive (id); -- Note: this 
>  index should
>      NOT be primary!
>      CREATE INDEX id ON notes_archive (id); -- Note: this index should
>      NOT be primary!
>  
>  These indexes allow for sub-millisecond response times on 
>  queries such as:
>  
>      SELECT * FROM access_logs_archive
>      WHERE time_stamp BETWEEN |UNIX_TIMESTAMP('2005-05-01') AND
>      UNIX_TIMESTAMP('2005-06-01')|
>      |AND (virtual_host = "www.myfavoritedomain.com" OR
>      "myfavoritedomain.com"); -- you could use virtual_host LIKE
>      "%myfavoritedomain.com" instead, but this wouldn't take full
>      advantage of the index on that column.|
>  
>  
>  Or my personal favorite query:
>  
>      SELECT YEAR(FROM_UNIXTIME(time_stamp)),
>      MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
>      virtual_host,
>           SUM(bytes_sent) AS bytes FROM access_log
>      GROUP BY YEAR(FROM_UNIXTIME(time_stamp)),
>      MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
>      virtual_host
>      ORDER BY YEAR(FROM_UNIXTIME(time_stamp)),
>      MONTH(FROM_UNIXTIME(time_stamp)),  
>  DAY(FROM_UNIXTIME(time_stamp)),
>      virtual_host
>      -- Alternatively: ORDER BY BYTES desc
>  
>  I personally do not record anything to the cookies, headers, 
>  or notes 
>  tables, however I wanted the ID column to have integer ID 
>  numbers that 
>  always auto-increment, so I can refer to a large range of 
>  log-entries at 
>  once rapidly.  I found that mod_log_sql works flawlessly 
>  after I changed 
>  the type of the ID column to BIGINT PRIMARY KEY AUTO_INCREMENT.  The 
>  other advantages of this over using apache's mod_unique_id 
>  are: faster 
>  performance (mysql can figure out the next ID number in a 
>  PRIMARY KEY 
>  column very quickly), and less space being used (BIGINT 
>  takes 8 bytes, 
>  versus a 19 byte CHAR field being used to record the ID number from 
>  apache in each of 5 tables.  Don't forget that it does not 
>  take long for 
>  an active server to exceed 4 billion records.).  *I would strongly 
>  recommend that future implementations of mod_log_sql use mysql's 
>  AUTO_INCREMENTING integers feature, and join the other 
>  tables to each 
>  record using MySQL's LAST_INSERT_ID() function, or if using 
>  the C API, a 
>  faster method described here: 
>  http://dev.mysql.com/doc/mysql/en/getting-unique-id.html*
>  
>  For more clever ideas, I recommend the book "High 
>  Performance MySQL" by 
>  O'Reilly publishing.
>  <http://www.amazon.com/exec/obidos/tg/detail/-/0596003064/>
>  Disclaimer: I have not personally tested any of the above 
>  code (except 
>  my favorite query), it is intended as a guideline of some 
>  good methods 
>  of minimizing server load while allowing fast transactional 
>  logging and 
>  fast queries to be written.  I also wrote this email in a 
>  hurry because 
>  I'm travelling to thailand VERY soon...
>  
>  I'll try to follow up as soon as I have internet access again...
>  
>  Thanks,
>  Matt
>  
>  Edward Rudd wrote:
>  
>  >On Tue, 2005-05-31 at 05:25, Justin Finkelstein wrote:
>  >  
>  >
>  >>Hi guys
>  >>
>  >>I've just had a look at my database and I've notice that nothing
>  >>appears to be logged for POST requests; does this need to be
>  >>explicitly turned on in the module?



More information about the Mod_log_sql mailing list