[Mod_log_sql] mod_log_mysql

Marcel Hartmann mail at marcel-hartmann.com
Sun Jun 12 13:26:21 EST 2005


Hi,

i would have some information please.

By the infos i have read on this list, my Solution for an ISP is the following:

1. I would set MassVirtualHosting Off
2. All VirtualHosts should log into one access_table
3. every night i will copy the data from each virtualhost 
   per month based in a new archive table. So i have all data in seperate new    
   archive tables, and the old table is empty after this.
4. i will create cronjob scripts whitch copys the mysql data for each vhost in
   a textfile to analyse with awstats or webalyzer.
5. I use no unique_id and no notes, cookies, headers_in and headers_out tables.

...and hope there is a auto_inkrement compile flag in the next version ;-)
Then in my cronjobs i use this statements for my needs. I set indexes on all
attributes, with must be sorted by Where clausel.
   |-- If you are using MyISAM tables, or HEAP tables use the following
    line|| to create your archive tables (and clear out the originals)||:|
    |LOCK TABLES access_log, cookies, headers_in, headers_out, notes;|
    |-- Otherwise if you are using BDB or InnoDB tables use the
    following line to create your archive tables (and clear out the
    originals):|
    |BEGIN;|
    ||
    |CREATE TABLE access_log_archive SELECT * FROM access_log;|
    |||CREATE TABLE cookies_archive SELECT * FROM cookies;|
    |||CREATE TABLE headers_in_archive SELECT * FROM headers_in;|
    |||CREATE TABLE headers_out_archive SELECT * FROM headers_out;|
    |||CREATE TABLE notes_archive SELECT * FROM notes;|
    |||-- If you are using BDB or InnoDB tables use the following line:|
    ||COMMIT;
    |-- ||Otherwise ||if you are using MyISAM tables, or HEAP tables use
    the following line||||:|
    |UNLOCK TABLES ||access_log, cookies, headers_in, headers_out, notes;|
    ||

|To keep these tables populated, you should have your server run these 
scripts on a nightly basis: 
(http://dev.mysql.com/doc/mysql/en/insert-select.html)
|

    |LOCK TABLES access_log, cookies, headers_in, headers_out, notes;||
    -- If you are using MyISAM tables, or HEAP tables only|
    |BEGIN; -- for BDB or InnoDB tables only...|
    |INSERT INTO access_log_archive SELECT * FROM access_log;|
    |||INSERT INTO cookies_archive SELECT * FROM cookies;|
    |||INSERT INTO headers_in_archive SELECT * FROM headers_in;|
    |||INSERT INTO headers_out_archive SELECT * FROM headers_out;|
    |||INSERT INTO notes_archive SELECT * FROM notes;|
    ||DELETE FROM access_log; -- This is why transactions, or just table
    locking is important!  We don't want to delete the hits which were
    logged between copying them off into our archive tables and now!
    DELETE FROM headers_in;
    DELETE FROM headers_out;
    DELETE FROM cookies;
    DELETE FROM notes;

    ||||||COMMIT; -- for BDB or InnoDB tables only
    |UNLOCK TABLES ||access_log, cookies, headers_in, headers_out,
    notes;|| -- If you are using MyISAM tables, or HEAP tables only|
    ||||||
    ||

Indexes should be placed in the ARCHIVE tables only, on every column by 
which the results will be SORTED or in the WHERE clause.  The bare 
minimum indexes (which satisfy the most common queries one could have 
about their web-hits) which I use are:

    CREATE INDEX time_stamp ON access_log_archive (time_stamp);
    CREATE INDEX virtual_host ON access_log_archive (virtual_host);
    CREATE PRIMARY INDEX p_id ON access_log_archive (id); -- This index
    is only neccessary if you ever wish to manipulate an individual row
    (unlikely), or you are recording cookie and header data in the other
    tables, or if you are using mysql to create unique IDs (see below),
    otherwise it can be skipped
    -- *IF* you may want to do a lot of REFER[R]ER analysis, consider
    adding the following index:
        CREATE INDEX referrer ON access_log_archive (referer);
    -- IF you are curious WHICH files are being accessed the most per
    site, consider adding the following index:
        CREATE INDEX files_accessed ON access_log_archive (virtual_host,
    request_uri);
    -- IF you are curious to track WHICH IPs are visiting each site the
    most:
        CREATE INDEX biggest_fans ON access_log_archive (virtual_host,
    remote_host);

-- if you are using the other tables, you definately want to create an 
index on every column which references the ID column of access_log:

    CREATE INDEX id ON header_in_archive (id); -- Note: this index
    should NOT be primary!
    CREATE INDEX id ON header_out_archive (id); -- Note: this index
    should NOT be primary!
    CREATE INDEX id ON cookies_archive (id); -- Note: this index should
    NOT be primary!
    CREATE INDEX id ON notes_archive (id); -- Note: this index should
    NOT be primary!

These indexes allow for sub-millisecond response times on queries such as:

    SELECT * FROM access_logs_archive
    WHERE time_stamp BETWEEN |UNIX_TIMESTAMP('2005-05-01') AND
    UNIX_TIMESTAMP('2005-06-01')|
    |AND (virtual_host = "www.myfavoritedomain.com" OR
    "myfavoritedomain.com"); -- you could use virtual_host LIKE
    "%myfavoritedomain.com" instead, but this wouldn't take full
    advantage of the index on that column.|


Or my personal favorite query:

    SELECT YEAR(FROM_UNIXTIME(time_stamp)),
    MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
    virtual_host,
         SUM(bytes_sent) AS bytes FROM access_log
    GROUP BY YEAR(FROM_UNIXTIME(time_stamp)),
    MONTH(FROM_UNIXTIME(time_stamp)), DAY(FROM_UNIXTIME(time_stamp)),
    virtual_host
    ORDER BY YEAR(FROM_UNIXTIME(time_stamp)),
    MONTH(FROM_UNIXTIME(time_stamp)),  DAY(FROM_UNIXTIME(time_stamp)),
    virtual_host
    -- Alternatively: ORDER BY BYTES desc

my Database Tables-Type is innodb.

Is there a better Way!?

Greets Marcel Hartmann




More information about the Mod_log_sql mailing list