Archiving data – great article

Some implementation notes:

in gentoo you must enable the storage engines in /etc/make.conf

USE="unicode big-tables extraengine

In debian

This article is so valuable to me so I will copy/paste the last part of it here…

Data Archiving Techniques

Data archiving, a subset of Information Lifecycle Management (ILM), typically involves horizontally partitioning data so that a set of current information resides in one or more objects, while the historical/archived data is placed in separate objects. Oftentimes, historical data can be transferred to lower-cost storage devices (like EMC Centera) that allow easy disk access, but provide significant financial savings. Of course, the idea behind data archiving is to lessen response times by reducing the sheer amount of data needed for current access. However, for analytic purposes, there are times when both the current and archive data needs to be accessed together.

For data archiving environments, Archive tables can easily be accessed in standalone fashion or be set up in union views, which is the primary vehicle used to reference both current and archived data in one object. By combining MySQL’s Archive storage engine with the new 5.0 Federated table support, a DBA can smartly transfer historical data to cheapter storage devices and then reference both current and archived data when needed.

For example, let’s say you have a set of recent information regarding customer transactions in a transactional InnoDB table called CLIENT_TRANSACTION, and a set of historical transactions stored in an Archive table called CLIENT_TRANSACTION_HIST. To lessen load on the active server, you can

  1. Transfer the history table to another MySQL server that utilizes cheaper storage devices.
  2. Set it up to use the Archive storage engine.
  3. Create a federated table link to the historical data from the current database.
  4. Build a union view to reference both together when needed.

On the historical MySQL Server:

On the MySQL Server that contains active data:

Note: rather than use a SELECT * in the view definition above, the actual columns should be listed out; the above was only done to save space for this article.

The targets for Archive tables in data warehousing or data archiving environments are fact and summary tables as they usually contain the bulk of the information. Dimension tables are rarely large enough to warrant a switch to Archive.

Miscellaneous Archive Engine Notes

We talked earlier about the Archive engine being a good choice for storing data auditing results as only INSERT and SELECT are supported. Let’s check to be sure:

We also talked about how Archive doesn’t support indexes at this time:

One special thing to note about Archive tables and indexes is that, if you want to alter another storage engine table to be an Archive table, you must first drop any indexes that have been defined on the table.

Archive tables are supported by the new sql_mode data integrity enhancements in MySQL 5.0, so you can ensure only proper data finds its way into your Archive tables:


DBAs facing the problem of corporate data explosion have an excellent new tool to help them in the MySQL 5.0 Archive storage engine. Whether it’s a data warehousing, data archiving, or data auditing situation, MySQL Archive tables can be just what the doctor ordered when it comes to maintaining large amounts of standard or sensitive information, while keeping storage costs at a bare-bones minimum.

Updated: 2005-09-16