Nearline and archiving in the data warehouse: what's the difference?

Author:Ritchie, Arthur
Position:Storage Expo 2008

In recent years, data warehouses have begun to increase radically in size. To maintain acceptable performance in the face of this "data explosion", several techniques have been introduced. These include pre-building aggregates and Key Performance Indicators (KPI's) from large amounts of detailed transaction data, and indexing as many columns as possible in order to speed up query processing. As data warehouses continue to grow, however, the time required to do all the necessary preprocessing of data increases to the point where these tasks can no longer be performed in the available "batch windows" when the warehouse is not being accessed by users. So, trade-offs need to be made. Doing less preprocessing work reduces the required time, but also means that queries that depend on aggregates, KPIs or additional indexes may take an inordinately long time to run, and may also severely degrade performance for other users as the system attempts to do the processing "on the fly". This impasse leads to two possible choices: either stop providing the analytic functionality--making the system less valuable, and users more frustrated,--or "put the database on a diet" by moving some of the data it contains to another location.

Putting the Database "on a Diet"

Both Nearline and Archiving solutions can help trim down an over-expanded database: the database can be made much smaller by implementing an Information Lifecycle Management (ILM) approach, removing unused or infrequently used detailed transactional data from the online database and storing it elsewhere. When the database is smaller, it will perform better and be capable of supporting a wider variety of user needs. Aggregates and KPI's will be built from a much smaller amount of detailed transaction data. Additionally, column indexing will be more practicable as there will be fewer rows per column to be indexed.

The Key Differences between Archiving and Nearlining in a Data Warehouse

It is important to stress the differences between archiving warehouse data (using products from Open Text, Princeton Softech and so on) and storing it nearline (using SAND/DNA). Since both types of product are used to hold data that has been moved out of the main "online" system, it is unclear to some why one would need to be implemented if the other is in place. To clarify this question and make it easier to discuss why one or the other type of system (or both) might be required in a given situation, the major...

To continue reading