DW 2.0 Workshop - Bill Inmon - Archival Sector

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

ALTERNATE STORAGE a presentation by W H Inmon

C Copyright Inmon Consulting Services, 2008

data warehouses grow large rapidly

C Copyright Inmon Consulting Services, 2008

data warehouses age over time

C Copyright Inmon Consulting Services, 2008

Actively used data

Dormant data Performance is greatly hurt by keeping a lot of data on disk storage that is dormant

C Copyright Inmon Consulting Services, 2008

lots of dormant data

Lots of cholesterol an artery with cholesterol

Not much cholesterol

C Copyright Inmon Consulting Services, 2008

Lots of dormant data

Not much dormant data

Removing dormant data from the data warehouse is the single most important thing the designer can do to improve performance

C Copyright Inmon Consulting Services, 2008

It is much less expensive to place data on different forms of storage

C Copyright Inmon Consulting Services, 2008

High performance disk storage Near line storage Archival storage

There are three storage media that data can be sent to

C Copyright Inmon Consulting Services, 2008

Data that has a very high Probability of access Data that has a low probability of access Data that needs to be kept regardless of probability of access

C Copyright Inmon Consulting Services, 2008

Can be accessed in online time

Can be accessed in near online time

Cannot be accessed in online time

C Copyright Inmon Consulting Services, 2008

Loosely coupled

Tightly coupled

Near line storage can be tightly coupled or loosely coupled with disk storage

C Copyright Inmon Consulting Services, 2008

query

query

result set

result set

When the near line storage environment and the disk storage environment are not tightly coupled, they must be queried separately

of course the result sets can be merged independently if desired


C Copyright Inmon Consulting Services, 2008

When the disk storage and the near line environment are not tightly coupled, the data base design can be independent and the data can be managed separately

C Copyright Inmon Consulting Services, 2008

query

When the disk storage environment and the near line storage environment are tightly coupled, a single query can access both sets of data without knowing where the data is
C Copyright Inmon Consulting Services, 2008

When the disk storage and the near line storage environments are managed in a tightly coupled manner, the data base design must be identical and the units of storage must be managed together
C Copyright Inmon Consulting Services, 2008

Archival storage is always loosely coupled with other storage media

C Copyright Inmon Consulting Services, 2008

archival assumptions over time - data will degrade - metadata not stored directly with data will be lost or otherwise corrupted - related data (key/foreign key) will be lost and or corrupted

C Copyright Inmon Consulting Services, 2008

copy over occasionally it is a good practice to copy over archived data to ensure the longevity and integrity of the data

C Copyright Inmon Consulting Services, 2008

spare machine

while archival data is sitting around waiting to be used, create passive indexes in anticipation of future needs
C Copyright Inmon Consulting Services, 2008

one issue relating to passive indexes is that they can grow to be larger than the archived data

C Copyright Inmon Consulting Services, 2008

The software that manages the movement of data to and from the disk storage to the archival/near line storage
cross media storage manager

The cmsm determines when data is ready to be placed in archival/near line storage - by age - by probability of access - by usage patterns
C Copyright Inmon Consulting Services, 2008

data can flow from the archival environment to the disk environment if needed
cross media storage manager

C Copyright Inmon Consulting Services, 2008

Archival and near line data can also flow into the data mining/ exploration warehouse environment as well
C Copyright Inmon Consulting Services, 2008

Granularity of data In both the near line and the archival environment, data Needs to be kept at the granular level. It is optional and Sometimes useful to keep the data at the summary level.

C Copyright Inmon Consulting Services, 2008

operating system/ dbms

Near line storage needs to be kept current/compatible with the operating system/dbms

Archival data usually is not current with the operating system/dbms

C Copyright Inmon Consulting Services, 2008

One of the most important uses of archival and near line data is that of passive security

In passive security we look at the records of events to determine what the extent of the damage is or how to find out next time how to prevent a disaster

C Copyright Inmon Consulting Services, 2008

Building passive indexes in the archival environment


processor

Dont just let archival data sit there and wait for activity. take an idle processor and constantly build indexes waiting for future unknown needs. Then when it comes time for using the archival environment it will be fast and easy to access

C Copyright Inmon Consulting Services, 2008

You might also like