Professional Documents
Culture Documents
DW 2.0 Workshop - Bill Inmon - Archival Sector
DW 2.0 Workshop - Bill Inmon - Archival Sector
DW 2.0 Workshop - Bill Inmon - Archival Sector
Dormant data Performance is greatly hurt by keeping a lot of data on disk storage that is dormant
Removing dormant data from the data warehouse is the single most important thing the designer can do to improve performance
Data that has a very high Probability of access Data that has a low probability of access Data that needs to be kept regardless of probability of access
Loosely coupled
Tightly coupled
Near line storage can be tightly coupled or loosely coupled with disk storage
query
query
result set
result set
When the near line storage environment and the disk storage environment are not tightly coupled, they must be queried separately
When the disk storage and the near line environment are not tightly coupled, the data base design can be independent and the data can be managed separately
query
When the disk storage environment and the near line storage environment are tightly coupled, a single query can access both sets of data without knowing where the data is
C Copyright Inmon Consulting Services, 2008
When the disk storage and the near line storage environments are managed in a tightly coupled manner, the data base design must be identical and the units of storage must be managed together
C Copyright Inmon Consulting Services, 2008
archival assumptions over time - data will degrade - metadata not stored directly with data will be lost or otherwise corrupted - related data (key/foreign key) will be lost and or corrupted
copy over occasionally it is a good practice to copy over archived data to ensure the longevity and integrity of the data
spare machine
while archival data is sitting around waiting to be used, create passive indexes in anticipation of future needs
C Copyright Inmon Consulting Services, 2008
one issue relating to passive indexes is that they can grow to be larger than the archived data
The software that manages the movement of data to and from the disk storage to the archival/near line storage
cross media storage manager
The cmsm determines when data is ready to be placed in archival/near line storage - by age - by probability of access - by usage patterns
C Copyright Inmon Consulting Services, 2008
data can flow from the archival environment to the disk environment if needed
cross media storage manager
Archival and near line data can also flow into the data mining/ exploration warehouse environment as well
C Copyright Inmon Consulting Services, 2008
Granularity of data In both the near line and the archival environment, data Needs to be kept at the granular level. It is optional and Sometimes useful to keep the data at the summary level.
Near line storage needs to be kept current/compatible with the operating system/dbms
One of the most important uses of archival and near line data is that of passive security
In passive security we look at the records of events to determine what the extent of the damage is or how to find out next time how to prevent a disaster
Dont just let archival data sit there and wait for activity. take an idle processor and constantly build indexes waiting for future unknown needs. Then when it comes time for using the archival environment it will be fast and easy to access