Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Checklist for Sizing Your Oracle or SQL Server Data Mart

by

Charles Bartlett
Compaq Engineering
Manager - ActiveAnswers and Tools

One of the essential tasks of building the data warehouse/data mart infrastructure is that
of sizing and configuring the environment. Every infrastructure for data warehousing
and data marts requires sizing and configuration.
Compaq has invested extensively in engineering and testing Oracle & Microsoft SQL
Server sizers for data marts and data warehouses. The engineering thought process that
Compaq has used in building our sizing tools, which are available free on the Internet, are
detailed in the following article.
GENERAL REQUIREMENTS FOR A DATA MART IMPLEMENTATION
When evaluating architectures to support a data mart implementation you must consider
several factors that dictate the size and power of the platform required for the data mart.
Those factors that go into the decision are as follows:
1.

2.

The number, speed, and configuration of the disk storage required to satisfy the
following:
- Current disk space requirement to initially load the data mart from a data
warehouse. Data redundancy must be considered when calculating current
disk storage requirements. When building fact tables, dimension and factual
values will be duplicated in many fact tables. It is better to plan for this type
of data redundancy rather than trying to suppress it.
- How much growth will the data mart have in one year? The data in data marts
is representative of a time period representative of the organization. Daily,
Weekly, or Monthly loads could increase the size of the data mart
exponentially.
- How much space will be required for indexes? Fact tables with more than 3
or 4 dimensions could have 2 to 5 indexes for quick access. In many cases,
index space requirements may be as large as 1 to 3 times the space required
for the data.
- How much space is required for transaction log and archive log storage.
Transaction logs and Archive logs are the main ingredient when restoring a
damaged database. Consideration needs to be made on how large to size these
logs before archival to tape.
- How much disk space is required for temporary storage used for large sorts or
joins?
- How much disk space is required in case of hardware failure. For many data
marts, the data that was used to build them resides in a data warehouse so they
can be rebuilt easily. For large data marts the time it takes to rebuild from the
data warehouse might not be acceptable. Disk space for mirroring or a RAID
system might double the amount of space the data mart needs.
The amount of memory required to efficiently run the following:
- The Oracle or Microsoft SQL Server database management system,
- The Operating system for the hardware platform,
- The load processing required to load the data mart,

Any Extract Transformation and Load (ETL) tools that may be required to
perform the load,
- The number of simultaneous users accessing the data mart, and
- Any access tool products that run on the data mart platform like an application
server or web server.
3. Backup and recovery strategies for the data mart, transaction and archive logs.
4. Load requirements (parallel or sequential, snapshot updates or append). Some data
marts only require the use of a database load utility for loading the data mart. Many
data marts require the use of an ETL product to perform the summarization and/or
aggregations prior to loading the data mart. All ETL tools are not required to reside
on the same platform as the data mart but network issues may dictate that they do.
5. Many access tools require the existence of an application server or web server.
Although most access tools do not require the application or web server to reside on
the data mart platform, in some cases it does. If these application or web servers are
going to reside on the data mart platform, their resource requirements need to be
considered.

Specific Check list for sizing your Oracle or Microsoft SQL Server Data
Marts
Disk Storage and Memory Requirements
Several factors are considered when calculating the number and size of disk drives
required to support the data mart environment. Each factor should be addressed
individually when calculating disk space requirements. The factors that go into
calculating the number, size and configuration of the required disk space are as follows:
Data Staging Space - For many medium to very large data marts, temporary
disk storage is required to storing data for loading into the data mart. They
also provide the capability to eliminate redundancy for this disk space when
using a RAID system. When using ETL tools, space for the tool and the load
files need to be calculated into the system sizing.
Size of Fact and Dimensional Tables - The size in gigabytes for fact and
dimensional tables should also be included in the calculating the system
sizing. The disk space allocated for fact and dimensional tables should have
room for future growth.
Size of Indexes - Space for indexes should be calculated separately from the
data space requirements
Temporary Tables Space This space calculation needs to assure that there is
enough sort space during data loads and large query extracts and should be in excess
of the source data allocations.

Transaction Log Space This transaction log space needs to be calculated as


a percentage of the size of the source data. Data marts should also be
configured so that transaction logs reside on separate disk than those used by
database data. SQL Server Data Marts should be configured so that
transaction logs and TempDB reside on separate disk than those used by
database data.
Storage Availability - Data mart storage availability options should be
chosen based on the requirements of the your organization. For those
organizations that consider their data mart to be business critical or require
7x24 availability, a Fault Tolerant option that uses RAID 1/5 for data
redundancy limiting outages caused by faulty storage devices may be
preferred.
Network Interface Card Options - Communication to and from the data
mart platform is very important. Selection of either a 10/100 Ethernet
connection or a FDDI connection will depend on the size of the loads and the
number of users accessing the data mart.
Memory Requirements - All of the factors described above should be used
when calculating memory requirements.
Data Back-up and Restore Time - Restoration time for the data should be
taken into account when sizing Data Marts and Data Warehouses.
User Access to the Data Mart
The number of users and the type of queries will impact the hardware
platform requirements, it is critical to make sure the data mart users get the
query response time they expect.
Number of concurrent users of the data mart will play a big part in deciding
the appropriate Server(s). Factors to consider in order to gain a better
understanding of user access requirements include:
- Number of concurrent users to the data mart,
- Average time in seconds for each query,
- How much think time in seconds between each query,
- Estimated queries per hour system, and
- Estimated queries per hour user.

Conclusion
Correctly sizing a Data Mart or Data Warehouse is a complicated process that requires
many interdependent factors to be considered and properly weighted. The process can be
simplified and the risk reduced by applying knowledge of Data mart and Data Warehouse
system performance gained through extensive Laboratory testing and real world customer
experience.
Compaq provides this type of knowledge and experience to customers and partners at no
charge through our Internet based ActiveAnswers sizing tools available at:
www.Compaq.com/ActiveAnswers
Simply click on the above address, spend a minute registering for this free service, and go
to the CRM / Business Intelligence subject area to use the Microsoft SQL Server and
Oracle Data Mart sizing tools.

You might also like