Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

Tasks you need to do to perform data warehouse performance testing There are essentially three steps to carry out

performance testing on the ETL processes: con-figure the test environment run the bulk load processes, and run the incremental processes. Let's go through these three steps one by one: 1. First you need to configure the data arehouse test servers as per the production con-figuration. !ou create the data stores and set up the ETL processes" that is, you migrate the ##$# packages from development to test server. Then you need to connect the test server to the test version of the source system %sometimes kno n as &'( and set up all necessary connectivity and access. !ou load the test data into the &' environment. !ou run the hole ETL processes end to end using a small load to verify that the test environment has been configured properly 2. !ou run the initial load ETL processes %be it T-#&L scripts or ##$# packages( to populate the arehouse and measure their performance, that is, hen each task in the $nitial)bulk load EFL processes started and hen each one completed. *ne ay to measure the performance is by utili+ing the ETL log and the timestamp column of the loaded table. Let me e,plain hat $ mean by the ETL log and the timestamp column. The ETL log is a table or file here the ETL system rites to. $t is sometimes kno n as the audit table or file. $n each ETL task or process, after each ma-or step the ETL routine rites a ro %$f it is a table( or a line %if it is a file(. $n this ETL log, containing the essential statistics of the task, such as the data volume %number of records affected by the operation(, the task name, and the current time. The timestamp column contains the time the ro as loaded or modified. .oth the ETL log and the timestamp columns are useful to measure the performance of the incremental ETL and bulk load because you can determine the duration of each ETL task. They are also useful for troubleshooting the root cause or finding the bottle-neck of the performance problem as ell as for doing fine-tuning in the data are-house performance. !ou can do this by finding out the long-running)longperforming tasks, and then you try to make these tasks more efficient. .oth take a little bit of time to implement %probably /0 minutes to an hour for each task( because you need to pro-gram each Ell task to rite into the log, but they are

e,tremely useful because they help identify slo -running tasks %$ have seen a 10 percent increase in performance(, so they are orth the effort. 'nother ay to measure the performance is using #&L #erver 2rofiler. !ou set up #&L #erver 2rofiler to log the time each #&L statement is e,ecuted. !ou can also utili+e ##$# logging. ##$# logs runtime events as they are e,ecuted in ##$#. For this you need to enable logging on each task in all ##$# packages in the bulk load processes. 3. Then you run the ETL batches3 hether $t is daily, hourly, or eekly 3in a particular se4uence according to the prepared test scenario and measure their performance %speed(. $n incremental ETL, you can also use the same methods)instruments as the bulk load to measure the performance: ETL log, timestamp, #&L #erver 2rofiler, and ##$# logging. For real-time ETL, you need to measure the time lag as ell" that is, ho many minutes after the transaction as made in the source system did the data arrive in the data arehouse5 The previous steps are about performance testing for ETL. For applications, to measure performance you can script the actions, or you can measure them manually. #cripting the actions means you use soft are to record the actions that a user does hen using the applications. !ou then play the recording back to simulate many people using the system simultaneously hile recording the system performance. 6easuring them manually means you use the system interactively. $n application performance testing, you use the applications, hether they're reports, mining models cubes, or .$ applications, using both normal scenarios and e,ception scenarios. 7ormal scenarios are steps that the users perform in regular daily activities. E,ception scenarios are steps the users perform that result in errors.

Performance should be separately tested for: 8 Database: re4uires the definition of a reference orkload in terms of number of 9oncurrent users, types of 4ueries, and data volume 8 Front-end: re4uires the definition of a reference orkload in terms of number of 9oncurrent users, types of 4ueries, and data volume 8 T! procedures: re4uires the definition of a reference data volume for the operational :ata to be loaded

; "tress test: simulates an e,traordinary orkload due to a significantly Larger amount of data and 4ueries. <ariables that should be considered to stress the system are: 8 :atabase: number of concurrent users, types of 4ueries and data volume 8 Front-end: number of concurrent users, types of 4ueries, and data volume 8 ETL procedures: data volume The e,pected 4uality level can be e,pressed in terms of response time.

You might also like