Professional Documents
Culture Documents
Practical Tips To Improve Data Load Performance and Efficiency
Practical Tips To Improve Data Load Performance and Efficiency
Joe Darlak
Comerit
2
In This Session ... Continued
• Find out how to enable version history to track code changes and
how to create reusable ETL logic to improve throughput and
reduce data load time.
• Get tips on when and how to use customer exits in DataSources
and variables to manage risk and reduce maintenance costs.
• Identify the challenges and benefits of semantic partitioning and
the importance of efficient data models.
• Take home a checklist to ensure your data models are optimally
designed.
3
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
4
How to Leverage BW ETL Architecture 1
• Implement a Layered Scalable Architecture (LSA):
Create multiple data warehouse layers (a DSO layer)
5
How to Leverage BW ETL Architecture 2
• Illustration: Sample Dataflow Diagram
6
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
7
Data Modeling 1: Overview
• Data modeling is still important!!!
BWA does not give license to design poorly
• Manage granularity
Do not add free text fields to cubes
• Think ahead
Semantic partitioning
8
Data Modeling 2: Defining Dimensions
• Use as many dimensions as possible
Separate common filter characteristics into own dimension
9
Data Modeling 3: Semantic Partitioning
• What is it?
An architectural design to enable parallel data loading and
query execution
Partitioning criteria: Year, Region or Actual/Plan
10
Data Modeling 4: Semantic Partitioning
• Benefits of Semantic Partitioning:
Reduction in BWA footprint (when partitioned by year)
Easier DB maintenance
11
Data Modeling 5: Semantic Partitioning
• Example: Semantic partitioning by year
History MultiProvider
(Summarized)
Current Year – 3 Current Year - 2 Current Year - 1 Current Year Current Year + 1
ALL years Current Year – 3 Current Year - 2 Current Year - 1 Current Year Current Year + 1
Write-Optimized (No SIDs)
12
Data Modeling 6: Data Retention Policy
• Develop and implement a data retention strategy to effectively
manager data as it ages
• Use a combination of approaches:
Aggregated history cubes
Near-line storage
Traditional archiving
Data deletion
13
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
14
Extraction 1 Overview
• Focus on R/3 extraction
• SAP delivers over 1,000 pre-developed DataSource
Still doesn‘t cover all SAP extraction requirements
15
Extraction 2: To Enhance Or Not To Enhance?
• Enhance business content (create user exit) if:
DataSource is delta-enabled
16
Extraction 3: Coding Tips – Dynamic Calls
• Code the extractor user exits so that they call a dynamic
program per DataSource
Isolate the code per DataSource in a self-contained
program
Minimize risk that a syntax error in code for one
DataSource impacts exctraction from all other DataSources
• Example
Program name = ‗YBW‘ + <DataSource name>
17
Extraction 4 : User Exit: Program Calls
• Illustration: Sample dynamic program call
18
Extraction 5: Coding Tips – Field Symbols
• Performance consideration: where possible, use field symbols to
populate fields in the data package
The move costs of a LOOP ... INTO statement depend on the
size of a table line. The larger the line size, the longer the move
will take
By applying a LOOP ... ASSIGNING statement you can attach a
field-symbol to the table lines and operate directly on the line
contents
This a much faster way to access the internal table lines
without moving their contents
19
Extraction 6: User Exit: Field Symbols
• Illustration: Sample use of field symbols
User Exit (without field-symbols) User Exit (with field-symbols)
REPORT YBWZDS_AGR_USER. REPORT YBWZDS_AGR_USER.
********************************************************************* *********************************************************************
* Form called dynamically must start with DOYBW + <DataSource> * * Form called dynamically must start with DOZBW + <DataSource> *
********************************************************************* *********************************************************************
FORM DOYBWZDS_AGR_USER FORM DOYBWZDS_AGR_USER
TABLES C_T_DATA STRUCTURE ZOXBWD0001. TABLES C_T_DATA STRUCTURE ZOXBWD0001.
ENDFORM.
20
Extraction 7: Generic DataSources
• Improve extract performance by creating delta-enabled generic
DataSources
• Simple:
By date
By timestamp
• Complex:
Pointers – ABAP techniques can be used to record an array of
pointers to identify new and changed records
21
Extraction 8: Generic DataSources
• Illustration: Delta enabling a generic DataSource
22
Extraction 9: Architecture Tip
• Need to update an ODS or master data from multiple
sources?
Rather than enhancing business content, consider using
multiple DataSources to load a single BW Object—as long
as ODS, MD key is available to both DataSources
Decrease regression testing
23
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
24
Transformation 1: Overview
• Common needs for transforming data:
Aggregation
Conversion
Validation
Filtering/deletion
Lookups/merging
25
Transformation 2: Use 3.x or 7.x Technology?
• Architecture decision:
Transfer Rules and Update Rules (3.x)?
Or Transformations (7.x)?
26
Transformation 3: Transfer Rules
• Architecture:
Only one InfoSource per DataSource
27
Transformation 4: Master Data Transfer Rules
• If an InfoObject requires a common transformation across the
warehouse, code it in the InfoObject definition
• The transfer routine will now be available in all transfer rules
where the InfoObject is used
You need to re-activate pre-existing transfer rules for a newly
added InfoObject routine to be recognized
• Allows for global conversion and/or validation of master data
28
Transformation 5: Master Data Transfer Rules
• Illustration: InfoObject Definition with Transfer Routine
29
Transformation 6: Architecture Tips
• Consider designing Level 1 ODS Objects to contain all possible
fields from source (if not LIS DataSource)
Minimize maintenance and downtime later to add fields and
populate in live environment
ODS Level 1 objects can then become the source for lookups
from other updates, thereby reducing redundant reads of
source tables in R/3
• Master data to multiple targets? Use flexible update rules
Default communication structures for InfoObjects are the
attribute tables—here you can define custom ones and use
update rules from them to multiple data targets
30
Transformation 7: Lookups
• Do not use single selects for lookups!
• For better performance:
Use start routines to read lookup data to an internal table
31
Transformation 8: Program Includes
• Use includes for all complex routine logic
• Access logic by using ―perform‖ statements
• Increase portability of transformation logic
Use same read statements for multiple lookups
32
Transformation 9: Program Includes
• Illustration – Select into internal table
Start routine Update include
FORM startup ************************************************************************
TABLES MONITOR STRUCTURE RSMONITOR "user defined monitoring * INITIALIZATION (ONE-TIME PER DATA PACKET) ****************************
MONITOR_RECNO STRUCTURE RSMONITORS " monitoring with record n * TO READ FROM DATABASE (ALL RECORDS FOR DATA PACKAGE) *****************
DATA_PACKAGE STRUCTURE DATA_PACKAGE ************************************************************************
USING RECORD_ALL LIKE SY-TABIX * FORM READ_USR02_TO_MEMORY_FOR_0BWTC_C02
SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS *---------------------------------------------------------------------- *
CHANGING ABORT LIKE SY-SUBRC. "set ABORT <> 0 to cancel update Form READ_USR02_TO_MEMORY_FOR_0BWTC_C02
* TABLES MONITOR STRUCTURE RSMONITOR "user defined monitoring
*$*$ begin of routine - insert your code only below this line *-* DATA_PACKAGE STRUCTURE /BIC/CS80BWTC_C02
* fill the internal tables "MONITOR" and/or "MONITOR_RECNO", USING RECORD_ALL LIKE SY-TABIX
* to make monitor entries SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS
CHANGING ABORT LIKE SY-SUBRC. "ABORT<>0 cancels update
perform READ_USR02_TO_MEMORY_FOR_0BWTC_C02
TABLES MONITOR * REFRESH ALL INTERNAL TABLES.
DATA_PACKAGE REFRESH: GT_USR02.
USING RECORD_ALL * READ USR02 user data to memory
SOURCE_SYSTEM select * into corresponding fields of table GT_USR02
CHANGING ABORT. from USR02
FOR ALL ENTRIES IN DATA_PACKAGE
* if abort is not equal zero, the update process will be canceled where BNAME = DATA_PACKAGE-TCTUSERNM
* ABORT = 0. order by primary key.
* if abort is not equal zero, the update process will be canceled
*$*$ end of routine - insert your code only before this line *-* ABORT = 0.
ENDFORM. "READ_USR02_TO_MEMORY_FOR_0BWTC_C02
33
Transformation 10: Program Includes
• Illustration – Include perform statements
34
Transformation 11: Update Rules - Results Tables
• Need to ―create‖ data based on business logic
• Beware of hard-coding based on fields like document types
New doc types can require enhancements/corrections to hard-
coded logic
Such dependencies need to be communicated to business and
changes to logic need to become part of business process for
creating doc types
35
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
36
Load 1: Process Chain Strategy
• Split loads by frequency and criticality
Separate daily loads from weekly, monthly, annual and ad-hoc
loads
Within each frequency group, identify the critical path, and
remove non-essential loads
• Design chains based on Dataflow dependencies
Remember the dataflow diagram?
37
Load 2: Process Chain Tips
• Process chains require explicit scheduling of all load events
previously handled by the InfoPackage
Use ―Only to PSA‖ and ―Subsequent Update‖ to reduce number
of dialog processes spawned during loads
• If possible, schedule loads when users are off system
Can then delete indexes prior to loads and re-create after
38
Load 3: Use Decision Variants
• Decision variants allow flexibility in chain logic
• For example, if you need to load a cube only on a specific day of
the month, or month of the year:
39
Load 4: Performance Tips
• Reduce data packet transfer size if there is extensive use of
lookups in transfer/update rules
• Use multiple loads with non-overlapping selection conditions v.
single loads
Some R/3 DataSources are not delta capable nor ODS
compatible—so they only support full loads
Separate InfoPackages for actual and plan data by current and
future years reduces full load size
Set number of background processes accordingly
• Turn off consistency check for proven loads from proven sources
40
Load 5: Error Handling
• If source data is frequently problematic, use error handling
Strips out error records into separate PSA or DTP to be
processed later without impacting current load
Completes processes of correct records
41
Load 6: Partitioning
• Define partitioning strategy before you go-live
Cubes must be empty before they are partitioned by transport
42
Load 7: Compression
• Compression should be scheduled regularly
• SAP recommendations for number of partitions:
30-50 partitions (requests) per F-fact table
43
Load 8: Data Load Scheduling Strategy
• Will the loads be scheduled by an external software?
Does the R/3 batch process use a external tool such as
AutoSys?
Consistent approach to batch scheduling could reduce overall
support and maintenance costs
• Will BW load success be monitored in BW or via the external tool?
If using an external tool, need to develop a mechanism to report
success/failure back to the tool
If using BW, consider adding text message notification steps to
process chains upon success/failure
44
Load 9: Data Load Scheduling Strategy
• Illustration: External scheduling process
BW
Program ZBW_PC_LOAD Process Chain
45
What We‘ll Cover …
• How to leverage the BW architecture
• Data Modeling
• ETL – Extraction
• ETL – Transformation
• ETL – Load (Process Chains)
• Wrap-up
46
7 Key Points to Take Home
• Intelligently managing data model granularity is critical to
performance—even with BW Accelerator!
• Implement Semantic Partitioning on every data model
• Define a data retention strategy early on to lower TCO
• Use dynamic programming for customer exits to simplify
maintenance and reduce risk of production impact
• Use field symbols in the start routine to transform data to achieve
optimal performance
• Use program includes to enable portability and version history for
your complex transformations
• Define process chains based on frequency and the critical path
Use decision variants to improve flexibility
47
Resources
• Jens Doerpmund, ―Introducing the Layered, Scalable Architecture (LSA)
Approach to Data Warehouse Design for Improved Reporting and Analytic
Performance‖ (BI and Portals 2009)
• Jens Doerpmund, ―Beyond the Basics of SAP NetWeaver Business Intelligence
Accelerator‖ (BI and Portals 2009)
• Ron Silberstein, ―Data Modeling, Management, and Architectural Techniques
for High Data Volumes with SAP Netweaver Business Intelligence‖ (BI and
Portals 2008)
• Joe Darlak, ―Maximize the Capabilities, Efficiency and Performance of ETL
Logic in BW‖ (ASUG Forums, October 2004)
• Ralph Kimball, The Data Warehouse Toolkit, (Wiley Publishing 2002)
• Rajiv Kalra, ―Conditional Execution‖ (BI Expert, March 2008)
• John Kurgen, ―Use a New Process Type to Create Dynamic Process Chains‖ (BI
Expert, January 2008)
48
Your Turn!
50