Business Warehouse - Data Modeling PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 110

4-1

4-2
4-3
DSO for direct update: only the active data table exists

4-4
Changes within Netweaver 7.x

Instead of a key field request GUID there is a key field request SID (domain RSSID) used within the activation
queue
 Because of the structure change there is no expensive join to the request SID table necessary

Package fetch used instead of a single dataset fetch during the activation process (in addition to that there is
only one loop over the activation loop implemented now; for restarting there is no further loop necessary
because the packages are stored temporarily in cluster tables)

The rollback of data packages is implemented in a different way: instead of rolling back in serial and in one
transaction the rollback now is in parallel and for each data package there is a single task.

4-5
4-6
Balance between package size and parallelism, runtime and memory consumption

 Check whether Flag for unique records can be set

 Check usage of ‚attribute only„ InfoObjects

Creating SIDs

 The creation of SIDs is time-consuming and may be avoided in the following cases:
 You should not set the indicator if you are using the DSO object as a pure data store. Otherwise, SIDs
are
created for all new characteristic values.
 If you are using line items (for example, document number, time stamp and so on)
as characteristics in the DSO object, you should mark these as 'Attribute only' in the characteristics
maintenance.
 SIDs are created at the same time if parallel activation is activated.
They are then created using the same number of parallel processes
as those set for the activation.

4-7
4-8
Be aware: Write-Optimized DSO Objects are in contrast to PSA tables not partitioned on database level.
Please see note 742243 for details how to partition a table using Netweaver basis technology

4-9
If you mark the „attribute only‟ flag, after you had activated a DSO object in which you use that specific
InfoObject, you have to reactivate the DSO once more, otherwise the SIDs to that InfoObject are determined.

Internal measurements has shown an decrease of ~40% of the activation runtime if the SID generation flag is
set

If the DSO object is used to store historical data without need to access it online, the SID generation flag
should be disabled.

Reporting on Data Store objects is always possible – even if SID values aren‟t created during activation.
In this case they will be created during query runtime for requested records (slower query performance)

4 - 10
If you load unique data records (that is, data records which do not lead to updates)
into a standard DSO object, the load performance will improve
if you set the 'Unique data record' indicator in the DSO object maintenance.

If in BW 7.0 the flag is selected the flag “SIDs Generation upon Activation” will automatically be selected
and cannot be changed by the user. If SID values aren‟t desired DataStore object type “write-optimized” has
to be used.
This behavior was changed in 7.01.

Please read note 1144998 for issues related to the flag

4 - 11
This indicator is only relevant for write-optimized DataStore objects. The technical key for these objects in the
active table always consists of the fields Request, Data Package and Data Record. The InfoObjects that
appear in the maintenance dialog in the folder Key Fields form the semantic key of the write-optimized
DataStore object.

If this indicator is not set, the uniqueness is checked and a unique index with the technical name "KEY" is
generated for the InfoObjects in the semantic key.

If this indicator is set, the active table of the DataStore object could contain several records with the same key.

The primary key 0 does not exist on database.

4 - 12
This indicator is only relevant for write-optimized DataStore objects and is new since BW 7.01

4 - 13
4 - 14
4 - 15
4 - 16
4 - 17
4 - 18
18
4 - 19
4 - 20
4 - 21
The BI extended star schema is different to the basic star schema. It is subdivided into a solution-dependent
part (InfoCube) and a solution-independent part (attribute tables, text tables, and hierarchy tables) that is also
shared among the other InfoCubes.

The dimension attributes of the dimension tables are called characteristics.

The dimension attributes located in the master data table of a characteristic are called the attributes of the
characteristic.

The great challenge when designing a solution is to decide whether to store a dimension attribute in a
dimension table (and therefore in the InfoCube) or in a master data table.

Data is loaded separately into the master data tables (attribute tables) text tables and hierarchy tables.

The SID table is the link between the master data and the dimension tables.

4 - 22
Cardinality refers to the number of distinct values in the column of a table.

Example:
 Column Sex can have only 2 values – M or F this is low cardinality
 A column holding a unique document number has a high cardinality

4 - 23
4 - 24
4 - 25
4 - 26
4 - 27
The flag for high cardinality is to be set only in special cases because the index can not be used for a star join.

Such special cases are then available if the cardinality of the characteristics is in such a manner large, that the
maintenance time (dropping and rebuilding) becomes for this bitmap index too large.

4 - 28
To reduce the probability for deadlocks the RSADMIN parameter ORA_CUBEINDEX_INITRANS could help
(SAP note 750033 and 1044110)

Please see note 1013912 for a FAQ regarding High-Cardinality Flag

4 - 29
4 - 30
4 - 31
4 - 32
4 - 33
See 1287382 for details

In case you change an InfoCube, please compress it completely before applying metadata changes.
Otherwise you might loose data (see note 647512).

4 - 34
See notes 647512 and 1287382 for details

If objects have very large amounts of data and if several fields have been added at the same time,
this may lead to very long runtimes if you run your BW system based on the ORACLE database system
and your version is lower than ORACLE 11.X.

4 - 35
4 - 36
4 - 37
4 - 38
4 - 39
See note 1172175 for details regarding runtime information of a DTP.

See SDN Blog https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/6428 for details

4 - 40
4 - 41
4 - 42
4 - 43
4 - 44
4 - 45
4 - 46
4 - 47
4 - 48
4 - 49
4 - 50
4 - 51
4 - 52
4 - 53
4 - 54
See notes 1137017, 1056259, 1063768 and 1246472 for details

4 - 55
4 - 56
4 - 57
A) For a user the F4 help can occur in different situations in the reporting environment:
 A query is defined with input variables. When executing the query, a variable input screen occurs, in
which the characteristic values can be chosen via the F4 help.
 After executing a query, for the involved characteristics (and navigational attributes) you can select
filter values in the ´Navigation Area´. This so called dynamic filter value selection is done via the F4
help.
 Already when defining a query in the Query Designer, a characteristic/navigational attribute can be
restricted. This so called fix filter value selection is done via the F4 help.
 As of BW Release 3.x, to get the correct values for Navigational Attributes, they are retrieved in a
special way, i.e. the attribute‟s master data table has to be matched with the so called X table of the
characteristic, to which the attribute is defined (-> further explanation – especially concerning the
performance – in one of the following slides).

B) The way, how the F4 help will work in the different situations, can be defined by settings to be specified in:
 the InfoObject Maintenance (transaction RSD1) or
 the InfoProvider Maintenance (Transaction RSDCUBE -> Tab „Characteristics‟ -> Main Menu:
„Extras‟ – „Structure-Specific InfoObject Properties‟).

There are differences (which F4 settings are taken into account), depending whether the Web or the BEx
Analyzer is used! (see following slides)

4 - 58
In the InfoObject Maintenance (transaction RSD1) at tab strip Business Explorer, the setting for Query Def.
Filter Value Selection decides, which values will be offered in the F4 help, when restricting a characteristic in
the Query Designer during query definition. The setting Values in Master Data Table is the default.

Values in Master Data Table considers all master data values which are visible at the Master data/texts tab
strip in transaction RSD1 of this characteristic.

Only values in InfoProvider considers all key figures of the InfoProvider and not only those of one query.
These values can then be found in the relevant dimension table of the InfoProvider (in case the InfoProvider
has a dimension table!).

4 - 59
In the InfoObject Maintenance (transaction RSD1) at tab strip Business Explorer, the setting for Query
Execution Filter Val. Selectn decides which values will be offered in the F4 help, when selecting filter values
for a characteristic in the Navigation Area after a Query was executed. The setting Only Posted Values for
Navigation is the default, which means that those values are offered at the F4 of an executed query, which
are relevant for this query‟s definition (e.g. only posted characteristic values of those key figures which are
chosen within the Query Designer of this query).

Only values in InfoProvider considers all key figures of the InfoProvider and not only those of one query.
These values can then be found in the relevant dimension table of the InfoProvider (in case the InfoProvider
has a dimension table!).

Values in Master Data Table considers all master data values which are visible at the Master data/texts tab
strip in transaction RSD1 of this characteristic.

4 - 60
With the Provider-Specific Properties you can overrule the settings of the InfoObject Maintenance (RSD1)
Attention: This setting is only relevant for variable selection screens, but NOT for filter value selection
after Query execution and not for the characteristic restriction during Query definition the Query Designer.

The settings can be defined as follows:


Administrator Workbench (RSA1) – Choose „InfoProvider‟ -> Change
Choose the Characteristic – „Provider-Specific Properties„

4 - 61
Performance of the F4 help might be tuned by aggregates. By using transaction RSRT (with HTML display),
the „Execute+Debug‟ function with the „Show aggregates used‟ flag can be taken, to find out the best possible
aggregate – also for the F4 help!

4 - 62
Performance of the F4 help might be tuned by aggregates. By using transaction RSRT (with HTML display),
the „Execute+Debug‟ function with the „Show aggregates used‟ flag can be taken, to find out the best possible
aggregate – also for the F4 help.

4 - 63
For more detailed information, please check note 581079 directly.

4 - 64
4 - 65
• This slide shows the indexing scheme of the standard InfoCube fact tables.
Both fact tables do not have primary indexes. However, the E fact table has a so called "P-index" (e.g.
/BIC/E<Cubename>~P). It comprises all dimension columns of the fact table. However, it does not enforce
a unique constraint. This is the only difference with the originally existing primary index.
• On the E fact table the P-index is specifically designed to support the InfoCube compression process. In
such a situation, BW has to check the E fact table for existing dimension id combinations in order to decide
whether the currently processed record (which originates in the F fact table) is to be inserted (when ist
dimension id combination does not exist in the E fact table so far) or updated (in the other case). A missing
P-index can therefore cause a very considerable decrease in compression performance.

4 - 66
4 - 67
• additional bitmap index on F facttable
• applies also to transactional infocubes
• named: /BIC/F...~900

• This slide shows the indexing scheme of fact tables of a partitioned InfoCube. The siginificant issue here
is an additional bitmap index on the column of the F fact table that corresponds to the partitioning column
(time characteristic the InfoCube is partitioned by) of the E fact table (see red box). This index's name is
"900", i.e. "/BIC/F...~900" on the database.
• The reason behind that index is to support restrictions that are likely to exist on this column. In case of a
partitioned E fact table, the BW SQL-generator translates time restricitions into restrictions on the
partitioning column whenever possible. This allows the Oracle query optimizer to prune the query to the
relevant partitions on the E fact table. The "900" index allows the F fact table to benefit from those
additional (and redundant) restrictions too.
• There is no difference between standard and transactional InfoCubes. However, in the case of
transactional InfoCubes it is assumed that this is the only bitmap index on the F fact table. Otherwise,
transactional write accesses would result in deadlock situations. Please refer to the discussion of
deadlocking on the transactional InfoCube slide.

4 - 68
DB2/400
 No index on every SID column (additional indexes can be helpful, though)
MSS
Individual index on all SID columns
 No common index over all SID columns

Oracle
Individual index on all SID columns

A dimension table has the primary index and the following secondary indexes:
 Index over all SID columns supporting loading
 Index on each individual column but the first one supporting querying. The first SID column doesn‟t need a
single index because it is the first column in the index over all SID columns.
The order of the fields in the dimensions is determined by the order of the InfoObjects in the InfoCube which
depends on the transactional data load.

4 - 69
4 - 70
You also can change the index type in SE14 (see note 383325)

An index can improve the following operations:


 Select … where <table fields>=<values>
 Update … where <table fields>=<values>
 Delete … where <table fields>=<values>
 Table joins <table1.field1>=<table2.field2>

4 - 71
4 - 72
4 - 73
4 - 74
4 - 75
Here a typical star query execution plan for a BW query is shown. The single column bitmap indexes are used
to propagate query restrictions to the fact table, thereby minimizing the amount of relevant facts at a very early
stage of query processing. This is the reason why those operations are found in the lower part of the
execution tree.

The execution plan on this slide is a typical example. Almost every BW query should show this kind of
execution pattern. Obviously, it highly depends on using bitmap indexes. If those indexes are not employed for
whatever reason (they do not exist, there are no restrictions, Oracle's query optimizer chooses an execution
plan without bitmap indexes etc.) then you will typically experience very high run times.

In the case of transactional InfoCubes and single column B-tree indexes, Oracle is supposed to transform the
relevant B-tree indexes into bitmap indexes at run time ("B-tree to bitmap conversion"). This is an additional
computational step which consumes resources, thus star queries on B-tree indexed fact tables are less
efficient than similar queries supported by bitmap indexes.

4 - 76
4 - 77
This is a simple example of how a reporting scenario can be partitioned using both partitioning concepts.

Usage of MultiProvider (Logical Partitioning): You might want to report your sales data using one InfoCube.
That InfoCube could be built as a MultiProvider which is based on two identical basic InfoCubes. The latter
contain disjoint sets of data, for example, one from southern sales regions and another from northern regions
(as shown here on this slide). This scenario results in more efficient dense InfoCubes as opposed to
combining the southern and northern region into one sparse InfoCube.

When the reporting scenario is to be extended, use a MultiProvider as central interface between query
definition and basic InfoProviders. When another InfoProvider is added to the MultiProvider definition, the
technical name of a query based on the MultiProvider remains unchanged.

Use a MultiProvider to reduce the size of the basic InfoProviders.

Further advantages: parallel access to underlying basic InfoProviders, load balancing, resource utilization,
query tuning.

Table Partitioning: Each of the two basic InfoCubes could be partitioned on the database level. That means
that the fact tables inside the respective star schema (which physically represents an InfoCube) are
partitioned. This is indicated by those horizontal lines splitting the fact tables into various partitions/fragments.

4 - 78
4 - 79
Organizational aspects could be related to the quality of data, the importance of data or the organization itself.
Typical partitioning criteria are organizational units
source systems, regions, etc.

4 - 80
Advantages:

If query filters constant, which was set in Provider specific properties, the OLAP processor can identify in an
early stage, which InfoCubes should be accessed.

Maintenance of aggregates during rollup only for those aggregates, which are in the actual period.

Loading data in the right InfoCube can be done in the start routine checking the entries in table rsdichapro.

Disadvantage:

Inflexible, as the meta data model has to be adjusted if a new Provider hast to be added (rolling window)

4 - 81
4 - 82
4 - 83
For details about RSDDK_AGGRCOMP_COPY see SAPNote 608814.

4 - 84
4 - 85
 Partition early and often. Partition InfoCubes in the development system at the time of initial design

 Aggregates are also partitioned like the InfoCube if they contain the partitioning characteristic. Even if the
InfoCube is never compressed and the aggregates are, partitioning can improve performance.

 Partitioning must be done before loading any data

 Use the time characteristic that corresponds with most of the reporting scenarios for an InfoCube

 Don‟t worry about space in the database: partitioning does not allocate empty space
 Instead it determines a way of grouping data as it is loaded into the fact table

4 - 86
4 - 87
4 - 88
4 - 89
If you have partitioned an InfoCube by a not supported characteristic like 0CALWEEK or 0CALDAY the
repartitioning toolbox does not work.

We offer a consulting solution to be able to add partitions at least.

Regularly compress your InfoCube even if there is a BWA in place. See note 1335666 for details.

4 - 90
In addition to that there is a MDC Advisor Tool integrated in the DBACOCKPIT

If RSADMIN-Parameter DB6_MDC_FOR_PSA is set to YES, MDC is activated for following tables

PSA table

Activation Queue table

Change Log table

4 - 91
Collect SQL statements on table of interest and let the MDC Advisor suggest.
 MDC Suggestions only for fact tables and active tables of DSO
 Queries on these tables have to be collected
 MDC Advisor assumes a maximal growth of 33% of the underlying disk space for the analyzed table
 Partitioning column is also interpreted

4 - 92
We strongly recommend using partitioning if your DBMS supports it. Queries can access partitions in parallel,
load into a partitioned PSA is faster and maintenance is improved (fast deletion of partitions). Be sure to
compress regularly to use the partitioning of the E-fact table. See SAP Note 385163 (Partitioning on ORACLE
since BW 2.0) for partitioning information on ORACLE. Use report SAP_DROP_EMPTY_FPARTITIONS in
ORACLE to delete unused F fact table partitions. See SAP Notes 430486 (Overview/repair of F fact table of
an BW InfoCube) and 590370 (Too many uncompressed requests in F fact table) for more information.

(Range) Partitioning is available for ORACLE, DB2/OS390 and Informix. DB2/UDB supports hash partitioning.
Avoid too many (usually several 1000) partitions in one table, i.e. avoid too many requests in your F fact table
and too granular time characteristics for the E table partition criterion.

Check note 609164 and entries in table RSDRDREQDELTAB

4 - 93
4 - 94
Especially in IP there is a huge effect on the overall performance when using hierarchies

4 - 95
Furthermore the following SAPNotes are dealing with hierarchies and performance:
 738098 Performance problems with hierarchies: deactivation of the function that adds the "remaining node"
to a hierarchy
 654243 Hierarchie maintenance
 520684 Run time error EXPORT_BUFFER_NO_MEMORY

Further performance problems may be caused by degenerated indexes on the inclusion tables
/BI0/I<InfoObject> or /BIC/I<InfoObject>. The efficient evaluation of these tables during query or aggregate
processing is based on an indexing of these tables.
See SAPNote 323090 for further details: Performance problems due to degenerated indexes

RSDRHLRUBUFFER See SAPNote 584216

4 - 96
4 - 97
Model the calculated Key Figure in the Data Model and load the data during load time (persistent) or in a query
definition and calculate the “fact” at query run time (dynamic). The former option requires the Calculated Key
Figure as part of the Data Model and Staging Engine with the calculation accomplished by routine in Update
Rules. This makes the Fact Table larger over time. The latter option requires the definition of the Calculated Key
Figure as part of the query definition. Calculation during query runtime degrades query performance depending
upon the complexity of the Calculated Key Figure and how often the query is run. The decision point lies
balancing query performance versus Fact Table size

When currencies are converted during data update, the converted values are stored physically with the chosen
currency resulting in the faster processing of reports. However, the original value and currencies are lost. When
currencies are converted during reporting analysis can be done for various target currencies. The original
transaction currency is not lost. However, conversion has to be done repeatedly resulting in lower query
performance.

Usually it is quite obvious how to distinguish attributes and facts. But there will be some attributes that will be
confusing. Prices are a good example. From one perspective, price describes the article as the manufacturer
attribute does and therefore it seems that it should be in the master data table. In this case, model “price” as an
attribute in a master data table. Using a formula variable with „replacement path,‟ the price navigational attribute
can be used to calculate “net sales = price * number of pieces sold,” for example. This allows calculations within
queries using this formula variable.

From another perspective the price is continuous over time which means it doesn‟t make sense to calculate
discounts on the basis of sales amount and quantity in a fact record using the actual price from the master data
table with fact records that are, for example, one year old. In this case the discount has to be calculated during
load time in an update rule using a lookup for the actual price from the master data table.

A time dependent navigational attribute would be acceptable to use if price changed only occasionally. Also
consider a categorical dimension „high or low or medium‟ price, if it is to be used for analysis. If promotion
analysis is a necessity, then do not have price as a navigational attribute.

4 - 98
The expected result set of the query should be kept as small as possible. It is significantly quicker when the
query firstly shows highly aggregated results that can then be drilled down with free characteristics. Move as
few characteristics into the columns and rows as possible so that the no. of cells sent to the frontend is kept
as small as possible.

define calculated and restricted key figures on the InfoProvider (globally) instead of in the query (locally)
reusability

A Web application returns quicker query results than the BEx Analyzer. In the BEx Analyzer also the transfer
time increases much quicker as the data set increases compared with the Web application.

InfoCubes and MultiProviders are optimized for aggregated requests. A user should only report in a very
restricted way on ODS objects and InfoSets. In other words, only very specific records are to be read with little
aggregation and navigation.

4 - 99
4 - 100
Query is split into cumulative and non-cumulative „subqueries‟

Cumulative query:
1. Filtering relevant keyfigures and selection conditions (preprocessor)
2. Reading cumulative keyfigures from database and return value to the OLAP-processor
3. Can be processed in parallel within SAP server when executed on a MultiProvider

 Non-cumulative query:
1. Filtering relevant keyfigures and selection conditions (preprocessor)
2. Reading of deltas for non-cumulative keyfigures
3. Filtering relevant keyfigures and selection conditions (preprocessor)
4. Reading of reference point for non-cumulative keyfigures
If there are requests which are not compressed
Computation of reference point has to consider these requests
 actual value is not a single value (more expensive!)
5. For every entry in the validity table
Computation of values for specified time points (postprocessor) and return value to OLAP-processor
6. Can not be processed in parallel within SAP server, when executed on a MultiProvider

4 - 101
Query is split into cumulative and non-cumulative „subqueries‟

Cumulative query:
1. Filtering relevant keyfigures and selection conditions (preprocessor)
2. Reading cumulative keyfigures from database and return value to the OLAP-processor
3. Can be processed in parallel within SAP server when executed on a MultiProvider

 Non-cumulative query:
1. Filtering relevant keyfigures and selection conditions (preprocessor)
2. Reading of deltas for non-cumulative keyfigures
3. Filtering relevant keyfigures and selection conditions (preprocessor)
4. Reading of reference point for non-cumulative keyfigures
If there are requests which are not compressed
Computation of reference point has to consider these requests
 actual value is not a single value (more expensive!)
5. For every entry in the validity table
Computation of values for specified time points (postprocessor) and return value to OLAP-processor
6. Can not be processed in parallel within SAP server, when executed on a MultiProvider

4 - 102
4 - 103
4 - 104
4 - 105
4 - 106
4 - 107
4 - 108
4 - 109
4 - 110

You might also like