Datawarehouse Intrviewqstions

c

A Data warehouse is a repository of integrated information, available for queries and analysis.
Data and information are extracted from heterogeneous sources and stored in a database for easy
and more efficient way to run queries and create reports.
A data warehouse is a logical collection of information gathered from many different operational
databases used to create business intelligence that supports business analysis activities and
decision-making tasks, primarily, a record of an enterprise¶s past transactional and operational
information, stored in a database designed to favour efficient data analysis and reporting (especially
OLAP)´.
c
Data mining is a process of extracting hidden trends within a datawarehouse. For example an
insurance data warehouse can be used to mine data for the most high risk people to insure in a
certain geographial area.
c

Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts ± sources from data captured form OLTP system, external providers or
from data generated locally within a particular department or geographic area.
2.Dependent data mart ± sources directly from enterprise data warehouses.
c
OnLine Transactional Processing.
c
OnLineAnalatical Processing.
c

Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT
professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives,
analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and
aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an
organization; integrates information from many organizational locations and data stores
c

Real-time data warehousing captures business activity data as it occurs. As soon as the business
activity is complete and there is data about it, the completed activity data flows into the data
warehouse and becomes available instantly. In other words, real-time data warehousing is a
framework for deriving information from data as the data becomes available.
c

u Gathering business requirements.
u Identifying Sources
u Identifying Facts
u Defining Dimensions
u Define Attributes
u Redefine Dimensions & Attributes
u Organize Attribute Hierarchy & Define Relationship
u Assign Unique Identifiers
u Additional conventions:Cardinality/Adding ratios
c !"

Cubes are logical representation of multidimensional data.The edge of the cube contains
dimension members and the body of the cube contains data values.
c #
A: Linked cube is a cube, in which a sub-set of the data can be analysed into greater detail. The
linking ensures that the data in the cubes remain consistent.
c
$%

Both differed in the concept of building the datawarehosue.
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on
delivering business objectives for departments in the organization. And the data warehouse is a
conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from
the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the
development of the data warehouse can start with data from the online store. Other subject areas
can be added to the data warehouse as their needs arise.
i.e.,
Kimball±First DataMarts±Combined way ²Datawarehouse
Inmon²First Datawarehouse±Later²-Datamarts
c &

'

Hierarchies are logical structures that use ordered levels as a means of organizing data. A
hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy
might aggregate data from the month level to the quarter level to the year level. A hierarchy can also
be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at
lower levels aggregate into the data values at higher levels. A dimension can be composed of more
than one hierarchy. For example, in the product dimension, there might be two hierarchies±one for
product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to
enable you to drill down into your data to view different levels of granularity. This is one of the key
benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For
example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at
the next higher level is its parent, and values at the next lower level are its children. These familial
relationships enable analysts to access data quickly.
c

(!)

(!))
* Used for OLTP systems
* Highly Normalized
* Difficult to understand and navigate
* Difficult to extract and solve complex problems
&)
* Used for OLAP systems
* De-normalized
* Easy to understand and navigate
* Relatively easier in extracting the data and solving complex problems
c '*

Meta data is the data about data; Business Analyst or data modeler usually capture information
about data ± the source (where and how the data is originated), nature of data (char, varchar,
nullable, existance, valid values etc) and behavior of data (how it is modified / derived and the life
cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the Datamart level, subsets,
fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis /
DSS.

c+
' *
'
+ B-Tree and Bit Map indexes.
c+!*

. The main advantage of using bitmap index is for the columns in which the ratio of the number of
distinct values to the number of rows in the table is very low. This ratio referred as degree of
cardinality. Example: A gender column, which has only two distinct values (male and female), which
is ideal for a bitmap index.
c+

!
*
A. B-tree indexes can be used when a typical query refers to the indexed column and retrieves a few
rows. In these queries, it is faster to find the rows by looking at the index.
What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down
and drill up. it contains only the textual attributes.
What is Fact table?
Fact Table contains the measurements or metrics or facts of business process.
If your business process is ³Sales´ , then a measurement of this business process such as ³monthly
sales number´ is captured in the Fact table.
Fact table also contains the foriegn keys for the dimension tables.
What is conformed fact?
Conformed dimensions are the dimensions which can be used across multiple Data Marts in
combination with multiple facts tables accordingly
What is ODS?
1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from operation databases and
standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture.
An ODS is used to support data mining of operational data, or as the store for base data that is
summarized for a data warehouse. The ODS may also be used to audit the data warehouse to
assure summarized and derived data is calculated properly. The ODS may further become the
enterprise shared operational database, allowing operational systems that are being reengineered to
use the ODS as there operation databases.
What is a lookup table?
A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on
the target table (fact table / warehouse) based upon the primary key of the target, it just updates the
table by allowing only new records or updated records based on the lookup condition.
What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels
of dimensions.Retrieving the required data from the actual table, which have millions of records will
take more time and also affects the server performance.To avoid this we can aggregate the table to
certain required level and can use it.This tables reduces the load in the database server and
increases the performance of the query and can retrieve the result very fastly.
What are conformed dimensions?
Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are
joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
Answer2:
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas
contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and
Dimensions here D1,D2 are the Conformed Dimensions
What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail are you willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put that data.
How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances
they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in the next section).
What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions
present in the fact table.
What are slowly changing dimensions?
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
scd3: by adding new columns to target table we maintain historical information and current
information
What are Semi-additive and factless facts and in which scenario will you use such kinds of fact
tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product samples) because this table
doesn¶t contain any measures.
Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
sowhen ever we have the keys in a table .that itself implies that the table is in the normal form.
What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which are neither dimesion nor measures is
called degenerate dimensions.Ex : invoice id,empno
what is junk dimension? what is the difference between junk dimension and degenerated
dimension?
Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them to a
separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension
table with fields like order number and order line number and have 1:1 relationship with Fact table, In
this case this dimension is removed and the order information will be directly stored in a Fact table
inorder eliminate unneccessary joins while retrieving order information..
4.what is aggregate fact table?
Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of
hirarchy.
What is fact less fact table? where you have used it in your project?
Factless table means only the key available in the Fact there is no mesuresavailalabl
c+
Dimension tables contain attributes that describe fact records in the fact table. Dimension tables
contain hierarchies of attributes that aid in summarization. Example: Time Dimension, Product
Dimension
c+,
+Fact Table contains the measurements or metrics or facts of business process.
c+

+Dimension tables contain attributes that describe fact records in the fact table. Dimension tables
contain hierarchies of attributes that aid in summarization. Example: Time Dimension.
c+)
+ODS means Operational Data Store. A collection of operation or bases data that is extracted
from operation databases and standardized, cleansed, consolidated, transformed, and loaded into
an enterprise data architecture. An ODS is used to support data mining of operational data, or as the
store for base data that is summarized for a data warehouse.
c+#
+A lookup table is used to display information from one table based on the value of a foreign-key
field in another table.
c+

+Aggregate table contains the summary of existing warehouse data which is grouped to certain
levels of dimensions. Aggregate table gives better performance for many reporting needs. Aggregate
table contains the [measure] values, aggregated/grouped/summed up to some level of hierarchy.
c+-.

'
+Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail are you willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put that data.
c+&

+Most dimension tables are designed using Normalization principles upto 2NF. In some instances
they are further normalized to 3NF.
c+
-
+Non-additive facts are facts that cannot be summed up for any of the dimensions present in the
fact table.
c+
'
SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
SCD1: Only maintain updated/Current values.
Ex: a customer address modified we update existing record with new address.
SCD2: Maintain historical and current information by using A) Effective Date B) Versions C)
Flags or combination of all three of those.
SCD3: By adding new columns to target table to maintain both historical and current information
c+
)-
' #

A. Generally most of the Snapshot fact tables are semi-additive.
A fact table without numeric fact columns is called factless fact table.
c+

+If a table contains the values, which are neither dimesion nor measures is called degenerate
dimensions.Ex : invoice id,EmpId.
c+/ #
/ #

+Junk dimension: Grouping of Random flags and text Attributes in a dimension and moving them
to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension
table with fields like order number and order line number and have 1:1 relationship with Fact table, In
this case this dimension is removed and the order information will be directly stored in a Fact table
inorder eliminate unneccessary joins while retrieving order information.
c+
+Factless table means only the keys are available in the Fact while there are
no measures available.

Datawarehouse Intrviewqstions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Datawarehouse Intrviewqstions

Uploaded by

Copyright:

Available Formats

c

SCD1: Only maintain updated/Current values.

A. Generally most of the Snapshot fact tables are semi-additive.

You might also like

Datawarehouse Intrviewqstions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Datawarehouse Intrviewqstions

Uploaded by

Copyright:

Available Formats

c 

SCD1: Only maintain updated/Current values.

A. Generally most of the Snapshot fact tables are semi-additive.

You might also like

c