Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

9.)types of testing in ETL?

1. Unit Testing –
This type of testing is being performed at the developer’s end. In unit testing,
each unit/component of modules is separately tested. Each modules of the
whole data warehouse, i.e. program, SQL Script, procedure,, Unix shell is
validated and tested.

Integration testing is the process of testing the interface between two software
units or module. It’s focus on determining the correctness of the interface. The
purpose of the integration testing is to expose faults in the interaction between
integrated units. Once all the modules have been unit tested, integration testing
is performed.

Integration test approaches –


There are four types of integration testing approaches. Those approaches are
the following:
1. Big-Bang Integration Testing –
It is the simplest integration testing approach, where all the modules are
combining and verifying the functionality after the completion of individual
module testing. In simple words, all the modules of the system are simply put
together and tested. This approach is practicable only for very small systems. If
once an error is found during the integration testing, it is very difficult to localize
the error as the error may potentially belong to any of the modules being
integrated. So, debugging errors reported during big bang integration testing
are very expensive to fix.
Advantages:
 It is convenient for small systems. High risk critical modules are not isolated
and tested on priority since all modules are tested at once.
Disadvantages:
 There will be quite a lot of delay because you would have to wait for all the
modules to be integrated.

2. Bottom-Up Integration Testing –
In bottom-up testing, each module at lower levels is tested with higher modules
until all modules are tested. The primary purpose of this integration testing is,
each subsystem is to test the interfaces among various modules making up the
subsystem. This integration testing uses test drivers to drive and pass
appropriate data to the lower level modules.
Advantages:
 In bottom-up testing, no stubs are required.
 A principle advantage of this integration testing is that several disjoint
subsystems can be tested simultaneously.
Disadvantages:
 Driver modules must be produced.
 In this testing, the complexity that occurs when the system is made up of a
large number of small subsystem.
3. Top-Down Integration Testing –
Top-down integration testing technique used in order to simulate the behaviour
of the lower-level modules that are not yet integrated.In this integration testing,
testing takes place from top to bottom. First high-level modules are tested and
then low-level modules and finally integrating the low-level modules to a high
level to ensure the system is working as intended.
Advantages:
 Separately debugged module.
 Few or no drivers needed.
 It is more stable and accurate at the aggregate level.
Disadvantages:
 Needs many Stubs.
 Modules at lower level are tested inadequately.
4. Mixed Integration Testing –
A mixed integration testing is also called sandwiched integration testing. A
mixed integration testing follows a combination of top down and bottom-up
testing approaches. In top-down approach, testing can start only after the top-
level module have been coded and unit tested. In bottom-up approach, testing
can start only after the bottom level modules are ready. This sandwich or mixed
approach overcomes this shortcoming of the top-down and bottom-up
approaches. A mixed integration testing is also called sandwiched integration
testing.
Advantages:
 Mixed approach is useful for very large projects having several sub projects.
 This Sandwich approach overcomes this shortcoming of the top-down and
bottom-up approaches.
Disadvantages:
 For mixed integration testing, require very high cost because one part has
Top-down approach while another part has bottom-up approach.
 This integration testing cannot be used for smaller system with huge
interdependence between different modules.
2) Smoke Testing is also known as Confidence Testing or Build Verification
Testing.

In other words, we verify whether the important features are working and there
are no showstoppers in the build that is under testing.
It is a mini and quick regression test of major functionality. Smoke testing shows
that the product is ready for testing. This helps in determining if the build is
flawed as to make any further testing a waste of time and resources.

Characteristics of Smoke Testing:


Following are the characteristics of the smoke testing:
 Smoke testing is documented.
 Smoke testing may be stable as well as unstable.
 Smoke testing is scripted.
 Smoke testing is type of regression testing.
Smoke Testing is usually carried out by the quality assurance engineers.
Goal of Smoke Testing:
The aim of Smoke Testing is:
1. To detect any early defects in software product.
2. To demonstrate system stability.
3. To demonstrate conformance to requirements.
4. To assure that the acute functionalities of program is working fine.
5. To measures the stability of the software product by performing testing.
6. To test all over function of the software product.
Types of Smoke Testing:
There are 2 types of Smoke Testing: Manual, and Automation.
Advantages of Smoke Testing:
1. Smoke testing is easy to perform.
2. It helps in identifying defects in early stages.
3. It improves the quality of the system.
4. Smoke testing reduces the risk of failure.
5. Smoke testing makes progress easier to access.
6. It saves test effort and time.
7. It makes easy to detect critical errors and helps in correction of errors.
8. It runs quickly.
9. It minimizes integration risks.

AlphaTesting is a type of software testing performed to identify bugs before


releasing the product to real users or to the public. Alpha Testing is one of
the user acceptance testing.
Beta Testing is performed by real users of the software application in a real
environment. Beta testing is one of the type of User Acceptance Testing.
Acceptance Testing is the last phase of software testing performed after
System Testing and before making the system available for actual use.
Types of Acceptance Testing:
1. User Acceptance Testing (UAT):
User acceptance testing is used to determine whether the product is working
for the user correctly. Specific requirements which are quite often used by
the customers are primarily picked for the testing purpose. This is also
termed as End-User Testing.
2. Business Acceptance Testing (BAT):
BAT is used to determine whether the product meets the business goals and
purposes or not. BAT mainly focuses on business profits which are quite
challenging due to the changing market conditions and new technologies so
that the current implementation may have to being changed which result in
extra budgets.
3. Contract Acceptance Testing (CAT):
CAT is a contract which specifies that once the product goes live, within a
predetermined period, the acceptance test must be performed and it should
pass all the acceptance use cases.
Here is a contract termed as Service Level Agreement (SLA), which includes
the terms where the payment will be made only if the Product services are
in-line with all the requirements, which means the contract is fulfilled.
Sometimes, this contract happens before the product goes live. There
should be a well defined contract in terms of the period of testing, areas of
testing, conditions on issues encountered at later stages, payments, etc.
4. Regulations Acceptance Testing (RAT):
RAT is used to determine whether the product violates the rules and
regulations that are defined by the government of the country where it is
being released. This may be unintentional but will impact negatively on the
business.
Generally, the product or application that is to be released in the market, has
to go under RAT, as different countries or regions have different rules and
regulations defined by its governing bodies. If any rules and regulations are
violated for any country then that country or the specific region then the
product will not be released in that country or region. If the product is
released even though there is a violation then only the vendors of the
product will be directly responsible.
5. Operational Acceptance Testing (OAT):
OAT is used to determine the operational readiness of the product and is a
non-functional testing. It mainly includes testing of recovery, compatibility,
maintainability, reliability etc.
OAT assures the stability of the product before it is released to the
production.
6. Alpha Testing:
Alpha testing is used to determine the product in the development testing
environment by a specialized testers team usually called alpha testers.
7. Beta Testing:
Beta testing is used to assess the product by exposing it to the real end-
users, usually called beta testers in their environment. Feedback is collected
from the users and the defects are fixed. Also, this helps in enhancing the
product to give a rich user experience.
Use of Acceptance Testing:
 To find the defects missed during the functional testing phase.
 How well the product is developed.
 A product is what actually the customers need.
 Feedbacks help in improving the product performance and user experience.
 Minimize or eliminate the issues arising from the production.
 White Box Testing: White Box Testing is a type of Software Testing in
which the internal structure, design and implementation of the software
application that is being tested is fully known to the tester.
 Gray Box Testing: Gray Box Testing is a software testing technique which
is a combination of Black Box Testing technique and White Box Testing
technique. The internal structure, design and implementation is partially
known in Gray Box Testing.
1. Black Box Testing is a software testing method in which the internal
structure/ design/ implementation of the item being tested is not known to
the tester

2. White Box Testing is a software testing method in which the internal


structure/ design/ implementation of the item being tested is known to the
tester.

Differences between Black Box Testing vs White Box Testing:

Black Box Testing White Box Testing

It is a way of software testing in which the It is a way of testing the software in which
internal structure or the program or the the tester has knowledge about the internal
code is hidden and nothing is known about structure or the code or the program of the
it. software.

It is mostly done by software testers. It is mostly done by software developers.

No knowledge of implementation is
needed. Knowledge of implementation is required.

It can be referred as outer or external It is the inner or the internal software


software testing. testing.

It is functional test of the software. It is structural test of the software.

This testing can be initiated on the basis of This type of testing of software is started
requirement specifications document. after detail design document.
Black Box Testing White Box Testing

No knowledge of programming is It is mandatory to have knowledge of


required. programming.

It is the behavior testing of the software. It is the logic testing of the software.

It is applicable to the higher levels of It is generally applicable to the lower levels


testing of software. of software testing.

It is also called closed testing. It is also called as clear box testing.

It is least time consuming. It is most time consuming.

It is not suitable or preferred for algorithm


testing. It is suitable for algorithm testing.

Can be done by trial and error ways and Data domains along with inner or internal
methods. boundaries can be better tested.

Example: search something on google by Example: by input to check and verify


using keywords loops

Types of Black Box Testing: Types of White Box Testing:

 A. Functional Testing  A. Path Testing

 B. Non-functional testing  B. Loop Testing

 C. Regression Testing  C. Condition testing


 Differences between White Box Testing and Gray Box Testing:
White Box Testing Gray Box Testing

It is a type of software testing in which It is a type of software testing in which


the internal structure and design of the the internal structure and design of the
White Box Testing Gray Box Testing

software application is fully known to software application is partially known


the tester. to the tester.

It is also known as clear box testing or


transparent testing. It is known as translucent testing.

It is performed by testers and It is performed by end users, testers and


developers. developers.

Full knowledge of the implementation Small knowledge of the implementation


is required. is enough.

High programming skills are required basic programming skills are enough to
to perform white box testing. perform this testing.

It is a time consuming testing. It is a less time consuming testing.

It is used for algorithm testing. It is not used in algorithm testing.

The difference between Alpha and Beta Testing is as follow:

Alpha Testing Beta Testing

Alpha testing involves both the


white box and black box
testing. Beta testing commonly uses black box testing.

Alpha testing is performed by


testers who are usually internal Beta testing is performed by clients who are
employees of the organization. not part of the organization.

Alpha testing is performed at Beta testing is performed at end-user of the


developer’s site. product.
Alpha Testing Beta Testing

Reliability and security testing Reliability, security and robustness are


are not checked in alpha testing. checked during beta testing.

Beta testing also concentrates on the quality of


Alpha testing ensures the the product but collects users input on the
quality of the product before product and ensures that the product is ready
forwarding to beta testing. for real time users.

Alpha testing requires a testing Beta testing doesn’t require a testing


environment or a lab. environment or lab.

Alpha testing may require long Beta testing requires only a few weeks of
execution cycle. execution.

Developers can immediately Most of the issues or feedback collected from


address the critical issues or beta testing will be implemented in future
fixes in alpha testing. versions of the product.

 Difference between Unit and Integration Testing:


Unit Testing Integration Testing

In unit testing each module of the In integration testing all modules of the
software is tested separately. the software are tested combined.

In unit testing tester knows the In integration testing doesn’t know the
internal design of the software. internal design of the software.

Unit testing is performed first of all Integration testing is performed after


testing processes. unit testing and before system testing.

Unit testing is a white box testing. Integration testing is a black box testing.

Unit testing is basically performed by Integration testing is performed by the


the developer. tester.
Unit Testing Integration Testing

Detection of defects in unit testing is Detection of defects in integration


easy. testing is difficult.

It tests parts of the project without It tests only after the completion of all
waiting for others to be completed. parts.

Integration testing is more costly.


Unit testing is less costly.

Difference between Sanity Testing and Regression Testing :


S.No. Sanity Testing Regression Testing
Sanity Testing is performed to Regression testing is performed to
check the stability of new check the stability of all areas impacted
functionality or code changes in by any functionality change or code
01. the existing build. change.

Sanity Testing is part of Regression Testing is independent


02. Regression Testing. testing.

It is executed based on the project and


It is executed before Regression availability of resources, manpower
03. Testing and after Smoke Testing. and time.

Sanity Testing is considered as a Regression Testing is not considered as


04. Surface Level Testing. a Surface Level Testing.

It examines few functionality of It examines extended mostly all


05. the software. functionality of the software.

Sanity Testing does not use


06. scripts. Regression Testing uses Scripts.

Sanity Testing is often carried Regression Testing is often preferred to


07 out manually. continue with automation.
Performing Sanity Testing Performing Regression
increases the product cost/budget Testing increases the product
08 cost. cost/budget cost.

Complete test cases are not


executed in the product during Complete test cases are executed in the
09 this Sanity Testing. product during this Regression Testing.

8)what is ods vs datawarehouse?

Operational Data Stores Data Warehouse

ODS means for operational reporting and supports current or near real- A data warehouse is intended for histo
time reporting requirements.

An ODS consist of only a short window of data. A data warehouse includes the entire h

It is typically detailed data only. It contains summarized and detailed da

It is used for detailed decision making and operational reporting. It is used for long term decision making

It is used at the operational level. It is used at the managerial level.

It serves as conduct for data between operational and analytics system. It serves as a repository for cleansed an

It is updated often as the transactions system generates new data. It is usually updated in batch processin
7.)what is fact and types of facts
A fact table is the central table in a star schema of a data warehouse. ... Thus, the
fact table consists of two types of columns. The foreign keys column allows joins with
dimension tables, and the measures columns contain the data that is being analyzed.

There are three types of facts:

 Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
 Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
 Non-Additive: Non-additive facts are facts that cannot be summed up
for any of the dimensions present in the fact table.

6.)what is scd and its types

 Slowly Changing Dimensions (SCD) – dimensions that change slowly over


time, rather than changing on regular schedule, time-base.
SCD1: It never maintains history in the target table. It keeps the most
recent updated record only in the data base.

SCD2: It maintains full history in the target. It maintains history by inserting


the new record and updating for each change.

SCD3: It keeps the both current and previous values only in the target.

5.) Data Warehouse Schema


In a data warehouse, a schema is used to define the way to organize the system with all the
database entities (fact tables, dimension tables) and their logical association.

Here are the different types of Schemas in DW:


1. Star Schema
2. Snowflake Schema
3. Galaxy Schema
4. Star Cluster Schema
#1) Star Schema
This is the simplest and most effective schema in a data warehouse. A fact table in the
center surrounded by multiple dimension tables resembles a star in the Star Schema model.
The fact table maintains one-to-many relations with all the dimension tables. Every row in a
fact table is associated with its dimension table rows with a foreign key reference.

Due to the above reason, navigation among the tables in this model is easy for querying
aggregated data. An end-user can easily understand this structure. Hence all the Business
Intelligence (BI) tools greatly support the Star schema model.

While designing star schemas the dimension tables are purposefully de-normalized. They
are wide with many attributes to store the contextual data for better analysis and reporting.

Benefits Of Star Schema


 Queries use very simple joins while retrieving the data and thereby query
performance is increased.
 It is simple to retrieve data for reporting, at any point of time for any period.
Disadvantages Of Star Schema
 If there are many changes in the requirements, the existing star schema is not
recommended to modify and reuse in the long run.
 Data redundancy is more as tables are not hierarchically divided.
An example of a Star Schema is given below.

Querying A Star Schema


An end-user can request a report using Business Intelligence tools. All such requests will be
processed by creating a chain of “SELECT queries” internally. The performance of these
queries will have an impact on the report execution time.

From the above Star schema example, if a business user wants to know how many Novels
and DVDs have been sold in the state of Kerala in January in 2018, then you can apply the
query as follows on Star schema tables:
SELECT pdim.Name Product_Name,
Sum (sfact.sales_units) Quanity_Sold
FROM Product pdim,
Sales sfact,
Store sdim,
Date ddim
WHERE sfact.product_id = pdim.product_id
AND sfact.store_id = sdim.store_id
AND sfact.date_id = ddim.date_id
AND sdim.state = 'Kerala'
AND ddim.month = 1
AND ddim.year = 2018
AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name
Results:
Product_Name Quantity_Sold

Novels 12,702

DVDs 32,919
Hope you understood how easy it is to query a Star Schema.

#2) SnowFlake Schema


Star schema acts as an input to design a SnowFlake schema. Snow flaking is a process
that completely normalizes all the dimension tables from a star schema.

The arrangement of a fact table in the center surrounded by multiple hierarchies of


dimension tables looks like a SnowFlake in the SnowFlake schema model. Every fact table
row is associated with its dimension table rows with a foreign key reference.

While designing SnowFlake schemas the dimension tables are purposefully normalized.
Foreign keys will be added to each level of the dimension tables to link to its parent
attribute. The complexity of the SnowFlake schema is directly proportional to the hierarchy
levels of the dimension tables.

Benefits of SnowFlake Schema:


 Data redundancy is completely removed by creating new dimension tables.
 When compared with star schema, less storage space is used by the Snow Flaking
dimension tables.
 It is easy to update (or) maintain the Snow Flaking tables.
Disadvantages of SnowFlake Schema:
 Due to normalized dimension tables, the ETL system has to load the number of
tables.
 You may need complex joins to perform a query due to the number of tables added.
Hence query performance will be degraded.
An example of a SnowFlake Schema is given below.
The Dimension Tables in the above SnowFlake Diagram are normalized as explained
below:
 Date dimension is normalized into Quarterly, Monthly and Weekly tables by leaving
foreign key ids in the Date table.
 The store dimension is normalized to comprise the table for State.
 The product dimension is normalized into Brand.
 In the Customer dimension, the attributes connected to the city are moved into the
new City table by leaving a foreign key id in the Customer table.
In the same way, a single dimension can maintain multiple levels of hierarchy.

Different levels of hierarchies from the above diagram can be referred to as follows:
 Quarterly id, Monthly id, and Weekly ids are the new surrogate keys that are created
for Date dimension hierarchies and those have been added as foreign keys in the
Date dimension table.
 State id is the new surrogate key created for Store dimension hierarchy and it has
been added as the foreign key in the Store dimension table.
 Brand id is the new surrogate key created for the Product dimension hierarchy and it
has been added as the foreign key in the Product dimension table.
 City id is the new surrogate key created for Customer dimension hierarchy and it has
been added as the foreign key in the Customer dimension table.
Querying A Snowflake Schema
We can generate the same kind of reports for end-users as that of star schema structures
with SnowFlake schemas as well. But the queries are a bit complicated here.

From the above SnowFlake schema example, we are going to generate the same query
that we have designed during the Star schema query example.

That is if a business user wants to know how many Novels and DVDs have been sold in the
state of Kerala in January in 2018, you can apply the query as follows on SnowFlake
schema tables.

SELECT pdim.Name Product_Name,


Sum (sfact.sales_units) Quanity_Sold
FROM Sales sfact
INNER JOIN Product pdim ON sfact.product_id = pdim.product_id
INNER JOIN Store sdim ON sfact.store_id = sdim.store_id
INNER JOIN State stdim ON sdim.state_id = stdim.state_id
INNER JOIN Date ddim ON sfact.date_id = ddim.date_id
INNER JOIN Month mdim ON ddim.month_id = mdim.month_id
WHERE stdim.state = 'Kerala'
AND mdim.month = 1
AND ddim.year = 2018
AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name
Results:
Product_Name Quantity_Sold

Novels 12,702

DVDs 32,919
Points To Remember While Querying Star (or) SnowFlake Schema Tables
Any query can be designed with the below structure:

SELECT Clause:
 The attributes specified in the select clause are shown in the query results.
 The Select statement also uses groups to find the aggregated values and hence we
must use group by clause in the where condition.
FROM Clause:
 All the essential fact tables and dimension tables have to be chosen as per the
context.
WHERE Clause:
 Appropriate dimension attributes are mentioned in the where clause by joining with
the fact table attributes. Surrogate keys from the dimension tables are joined with the
respective foreign keys from the fact tables to fix the range of data to be queried.
Please refer to the above-written star schema query example to understand this.
You can also filter data in the from clause itself if in case you are using inner/outer
joins there, as written in the SnowFlake schema example.
 Dimension attributes are also mentioned as constraints on data in the where clause.
 By filtering the data with all the above steps, appropriate data is returned for the
reports.
As per the business needs, you can add (or) remove the facts, dimensions, attributes, and
constraints to a star schema (or) SnowFlake schema query by following the above
structure. You can also add sub-queries (or) merge different query results to generate data
for any complex reports.

#3) Galaxy Schema


A galaxy schema is also known as Fact Constellation Schema. In this schema, multiple fact
tables share the same dimension tables. The arrangement of fact tables and dimension
tables looks like a collection of stars in the Galaxy schema model.

The shared dimensions in this model are known as Conformed dimensions.

This type of schema is used for sophisticated requirements and for aggregated fact tables
that are more complex to be supported by the Star schema (or) SnowFlake schema. This
schema is difficult to maintain due to its complexity.

An example of Galaxy Schema is given below.

#4) Star Cluster Schema


A SnowFlake schema with many dimension tables may need more complex joins while
querying. A star schema with fewer dimension tables may have more redundancy. Hence, a
star cluster schema came into the picture by combining the features of the above two
schemas.

Star schema is the base to design a star cluster schema and few essential dimension tables
from the star schema are snowflaked and this, in turn, forms a more stable schema
structure.

An example of a Star Cluster Schema is given below.

Which Is Better Snowflake Schema Or Star Schema?


The data warehouse platform and the BI tools used in your DW system will play a vital role
in deciding the suitable schema to be designed. Star and SnowFlake are the most
frequently used schemas in DW.
Star schema is preferred if BI tools allow business users to easily interact with the table
structures with simple queries. The SnowFlake schema is preferred if BI tools are more
complicated for the business users to interact directly with the table structures due to more
joins and complex queries.

You can go ahead with the SnowFlake schema either if you want to save some storage
space or if your DW system has optimized tools to design this schema.

Star Schema Vs Snowflake Schema


Given below are the key differences between Star schema and SnowFlake schema.

S.No Star Schema Snow Flake Schema

1 Data redundancy is more. Data redundancy is less.

2 Storage space for dimension tables is more. Storage space for dimension tables is comparativ
less.

3 Contains de-normalized dimension tables. Contains normalized dimension tables.

4 Single fact table is surrounded by multiple Single fact table is surrounded by multiple hierar
dimension tables. dimension tables.

5 Queries use direct joins between fact and Queries use complex joins between fact and dim
dimensions to fetch the data. to fetch the data.

6 Query execution time is less. Query execution time is more.

7 Anyone can easily understand and design the It is tough to understand and design the schema.
schema.

8 Uses top down approach. Uses bottom up approach.

4.) Popular ETL tools :

1. Xplenty –
Xplenty is a cloud-based ETL solution which requires no coding and
provides simple visualized interface for performing ETL activities. It also
connects with a large variety of data sources.

2. IBM – DataStage –
It is a business intelligence tool for integrating data across various enterprise
systems, it is part of IBM information platforms solution suite it uses
visualized notation to making etl processes, it is a powerful data integration
tool.

3. Informatica –
Informatica is leading market in data integration, Informatica’s suite of data
integration software includes PowerCenter, which is known for its strong
automation capabilities. Informatica PowerCenter is developed by
Informatica Corporation. Informatica PowerCenter can connect to many
sources for fetching data for data integration.
Informatica PowerCenter have four client tools which is used in development
process.
 PowerCenter Designer
 Workflow Manager
 Workflow Monitor
 Repository Manager
4. Microsoft SQL Server SSIS –
Microsoft offers SSIS, a graphical interface for managing ETL using MS SQL
Server. SSIS have user friendly interface, allowing users to deploy
integrated data warehousing solutions without having to get involved with
writing lots of code. SSIS is a fast and flexible data warehousing tool. The
graphical interface allows for easy drag-and-drop ETL for multiple data types
and warehouse destinations.

5. Talend –
Talend is open source software which integrate, cleanse profile data and
helps you get business insights easily. Talend has a GUI that enables
managing a large number of source systems. This tool has Master Data
Management(MDM) functionality. It also provides metadata repository using
which user can easily re-use work.

6. Azure Data Factory –


Microsoft Azure Data Factory is a cloud based data integration service that
automates the ETL process. We can say it is SSIS in the cloud because they
share same idea but SSIS provide more powerful GUI, debugging and
intelligence tools.

7. Oracle Data Integrator –


Oracle Data Integrator is based on Extract, load and transform (ELT)
architecture which means it performs load first then transform data. This tool
is produced by Oracle that offers a graphical environment and it is also very
cost effective.
8. data junction
9. warehouse builder.
3.what is surrogate key

SURROGATE KEY:
A surrogate key is like a artificial primary key which is generated automatically
by the system and the value of surrogate key is numeric and it is
automatically incremented for each new row.
Generally, a DBMS designer needs a surrogate key when the primary key is
used inappropriately.
Features of the surrogate key :
 It is automatically generated by the system.
 It holds anonymous integer.
 It contains unique value for all records of the table.
 The value can never be modified by the user or application.
 Surrogate key is called the fact less key as it is added just for our ease of
identification of unique values and contains no relevant fact(or information)
that is useful for the table.

2. DATAWAREHOUSE AND DATA MINING


FIGURE1: DATAWAREHOUSE PROCESS:
Figure2 – DATA MINING PROCESS

Comparison between data mining and data warehousing:

Data Warehousing Data Mining


A data warehouse is database system which is
designed for analytical analysis instead of Data mining is the process of
transactional work. analyzing data patterns.

Data is stored periodically. Data is analyzed regularly.

Data mining is the use of pattern


Data warehousing is the process of extracting recognition logic to identify
and storing data to allow easier reporting. patterns

Data warehousing is solely carried out by Data mining is carried by business


engineers. users with the help of engineers.

Data mining is considered as a


Data warehousing is the process of pooling all process of extracting data from
relevant data together. large data sets.
1) Fact table And Dimension table:

Dimension table: Non measurable, primary_keys


Fact tables : measurable and foreign_keys

Difference between Fact Table and Dimension Table:


Fact Table Dimension Table
S.NO

Dimension table contains the


Fact table contains the measuring on the attributes on that truth table
1. attributes of a dimension table. calculates the metric.
Fact Table Dimension Table
S.NO

While in dimension table,


In fact table, There is less attributes than There is more attributes than
2. dimension table. fact table.

While in dimension table,


In fact table, There is more records than There is less records than fact
3. dimension table. table.

While dimension table forms a


4. Fact table forms a vertical table. horizontal table.

While the attribute format of


The attribute format of fact table is in dimension table is in text
5. numerical format and text format. format.

While it comes before fact


6. It comes after dimension table. table.

While the number of dimension


The number of fact table is less than is more than fact table in a
7. dimension table in a schema. schema.

While the main task of


dimension table is to store the
It is used for analysis purpose and decision information about a business
8. making. and its process.

You might also like