Professional Documents
Culture Documents
Etl Intro Question
Etl Intro Question
Etl Intro Question
A list of frequently asked ETL Testing Interview Questions and Answers are given
below.
Loading :In this task, the data is added to the database table in a warehouse.
Staging Layer: Staging layer is used to store the data which is extracted from the
different data source systems.
Data Integration Layer: Integration layer transforms the data from the staging
layer and moves the data to a database. In the database, the data is arranged into
hierarchical groups, which is often called dimension, and into facts and
aggregation facts. The combination of facts and dimension table in a data warehouse
system is called a schema.
Access Layer: Access layer is used by the end-users to retrieve the data for
analytical reporting.
4) What is BI?
Business Intelligence is the process for collecting raw business data and
transforming it into a meaningful vision that is more useful for business.
The ETL tools are used to extract the data from different data sources, transform
the data, and load it into a data warehouse system.
Most commonly ETL tools are Informatica, SAP BO data service, Microsoft SSIS,
Oracle Data Integrator (ODI) Clover ETL Open Source, etc.
BI TOOLS :
BI tools are used to generate interactive and ad-hoc reports for end-users, data
visualization for monthly, quarterly, and annual board meetings.
Most commonly BI tools are SAP Lumira, IBM Cognos, Microsoft BI platform, Tableau,
Oracle Business Intelligence Enterprise Edition, etc.
6) What are the ETL tools available in the market?
The popular ETL tools available in the market are:
8) What is the difference between the data warehouse and data mining?
Data warehousing is a broad concept as compared to data mining. Data Mining
involves extracting the hidden information from the data and interpreting it for
future forecasting. In contrast, data warehousing includes operations such as
analytical reporting to generate detailed reports and ad-hoc reports, information
processing to generate interactive dashboards and charts.
9) What are the differences between data warehousing and data mining?
OLTP :
OLTP is a relational database, and it is used to manage the day to day transaction.
OLAP :
OLAP stands for Online Analytical Processing.
OLAP is a multidimensional system, and it is also called a data warehouse.
10) What is a dimension table and how it is different from the fact table?
Here, we are taking an example to describe how the dimension table is
distinguishing from the fact table.
Suppose a company sells its products to its customer. Every sale is a fact which
occurs within the company, and the fact table is used to record these facts. Each
fact table stores the primary key that joins fact table with the dimension table
and measures/ facts.
For example: In an organization, data marts may exist for marketing, finance, human
resource, and other individual departments which stores the data related to their
specific functions.
12) What is the difference between Manual Testing and ETL Testing?
The difference between Manual testing and ETL testing is:
Manual testing focuses on the functionality of the program while the ETL testing is
related to database and its count.
ETL is the automated testing process where we do not need any technical knowledge.
ETL testing is extremely faster, systematic, and assurance of the result required
by the business.
Manual testing is a time-consuming process where we need the technical knowledge to
write the test cases and scripts. It is slow, highly prone to errors, and also need
efforts.
Stage Tables
Business Logic Transformation
Target table loading from the staging table, once we apply the transformation.
Responsibilities of ETL tester are:
ETL testing used to keep an eye on the data which is being transferred from one
system to another.
The need for ETL testing is to keep a track on the efficiency and speed of the
process.
The need for ETL testing is arising to be familiar with the ETL process before we
implement it into our business and production.
ETL is used in data warehousing concept. Here, we need to fetch the data from
multiple different systems and loads it in the data warehouse database. ETL concept
is used here to extract the data from the source, transform the data, and load it
into the target system.
Data migrations are a difficult task if we are using PLSQL. If we want to migrate
the data using a simple way, we will use different ETL tools.
In today's time, lots of companies are merging into different MNCs. To move the
data from one company to another, the need for ETL concept arises.
19) What is the difference between ETL Testing and Database Testing?
The differences between the ETL testing and Database testing are:
ETL Testing
Database Testing
In ETL testing, the goal is the reporting of business intelligence
In DB testing, the goal is to integrate the data.
The flow of business environment is based on the data used earlier
Database Testing applies to business flow systems only.
The tools Informatica, Query Surge, Cognos can be used.
In DB testing, the QTP and Selenium tools are used.
In ETL testing, Dimensional model is used.
In DB testing, relational model is used.
In ETL testing, Analytics are processed.
In DB testing, Transactions are processed.
Denormalized data is used in ETL testing.
Normalized data is used.
This step is based on the validation and test estimation. In this step, the
environment of ETL is planned according to the input which is used in the test
estimation and worked according to that.
As per the test, data is prepared and executed as per the requirement.
On the completion of the test run, a summary report is prepared for concluding and
improvising.
24) What are the steps followed to choose the ETL process?
It is a very difficult task to choose the ETL tools. To select the correct ETL
tool, we need to consider a lot of factors according to the project. To choose the
ETL tool for a specific project is a very strategic move, even we need it for a
small project.
Here are some points which will help us to choose the ETL tool.
Data Connectivity
To choose the ETL tool, we will focus on how the ETL tool should communicate with
any source of data no matter where the data comes. Data connectivity is very
critical.
Performance
To move and change the data requires some serious processing power. So here, we
need to check the performance factor.
Transformation Flexibility
Merging, Matching, and Changing the data is very critical. ETL tools should provide
all these Mergings, Matching and Changing operations and many transformation
packages. It allows the modification to the data in the transformation phase with
simple drag and drop.
Data Quality
We can take advantage of the data only when the data is clean and consistent.
Flexible data action option
When the ETL is ready, we need to check that ETL will work on previous data as well
as new coming data.
Committed ETL vendor
We are working with the organization data while we are doing the ETL process. So
here we have to choose the vendor who is aware of the industry and whose support
will be beneficial.
Source Bugs
Load Condition Bugs
Calculation Bugs
ECP related Bugs
User-Interface Bugs
Full Extraction: All extracted data from an operational system or source system
load to the staging area.
Partial Extraction: Sometimes, we get the notification from the source system to
update the specific data. It is called Delta Load.
Source System Performance: The extraction strategies of data should not affect the
performance of the source system.
Informatica
Talend
IBM Datastage
Abnitio
MS SQL Server Integration service
Clover ETL
2. Open Source ETL tools
Pentaho
Kettle
35) What is the use of dynamic cache and static cache in transformation?
Dynamic cache is used to update the dimension or master table slowly. The static
cache is used in flat files.
Incremental Load: In this, we apply the ongoing changes to one or more table, which
is based on a predefined schedule.
Lookup is used to check and compare the source table and the target table.
OLAP Tools: It is used for reporting purpose in OLAP data available in the
multidirectional model. We can write a simple query to extract the data from the
database.