Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

etl: as the name implies extract data from homegeneous or heterogeneous data

sources. then transaform the data storing it formate or structure for queryng
and analysis processes it loads it into the fnal traget which can be a database
more specically an operational data store data mart or datahouse. etl systems
commonly integrate data from multiple applicatons typcally developed and su-
ported by different vendors or hosted on separate computer hardware.
here you can see an etl process including a stagng area which is a common prac-
tice in the industry as we disscussed in the previous video data s first collected
from multiple sources brought into a common staging area then is cleansed
and loaded into the final datawarehouse generally data is simply extracted and
loaded into the staging area first without any transformations that is why you
can see the big red e under the first etl job this is also refrred to also refered to
as data being dumped into the stagng area this can be thought of an intermedi-
ate type temporary database then from here on the data is transaformed cleans
and loaded into the final target hence the backed red tea and l under second
tel job from the data warehouse data is transferred to data mart’s then BI and
reporting tools come in the picture and provide reporting for the end-users these
movements of data can be scheduled on regular basis usng an ETL tool such as
informatica which will be our man focus for ths tutorial series it bent its benefits
and working will be discussed later on now let’s just see what kind of projects
does informatica cater to informatica can be used for wide variety of projects as
it s one of the most diverse and flexible tools of our age some of these projects
can be creating data mart’s or datawareses for business oriented people where
transactonal data is gathered up for their use mgrating data from old systems
to new ones inegrating data from interrelated companies integrating data from
third-party suplliers all ths work can be done by scripting so why use an ETL
tool sucha as informatica imagine hand coding all the tedious etl jobs using
scripts well i guarantee it would take you not less than a month to manage all
of the data movement in a small data mart enviroment to speed up ths process
we use etl tool and as most of the business are crtical you don’t want to mess
up things and make your life a hell informatcica plays a major role in making
your life easier now let’s look at a few advantages of informatica it is a gui tool
coding in an graphical tool is generally faster than hand coding scripting it can
communicate with all major data sources for example manframe or dbms flate
files xml vsm sap etc.. it can handle very large amount of data very effectively
reusability of objects is promoted by informatica -->for example the transforma-
tion rules can be used over and over again in multple mappings informatica can
be run on windows and unix enviroment it increases aglity in deliverng critical
data and reports to the business informatica boosts productvity with metadata
management it tests changes and upgrades up to 10 times faster and increases
test coverage with the informatica data validation option any changes in any
other objects will have minimum impact on other objects it strengthns real tme
operatons informatica proactively identifes data integratoin risks and final it
takes advantage annd harnesses the power of big - THe main advantage of infor-
matica is that it provides complete data lineage - well what is data lineage data

1
lineage is defined as a data lifecycle that includes that data’s orgins and where it
moves over time --it descrbes what happens to data as it goes thorough diverse
processes it helps provide visiblity into the analytics and smplifies tracing errors
backto their sources it also enables replaying specific specific portions of input
to the data flow for stepwise debugging or regenerating lost output.
-->why an etl tool? what is the need of an etl tool The problem comes with
traditional programming language where you need to connect to multiple sources
and you have to handle errors.for this you have to write comple code.etl tools
provide a ready-made solution for this.you don’t need to worry about handling
these things and concentrate only on coding requotement part.
informatca can be used for a wide varety of projects as it is one of most diverse
and flexible tools of our age some of projects can be creating data mart’s or data
warehouses for busness oriented people where transactional data is gathered up
for their use for example a big data management firm needs its data to be
organized and stored in a centralized warehouse where every one can access
it the data source can be diverse such as flate files containng new stat files
contaning old client data or relatonal database containing the recent data all
these sourceses need to be integrated and brought into the data warehouse
for further reporting and processng now this data needs to be accessed by all
the departments such as the hr department the payroll department we have
maneagement etc so we need to cerate data mart’s open the warehouse which
will accessible to different departments and have relevant data for them for
these purposes we can use nformatica and create jobs and processes to extract
data from initial diverse sources transform and cleans them load them into the
warehouse then further filter out and load them nto the data mart’s
the second tupe of project can be migrating data from old systems to new ones
for example a legacy system needs to be migrated to a new system data extracted
from old system cleans and transformed made according with new system and
loaded into it compatibility needs to be ensured hence we need an etl tool such
as informatica to do the job otherwise hand coding the process would be tedious
and prone to errors
another example of projects which informatica can be used in integrating data
from interrelated companies for example a constructon company acquires a
smaller constructon company all the employees now have to work as a single
entity under a single name this merger entails the merger of data from both the
companies the hr department need to know how many peoples are workng for
them so to do this we need to use an etl tool such as informatica to extract data
from two separate sources and bring them into a single centralized data data
warehouse we wll have to cleans the data and bring some unformity in it so to
be loaded in a common place this challenge can only be met efficiently using
informatica
another example of a project where informatca can be used is integrating data
from third-party suplliers let’s take the example of a construction company

2
which has hired a third-party supplier for the woodwork the construction com-
pany needs to know how much wood is available while the wood company needs
to know how much wood it needs to supply to make things simpler we can create
a centralized data warehouse which can be accessed by both the companies and
they can have a clear visility of the inventory this can be done by informatica
in an efficent manner
now that we know what informatica is and what are the real-world applicatons
of it
---> hey guys hope you are good today i will be your instructor for this series
of informatca tutorial i am an etl and dwh expert myself and have well done
multiple projects thorought my industrail experience of 10 years i started off as
a fresh etl developer where i used to use different methodologies for the process
but as the industry evolved a number of littile tools emerged one of them being
informatica power center amongst wide variety of tools available informatca is
my personal favorite as it is powerful efficient and really easy to get learn so
let’s take a look at what we will be learning in these lectures we will start off
with the basic concepts of data waresousing so f you are not from warehousing
department need to worry this lecture set is all you need to master the scale
then we will move on the concept of etl and a few real-life examples of etl and
data warehousing projects after that we start the informatica tutorials which
include its basic architecture the components whch we will be dealing with and
the installatoin guide will set up the enviroment from scratch the server side
and as well as the client modules after that we will learn how to create smple
one-to-one mappings sessons and their worlflows with each wth you the level
difficulty will rise but i hope you will you won’t have any trouble in grasping
the idea we will take a look at all the transformatoions such expresson transfor-
mation filter transformations the router and jointer transformatons we look up
transformations union transformations sorter aggregator and rank transforma-
tioins sequence generator and stored procedure transformation then we take a
look at the concept of slowly channg conventon the properties of tasks sessons
and

You might also like