Professional Documents
Culture Documents
Why Do We Need Staging Area During ETL Load PDF
Why Do We Need Staging Area During ETL Load PDF
WrittenbyDWBIConceptsTeam
LastUpdated:31December2014
"WehaveasimpledatawarehousethattakesdatafromafewRDBMSsourcesystemsandloadthedatain
dimensionandfacttablesofthewarehouse.Iwonderwhywehaveastaginglayerinbetween.Whycant
weprocesseverythingontheflyandpushtheminthedatawarehouse?"
Lastnight,IreceivedthisquestionfromoneofthemembersofDWBIConceptscommunityoveremailand
thoughtofdiscussingtheprosandconsofhavingastaginglayerinthisarticle.
Really staging area is not a necessity if we can handle it on the fly. But can we? Here are a few reasons
whyyoucantavoidastagingarea:
1.Sourcesystemsareonlyavailableforextractionduringaspecifictimeslotwhichisgenerallylesser
than your overall data loading time. Its a good idea to extract and keep things at your end before
youlosetheconnectiontothesourcesystems.
2.You want to extract data based on some conditions which require you to join two or more different
systems together. E.g. you want to only extract those customers who also exist in some other
system.YouwillnotbeabletoperformaSQLqueryjoiningtwotablesfromtwophysicallydifferent
databases.
3.Varioussourcesystemshavedifferentallottedtimingfordataextraction.
4.Data warehouses data loading frequency does not match with the refresh frequencies of the source
systems.
5.Extracted data from the same set of source systems are going to be used in multiple places (data
warehouseloading,ODSloading,thirdpartyapplicationsetc.)
6.ETLprocessinvolvescomplexdatatransformationsthatrequireextraspacetotemporarilystagethe
data
7.There is specific data reconciliation / debugging requirement which warrants the use of staging area
forpre,duringorpostloaddatavalidations
Clearly staging area gives lot flexibility during data loading. Shouldn't we have a separate staging area
alwaysthen?Isthereanyimpactofhavingastagearea?Yesthereareafew.
1.Stagingareaincreaseslatencythatisthetimerequiredforachangeinthesourcesystemtotake
effectinthedatawarehouse.In lot of real time / near real time applications, staging area is rather
avoided.
2.Datainthestagingareaoccupiesextraspace.
To me, in all practical senses, the benefit of having a staging area outweighs its problems. Hence, in
generalIwillsuggestdesignatingaspecificstagingareaindatawarehousingprojects.
Prev(/etl/etl/53methodsofincrementalloadingindatawarehouse)
Next(/etl/etl/25dataintegration)
Doyouknowtheanswer?
Whichofthefollowingisnotadatabase?
Oracle
MSSQLServer
Hadoop
MySQL
Submit
Popular
Top20SQLInterviewQuestionswithAnswers(/database/sql/72top20sqlinterviewquestionswithanswers)
BestInformaticaInterviewQuestions&Answers(/etl/informatica/131importantpracticalinterviewquestions)
Top50DataWarehousing/AnalyticsInterviewQuestionsandAnswers(/datamodelling/dimensionalmodel/58
top50dwbiinterviewquestionswithanswers)
Top50DWBIInterviewQuestionswithAnswersPart2(/datamodelling/dimensionalmodel/59top50dwbi
interviewquestionswithanswerspart2)
The101GuidetoDimensionalDataModeling(/datamodelling/dimensionalmodel/1dimensionalmodeling
guide)
Top30BusinessObjectsinterviewquestions(BO)withAnswers(/analysis/businessobjects/69top
businessobjectsinterviewquestions)
AlsoRead
BuildingtheNextGenerationETLdataloadingFramework(/etl/etl/56etldataloadframeworkrfc)
IncrementalLoadingforDimensionTable(/etl/etl/54incrementalloadingfordimensiontable)
ETLDesignPattern(/etl/etldesignpattern/57etldesignpattern)
BusinessIntelligenceCertification(/etl/etl/2uncategorised/179businessintelligencecertification)
UsingInformaticaNormalizerTransformation(/etl/informatica/147usinginformaticanormalizer
transformation)
Haveaquestiononthissubject?
Askquestionstoourexpertcommunitymembersandclearyourdoubts.Askingquestionorengagingin
technicaldiscussionisbotheasyandrewarding.
AskaQuestion,we'llAnswer
AreyouonTwitter?
Startfollowingus.Thiswaywewillalwayskeepyouupdatedwithwhat'shappeninginDataAnalytics
community.Wewon'tspamyou.Promise.
Follow@dwbic
AboutUs
DataWarehousingandBusinessIntelligenceOrganizationAdvancingBusinessIntelligence
DWBI.orgisaprofessionalinstitutioncreatedandendorsedbyveteranBIandDataAnalyticsprofessionals
fortheadvancementofdatadrivenintelligence
JoinUs(/dwbi.org/component/easysocial/login)|Submitanarticle(/contribute)|ContactUs(/contact)
Copyright
(https://creativecommons.org/licenses/byncsa/4.0/)
Exceptwhereotherwisenoted,contentsofDWBI.ORGbyIntellipLLP(http://intellip.com)islicensedunder
aCreativeCommonsAttributionNonCommercialShareAlike4.0InternationalLicense.
PrivacyPolicy(/privacy)|TermsofUse(/terms)
Getintouch
(https://www.facebook.com/datawarehousing)
(https://www.linkedin.com/company/dwbiconcepts)
(https://twitter.com/dwbiconcepts)
(https://www.youtube.com/dwbiconcepts)
(https://plus.google.com/b/105042632846858744029)
Security
(https://www.beyondsecurity.com/vulnerabilityscannerverification/dwbi.org)