Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

OLTP On Line Transaction Processing The major task of OLTP system is to perform on-line transaction and query processing.

g. It mostly covers all day-day operations of an organi ation like purchasing! inventory! "anking! payroll! registration! accounting etc OLTP is designed to get data in quickly and to analy e the current events characteri ed "y # Process Oriented $ormali ed %ata &urrent %ata 'olatile %ata (pdated in real-time %ata )arehousing is a concept and not a technology*** In Layman Words : %ata )arehousing is a dump or collection of historical data from different sources of different data"ases formatted and stored into a common target in order to get an intelligent output +ccording to Bill Inmon kno,n as -.ather of %ata )arehousing/ %ata )arehousing is 0u"ject Oriented Integrated $onvolatile Time 'ariant &ollection of data in support of management decisions. Subject Oriented %ata is organi ed around major su"jects of the enterprise +pplications typically are designed around process1functions %ata ,arehouse is su"ject1data-driven orientation. It only includes data used for decision making %ata ,arehouse data spans time and allo,s more comple2 relations Integrated %ata in data ,arehouse is integrated to refer in only one ,ay unlike many ,ays in legacy system %ata is in same format %ata is in same units in ,hich attri"utes are measured

Non-volatile %ata is not updated regularly on record-"y record "asis %ata in data ,arehouse is refreshed at certain intervals %ata in data ,arehouse can "e accessed as and then required Time Variant %ata ,arehouse data can "e accurate at some moment in time "ut not necessarily -rig t no!/ %ata )arehouse key al,ays contains unit of time 3day! ,eek etc4 &orrectly recorded data ,arehouse data cannot "e updated %ata )arehouse is designed to get data out and quickly analy e mainly characteri ed "y# 0u"ject oriented rather than process oriented Integrated across su"jects and entire enterprise %e-normali ed data Time 'ariant 5istorical $on 'olatile 0ummary data "i##ernce bet!een "ata!are $ouse and OLTP "ata Ware ouse )orks ,ith enterprise ,ide information Large to very large data"ase $on volatile data %e-normali ed data Less joins 6ultidimensional %ata 0tructures (pdated on a schedule 7ead 8ueries +naly e the "usiness OLTP )orks ,ith small pieces of information 0mall to large data"ase 'olatile data $ormali ed data 6ore num"er of joins &omple2 %ata structures (pdated in real time Insert1update 8ueries 7uns the "usiness

"ra!bac%s o# "ata Ware ousing 5andling large volume of data %ata )arehouse solutions complicate "usiness processes %ata )arehouse solutions may have too long a learning curve &osts factor In getting a professional In getting data ,arehousing licensed tools In cleaning! capturing and delivering data Bene#its o# "ata Ware ouse The a"ility to scale to large volumes of data and large num"ers of concurrent users &onsistent! fast query response times that allo, foriterative speed-of-thought analysis Integrated metadata that seamlessly links that OL+P server and the data ,arehouse relational data"ase The a"ility to automatically drill from summary calculated data! ,hich is managed "y OL+P server! to detail data stored in the data ,arehouse relational data"ase + calculation engine that includes ro"ust mathematial functions for computing derived data 3aggregations! matri2 calculations! cross-dimensional calculations! OL+P-a,are formulas and procedural calculations4 0eamless integration of historical! projected and derived data + 6ulti-user read1,rite environment to support users9 ,hat-if analysis! modeling and planning requirements The a"ility to "e depolyed quickly! adopted easily and maintained cost-effectively 7o"ust data-access security and user management +vaila"ility of a ,ide variety of vie,ing and analysis tools to support different user communities &oals o# "ata Ware ousing To provide a relia"le! single! integrated source of key corporate information To give end users access to their data ,ithout a reliance on reports produced "y the information system department To allo, analysts to analy e corporate data and even produce predictive -W at i#/ models from that data The data ,arehouse is simply one component of modern reporting architectures The real goal is decision support or its modern equivalent -Business Intelligence' to help people make "etter! more intelligent decision.

$istory o# "ata Ware ousing %ata )arehousing philosophy falls into# )illiam 5 Inmon philosophy 5e is kno,n as -.ather of %ata )arehouse/ as he started in early :;9s 7alph <im"all philosophy =ased on =ill Inmon philosophy "ut has a different approach in "uilding data ,arehousing "i##erent ())roac es to Build "ata Ware ouse T,o different approaches to "uild %ata )arehouse are# Top - %o,n approach >)illiam 5 Inmon? +n enterprise has one data ,arehouse and data marts source their information from the data ,arehouse. Information is stored in @rd normal form =ottom - (p approach >7alph <im"all? +n enterprise has one data ,arehouse ,here the information is sourced from data marts. Information is stored in dimensional model

Top - %o,n +pproach +rchitecture

=ottom (p +pproach

"ata Ware ousing im)ortant terminologies


%ata ,arehouse most often use %imensional data modeling and some of the terms frequently used # %imension ta"le# + category "y ,hich summari ed data can "e vie,ed. .or eg! Time dimension %imension# + unique level ,ithin a dimension ta"le. .or eg! month is a dimension in Time dimension 5ierarchy# The specification of levels that represents relationship "et,een different dimensions ,ithin a hierarchy. .or eg! one possi"le hierarchy in the Time dimension in Aear-8uarter-6onth-%ay %ata 6art# + data mart is a focused su"set of a data ,arehouse that is organi ed for quick analysis 6eta data# Is -%ata a"out data/. It9s a descriptions of ,hat kind of information is stored ,here! ho, it is encoded! ho, it is related to other information! ,here it comes from and ho, it is related to 0urrogate key# It9s a system generated key! act as primary key in dimension ta"les. They are used ,hen# Preserve the history of changes instead of updating There is a high possi"ility of restructuring the "usiness keys &an "e used to increase the join performance .act ta"le# + ta"le that contain facts and foreign keys from the primary keys of related dimension ta"les. .acts# + fact is a collection of related data items! consisting of measures and conte2t data. 6easure# It9s a numeric attri"ute of fact. .or eg! a sales fact ta"le contain profit as measure representing profit on each sale. +ggregates# Its a pre-calculated numeric data. This is key in providing fast query performance. &u"es# &u"es are data processing units composed of fact ta"les and dimensions from the data ,arehouse. They provide multidimensional vie,s of data! querying and analytical capa"ilities to clients. Ty)es o# "imensions &ausal dimension# Is a dimension ,hich e2plains the e2istence of a record in fact ta"le. .or eg! Transaction like hiring or termination ,ill ,arrant an additional ro, in the fact. %egenerate dimension# Is a dimension key generated in the fact ta"le that doesn9t refer to any dimension ta"le. It acts as a primary key for the fact ta"le.

&onformed dimension# Is a dimension key that is shared "y more than one fact ta"le. 5ierarchical dimension# Is a dimension ,hich has hierarchies. .or eg! Beography Ty)es o# "imension table: 6ini dimension# + mini dimension is a group of attri"utes ,hich changes frequently or a group of attri"utes ,hich ,ould "e queried frequently. Cunk dimension# + junk dimension is a group of static attri"utes like random flags +ggregate dimension# + aggregate dimension is a group of aggregated attri"utes. Ty)es o# #acts: +dditive# +dditive facts are the facts that can "e summed up through all of the dimensions in the fact ta"le 0emi-additive# 0emi-additive facts are the facts that can "e summed up for some of the dimensions in the fact ta"le! "ut not the others $on-additive# $on-additive facts are facts that cannot "e summed up for any of the dimensions present in the fact ta"le .actless fact ta"le# .actless fact doesn9t have any additive or semiadditive. It contains only foreign key value from dimension ta"les "ata Ware ousing is done in * stages: DTL %ata is pulled from different sources of different data"ase into a common area called staging area and then this data is transformed into )arehouse ta"les in a desired ,ay and in a desired format OL+P Process the ,arehouse data to generate cu"e and tp provide multidimensional vie, either to generate reports or to analy e or to mine the data W y data modeling+++ 5elps in managing comple2 data relationships + data model helps keep track of the comple2 environment like data ,arehousing 6any comple2 relationship e2its! ,ith the a"ility to change over time Transformation and integrations from various systems of record need to "e ,orked out and maintained. Provides the means of supplying users ,ith a -road map/ through the data and relationships

Ste)s in data modeling Ste) , : -once)tual "ata .odel Includes the important entities and the relationships among them $o attri"ute is specified $o primary key is specified +t this level! the data modeler attempts to identify the highest-level relationships among the different entities Ste) * : Logical "ata .odel Includes all entities and relationships among them +ll attri"utes for each entity are specified The primary key for each entity specified .oreign keys are specified $ormali ation occurs at this level +t this level! the data modeler attempts to descri"e the data in as much detail as possi"le! ,ithout regard to ho, they ,ill "e physically implemented in the data"ase Ste) / : P ysical "ata .odel 0pecification of all ta"les and columns .oreign keys are used to identify relationships "et,een ta"les %e-normali ation may occur "ased on user requirements Physical considerations may cause the physical data model to "e quite different from logical data model +t this level! the data modeler ,ill specify ho, the logical data model ,ill "e reali ed in the data"ase schema Sc emas available are: 0tar schema - consist of a central fact ta"le surrounded "y denormali ed dimensions 0no,flake schema consist of a central fact ta"le surrounded "y partly or completely normali ed dimensions &onstellation schema consist of more than one fact ta"les surrounded "y denormali ed dimensions

Star sc ema .eatures of 0tar schema are# %imension ta"les are separated from the fact ta"le %e-normali ed %imension ta"les +ttri"ute information stored ,ithin the dimension ta"le %e-normali ed fact ta"le Dach dimension ta"le has a key in the fact ta"le Ty)ical Star Sc ema

Sno!#la%e sc ema .eatures of 0no,flake 0chema are# %imensions are separated from facts +t least one normali ed dimension ta"le +ttri"ute info of normali ed dimension ta"le stored in outrigger ta"les +ttri"ute info of de-normali ed dimension ta"les stored in dimension ta"les %e-normali ed fact ta"le Dach dimension ta"le has a key into the fact ta"le Ty)ical Sno!#la%e Sc ema

-onstellation Sc ema .eatures of &onstellation schema are# %imension ta"les separated from fact ta"le +t least one normali ed dimension ta"le 3usually the largest4 +ttri"utes of normali ed dimension ta"les stored in outrigger ta"les +ttri"utes of de-normali ed dimension ta"les stored in dimension ta"les $ormali ed fact ta"les Dach dimension ta"le keyed into one or more fact ta"les Ty)ical constellation sc ema

W at is 0TL+++ DTL - D2traction Transformation Loading 01traction of data from different sources of different data"ase Trans#ormation of e2tracted data in a desired common format Loading of transformed data into staging1,arehouse ta"les 0TL (rc itecture

W y do !e need 0TL+++ 6igrate data from one data"ase to another or same data"ase &leanses the data Dliminates duplicates Organi e the data %ata handling ,ould "e easier 7eformats the data for target repository &apture data change 6aintains historical data $o! to im)lement 0TL+++ (sing PL108L scripts &oding is tedious and cum"ersome $eeds more resource for coding %ifficult to implement Takes more time to implement $eeds no additional cost for implementing %ata retrieval is faster (sing data ,arehouse tools DTL tools are user friendly and easy to handle $eeds less resource for implementing Dasy to implement Takes less time to implement $eed to have licensed copy to use the tool %ata retrieval is slo,er ,hen compared to PL108L scripting

(vailable 0TL tools in mar%et 0ome of the DTL tools in market are# Informatica Po,ercenter +" Initio +scential %ata0tage Oracle )arehouse =uilder =usinessO"jects %ataintegrator &ognos %ecision0tream 6icrosoft %T0 Pervasive %atajunction 5umming"ird Benio $o! to ca)ture data c ange+++ %ata change in the dimension ta"les can "e captured using follo,ing methods. 6ethod E# 0imple Pass Through %imension - Truncate and load the data 6ethod F# 0lo,ly &hanging %imension - $e, data gets inserted as ne, ro, and changed data gets either updated or inserted as ne, ro, or ne, column "ased on the types of 0&% 6ethod @# 0lo,ly Bro,ing Target %imension - $e, and changed data gets inserted as ne, ro, .et od , 0imple Pass Through "imension# (sed to load current data after truncating the target ta"le %oesnGt filter out the e2isting ro,s! loads all the source ro,s %ata flo, for all e2isting ro,s in the source .et od * Slo!ly - anging "imension 2S-"3: 5istory of data can "e maintained in three different types# Ty)e , &hanged data over,rites the e2isting data and history is not maintained Ty)e * .lag &hanged data is added as a ne, ro, and history is maintained using flag 'ersion changed data is added as ne, ro, and history is maintained using version num"ers Time variant &hanged data is added as ne, ro, and history is maintained using Time stamp Ty)e / &hanged data is added as ne, column and history is restricted to previous data and current data

S-" Ty)e , (sed to maintain current data ,ithout a historical log .ilters source ro,s "ased on user-defined comparisons and updates only those found to "e ne, data to the target %ata flo, for ne, and changed data .or each changed data in the source! this data flo, marks the ro, for update and over,rites the corresponding ro, in the target (pdates changed data in the target! over,riting e2isting data S-" Ty)e * - 4lag (sed to maintain full history of data in the dimension ta"le! ,ith the most current data flagged .ilters source ro,s "ased on user-defined comparisons and inserts changed data into the target The designer uses t,o instances for the same target definition to ena"le the t,o separate data flo,s to ,rite to the same target ta"le Benerate only one target ta"le in the target data"ase %ata flo, for changed data Increments the e2isting primary key "y E 0ets the current flag to E for changed data and inserts into the target %ata flo, to update e2isting ro,s (pdates the corresponding ro, in the target of the changed ro,s in the source 7esetting the current flag to ; to indicate the ro, is no longer current. S-" Ty)e * - Version (sed to maintain full history of data in the dimension ta"le .ilters source ro,s "ased on user-defined comparisons and inserts changed data into the target The current version of a data has the highest version num"er %ata flo, for changed data Increments the primary key and version num"er for changed ro,s Inserts changed data in the target S-" Ty)e * 5 Time Variant (sed to maintain full history of data in the dimension ta"le +n effective date range tracks the chronological history of changes for each dimension

.ilters source ro,s "ased on user-defined comparisons and inserts changed data into the target

The current data has a "egin date ,ith no corresponding end date The designer uses t,o instances of the same target definition to ena"le the t,o separate data flo,s to ,rite to the same target ta"le. Benerate only one target ta"le in the target data"ase %ata flo, for changed data Increments the e2isting primary key "y E Benerates "eginning of the effective date range for changed ro,s and insert into the target %ata flo, to update e2isting ro,s (pdates e2isting ro, of changed data in the target Benerate the end of the effective date range S-" Ty)e / (sed to maintain only current and previous versions of changed data in the dimension ta"le .ilters source ro,s "ased on user-defined comparisons and inserts only those found to "e ne, data into the target 7o,s containing changes to e2isting data are updated in the target )hile updating! the e2isting data is saved into a different column of the same ro, and replaces the e2isting data ,ith the ne, data %ata flo, to update e2isting ro,s )rites previous values for each changed ro, into previous columns 7eplaces previous values ,ith updated values (pdates changed date in the target .et od / 0lo,ly Bro,ing Target %imension# (sed to maintain current and history data .ilters source data "ased on user-defined comparisons and updates only those found to "e ne, data to the target %ata flo, for ne, and updated data .or each changed ro, in the source! this data flo, inserts a ne, ro, into the target

W at is OL(P+++ OL+P stands for On-Line +nalytical Processing Process the ,arehouse data to generate cu"e and provide multidimensional vie, to generate reports. The OL+P tools have the a"ility to do rapid analysis of multiple simultaneous factors! something that relational data"ases cant do. W at is "ata .ining+++ %ata 6ining is the running of automated routines that search through data organi ed in a ,arehouse. They look for patterns in order to point us to areas that ,e should "e addressing %ata 6ining deals ,ith five kind of data +ssociations 3things done together4 0equences 3events over time4 &lassifications Pattern recognition &lusters 3define ne, groups4 .orecasting predictions from time series W y is it re6uired+++ Organi ations information easily accessi"le Present information consistently +daptive and resilient to changes 0ecure "astion that protects information .oundation for improved decision =usiness community must accept for deemed successful Ty)es o# OL(P %ifferent types of OL+P are# 6OL+P 6ultidimensional OL+P This methodology is a traditional ,ay of OL+P analysis The data is stored in a multidimensional cu"e The storage is not in the relational data"ase! "ut in proprietary formats 7OL+P 7elational OL+P

This methodology relies on manipulating the data stored in relational data"ase Bives appearance of traditional OL+P9s slicing and dicing functionality Dach action of slicing and dicing is equivalent to adding a -)5D7D/ clause in the 08L statement

5OL+P 5y"rid OL+P This methodology attempts to com"ine the 6OL+P and 7OL+P technologies %OL+P %esktop1%ata"ase OL+P This methodology provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional data"ase servers $o! to im)lement+++ It can "e implemented using some of the (ser .riendly Tools# .or 6OL+P &ognos tool .or 7OL+P =usiness O"ject or 6icro 0trategy .or 5OL+P 7elational +ccess 6anager .or %OL+P =usiness O"ject (vailable OL(P tools in mar%et 0ome of the OL+P tools in market are# =usiness O"jects9s - =usiness o"jects =usiness O"jects9s - &rystal reports &ognos9s - &ognos 5ypersion9s - 5ypersion 6icro 0trategy9s 6icro 0trategy 6icrosoft9s 6icrosoft 7eporting 0ervices =7IO 7elational +ccess 6anager

You might also like