Professional Documents
Culture Documents
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
Chapter 29
Special thanks to: Liem Tran, Robert Turan, and Miguel Delgado
Outline
I. Introduction to Data Warehousing
I. Definition of Data Warehouse
II. Data Warehouse Development
III. Benefits and Advantages
IV. Database vs Data Warehouse
V. Datamart
II. Data Warehouse Structure
I. Star & Snowflake Schema
II. Fact Table & Dimension Table
III. OLTP
IV. OLAP
V. ROLAP
VI. MOLAP
VII. HOLAP
III. Real Life Implementation of Data Warehouse
I. Data Warehouse Providers
II. What Data Warehouse Do Big Companies Use?
III. Job Opportunities
What is a data warehouse?
Star vs Snowflake
Star Schema
• Most simple design and most
commonly used
• Measurable attributes are usually numeric values that provide data useful
for business and data analysts
• For example:
• the fact table for a retail store will continue key attributes and the fact
attributes can be Sales, Price, Time, and City.
Dimension Tables
• Each table describes a specific dimension
• Not too many rows but can grow large in some cases
• Requires more processing time and disk space to perform some tasks of
multidimensional databases are designed for
• Teradata Active Enterprise Data Warehouse is the platform that runs the Teradata Database, with added data
management tools and data mining software.
• The data warehouse differentiates between “hot and cold” data – meaning that the warehouse puts data that is not
often used in a slower storage section. As of October 2010, Teradata uses Xeon 5600 processors for the server
nodes.
• Use frequently – Hot data
• Teradata Database 13.10 was announced in 2010 as the company’s database software for storing and processing
data.
• Teradata Database 14 was sold as the upgrade to 13.10 in 2011 and runs multiple data warehouse workloads at the
same time. It includes column-store analyses.
• Teradata Integrated Analytics is a set of tools for data analysis that resides inside the data warehouse.
Teradata https://www.teradata.com/
Industries they serve https://www.teradata.com/Solutions
• database software and technology, cloud engineered systems and enterprise software products—particularly its own
brands of database management systems
• In 2015 Oracle was the second-largest software maker by revenue, after Microsoft
• Oracle 18c Database is the industry standard for high performance scalable, optimized data warehousing.
• The company’s specialized platform for the data warehousing side is the Oracle Exadata Machine.
• There are an estimated 390,000 Oracle DBMS customers worldwide, and about 4,000 Exadata data warehousing
appliances have been sold.
Database Data Warehousing Guide
Oracle Warehouse in the Cloud
• An American electronic commerce and cloud computing company, founded on July 5, 1994, by Jeff Bezos and based in
Seattle, Washington
• The whole shift in data storage and warehousing to the cloud over the last several years has been momentous and
Amazon has been a market leader in that whole paradigm.
• Amazon offers a whole ecosystem of data storage tools and resources that complement its cloud services platform.
• For example, there is Amazon Redshift, a fast, fully managed, petabyte-scale data warehouse cloud solution; AWS Data
Pipeline, a web service designed for transporting data between existing AWS data services; and Elastic MapReduce, which
provides an easily managed Hadoop solution on top of the AWS services platform.
• Big AWS customers: Airbnb, Adobe Systems, Johnson & Johnson, GE, BMW, Cannon, NASA,…
• AWS pulls in about $11 billion in revenue each year for Amazon.
• Three engineers from Google, Yahoo and Facebook (Christophe Bisciglia, Amr Awadallah and Jeff Hammerbacher,
respectively) joined with a former Oracle executive (Mike Olson) to form Cloudera in 2008
• Cloudera has emerged in recent years as a major enterprise provider of Hadoop-based data storage and processing
solutions.
• Cloudera offers an Enterprise Data Hub (EDH) for its variety of operational data store, or data warehouse.
• The EDH is Cloudera’s proprietary framework for the “information-driven enterprise” and focuses on “batch
processing, interactive SQL, enterprise search, and advanced analytics—together with the robust security, governance,
data protection, and management that enterprises require.”
• Cloudera’s data warehouse is based on CDH, which is Cloudera’s version of Apache Hadoop and the world’s largest
distribution at that.
• In 2013 MarkLogic released a new semantics platform which provides the capability of storing billions of RDF triples that
can queried with SPARQL (a semantic query language for the RDF platform).
• Customers: Aetna, NBC, BBC, Boeing, U.S. Navy, and U.S. Army.
What Data Warehouse Do Big Companies Use?
TeraData Customers:
• Apple: operating a multiple-petabyte Teradata system, was Teradata’s “fastest ever
customer to a petabyte.”
• Ebay: Its primary data warehouse is over 9.2 petabyes; its “singularity system” that
stores web clicks and other “big” data is more than 40 petabytes.
What Data Warehouse Do Big Companies Use?
TeraData Customers:
• Harrah’s (Caesar’s Entertainment Casino):understands how much money particular
gamblers can afford to lose in a day before they won’t come back the next day.
• Disney: new bracelet tickets equipped with GPS and NFC that track everything visitors do
while inside Disney’s amusement parks.
• Continental Airlines: to keep its customers happy, began assessing them by lifetime
value and began making alternative arrangements for them as soon as the airline
realized flights would be delayed
Teradata https://www.teradata.com/
Industries they serve https://www.teradata.com/Solutions
• Facebook: Hive
Data Warehouse Developer
• The Data Warehouse Developer is responsible for the successful delivery of business intelligence information to the
entire organization and is experienced in BI development and implementations, data architecture and data
warehousing.
A. Creating star schema data models, performing ETLs and validating results with business representatives
B. Supporting implemented BI solutions by: monitoring and tuning queries and data loads, addressing user
questions concerning data integrity, monitoring performance and communicating functional and technical issues.
Data Warehouse Developer
Data Warehouse Analyst
• A data warehouse analyst collects, analyzes, mines and helps the business leverage the information stored in data
warehouses.
• Professionals in this role research and recommend technology solutions related to data storage, reporting, importing
and other business concerns; they also define the user interfaces for managing the interaction between data.
• A data warehouse analyst is often expected to collaborate with business intelligence analysts and developers to
translate data requirements into logical data models.
Data Warehouse Analyst
ETL Developer
• Write scripts to extract data from the source systems
• Analyze data
• Familiar with SQL, Java, XML and several data warehouse architecture techniques such as EDW, ODS, DM,
ROLAP and MOLAP.
https://www.indeed.com/jobs?q=etl%20developer&l
Thank You