Professional Documents
Culture Documents
Collect Process Analyze
Collect Process Analyze
@Lynn Langit
AWS Marketplace
Enterprise software store for business users who need simplified procurement
world
•Pay-as-you-go pricing
•to use on demand Data Enablement
Building a Data Warehouse on AWS
AWS Marketplace
Partners
Matillion Yellowfin
Setup
Our Scenario and Source Files
“In this scenario we will use Matillion ETL File Types
for Redshift to prepare two separate data -- Text - .csv
sources ready for analysis.
The sample data is US airport flight -- Compressed - .gz
information from 1995 -> 2008. Every flight File Categories
to or from a US airport (and whether it left
on time or not) is included. Details / Events
-- Flights
The second data set is weather data, taken
from NOAA, including the daily weather -- Weather
readings for each US Airport.” Metadata
-- Airports
-- Carriers
Loading data from S3 in to Redshift
Using Matillion ETL for Redshift
• Create Instance (AMI/EC2) of Matillion/AWS Marketplace
• Connect Matillion to Redshift
Loading
Data in
Redshift
Table distribution styles
Distribution Key All Even
All data on Round robin
Same key to same location
every node distribution
ke
y
y1
ke
4
2
ke
ke y
y3
Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice Slice
Slice
11 22 33 44 11 22 33 44 11 22 33 44
Region
Sorted
Sorted
Sorted
Sort Unsorted Merge
Region
Region
Unsorted
Sorted
Append in Sort Key Order
Visualizing
with Yellowfin
Automate – https://github.com/lynnlangit/AWSDataWarehouse