Professional Documents
Culture Documents
Project Guidelines - Analytics Engineering
Project Guidelines - Analytics Engineering
Project Guidelines - Analytics Engineering
Main challenges
● The databases are very unstructured and undocumented so exploration will be required to
understand their contents.
Main Tasks
● Choose a Dataset and Design a DataBase Schema in third normal form (3NF) a.k.a.
Snowflake Schema (Facts + Dimensions) containing as much data from the original dataset as
possible.
● 1NF :
○ Each table cell should contain a single value (No lists)
○ No duplicate rows
● 2NF :
○ No dependency between the Primary Key and any other field.
● 3NF :
○ No dependency between any fields of a given table
Build ETL Pipelines for Data Warehousing
Main challenges
● ETL processes will need to be completely automated from the folder and file structures
since there are hundreds of files.
● Specific libraries might be needed for loading uncommon data types.
Main Tasks
● Build ETL pipelines to populate the Database:
○ Extract : Load the data from the original sources (YOU SHOULD NOT DOWNLOAD THE
DATASET MANUALLY but programmatically download from link and unzip it)
○ Transform : Merging, Encoding, Granularity
○ Load : Normalize and Load the data in the target system (SQL Database)
● Run the pipeline to integrate as much data from the original dataset as possible
Create your own Analytics Application
Main challenges
● The Dataset will need to be denormalized to increase the efficiency of SQL queries (Star
Schema 2NF)
Main Tasks
● Build a BI Dashboard (PowerBI / Streamlit) or ML model (Streamlit Optional)
● The dataset used by the app needs to be a View from the normalized dataset (A single SQL query should be required to build
it)
● Examples for Chess:
○ Openings Explorer (BI)
○ Player Repertoires and KPIs (BI)
● Examples for Steam Games:
○ Predict Sales for each game (ML)
○ Games/Publisher/Genre KPIs (BI)
Deadlines & Deliverables