Intro Slides

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Data Preprocessing and

Feature Engineering
Techniques
@ AIMS Cameroon
25 Sept- 14 Oct, 2023

Rockefeller,
Stellenbosch University, South Africa
Who am I ?

• Data Scientist Consultant and


Trainer

• My name is Rockefeller.
• PhD Candidate in A.I.,
Stellenbosch University, South
• You can call me Tonton Rock if
Africa.
you like

• I was born in Douala, Cameroon. • Research focuses on Deep


Learning methods applied to
Dynamical Systems.

rockefeller@aims.ac.za
FACTS

• Fitting models with raw data is


(often) the guarantee of building
biased models.

• Data literacy on the


African continent is still quite low.
Data Science Project Life Cycle Simple! Right?

Problem Statement Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Reading, Data
visualization, Data cleaning,
Data normalization
on

i.i.d Data
Time Series
Image Data

Text Data
Part 0 : The Data Science Ecosystem

1. The Data Science Ecosystem

2. Getting started with Jupyter and Colab

3. Introduction to Python for Data Science


Part 1 : Dealing with i.i.d. data
1. Working with Series and DataFrames

2. Data Reading Methods

3. Introducing Features and Observations

4. Handling Text Data

5. Grouping the Data

6. Basic Data Explorations

7. Data Organization Methods

8. Customizing Functions
Part 2 : Dealing with Time Series

1. Working with Time Data

2. Basic Data Manipulation on Time Series

3. Advanced Manipulation on Time Series

4. Framing Time Series for Machine Learning


Part 3 : Dealing with image data

1. Introduction to Image Data

2. Image Pre-processing operations

3. Advanced Image Pre-processing operations

4. Feature Extraction from Image

5. Preparing Image Data for Model Training


Part 4 : Dealing with Text Data

1. Introduction to Text Data

2. Text Mining Operations

3. Feature Extraction from Text Data

4. Word Embeddings
Some tips!!!

1. It is a practical data analysis course, not a


programming course!!!

2. Focus on building your data literacy, not on


copy pasting codes.

3. Do not code while I am teaching, you will have


plenty of time for that.
Tips for success

1. Ask Questions
Tips for success

1. Ask Questions

2. Ask Questions again


Tips for success

1. Ask Questions

2. Ask Questions again

3. Ask Questions again and again


Course Outline

Tuesday Wednesday Thursday Friday Saturday


Lectures Lectures • Quiz 1 Lectures
• Lectures Lectures
• Assignment 1
(release)

Lectures • Lectures Lectures • Quiz 3


• Quiz 2 • Lectures • Lectures
• Assignment 2
(release)

• Lectures Lectures • Lectures Lectures Group


• Group Assignment • Quiz 4 Presentations
(release)

You might also like