Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Data Science

Revision
• What is Data Science?
• What is Big Data?
• What’s driving the Data Deluge?
• 6Vs of Big Data – Volume, Variety, Velocity, Veracity & Validity, Value
and Vulnerability
• Facets of Big Data – Structured, Semi-Structured, Quasi-structured and
Unstructured Data
• Emerging Big Data Ecosystem – Data Devices, Data Collectors, Data
Aggregators, Data Users/Buyers.
Data Science Lifecycle

Lifecycle Phases –
1. Discovery
2. Data Preparation
3. Model Planning
4. Model Building
5. Communicate Results
6. Operationalize
Phase 1 - Discovery
1. Learning the business domain
2. Resources
3. Framing the problem
4. Identifying the key stake holders
• Business User
• Project Sponsor
• Project Manager
• Business Intelligence Analyst
• DBA
• Data Engineer
• Data Scientist
5. Identifying the potential data sources
Phase 2 – Data Preparation
1. Preparing the Analytic sandbox
2. Data Cleansing
3. Combining Data
4. Data Transformation
5. Common tools –
• Hadoop
• OpenRefine
• Alpine Miner
• Data Wrangler
Data Cleansing – Common Errors

outlier

outlier

outlier
Data Cleansing – Techniques for handling missing values
Combining Data

Example – Joining 2 tables Example – Appending 2 tables


Example - Data Transformation
Phase 3 – Model Planning

1. Data Exploration
2. Model Selection
3. Common Tools for Model
Planning Phase
• R
• Matlab
• SAS
Example - Data Exploration

Bar Chart Line Chart


Example - Data Exploration

Distribution Chart Animated Visualization


Phase 4 – Model Building

1. Common Tools for Model


Building Phase
• SPSS Modeler
• Matlab
• Statistica and Mathematica
• Alpine Miner
• R and PL/R
• Python
• Octave
Phase 5 – Communicating Results
Choose an effective medium and
channel
• Standalone graphics or narrated?
• Static, interactive, animated, or
combined graphics?
• If narrated: recorded, live or both?
• If live: remote, in person, or both?

The famous Gapminder Video by Hans Rosling:


200 Countries, 200 Years, 4 Minutes
https://www.youtube.com/watch?feature=player_
embedded&v=jbkSRLYSojo
OR
200 years that changed the world – Gapminder
Phase 6 – Operationalize

1. Small scope pilot deployment


2. Full deployment

You might also like