Class 1 - Intro To Data Science

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Alejandra Endara

Introduction to
Data Science

2022
Google Digital Academy Facilitator /
Founder Loud Consulting

● Digital / Brand strategist, consultant and


entrepreneur.
● Alejandra is the only latin american and
ecuadorian woman part of the Google Digital
Academy team of facilitators, in charge of
leading digital transformation programs
across Europe, Middle East and Africa with
+100 workshops taught in 5 continents.
● Alejandra comes from the Fast Moving
Consumer Goods industry with +8 years of
experience working in brand marketing,
innovation and digital transformation across
LATAM, Europe and the US.
● Alejandra is also a professor and
practitioner at schools in Europe, the US
and LATAM such as ISDI, GBSB, General
Assembly and Colectivo 23.

BARCELONA | MADRID | MALTA | ONLINE


weareloud.group
PRIVAC
TEST AND Y
LEARN
OMNICHANNEL

CONTENT
VIDEO
CAMPAIGNS

DIGITAL
MARKETING
MATURITY

DATA
GROW WITH
ACTIVATION ACTIVATE APPS
INSIGHTS

AUTOMATION DESIGN
THINKING
RUTA 01
Parking lot Be present Don’t be Charlie

Be Ok if its not perfect

BARCELONA | MADRID | MALTA | ONLINE


Agenda

• What is Data Science?


• The importance and power of Data
• Data collecting and computing
• Scientific method

• Algorithms and Data Types

• Data Science Software

• Data Extraction Software


• Data Analysis Software
• Data Visualization Software

7
8
9
10
11
What is Data Science?

• New Age
• New Terminology Data
Engineering
• New Technology
• Interdisciplinary field Scientific
Visualization
Method

Data
Science
Advanced
Math
Computing

Statistics

12
Every Industry Needs Data Scientists

Need data scientists

Data is powerful

13
The Power of Data

The age of knowledge


Wisdom

The age of data science


Knowledge

Information

Data

14
The Importance of Data Science

Wisdom

Knowledge

Information

Data

15
Data Collecting and Computing

Two main reasons:


• Massive amount of data
• Computer processing

16
17
18
19
20
21
Scientific Method

Observation
/ Questions

Performing
Conclusions
hypotheses

Analyzing Experiments

22
Data Scientific Method

The Gaussian Data Scientific Method

23
The Gaussian Data Scientific Method

24
Algorithms

• Finite sequence of well-defined instructions


• To solve a problem
• To perform a computation

Input Brain | CPU


Algorithm Output

Set of rules
that precisely
defines a
sequence of
operations

25
26
27
1,280 × 720

28
1,280 × 720

29
1,280 × 720

30
Data Types

• “Data are individual facts, statistics, or items of information, often numeric.”


(Wikipedia)
• “A datum (singular of data) is a single value of a single variable”. (Wikipedia)
• Structured data
• Unstructured data
• Semi-structured data
• External data
• Metadata

31
32
33
34
35
36
37
38
39
40
41
42
43
Designing a Program

Three steps in designing a program


1. Understand the tasks that the program is to perform.
• Learning and analyzing each step.
2. Determine the steps that must be taken to perform the task.
• Create an algorithm, or step-by-step directions to solve the problem.
3. Implement a software to meet the requirements.

44
45
46
47
48
49
50
Data Science Software
Extract Knowledge, Analyze and Insights

• Data Extraction Software:

• SAP to manage their business operations and customer services.


• SQL stands for Structured Query Language.
• NoSQL stands for anything that is NOT SQL.

51
56
Data Science Software
Extract Knowledge, Analyze and Insights

• Data Extraction Software


• Data Analysis Software

• Python and R are both open-source programming languages with a large community.
• SPSS is a statistical software for data management, advanced analytics, multivariate analysis,
business intelligence.
• MATLAB is a proprietary multi-paradigm programming language and numeric computing
environment.
• Minitab is a software for statistics education programs.
• Stata is an integrated statistical software package that provides data manipulation, statistics, and
automated reporting.

57
58
59
Data Science Software
Extract Knowledge, Analyze and Insights

• Data Extraction Software


• Data Analysis Software
• Big Data Analysis Software
• Hadoop uses distributed processing to run analyses on large datasets.
• Spark also uses distributed processing but can stream and analyze data in real time much better
than Hadoop.
• Hive is a SQL-to-Hadoop engine.

60
Data Science Software
Extract Knowledge, Analyze and Insights

• Data Extraction Software


• Data Analysis Software
• Big Data Analysis Software
• Data Visualization and Reporting Software:
• Tableau was created with the field of data visualization in mind.
• PowerBI makes informed, confident business decisions by putting
data-driven insights into everyone’s hands.

61
Data Extraction Software

• SQL stands for Structured Query Language


• Classical approach to storing and operating data
• It is still efficient to answer structured data
• NoSQL (Not-only-SQL) stands for anything that is NOT SQL
• Dealing with unstructured data platforms or handle Big Data applications
• NewSQL is a relational database (SQL) with the scalable properties of NoSQL

62
Data Analysis Software

Software Cost Analysis Field of Expertise


Capability

Python & R Free Strong Any

Tableau Free Strong Any

SPSS $ Moderate Social Science | Medicine

MATLAB $$ Strong Life Sciences

Minitab $$$ Moderate Business

Stata $ Moderate Academia

63
Data Analysis Software

Software Cost Analysis Field of Expertise


Capability

Python & R Free Strong Any

Tableau Free Strong Any

SPSS $ Moderate Social Science | Medicine

MATLAB $$ Strong Life Sciences

Minitab $$$ Moderate Business

Stata $ Moderate Academia

64
Data Analysis Software

Software Cost Analysis Field of Expertise


Capability

Python & R Free Strong Any

Tableau Free Strong Any

SPSS $ Moderate Social Science | Medicine

MATLAB $$ Strong Life Sciences

Minitab $$$ Moderate Business

Stata $ Moderate Academia

65
66
Big Data Analysis Software

• Hadoop uses distributed processing to run analyses on large datasets.


• Spark also uses distributed processing but can stream and analyze data in real time much better than
Hadoop.

67
Big Data Analysis Software

• Hadoop uses distributed processing to run analyses on large datasets.


• Spark also uses distributed processing but can stream and analyze data in real time much better than
Hadoop.
• Hive is a SQL-to-Hadoop engine.

68
Big Data Analysis Software

• Hadoop uses distributed processing to run analyses on large datasets.


• Spark also uses distributed processing but can stream and analyze data in real time much better than
Hadoop.
• Hive is a SQL-to-Hadoop engine.

69
Data Visualization and Reporting Software

• Tableau was created with the field of data


visualization in mind.
• PowerBI makes informed, confident business
decisions by putting data-driven insights into
everyone’s hands.
• Other software: D3 – Sisense – Datawrapper – …

70
Data Scientist Jobs

• Data Scientist • Data analysis and machine learning


• Data Analyst • Databases
• Market Analyst • Visualize data
• Business Analyst
• Statistician
• Researcher
• …

71
72
Summary

• What is Data Science?


• The importance and power of Data
• Data collecting and computing
• Scientific method
• Algorithms and Data Types
• Data Science Software
• Data Extraction Software
• Data Analysis Software
• Data Visualization Software

Madrid Campus: C/Numancia 6, 28039 Madrid info@global-business-school.org


Barcelona Campus: Carrer d'Aragó, 179 08011 Barcelona +34 930 086 588
Class 2
The Data Type and the Analytic tools
• Data Science Process
• Data Types
• Data Analysis Software
• Intro to Data Wrangling
• Displaying Quantitative Data
• Overview about Tableau

Madrid Campus: C/Numancia 6, 28039 Madrid info@global-business-school.org


Barcelona Campus: Carrer d'Aragó, 179 08011 Barcelona +34 930 086 588
Thank You

BARCELONA | MADRID | MALTA | ONLINE

You might also like