Data Science - Introduction - 1

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Introduction To Data Science

Introduction

“It is a capital mistake to theorize before one has data. Insensibly, one begins
to twist the facts to suit theories, instead of theories to suit
facts.”

— Sherlock Holmes
Introduction
-What is Data science?

- It is a multidisciplinary blend of data inference, algorithm development,


and technology in order to solve analytically complex problems.

- A field of study and practice that involves the collection, storage, and
processing of data in order to derive important insights into a problem or a

phenomenon
- Data science includes: defining, storing, cleaning, retrieving, and analyzing data

 Deriving meaningful insights toward making decisions and solving problems.


Introduction

- Why data science is so important now?

- We have a lot of data, we continue to generate a staggering amount


of data at an unprecedented and ever-increasing speed, analyzing
data wisely necessitates the involvement of competent and
well-trained practitioners, and analyzing such data can provide
actionable insights
Introduction
- 3V model
1- Velocity: The speed at which data is accumulated.

2. Volume: The size and scope of the data.

3. Variety: The massive array of data and types (structured and unstructured).

- Data is everywhere!
Humans and machines are constantly creating new data!
Introduction

The increase in size of data is a 50-fold more in volume than what was available at the beginning of 2010
Introduction
- If your computer has 1 terabytes (TB) hard drive, 40 ZB is 40 billion times that.

- A data scientist should have at least three basic skills:

1-A strong knowledge of basic statistics and machine learning or at least enough
to avoid misinterpreting correlation for causation or extrapolating too much from a
small sample size.

2. The computer science skills to take an unruly dataset and use a programming
language (like R or Python) to make it easy to analyze.

3. The ability to visualize and express their data and analysis in a way that is
meaningful to somebody less conversant in data.
Introduction
- Skills that employers expect from data scientists:

1- Willing to Experiment: Intellectual curiosity and the ability to experiment. employers are seeking
applicants who can ask questions to define intelligent hypotheses and to explore the data utilizing
basic statistical methods and models.

2- Proficiency in Mathematical Reasoning: Mathematical and statistical knowledge is a critical skill for
a potential applicant seeking a job in data science.

3- Data literacy: Is the ability to extract meaningful information from a dataset. A skilled data scientist
plays an intrinsic role for businesses through an ability to assess a dataset for relevance and suitability
for the purpose of interpretation, to perform analysis, and create meaningful visualizations to tell
valuable data stories
Introduction
- What is present in the dataset?
- What are the benefits of this dataset?

- What are your questions?

 Sorted

 height ranges from 58 to 72

 weight ranges from 115 to 164

 Average height: 65
 Average weight: 136
Introduction
- What else?

 An increase in height correlates with


the value of weight
Introduction
- On average, how much increase can we expect in
weight with an increase of one inch in height
1. 72 – 58 = 14 inches
2. 164 – 115 = 49 pounds
3. 49 / 14 = 3.5

- Dig deeper!

- The weight change with respect to the height change is not


that uniform

1. 65 – 58 = 7 inches
2. 136 – 115 = 21 pounds
3. 21 / 7 = 3

- For values of height greater than 65 inches, weight


increases more rapidly (by 4 pounds mostly until 70 inches,
and 5 pounds for more than 70 inches).
Introduction

Obs. H W DH DW
weight change with respect to the height
1 58 115 1 2 change
2 59 117 1 3
3 60 120 1 3 5 5

4 61 123 1 3 4 4 4 4

5 62 126 1 3 3 3 3 3 3 3 3

6 63 129 1 3 2

7 64 132 1 3
8 65 135 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1

9 66 139 1 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14
10 67 142 1 4
Series1 Series2
11 68 146 1 4
12 69 150 1 4
13 70 154 1 5
14 71 159 1 5
15 72 164

- What the weight of a woman


of 73 inches?
Introduction
- Tools for data science:
- Python
-R
- SQL
-
-

-Issues of Ethics, Bias, and Privacy in Data Science :


- Many of the issues related to privacy, bias, and ethics can be traced back
to the origin of the data.
- Ask: how, where, and why was the data collected? Who collected it?
What did they intend to use it for?
Introduction
- More important, if the data was collected from people, did these people know that:

- such data was being collected about them


- how the data would be used

- As the old saying goes, “there is no free lunch.”


- when you are getting an email service or a social media account for “free,” ask why?

- as it is often understood, “if you are not paying for it, you are the product.”
Sure enough, for Facebook, each user is worth $158. Equivalent values for other
major companies are: $182/user for Google and $733/user for Amazon.
Introduction
- What we are often not aware of is how even ethically collected data could be
highly biased. And if a data scientist is not careful, such inherent bias in the data
could show up in the analysis and the insights developed, often without anyone
actively noticing it.

- There is a community, called Fairness, Accountability, and Transparency (FAT),


that has emerged in recent years that is trying to address some of these issues, or
at least is shedding a light on them. This community, thankfully, has scholars from
fields of data science, machine learning, artificial intelligence, education,
information science, and several branches of social sciences.

You might also like