Professional Documents
Culture Documents
Data Science - Introduction - 1
Data Science - Introduction - 1
Data Science - Introduction - 1
Introduction
“It is a capital mistake to theorize before one has data. Insensibly, one begins
to twist the facts to suit theories, instead of theories to suit
facts.”
— Sherlock Holmes
Introduction
-What is Data science?
- A field of study and practice that involves the collection, storage, and
processing of data in order to derive important insights into a problem or a
phenomenon
- Data science includes: defining, storing, cleaning, retrieving, and analyzing data
3. Variety: The massive array of data and types (structured and unstructured).
- Data is everywhere!
Humans and machines are constantly creating new data!
Introduction
The increase in size of data is a 50-fold more in volume than what was available at the beginning of 2010
Introduction
- If your computer has 1 terabytes (TB) hard drive, 40 ZB is 40 billion times that.
1-A strong knowledge of basic statistics and machine learning or at least enough
to avoid misinterpreting correlation for causation or extrapolating too much from a
small sample size.
2. The computer science skills to take an unruly dataset and use a programming
language (like R or Python) to make it easy to analyze.
3. The ability to visualize and express their data and analysis in a way that is
meaningful to somebody less conversant in data.
Introduction
- Skills that employers expect from data scientists:
1- Willing to Experiment: Intellectual curiosity and the ability to experiment. employers are seeking
applicants who can ask questions to define intelligent hypotheses and to explore the data utilizing
basic statistical methods and models.
2- Proficiency in Mathematical Reasoning: Mathematical and statistical knowledge is a critical skill for
a potential applicant seeking a job in data science.
3- Data literacy: Is the ability to extract meaningful information from a dataset. A skilled data scientist
plays an intrinsic role for businesses through an ability to assess a dataset for relevance and suitability
for the purpose of interpretation, to perform analysis, and create meaningful visualizations to tell
valuable data stories
Introduction
- What is present in the dataset?
- What are the benefits of this dataset?
Sorted
Average height: 65
Average weight: 136
Introduction
- What else?
- Dig deeper!
1. 65 – 58 = 7 inches
2. 136 – 115 = 21 pounds
3. 21 / 7 = 3
Obs. H W DH DW
weight change with respect to the height
1 58 115 1 2 change
2 59 117 1 3
3 60 120 1 3 5 5
4 61 123 1 3 4 4 4 4
5 62 126 1 3 3 3 3 3 3 3 3
6 63 129 1 3 2
7 64 132 1 3
8 65 135 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1
9 66 139 1 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14
10 67 142 1 4
Series1 Series2
11 68 146 1 4
12 69 150 1 4
13 70 154 1 5
14 71 159 1 5
15 72 164
- as it is often understood, “if you are not paying for it, you are the product.”
Sure enough, for Facebook, each user is worth $158. Equivalent values for other
major companies are: $182/user for Google and $733/user for Amazon.
Introduction
- What we are often not aware of is how even ethically collected data could be
highly biased. And if a data scientist is not careful, such inherent bias in the data
could show up in the analysis and the insights developed, often without anyone
actively noticing it.