Professional Documents
Culture Documents
Introduction To Data Science and Probability
Introduction To Data Science and Probability
Data Science is the science of extracting hidden patterns from large data sets
Hidden patterns can appear in form of trends, cycles, associations, rules, groups etc.
in the data
Data sets usually refer to large volume of cleansed, structured data prepared for
the analysis
Science refers to the statistical tools and techniques employed to understand the data
and reliability of the identified patterns.
o That part of statistics which is used to understand the data is called descriptive
statistics. Descriptive statistics give vital insights into the data in terms of
central values, spread and distribution shape of the data
o The part of statistics which is used to establish the reliability of the potential
patterns identified, is called inferential statistics
This can easily be represented pictorially by following venn diagram
pawangs@gmail.com
ZN4L9ICF3G
What is Data?
What is Information?
pawangs@gmail.com
ZN4L9ICF3G
Information is processed data. Information is basically the data plus the meaning of what the
data was collected for minus the noise that got collected unintentionally.
Example:
● Sales report by region and venue - tells us which venue is most profitable.
● Survey Reports and Results: Survey data is summarized into reports/information
to present to management of the company
Key Differences:
We live in a world that’s drowning in data. Websites track every user’s every click. Your
smartphone is building up a record of your location and speed every second of every day.
Probability
Probability is simply how likely something is to happen. Whenever we’re unsure about the
outcome of an event, we can talk about the probabilities of certain outcomes—how likely they
are. The analysis of events governed by probability is called statistics.
Let’s call probability of coin landing on Head is P(H). You might intuitively know that
the likelihood is half/half, or 50%. But how do we work that out?
pawangs@gmail.com
ZN4L9ICF3G
PROBABILITY OF AN EVENT = (# OF WAYS IT CAN HAPPEN) / (TOTAL NUMBER OF OUTCOMES)
So, in this case, # ways coin can land on Head = 1 and total number of possible outcomes =
2 (Head or Tails) and so P(H) = ½ = 50%
The probability of an event can only be between 0 and 1 and can also be written as
a percentage.
The probability of event A is often written as P(A).
If P(A)>P(B), then event A has a higher chance of occurring than event B.
If P(A) = P(B), then events A and B are equally likely to occur.
Conditional Probability
Conditional Probability is a measure of the probability of an event given that (by assumption,
presumption, assertion or evidence) another event has already occurred. If the event of interest is
A and the event B is known or assumed to have occurred, “the conditional probability of A given
B”, is usually written as P(A|B).
Independence
Two events are said to be independent of each other, if the probability that one event occurs
in no way affects the probability of the other event occurring, or in other words if we have
Example
Let’s say you rolled a die and flipped a coin. The probability of getting any number face on
the dice is no way influences the probability of getting a head or a tail on the coin.
pawangs@gmail.com
ZN4L9ICF3G