Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

CSN-373: Probability Theory for Computer Engineers

Lecture 8: Distributions of Data, Measure of Position

Dr. Sudip Roy (a.k.a., SR)


Department of Computer Science & Engineering
Outline of Module 1:

● Concept of probability
● Random variables
● Distribution functions: discrete and continuous
● Moments and moment generating functions

2
Chebyshev’s Theorem:

• Chebyshev’s Theorem: As stated previously, the variance and standard deviation of a


variable can be used to determine the spread, or dispersion, of a variable. That is, the
larger the variance or standard deviation, the more the data values are dispersed.

• Chebyshev’s theorem can be used to find the minimum percentage of data values that will
fall between any two given values. This theorem can be applied to any distribution
regardless of its shape.

3
Chebyshev’s Theorem:

• The Empirical (Normal) Rule


• Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a
distribution is bell-shaped (or what is called normal), the following statements, which
make up the empirical rule, are true.
• Approximately 68% of the data values will fall within 1 standard deviation of the mean.
• Approximately 95% of the data values will fall within 2 standard deviations of the mean.
• Approximately 99.7% of the data values will fall within 3 standard deviations of the
mean.

4
Measures of Position: z-score

5
Example:

6
Measures of Position: Percentiles

• Percentiles divide the data set into 100 equal groups.

7
Example:

8
Measures of Position: Quartiles and
Deciles
• Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3.
• Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile, or
the median; Q3 corresponds to the 75th percentile, as shown:

• Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2,
etc.

9
Measures of Position: Quartiles and
Deciles

10
Example:

• Find Q1, Q2, and Q3 for the data set


15, 13, 6, 5, 12, 50, 22, 18.

11
Interquartile Range (IQR):

• In addition to dividing the data set into four groups, quartiles can be used as a rough
measurement of variability. The interquartile range (IQR) is defined as the difference
between Q1 and Q3 and is the range of the middle 50% of the data.

• The interquartile range is used to identify outliers, and it is also used as a measure of
variability in exploratory data analysis.

12
Measures of Position: Outliers

• Outliers:
A data set should be checked for extremely high or extremely low values. These values are
called outliers.

• An outlier can strongly affect the mean and standard deviation of a variable. For example,
suppose a researcher mistakenly recorded an extremely high data value. This value would
then make the mean and standard deviation of the variable much larger than they really
were. Outliers can have an effect on other statistics as well.
• There are several ways to check a data set for outliers. One method is shown here:

13
Example:

14
Discrete Probability Distributions:

• A random variable is a variable whose values are determined by


chance.
• A variable was defined as a characteristic or attribute that can assume
different values. Various letters of the alphabet, such as X, Y, or Z, are
used to represent variables. Since the variables we will consider here are
associated with probability, they are called random variables.
• Discrete variables have a finite number of possible values or an infinite
number of values that can be counted.
• Variables that can assume all values in the interval between any two
given values are called continuous variables.
• A discrete probability distribution consists of the values a random
variable can assume and the corresponding probabilities of the values.
The probabilities are determined theoretically or by observation.

15
Example:

• The procedure shown here for constructing a probability distribution for a


discrete random variable uses the probability experiment of tossing three coins.
Recall that when three coins are tossed, the sample space is represented as TTT,
TTH, THT, HTT, HHT, HTH, THH, HHH; and if X is the random variable for
the number of heads, then X assumes the value 0, 1, 2, or 3.

16
Examples: Illustrations of Theoretical
Probability Distributions
• You did not need to actually perform the experiments to compute the probabilities. In
contrast, to construct actual probability distributions, you must observe the variable over
a period of time.

17
Example:

18
Next Class…

19

You might also like