Lecture (1) - Statistics

Applied Statistics
By
Dr. Hany Gomaa Ahmed
Associate professor
Irrigation and Hydraulics Department
Faculty of Engineering, Cairo University
Academic Year 2023-2024
Course Outline
Chapter 1: Introduction
Chapter 2: Organizing and Graphing Data
Chapter 3: Basic Probability Concepts
Chapter 4: Random Variables, Probability Distributions
Chapter 5: Common Discrete Probability Distributions
Chapter 6: Common Continuous Probability Distributions
Chapter 7: Sampling Distributions
Chapter 8: Confidence Intervals
Chapter 9: Fundamentals of Hypothesis: Part I
Chapter 10: Fundamentals of Hypothesis: Part II
Chapter 11: Linear Regression Analysis
Chapter 12: Linear Regression and Correlation Analysis
Course Outline
Chapter 1: Introduction
Chapter 2: Organizing and Graphing Data
Chapter 3: Basic Probability Concepts
Chapter 4: Random Variables, Probability Distributions
Chapter 5: Common Discrete Probability Distributions
Chapter 6: Common Continuous Probability Distributions
Chapter 7: Sampling Distributions
Chapter 8: Confidence Intervals
Chapter 9: Fundamentals of Hypothesis: Part I
Chapter 10: Fundamentals of Hypothesis: Part II
Chapter 11: Linear Regression Analysis
Chapter 12: Linear Regression and Correlation Analysis
1- Introduction
Why an Engineer Needs to

Know about Statistics
• To know how to properly present information
• To Know how to properly interpret information
• To know how to draw conclusions about
populations based on sample information
• To know how to optimize the use of limited
resources (sampling)
• To know how to obtain reliable forecasts
1- Introduction
Key Definitions
• A population (universe) is the collection of things under
consideration (e.g. Grades of 100 students)
• A sample is a portion of the population selected for
analysis (e.g. grades of 10 students out of the 100)
• A parameter is a summary measure computed to describe
a characteristic of the population (e.g. average grade of
all 100 students, constant)
• A statistic is a summary measure computed to describe a
characteristic of the sample (e.g. mean of grades of a
sample of 10 students, variable)
1- Introduction
Population and Sample
Population
Sample
Use statistics to
summarize features
Descriptive statistics
Use parameters to
summarize features
Inferential statistics
Inference on the population from the sample
1- Introduction
Statistical Methods
• Descriptive statistics
– Collecting and describing data
• Inferential statistics
– Drawing conclusions and/or making decisions
concerning a population based only on sample
data
1- Introduction
Descriptive Statistics
• Collect data
– e.g., rain depth, temperature, river flow,
compressive strength, … etc.
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = X i
n
1- Introduction
Inferential Statistics
• Estimation
– e.g.: Estimate the population mean
weight using the sample mean
weight
• Hypothesis testing
– e.g.: Test the claim that the
population mean weight is 120
pounds
Drawing conclusions and/or making decisions concerning
a population based on sample results
1- Introduction
2. Sampling Concepts
1- Introduction
Definitions
Population: is the total set of elements of
interest for a given problem
1) Finite population: described by actual

distribution of its values
2) Infinite Population: described by
corresponding probability distribution or
probability density
1- Introduction
Sample
• A subset of the population’s elements that

gives sense about the population or inference
can be drawn from it about population
OR
• A group of units selected from a larger group
(the population). By studying the sample it is
hoped to draw valid conclusions about the
larger group
1- Introduction
Population and Sample
Sample 1 Population Sample 2
Random Sample: is a sample where all

population elements have equal probability Sample 3
(chance) to be included in the sample
Thus, samples 1, 2, and 3 have the same chance

to be extracted
1- Introduction
Reasons for Sampling

• More economic
• Time saving
• Inaccessible population
• Infinite population
1- Introduction
Applied Statistics
3. Presentation and Analysis of Data
1- Introduction
Presentation of Data
• Topics
– Organizing numerical data
• The ordered array
– Tabulating and graphing numerical data
• Grouping of Data
• Frequency distributions: tables, histograms, polygons
• Cumulative distributions: tables, diagrams
– Graphing bivariate numerical data
• Scatter plots
– Numerical Descriptive Measure
1- Introduction
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Ordered Array
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
1- Introduction
Organizing Numerical Data

(continued)
• Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
1- Introduction
Tabulating and Graphing Numerical

Data: Grouping of Data
• When data points are very large, it may be

advantageous to group or classify the data
• Grouping condenses the data and makes it easier
to extract information (some information will be
lost though)
1- Introduction
Tabulating and Graphing Numerical Data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Histograms
7
Tables
4
1
Polygons
0
10 20 30 40 50 60
1- Introduction
Describing Numerical Data with Tables
• Frequency Tables
Simple
Multiple
• Relative Frequency Tables
Fraction
Percentage
• Cumulative Frequency Tables
More than
Less than
1- Introduction
Steps to Create Frequency Tables

• Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 15)
– The smaller the number of classes, the greater the loss
of information
• Compute class interval (width): 10 (46/5 then round up)
• Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
• Compute class midpoints: 15, 25, 35, 45, 55
• Count observations & assign to classes

1- Introduction
Example
The following are the grades of midterm exam for
a certain section of 50 students, arrange the
following data using tables
18 19 9 3 12 13 8 17 19 15 7 16 13
13 4 14 18 17 12 11 16 15 17 12 11
12 12 15 16 14 5 17 15 18 19 13 11
9 13 17 12 13 9 18 19 11 6 15 12 9
1- Introduction
3 4 5 6 7 8 9 9 9 9 11 11 11 11
12 12 12 12 12 12 12 13 13 13 13 13 13 14
14 15 15 15 15 15 16 16 16 17 17 17 17 17
18 18 18 18 19 19 19 19
1) Range= Max. – Min. = 19 – 3 =16

2) Select number of groups (5 15). Let it 5 groups
3) Class interval= (Range/ class No.)= 16/5=3.2 4
1- Introduction
Simple Frequency Table

Class Bars frequency
(f)
More than 0–4 | 1
or equal 0
but less 4–8 |||| 4
than 4
8 – 12 |||| |||| 9
12 – 16 |||| |||| |||| |||| 20
16 - 20 |||| |||| |||| | 16
Σ 50 50
1- Introduction
Relative Frequency Table

Class Mid point Frequency as a %
fraction frequency
0–4 2 1 0.02 2
4–8 6 4 0.08 8
8 – 12 10 9 0.18 18
12 – 16 14 20 0.40 40
16 - 20 18 16 0.32 32
Σ 50 1.0 100
1- Introduction
Cumulative Frequency Table (More than) (more
than the lower limit)
Class lower Cumulative Cumulative
Class Frequency
limit Frequency % Frequency
0-4 1 (f)
4-8 4 >0 50 100
8 -12 9 >4 49 98
12 - 16 20 >8 45 90
16 - 20 16 >12 36 72
20 -24 0 >16 16 32
Σ 50 >20 0 0
1- Introduction
Cumulative Frequency Table (Less than) (Less
than the upper limit)
Class Frequency Class upper Cumulative Cumulative
0-4 1
limit Frequency %
(f) Frequency
4-8 4
<4 1 2
8 -12 9
<8 5 10
12 - 16 20 <12 14 28
16 - 20 16
<16 34 68
Σ 50 <20 50 100
2- Organizing and Graphing Data
Describing Numerical Data With Graphs
• Histogram
• Frequency Polygon
• Frequency Curve
• Less & More than Ogive (Cumulative Frequency

Polygon)
Frequency Table
Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Class Frequency Relative Percentage

j fj Frequency rfj
10 but under 20 3 0.15 15
20 but under 30 6 0.30 30
30 but under 40 5 0 .25 25
40 but under 50 4 0.20 20
50 but under 60 2 0.10 10
Total 20 1 100
Graphing Numerical Data:
The Histogram
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram
7 6
6 5
Frequency
5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 36 45 55 More
Class Boundaries
Class Midpoints

The Frequency Polygon
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Polygon
7
6
5
4
3
2
1
0
5 15 25 36 45 55 More
Class Midpoints
The Frequency Curve
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Curve
7
6
5
4
3
2
1
0
5 15 25 36 45 55 More
Class Midpoints
Cumulative Frequency Curve

Create Cumulative Frequency Table first
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Cumulative Cumulative Cumulative

Class Frequency Relative Frequency % Frequency
< 20 3 0.15 15
< 30 9 0.45 45
< 40 14 0.70 70
< 50 18 0.90 90
< 60 20 1.00 100
Can we do More than (>)?
Tabulating and Graphing Numerical
Data: Example
• Consider the data for the mean flow of a river for the
month of May during the period from 1922 to 1971 (see
Table below)
discharge discharge discharge discharge discharge
year year year year year
(m3/s) (m3/s) (m3/s) (m3/s) (m3/s)
1922 3532 1932 2338 1942 1608 1952 1949 1962 2568
1923 2071 1933 1873 1943 1456 1953 1396 1963 1944
1924 4188 1934 1243 1944 1570 1954 1344 1964 2062
1925 2080 1935 2849 1945 2301 1955 1886 1965 3919
1926 2036 1936 2359 1946 1460 1956 1786 1966 2944
1927 2685 1937 3070 1947 1584 1957 1455 1967 2175
1928 1832 1938 1222 1948 1410 1958 3025 1968 2877
1929 1500 1939 2841 1949 1490 1959 1828 1969 3208
1930 2856 1940 2110 1950 1959 1960 1401 1970 4750
1931 3043 1941 2058 1951 1981 1961 2427 1971 1475

Tabulating Numerical Data: Example
(Continued)
• Sort raw data in ascending order:
1222, 1243, …, 4750
• Number of observations n = 50
• Minimum discharge is 1222 m3/s
• Maximum discharge is 4750 m3/s
• Find range: 4750 - 1222 = 3528
• Select number of classes: 6 (usually between 5 and 15)
• Compute class interval (width): 600 (3528/6 then round up)
• Determine class boundaries (limits): 1200, 1800, 2400,3000, 3600, 4200,
4800
• Compute class midpoints: 1500, 2100, 2700, 3300, 3900, 4500
• Count observations & assign to classes
Example (Continued)
Relative
Class No. Class Interval Frequency
3 Description Frequency
j I j (m /s) fj
rf j
1 (1200, 1800) 1200 but under 1800 16 0.32
2 (1800, 2400) 1800 but under 2400 18 0.36
3 (2400, 3000) 2400 but under 3000 8 0.16
4 (3000, 3600) 3000 but under 3600 5 0.1
5 (3600, 4200) 3600 but under 4200 2 0.04
6 (4200, 4800) 4200 but under 4800 1 0.02
Total 50 1.00
Example (Continued)
Frequency
Histogram
Relative
Frequency
Histogram
Example (Continued)
Area under polygon

= Area under histogram
Example (Continued)
Boundary Cumulative
Cumulative
3 Description Relative
Value (m /s) Frequency
Frequency
This is called an Ogive 1,200 Less than 1,200 0 0
1,800 Less than 1,800 16 0.32
Cumulative frequency 2,400 Less than 2,400 34 0.68
3,000 Less than 3,000 42 0.84
polygon & cumulative 3,600 Less than 3,600 47 0.94
frequency curve (smooth 4,200
4,800
Less than 4,200
Less than 4,800
49
50
0.98
1.00
Ogive)
Less than cumulative

frequency polygon
How does the “more

than” Ogive look like?
More-Than Curve (Ogive)
More than curve
100
90
80
70
60
%F
50
40
30
20
10
0
2 6 10 14 18 22
Classes
Graphing Bivariate Numerical Data
Scatter Plot of bi-variate numerical data

Describing Numerical Data with Numbers –
Numerical Descriptive Measures
• Measures of central tendency
– Mean, median, mode
• Measure of variation (or Dispersion)
– Range, variance and standard deviation,
coefficient of variation
• Measure of Shape
– Skewness Coefficient
• Measure of accordance
– Coefficient of Correlation
Measures of Central Tendency
Central Tendency
Average or
Arithmetic Mean
Population mean Sample mean

N
1 1 n

N
X
i 1
i X   Xi
n i 1
Parameter Statistic
Mean (Arithmetic Mean)
• Mean (arithmetic mean)

– Sample mean
n Sample Size
X i
X1  X 2   Xn
X i 1

n n
– Population mean Population Size
N
X i
X1  X 2   XN
 i 1

N N
Median
• The variate value that divides the data into two equal halves
1 3 5 7 9 Median = 5
Median = (5+7)/2= 6
1 3 5 7 9 24
• In an ordered array, the median is the “middle” number

– If n or N is odd, the median is the middle number @ (n+1)/2.
– If n or N is even, the median is the average of the two middle numbers at
(n/2) and ((n/2)+1).
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values 0 1 2 3 4 5 6
• There may be no mode No Mode
• There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 one Mode =3
Three Modes =5, 9, 12
Measures of Variation
Variation
Variance Standard Deviation Coefficient

of Variation
Range Population Population
Variance Standard
Relative
Sample Deviation
Range
Variance Sample
Standard
Deviation
Range
• Measure of variation
• Difference between the largest and the smallest
observations:
Range  X Largest  X Smallest

• Ignores the way in which data are distributed
Range = 12 - 7 = 5 Range = 12 - 7 = 5
7 8 9 10 11 12 7 8 9 10 11 12
Relative Range
Range X Largest  X Smallest

Relative Range  
Mean Mean
Variance
• Important measure of variation

• Shows variation about the mean
– Sample variance: n
 X i  X
2
S2  i 1
n 1
– Population variance: N
 X 
2
i
2  i 1
Standard Deviation
• Most important measure of variation
• Shows variation about the mean
• Has the same units as the original data
– Sample standard deviation: n
 X X
2
i
S i 1
n 1
N
– Population standard deviation:  X 
2
i
 i 1
N
Coefficient of Variation
S 
CV   100%
X 
• Measures relative variation

• Always in percentage (%)
• Shows variation relative to mean
• Is used to compare two or more sets of data measured
in different units
• Not suitable if mean is close to zero
Shape of a Distribution
• Skewness Coefficient
– Describes how data is distributed
– Measure of shape
 X X
n
3
n i
For population or large sample CS  i 1

3/ 2
 2
  X i  X  
n
 i 1 
• Corrected form of CS 2  X
n
i  X
3
n n
For small sample CS  i 1
(n  1)(n  2)  n 2
3/ 2
  X i  X  
 i 1 
Shape of a Distribution
• Symmetric or skewed
CS < 0 CS = 0 CS > 0
Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean

Descriptive Measure using Grouped Data
(Frequency Distribution)
• Sample Mean
Class Mid f
k
f
point
j X Classj
1 k 0-4 2 1
X
j 1
k
  f j X Classj
f
n j 1 4-8 6 4
j
j 1
8 -12 10 9
• Sample Variance
1 k 12 - 16 14 20
S 
2

n  1 j 1
f j ( X Classj  X ) 2
16 - 20 18 16
X Classj is the mid-point for class j
fj is the frequency for class j

• Sample Mode
 
 Δ1 
M ode  L1     C
 Δ1  Δ 2 
C  L2  L1
L1 L2
Lecture 1- Page 57

• Sample Median Median is in this class
fmedian
M edian  L1 
N /2  f i
C
f median
f i   frequency until L1
L1 L2
C  L2  L1
Coefficient of Correlation
• Measures the strength of the linear relationship
between two quantitative variables
 X i  X Yi  Y 
r i 1
n n
 X X  Y  Y 
2 2
i i
i 1 i 1

Features of
Correlation Coefficient
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker any linear relationship
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
r = .6 r=1

Lecture (1) - Statistics

Uploaded by

Copyright:

Available Formats

You might also like

Lecture (1) - Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture (1) - Statistics

Uploaded by

Copyright:

Available Formats

Applied Statistics

Dr. Hany Gomaa Ahmed

Academic Year 2023-2024

Why an Engineer Needs to

Population and Sample

1) Finite population: described by actual

• A subset of the population’s elements that

Population and Sample

Sample 1 Population Sample 2

Random Sample: is a sample where all

Thus, samples 1, 2, and 3 have the same chance

Reasons for Sampling

3. Presentation and Analysis of Data

Organizing Numerical Data

Organizing Numerical Data

Tabulating and Graphing Numerical

• When data points are very large, it may be

Describing Numerical Data with Tables

Steps to Create Frequency Tables

• Count observations & assign to classes

1) Range= Max. – Min. = 19 – 3 =16

Simple Frequency Table

12 – 16 |||| |||| |||| |||| 20

16 - 20 |||| |||| |||| | 16

Relative Frequency Table

Describing Numerical Data With Graphs

• Less & More than Ogive (Cumulative Frequency

2- Organizing and Graphing Data

Data in ordered array:

Class Frequency Relative Percentage

2- Organizing and Graphing Data

2- Organizing and Graphing Data

Cumulative Frequency Curve

Cumulative Cumulative Cumulative

2- Organizing and Graphing Data

2- Organizing and Graphing Data

Area under polygon

2- Organizing and Graphing Data

Less than cumulative

How does the “more

More-Than Curve (Ogive)

More than curve

2- Organizing and Graphing Data

Graphing Bivariate Numerical Data

Scatter Plot of bi-variate numerical data

2- Organizing and Graphing Data

Measures of Central Tendency

Population mean Sample mean

Mean (Arithmetic Mean)

• Mean (arithmetic mean)

2- Organizing and Graphing Data

• In an ordered array, the median is the “middle” number

Three Modes =5, 9, 12

2- Organizing and Graphing Data

Variance Standard Deviation Coefficient

Range  X Largest  X Smallest

2- Organizing and Graphing Data

Range X Largest  X Smallest