Professional Documents
Culture Documents
Lecture (1) - Statistics
Lecture (1) - Statistics
Lecture (1) - Statistics
By
Associate professor
Irrigation and Hydraulics Department
Faculty of Engineering, Cairo University
Course Outline
Chapter 1: Introduction
Chapter 2: Organizing and Graphing Data
Chapter 3: Basic Probability Concepts
Chapter 4: Random Variables, Probability Distributions
Chapter 5: Common Discrete Probability Distributions
Chapter 6: Common Continuous Probability Distributions
Chapter 7: Sampling Distributions
Chapter 8: Confidence Intervals
Chapter 9: Fundamentals of Hypothesis: Part I
Chapter 10: Fundamentals of Hypothesis: Part II
Chapter 11: Linear Regression Analysis
Chapter 12: Linear Regression and Correlation Analysis
Course Outline
Chapter 1: Introduction
Chapter 2: Organizing and Graphing Data
Chapter 3: Basic Probability Concepts
Chapter 4: Random Variables, Probability Distributions
Chapter 5: Common Discrete Probability Distributions
Chapter 6: Common Continuous Probability Distributions
Chapter 7: Sampling Distributions
Chapter 8: Confidence Intervals
Chapter 9: Fundamentals of Hypothesis: Part I
Chapter 10: Fundamentals of Hypothesis: Part II
Chapter 11: Linear Regression Analysis
Chapter 12: Linear Regression and Correlation Analysis
1- Introduction
Key Definitions
• A population (universe) is the collection of things under
consideration (e.g. Grades of 100 students)
• A sample is a portion of the population selected for
analysis (e.g. grades of 10 students out of the 100)
• A parameter is a summary measure computed to describe
a characteristic of the population (e.g. average grade of
all 100 students, constant)
• A statistic is a summary measure computed to describe a
characteristic of the sample (e.g. mean of grades of a
sample of 10 students, variable)
1- Introduction
Population
Sample
Use statistics to
summarize features
Descriptive statistics
Use parameters to
summarize features
Inferential statistics
Inference on the population from the sample
1- Introduction
Statistical Methods
• Descriptive statistics
– Collecting and describing data
• Inferential statistics
– Drawing conclusions and/or making decisions
concerning a population based only on sample
data
1- Introduction
Descriptive Statistics
• Collect data
– e.g., rain depth, temperature, river flow,
compressive strength, … etc.
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = X i
n
1- Introduction
Inferential Statistics
• Estimation
– e.g.: Estimate the population mean
weight using the sample mean
weight
• Hypothesis testing
– e.g.: Test the claim that the
population mean weight is 120
pounds
Drawing conclusions and/or making decisions concerning
a population based on sample results
1- Introduction
2. Sampling Concepts
1- Introduction
Definitions
Population: is the total set of elements of
interest for a given problem
1- Introduction
Sample
1- Introduction
Applied Statistics
1- Introduction
Presentation of Data
• Topics
– Organizing numerical data
• The ordered array
– Tabulating and graphing numerical data
• Grouping of Data
• Frequency distributions: tables, histograms, polygons
• Cumulative distributions: tables, diagrams
– Graphing bivariate numerical data
• Scatter plots
– Numerical Descriptive Measure
1- Introduction
Ordered Array
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
1- Introduction
1- Introduction
Tabulating and Graphing Numerical Data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Frequency Distributions
Ordered Array
Cumulative Distributions
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Histograms
7
Tables
4
1
Polygons
0
10 20 30 40 50 60
1- Introduction
• Frequency Tables
Simple
Multiple
• Relative Frequency Tables
Fraction
Percentage
• Cumulative Frequency Tables
More than
Less than
1- Introduction
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 15)
– The smaller the number of classes, the greater the loss
of information
• Compute class interval (width): 10 (46/5 then round up)
• Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
• Compute class midpoints: 15, 25, 35, 45, 55
Example
The following are the grades of midterm exam for
a certain section of 50 students, arrange the
following data using tables
18 19 9 3 12 13 8 17 19 15 7 16 13
13 4 14 18 17 12 11 16 15 17 12 11
12 12 15 16 14 5 17 15 18 19 13 11
9 13 17 12 13 9 18 19 11 6 15 12 9
1- Introduction
3 4 5 6 7 8 9 9 9 9 11 11 11 11
12 12 12 12 12 12 12 13 13 13 13 13 13 14
14 15 15 15 15 15 16 16 16 17 17 17 17 17
18 18 18 18 19 19 19 19
Σ 50 50
1- Introduction
8 -12 9 >4 49 98
12 - 16 20 >8 45 90
16 - 20 16 >12 36 72
20 -24 0 >16 16 32
Σ 50 >20 0 0
1- Introduction
Cumulative Frequency Table (Less than) (Less
than the upper limit)
Class Frequency Class upper Cumulative Cumulative
0-4 1
limit Frequency %
(f) Frequency
4-8 4
<4 1 2
8 -12 9
<8 5 10
12 - 16 20 <12 14 28
16 - 20 16
<16 34 68
Σ 50 <20 50 100
2- Organizing and Graphing Data
• Histogram
• Frequency Polygon
• Frequency Curve
Frequency Table
7 6
6 5
Frequency
5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 36 45 55 More
Class Boundaries
Class Midpoints
Frequency Polygon
7
6
5
4
3
2
1
0
5 15 25 36 45 55 More
Class Midpoints
2- Organizing and Graphing Data
Graphing Numerical Data:
The Frequency Curve
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Curve
7
6
5
4
3
2
1
0
5 15 25 36 45 55 More
Class Midpoints
Example (Continued)
Relative
Class No. Class Interval Frequency
3 Description Frequency
j I j (m /s) fj
rf j
1 (1200, 1800) 1200 but under 1800 16 0.32
2 (1800, 2400) 1800 but under 2400 18 0.36
3 (2400, 3000) 2400 but under 3000 8 0.16
4 (3000, 3600) 3000 but under 3600 5 0.1
5 (3600, 4200) 3600 but under 4200 2 0.04
6 (4200, 4800) 4200 but under 4800 1 0.02
Total 50 1.00
Example (Continued)
Frequency
Histogram
Relative
Frequency
Histogram
2- Organizing and Graphing Data
Example (Continued)
Example (Continued)
Boundary Cumulative
Cumulative
3 Description Relative
Value (m /s) Frequency
Frequency
This is called an Ogive 1,200 Less than 1,200 0 0
1,800 Less than 1,800 16 0.32
Cumulative frequency 2,400 Less than 2,400 34 0.68
3,000 Less than 3,000 42 0.84
polygon & cumulative 3,600 Less than 3,600 47 0.94
frequency curve (smooth 4,200
4,800
Less than 4,200
Less than 4,800
49
50
0.98
1.00
Ogive)
100
90
80
70
60
%F
50
40
30
20
10
0
2 6 10 14 18 22
Classes
Central Tendency
Average or
Arithmetic Mean
X i
X1 X 2 XN
i 1
N N
Median
• The variate value that divides the data into two equal halves
1 3 5 7 9 Median = 5
Median = (5+7)/2= 6
1 3 5 7 9 24
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values 0 1 2 3 4 5 6
• There may be no mode No Mode
• There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 one Mode =3
Measures of Variation
Variation
Range
• Measure of variation
• Difference between the largest and the smallest
observations:
7 8 9 10 11 12 7 8 9 10 11 12
Relative Range
Variance
X i X
2
S2 i 1
n 1
– Population variance: N
X
2
i
2 i 1
Standard Deviation
• Most important measure of variation
• Shows variation about the mean
• Has the same units as the original data
– Sample standard deviation: n
X X
2
i
S i 1
n 1
N
– Population standard deviation: X
2
i
i 1
N
2- Organizing and Graphing Data
Coefficient of Variation
S
CV 100%
X
Shape of a Distribution
• Skewness Coefficient
– Describes how data is distributed
– Measure of shape
X X
n
3
n i
i 1
• Corrected form of CS 2 X
n
i X
3
n n
For small sample CS i 1
(n 1)(n 2) n 2
3/ 2
X i X
i 1
2- Organizing and Graphing Data
Shape of a Distribution
• Symmetric or skewed
CS < 0 CS = 0 CS > 0
f
point
j X Classj
1 k 0-4 2 1
X
j 1
k
f j X Classj
f
n j 1 4-8 6 4
j
j 1
8 -12 10 9
• Sample Variance
1 k 12 - 16 14 20
S
2
n 1 j 1
f j ( X Classj X ) 2
16 - 20 18 16
L1 L2
Lecture 1- Page 57
fmedian
M edian L1
N /2 f i
C
f median
f i frequency until L1
L1 L2
C L2 L1
2- Organizing and Graphing Data
Coefficient of Correlation
• Measures the strength of the linear relationship
between two quantitative variables
X i X Yi Y
r i 1
n n
X X Y Y
2 2
i i
i 1 i 1
X X X
r = -1 r = -.6 r=0
Y Y
X X
r = .6 r=1