Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

05/07/19

STATISTICS

TOPIC 1

WHAT IS STATISTICS?
 Statistics is a field of study, which implies collecting,
organizing, analyzing and interpreting data as a basis for
explanation, description and comparison in order to make
decision.
Example
MMU – students’ result is dropping
Why?

1
05/07/19

The Process of Statistics

Identify the research objectives

Collect the information needed to answer the questions

Organize and summarize the information

Make decision/Draw Conclusion

TYPES OF STATISTICS
Descriptive Statistics
is a field of study which involves organizing, displaying and
describing data by using tables, graphs (graphical techniques) and
summary measures (numerical techniques).

Inferential Statistics
is a field of study that used sample results to make decisions about
population.

Population Sample

2
05/07/19

Measures of
central tendency

Numerical • Mode, median and


summary mean
Descriptive
Statistics Population: µ
Sample: x

Measures of
Organizing data dispersion
• Range, standard deviation
and Variance
Population: σ 2
Sample: s 2

Sampling Theory
 We look at sample characteristics to gain understanding of
the population at hand. Recognize random variables and
random sample.
 The usual assumption that a sample comes from a normally
distributed population. Written as X~ N ( µ , σ ) .
2

 We use sample statistics to estimate population parameters.

X to estimate µ

s2 to estimate σ2

3
05/07/19

SHAPE OF THE SAMPLING


Sampling from normally
distributed population
DISTRIBUTION

Central limit theorem states that for a


large sample size, the sampling distribution is
approximately normal, irrespective of the shape of
the population distribution

Sampling from non normal


distributed population

4
05/07/19

Cases
σ known σ unknown σ unknown
n ≥ 30 n < 30
Approximation Normal Normal t-distribution
Standardization X −µ X −µ X −µ
Z= Z= T=
σ s s
n n n

CLT

Example 1:

5
05/07/19

Example 2:

Statistical Inference

Statistical inference is the process of making an estimate,


prediction, or decision about a population based on a sample.
Population

Sample

Inference

Parameter Statistic

A descriptive measure of A descriptive measure of


a population a sample

6
05/07/19

Statistical Inference

We use statistic to make inferences about parameter.


Rationale:
• Large populations make investigating each member impractical and
expensive.
• Easier and cheaper to take a sample and make estimates about the
population from the sample.
However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of reliability”,
namely confidence level and significance level.

Statistical inference

Estimation Test of hypothesis

• Point estimator • Hypothesis testing


• Interval estimator - Statistical hypothesis,
- Confidence interval rejection region, decision
• Estimating the mean rule, type I and II error,
• Estimating a • Testing a population mean
proportion • Testing a population
• Estimating the proportion
variance • Testing a population
variance

7
05/07/19

ESTIMATION
Estimation is a procedure by which numerical value(s) are assigned
to a population parameter based on the information collected from
a sample.

Example
 A lecturer wants to estimate the average time taken by
the students to browse Internet per day.

Definition of Point Estimator


Point estimator is the value of a sample statistic that is used
to estimate a population parameter.

X
Is the point estimate
of
µ s 2 Is the pointof estimate σ 2

8
05/07/19

Definition of Interval Estimator


An interval that constructed around the point estimate, and it is stated
that this interval is likely to contain the corresponding population
parameter.
Definition of Confidence Level
The confidence level associated with a confidence interval states how
much confidence we have that this interval contains the true
population parameter.
Definition of Confidence Interval
Confidence interval is an interval that is constructed based on a given
confidence level.

Estimating the mean


(confidence interval for µ )
a) Normal population and σ known

**
σ
µ = = x ± zα ( )
2 n
b) Large sample and σ unknown

s
µ = = x ± zα ( )
2 n
c) Small sample and σ unknown

s
µ = = x ± tα ( )
2 n

9
05/07/19

Example

?
?

Example:

10
05/07/19

Example:

Example:

11
05/07/19

Example:

Building a one sided confidence


interval
One sided C.I

Lower bound Upper bound


- Bounded from below - Bounded from above

12
05/07/19

Estimating a proportion
(confidence interval for a population
proportion)

13
05/07/19

Example:

Estimating the variance


2
(confidence interval for σ )

14
05/07/19

Example:

Hypothesis Testing
Test a certain given theory about a population
parameter by using some sample
information(statistic).
 Useful in determining a new procedure is better than the existing
one.
E.g.:
- Drink Company claims that on average its cans contains 12 ounces
of soda.
- A researcher thinks that if knee surgery patients go to physical
therapy twice a week (instead of 3 times), their recovery period will
be longer

15
05/07/19

Hypothesis testing involves:


 Stating the statistical hypothesis
 Calculating the test statistics/p-value
 Indicating the rejection region
 Making the decision

Statistical hypothesis

Null hypothesis, Alternative hypothesis,


H0 H1

A claim about population A claim that will be true if the


parameter (such as µ , p or σ 2 ) null hypothesis false (that is
that is assumed to be true until
contrary to the null hypothesis)
it is declared false.

16
05/07/19

Test statistics
A value computed from sample data

Indicating the rejection region


Do not reject H0 Reject H0
Non-rejection
Rejection region
region
C
(critical value)

REJECTION REGION OF A HYPOTHESIS


TEST
 Two-tailed test:
- Rejection regions on both tails
 Left-tailed test:
- Rejection region on the left tail
 Right-tailed test:
- Rejection region on the right tail

17
05/07/19

Two-tailed test:
Sign for H1 is “≠”.
Keyword:
“changed”
“different”
E.g.: Mean family size in US
was 3.18. We want to check
either the mean has changed or
not.

Right-tailed test:
Sign for H1 is “>”.
Keyword:
 “more than”
E.g.: School teacher’s
salary was $2000. We want
to test the current salary is
higher than $2000

18
05/07/19

Left-tailed test:
Sign for H1 is “<”.
Keyword:
 “less than”
E.g.: Cans contain an average
12 ounces. We want to test
either the mean amount is less
than 12 ounces.

Example:
Explain which of the following is a two-tailed test, a left-tailed test
and a right-tailed test.

a. H0: µ = 45, H1: µ < 45


b. H0: µ ≤ 85, H1: µ > 85
c. H0: µ = 33, H1: µ ≠ 33
Solution:
a) left-tailed test b) right-tailed test c) two-tailed test

19
05/07/19

SUMMARY OF THE SIGNS IN H0 AND H1


AND THE TAILS OF THE TEST
The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be
false; and vice versa.

Two-Tailed Test Left-Tailed Test Right-Tailed Test


Sign in H 0 = = or ≥ = or ≤

Sign in H 1 ≠ < >


Rejection Region On both tails On the left tail On the right tail

Example:
Write the null and alternative hypotheses for each of the
following examples. Determine the tail (s) of the test.

a. To test whether or not the mean price of houses in Johor is


greater than RM143,000
b. To test if the mean number of hours spent working per week
by college students who hold jobs is different from 15 hours
c. To test whether the mean life of a particular brand of auto
batteries is less than 45 months
d. To test whether the mean amount of time spent doing
assignment by MMU students is different from 5 hours a
week

20
05/07/19

Making the decision

Decision made after test


Actual Decision
Fail to reject H 0 Reject H 0
Type I error
H 0 is true Correct Decision
(α Error)
Type II error
H 0 is false Correct Decision
(β Error)

Types of error
Type I error
• True H0 is rejected.
• α = significance level of the test
• α = P (H0 is rejected | H0 is true)

Type II error
• False H0 is not rejected.
• 1- β = power of the test
• β = P (H0 is not rejected | H0 is false)

21
05/07/19

Hypothesis testing by using test


statistics approach:

Hypothesis testing by using p-value


approach:

22
05/07/19

Hypothesis testing for population


mean, µ
• For normal population and σ known
Two-Tailed Test Left-Tailed Test Right-Tailed Test
Sign in H0 = = or ≥ = or ≤
Sign in H1 ≠ < >

Test statistic

Critical Region
Decision rule Reject H0 if test statistic falls in critical region
     
 x−µ   x−µ   x−µ 
p-value 2P  Z >

σ 

PZ <

σ 

PZ >

σ 

 n   n   n 

Decision rule Reject H0 if p-value < α

Hypothesis testing for population


mean, µ
• For large sample and σ unknown
Two-Tailed Test Left-Tailed Test Right-Tailed Test
Sign in H0 = = or ≥ = or ≤
Sign in H1 ≠ < >
x−µ
Test statistic z=
s
n

Critical Region
Decision rule Reject H0 if test statistic falls in critical region
     
 x−µ   x−µ   x−µ 
p-value 2P  Z >

s 

PZ <

s 

PZ >

s 

 n   n   n 

Decision rule Reject H0 if p-value < α

23
05/07/19

Hypothesis testing for population


mean, µ
• For small sample (n<30) and σ unknown
Two-Tailed Test Left-Tailed Test Right-Tailed Test
Sign in H0 = = or ≥ = or ≤
Sign in H1 ≠ < >
x−µ
Test statistic t=
s
n

Critical Region t > tα t < −tα t > tα


2

Decision rule Reject H0 if test statistic falls in critical region


     
 x−µ   x−µ   x−µ 
p-value 2P  T >

s 

P T <

s 

P T >

s 

 n   n   n 

Decision rule Reject H0 if p-value < α

Example:

24
05/07/19

Example:

Example:

25
05/07/19

Example:

Hypothesis testing for population


proportion, p
• For large sample
Two-Tailed Test Left-Tailed Test Right-Tailed Test
Sign in H0 = = or ≥ = or ≤
Sign in H1 ≠ < >
̂ −   = 

    

Test statistic =
  ̂ = 
    


Critical Region z > zα z < − zα z > zα


2

Decision rule Reject H0 if test statistic falls in critical region


̂ −  ̂ −  ̂ − 
2  >  <  >
p-value      
  

Decision rule Reject H0 if p-value < α

26
05/07/19

Example:

Hypothesis testing for population


variance, σ 2
Two-Tailed Test Left-Tailed Test Right-Tailed Test
Sign in H0 = = or ≥ = or ≤
Sign in H1 ≠ < >
( n − 1) s 2   = 
  
Test statistic χ2 =
σ 02
  = 

  


χ2 < χ2 α
1−
2 or χ 2 < χ 21−α χ 2 > χ 2α
Critical Region
χ 2 > χ 2α
2

Decision rule Reject H0 if test statistic falls in critical region

27
05/07/19

Example:

Chi-squared Goodness of Fit test (GOF)


 Used to test if a model fits a given scenario well
 In a given scenario, frequencies are observed and compared
to frequencies expected by fitting a said model
• Procedures for GOF
Hypothesis
H0 A said model fit a given scenario well
H1 A said model does not fit a given scenario well
2
k
( oi − ei )  = observed frequencies
Test statistic χ2 = ∑  = expected frequencies
i =1 ei k = number of classes
Decision rule Reject H0 if χ 2
> χ 2α ( k − 1)

28
05/07/19

Example:

Example:

29

You might also like