Professional Documents
Culture Documents
ETF1100 Business Statistics Week 6: Midterm Test Revision
ETF1100 Business Statistics Week 6: Midterm Test Revision
ETF1100 Business Statistics Week 6: Midterm Test Revision
https://burst.shopify.com/photos/whiteboard-chart?q=graph
Population:
All members of a group about which you want to draw a conclusion.
Eg. All voters in an election, all Telstra shareholders, all invoices
submitted to Medicare for reimbursement, etc.
Types of Data Wk1
Data Types
Numerical
Numerical
operations Categorical Numerical operations
are not (Qualitative)
meaningful.
(Quantitative) are
meaningful.
Great for
Great for
illustrating
illustrating Great for illustrating the distribution
relativity
portions or
particularly for
shares
ordinal data
Normalization of Data Wk1
Purpose: comparability across observations
Central
Variation Shape
Tendency
What is the typical or the central value? How much variation in the distribution? Are there any
unusual values
that
contribute to
the
distribution?
Mean, Median & Mode Wk3
Mean: measure of typical value, also known as “average”.
The sum of all values observed divided by the no of observations. In Excel : =AVERAGE(…)
Median: The middle value if values are sorted from smallest to largest (50th percentile).
50% of values are equal to or lower than the median, and 50% are equal to or higher.
In Excel : =MEDIAN(…)
All are measures of central tendencies, but which one should we use?
Measures of Variability Wk3
Range: The difference between the maximum and the minimum values. It relies just on the two
most extreme values in the dataset. In Excel: =MAX(…)-MIN(…)
Variance: average squared deviations (distance) from the mean. Reported in squared units
In Excel: =VAR.S(…)
Excel Functions:
For probability “=NORM.DIST(xvalue,mean,stdev,TRUE)”
For percentile “=NORM.INV(prob, mean,stdev)”
Representative Sample Wk4
Representative sample is determined by:
1) Data collection process (sampling design)
2) Survey design → wording design of the questions/form.
3) Sample size → a sufficiently large sample means the sample statistic gets closer to the population
parameter
Biased sample:
• Non-representative statistics
• Invalid inference → invalid conclusions. It could end with catastrophic outcomes if used in business
decisions
Potential biases:
• Selection bias – each identity in the population has an uneven chance of being chosen
• Non-responsive bias – data collection process leading to systematic non-response from certain
groups
Statistics is UNCERTAIN Wk4
• Statistics is about quantifying the uncertainty of the sample estimate
• 𝒙
ഥ is an estimate of 𝑬 𝑿 = 𝝁 (Sample statistic is only an estimate of the
truth. Any sample statistic is not exact and has variation/error around
them.)
• Assume we take data samples repeatedly, and compute sample means as the
statistic for each set of sample. Then we would have the sampling
distribution of the sample mean to portray its variability.
𝒔
• Central Limit Theorem: If the sample size 𝒏 is large: 𝒙
ഥ ∼ 𝑵 𝝁,
𝒏
𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
If the sample size (n) ↑,standard error ↓, width ↓, estimate is more precise
𝑛
The bigger the sample, the more information we have to increase the precision of the interval estimate of the
sample mean, the narrower the interval.
If the level of confidence (1-α) ↑, critical value changes, width ↑ , the estimate is less precise
The more confident we are, the more values we need to include in our confidence interval, the wider the
interval.
Hypothesis Test for Evidence-based Decisions Wk5
A statistical framework for using data to derive evidence-based
decisions.
• Define business problem and variables relevant to that problem
• Formulate a hypothesis around these variable that are relevant to business
decisions
• Conduct hypothesis testing to establish degree of evidence for the
hypotheses
• Based on evidence, make business decisions
Hypothesis Test for Evidence-based Decisions Wk5
21
Sample
Sampling
STATISTICS Distribution
DESCRIPTIVE INFERENTIAL
ESTIMATION
HYPOTHESIS TESTS
Point & Interval
1 2 3 4
Formulate Decide Calculate Apply
𝐻0 & 𝐻1 on the p-value decision rule:
reject 𝐻0
if p-value <
OR retain it if
p-value >
Defining the hypothesis Wk5
•Formulate 𝐻0 & 𝐻1
1 •The null hypothesis always involve equality sign (=)
•The alternative hypothesis is what we are searching evidence for. It can contain an “≠” , “>” or “<“ sign
𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 = 𝜇0
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 > 𝜇0 𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
ҧ 0
𝑥−𝜇 ഥ−𝝁𝟎
𝒙
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = ഥ
𝑠/ 𝑛 𝑺𝑬 𝒙
3 Judging whether or not the test statistic is outstanding “far from zero”, in the
direction of the alternative.
Decision:
P-value for a right-tail test P-value for a left-tailed test P-value for a two-tail test
=1-NORM.S.DIST(test statistic ,TRUE) =NORM.S.DIST(test statistic ,TRUE) =2*NORM.S.DIST(??,TRUE)
Type I and II errors Wk5
Since we rely on data samples to conduct hypothesis tests, there is a potential
for errors. Possible scenarios:
𝑯𝟎 is TRUE 𝑯𝟎 is FALSE
Do not reject 𝑯𝟎 CORRECT TYPE II ERROR
DECISION! (β)
Reject 𝑯𝟎 TYPE I ERROR CORRECT
(α) DECISION!