ETF1100 Business Statistics Week 6: Midterm Test Revision

Whiteboard Chart by Shopify Partners
https://burst.shopify.com/photos/whiteboard-chart?q=graph
ETF1100 Business Statistics

Week 6
Midterm Test Revision
Charanjit Kaur
Learning Outcomes
• Revision of materials taught in Weeks 1-5
• Complete Practise Test for Mid-Semester

Basic Concepts of Statistics: Random Variables Wk1
Definition: outcomes of experiments whose values may vary due to chance
• Repeated observations of random variables produce a spread of values.
• These observations are “data” that is relevant in analysing the problem.
Business outcomes are rarely predictable

Basic Concepts of Statistics: Population and Sample Wk1
Sample:
A subset of the population selected for
analysis.
• Often chosen randomly
• Preferably representative of the population.
Population:
All members of a group about which you want to draw a conclusion.
Eg. All voters in an election, all Telstra shareholders, all invoices
submitted to Medicare for reimbursement, etc.
Types of Data Wk1
Data Types
Numerical
Numerical
operations Categorical Numerical operations
are not (Qualitative)
meaningful.
(Quantitative) are
meaningful.
Nominal Ordinal Discrete Continuous

Values are labels & do Values are labels, but Values arise from Values arise from
not imply any order they have an order counting measuringg
Visualising Data Wk1
Nominal & Ordinal Discrete Data Continuous data
Bar Chart Pie Chart Bar Chart Histogram Box plot
Great for
Great for
illustrating
illustrating Great for illustrating the distribution
relativity
portions or
particularly for
shares
ordinal data
Normalization of Data Wk1
Purpose: comparability across observations
• Choice of normalization depends on the purpose of the analysis!
Store Profit ($000) Net Profit % ROI

A 8.06 3.96% 22.39%
B 54.229 10.81% 2.36%
C 17.981 17.55% 9.04%
D 94.891 15.67% 5.02%
E 70.913 7.04% 6.66%
F 23.005 15.49% 18.11%
G 108.656 17.94% 2.57%
Normalization of Data Wk1
Other common normalization
a. across time; real vs nominal values (CPI adjusted)
• Consumer Price Index (CPI): a weighted average of prices for everyday goods
and services people buy. It is indexed to 100 in the base period and is used
to calculate the inflation rate.
• Nominal value: value that is measured in terms of actual prices that exist at
the time.
• Real value: the value of the same item after it has been adjusted for
inflation.
b. across observations; e.g. percentage, per capita
Analysis of Categorical Data Wk2
Probability = quantification of chance. Probability of all possible events, add to 1
• Marginal Probability: P(A) = probability of event A.
• Joint Probability: Probability of “Intersection” describes “A AND B” 𝑃(𝐴 ∩ 𝐵)
• Conditional probability: P(A) conditional on B having occurred P(A|B) = P(A) where

𝑷(𝑨 ∩ 𝑩)
𝑷 𝑨𝑩 =
𝑷(𝑩)
• Probability of “Union” describes “Either A OR B” 𝑃(𝐴 ∪ 𝐵)

Analysis of Categorical Data Wk2
• Mutually exclusive events: Events that cannot occur together
Pr 𝐻𝑒𝑎𝑑 ∩ 𝑇𝑎𝑖𝑙 = 0
• Independent events: Event A is independent of event B if

P 𝐴 𝐵 = Pr 𝐴 or P 𝐴 ∩ 𝐵 = P 𝐴 × P(𝐵)
• Evaluate the relationship between two categorical variables – refer to Exercise

vs Heart Disease Example in Seminar 2
Understanding Statistical Uncertainty Wk3
Distribution of
Numerical Data
Central
Variation Shape
Tendency
Arithmetic Interquartil Standard

Median Mode Range Variance Skewness
Mean e Range Deviation
What is the typical or the central value? How much variation in the distribution? Are there any
unusual values
that
contribute to
the
distribution?
Mean, Median & Mode Wk3
Mean: measure of typical value, also known as “average”.
The sum of all values observed divided by the no of observations. In Excel : =AVERAGE(…)
Median: The middle value if values are sorted from smallest to largest (50th percentile).
50% of values are equal to or lower than the median, and 50% are equal to or higher.
In Excel : =MEDIAN(…)
Mode: Value that occurs most frequently.

This might not be interesting if the values don’t repeat often. For numerical data, the most
populated bin range is often reported. In Excel : =MODE(…)
All are measures of central tendencies, but which one should we use?
Measures of Variability Wk3
Range: The difference between the maximum and the minimum values. It relies just on the two
most extreme values in the dataset. In Excel: =MAX(…)-MIN(…)
Interquartile Range: the spread of the middle 50% of the data

Q1 = first quartile → 25% of data falls below this value In Excel: =QUARTILE.EXC(…,1)
Q3 = third quartile → 25% of data falls above this value In Excel: =QUARTILE.EXC(…,3)
In Excel: =Q3 – Q1
Variance: average squared deviations (distance) from the mean. Reported in squared units
In Excel: =VAR.S(…)
Standard deviation: Variance

Same unit as the data. Easier to interpret In Excel: =STDEV.S(…)
Shape/Skewness of Data Distribution Wk3
Skewness is the extent of asymmetry in the distribution.
If the distribution is symmetric, the mean is equal to the median.
Skewness > 0 Skewness = 0 Skewness < 0
Probability Distribution Wk3
• In statistics, we use a smooth mathematical function to model the
probability density function (pdf)
• These are approximation to the data distribution – “model”
• The function 𝑓 𝑋 denotes the “pdf”

• Areas under the curve represents
probability
Normal Distribution Wk3
• The most common distribution in statistics → normal distribution
• It is a symmetric (bell-shaped) distribution
• The normal distribution has two features: Mean and Stdev
• Notation: 𝑋 ~ 𝑁(𝑀𝑒𝑎𝑛, 𝑆𝑡𝑑𝑒𝑣)
• Skewness = 0;
• Mean = Median = Mode
Excel Functions:
For probability “=NORM.DIST(xvalue,mean,stdev,TRUE)”
For percentile “=NORM.INV(prob, mean,stdev)”
Representative Sample Wk4
Representative sample is determined by:
1) Data collection process (sampling design)
2) Survey design → wording design of the questions/form.
3) Sample size → a sufficiently large sample means the sample statistic gets closer to the population
parameter
Biased sample:
• Non-representative statistics
• Invalid inference → invalid conclusions. It could end with catastrophic outcomes if used in business
decisions
Potential biases:
• Selection bias – each identity in the population has an uneven chance of being chosen
• Non-responsive bias – data collection process leading to systematic non-response from certain
groups
Statistics is UNCERTAIN Wk4
• Statistics is about quantifying the uncertainty of the sample estimate
• 𝒙
ഥ is an estimate of 𝑬 𝑿 = 𝝁 (Sample statistic is only an estimate of the
truth. Any sample statistic is not exact and has variation/error around
them.)
• Assume we take data samples repeatedly, and compute sample means as the
statistic for each set of sample. Then we would have the sampling
distribution of the sample mean to portray its variability.
𝒔
• Central Limit Theorem: If the sample size 𝒏 is large: 𝒙
ഥ ∼ 𝑵 𝝁,
𝒏
• This is true regardless of the shape of the population distribution

Confidence Interval for the Population Mean Wk4
Confidence interval = plausible range of the unknown population
mean given some level of probability
𝑠 𝑠
𝑋−𝑍 < 𝜇 < 𝑋+𝑍
𝑛 𝑛
If the standard deviation (𝜎) ↑, the spread of the distribution is larger

𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
standard error ↑, width ↑, estimate is less precise
𝑛
𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
If the sample size (n) ↑,standard error ↓, width ↓, estimate is more precise
𝑛
The bigger the sample, the more information we have to increase the precision of the interval estimate of the
sample mean, the narrower the interval.
If the level of confidence (1-α) ↑, critical value changes, width ↑ , the estimate is less precise
The more confident we are, the more values we need to include in our confidence interval, the wider the
interval.
Hypothesis Test for Evidence-based Decisions Wk5
A statistical framework for using data to derive evidence-based
decisions.
• Define business problem and variables relevant to that problem
• Formulate a hypothesis around these variable that are relevant to business
decisions
• Conduct hypothesis testing to establish degree of evidence for the
hypotheses
• Based on evidence, make business decisions
Hypothesis Test for Evidence-based Decisions Wk5
21
Sample
Sampling
STATISTICS Distribution
DESCRIPTIVE INFERENTIAL
ESTIMATION
HYPOTHESIS TESTS
Point & Interval
Estimating the value of a Testing a claim about the value

population parameter of a population parameter
Steps in Hypothesis Test Wk5
1 2 3 4
Formulate Decide Calculate Apply
𝐻0 & 𝐻1 on  the p-value decision rule:
reject 𝐻0
if p-value < 
OR retain it if
p-value > 
Defining the hypothesis Wk5
•Formulate 𝐻0 & 𝐻1
1 •The null hypothesis always involve equality sign (=)
•The alternative hypothesis is what we are searching evidence for. It can contain an “≠” , “>” or “<“ sign
𝐻0 : 𝜇 = 𝜇0 𝐻0 : 𝜇 = 𝜇0
𝐻0 : 𝜇 = 𝜇0
𝐻1 : 𝜇 > 𝜇0 𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
Two-tailed test Right-tailed test Left-tailed test

“different to” “greater than” “less than”
Mechanics of Hypothesis Testing Wk5
Decide on .
2 Recommendations: 𝛼 = 5%; or 𝛼 = 1% for conservative cases.
ҧ 0
𝑥−𝜇 ഥ−𝝁𝟎
𝒙
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = = ഥ
𝑠/ 𝑛 𝑺𝑬 𝒙
3 Judging whether or not the test statistic is outstanding “far from zero”, in the
direction of the alternative.
Decision:
4 Reject 𝐻0 if p-value <  OR Retain it if p-value > 

A smaller p-value means that there is stronger evidence in favor of H1
P-value for a right-tail test P-value for a left-tailed test P-value for a two-tail test
=1-NORM.S.DIST(test statistic ,TRUE) =NORM.S.DIST(test statistic ,TRUE) =2*NORM.S.DIST(??,TRUE)
Type I and II errors Wk5
Since we rely on data samples to conduct hypothesis tests, there is a potential
for errors. Possible scenarios:
Type I error Type II error

Reject a true null The true ‘state of the world’ Retain a false null
𝑯𝟎 is TRUE 𝑯𝟎 is FALSE
Do not reject 𝑯𝟎 CORRECT TYPE II ERROR
DECISION! (β)
Reject 𝑯𝟎 TYPE I ERROR CORRECT
(α) DECISION!

ETF1100 Business Statistics Week 6: Midterm Test Revision

Uploaded by

Copyright:

Available Formats

You might also like

ETF1100 Business Statistics Week 6: Midterm Test Revision

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETF1100 Business Statistics Week 6: Midterm Test Revision

Uploaded by

Copyright:

Available Formats

Whiteboard Chart by Shopify Partners

ETF1100 Business Statistics

• Revision of materials taught in Weeks 1-5

• Complete Practise Test for Mid-Semester

Definition: outcomes of experiments whose values may vary due to chance

• Repeated observations of random variables produce a spread of values.

• These observations are “data” that is relevant in analysing the problem.

Business outcomes are rarely predictable

Nominal Ordinal Discrete Continuous

Nominal & Ordinal Discrete Data Continuous data

Bar Chart Pie Chart Bar Chart Histogram Box plot

• Choice of normalization depends on the purpose of the analysis!

Store Profit ($000) Net Profit % ROI

• Conditional probability: P(A) conditional on B having occurred P(A|B) = P(A) where

• Probability of “Union” describes “Either A OR B” 𝑃(𝐴 ∪ 𝐵)

• Independent events: Event A is independent of event B if

• Evaluate the relationship between two categorical variables – refer to Exercise

Arithmetic Interquartil Standard

Mode: Value that occurs most frequently.

Interquartile Range: the spread of the middle 50% of the data

Standard deviation: Variance

• The function 𝑓 𝑋 denotes the “pdf”

• This is true regardless of the shape of the population distribution

If the standard deviation (𝜎) ↑, the spread of the distribution is larger

Estimating the value of a Testing a claim about the value

Two-tailed test Right-tailed test Left-tailed test

4 Reject 𝐻0 if p-value <  OR Retain it if p-value > 

Type I error Type II error

You might also like