Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Central Limit Theorem and Confidence Interval

27)
Intro CLT: the central limit theorem (CLT) states that the distribution of sample approximates a
normal distribution as the sample size becomes larger
Sample sizes equal to or greater than 30 are considered sufficient for the CLT to hold.
A sufficiently large sample size can predict the characteristics of a population accurately.
Understanding:
the mean of a sample of data will be closer to the mean of the overall population in question, as the
sample size increases, notwithstanding the actual distribution of the data

24)
Why useful:
The central limit theorem is vital in statistics for two main reasons—the normality assumption and the
precision of the estimates.

 Can use CLT for hypothesis testing in non-normal distributions


Normality assumption - In statistics, the normality assumption is vital for parametric
hypothesis tests of the mean. And these tests are not valid when the data is not normally
distributed. However, if your sample size is large enough, the central limit theorem kicks in
and produces sampling distributions that approximate a normal distribution. This fact allows
you to use these hypothesis tests even when your data are not normally distributed—as long
as your sample size is large enough.

 Sampling distributions cluster more around population mean as the sample size increases.

This property of CLT becomes more relevant when we are using samples to estimate the
population mean. With larger sample size, sample mean is approximating population mean, so
the estimate becomes more precise

32)
Where CLT is useful:
investors of all types rely on the CLT to analyze stock returns, construct portfolios, and manage risk.
Say, for example, an investor wishes to analyze the overall return for a stock index that comprises
1,000 equities. In this scenario, that investor may simply study a random sample of stocks, to cultivate
estimated returns of the total index.
30)
Cons

 The central limit theorem applies to almost all types of probability distributions, but there are
exceptions. For example, the population must have a finite variance. That restriction rules out
the Cauchy distribution because it has infinite variance.
 the value of one observation should not depend on the value of another observation.
 the distribution of the independant variable must remain constant across all measurements.
 Typically, a sample size of 30 is sufficient for most distributions. But strongly skewed
distributions can require larger sample sizes
Solution:
If population distribution is extremely skewed, might need a substantial sample size for the
central limit theorem and produce sampling distributions that approximate a normal
distribution

33)
Intro Confidence Interval: To test the accuracy of sample mean
A confidence interval is an interval around the estimated mean (μₑ) that is likely to include the
unknown population mean (μ)
A 95% confidence level means that we would expect 95% of the interval estimates would include the
population mean.

Restrictions:
Usually, we work with only one random sample containing large number of data points and in such
case we have only one confidence interval estimate that can be computed as follows for 95%
confidence level:

1.96 is the Z-score associated with 95% confidence interval

Higher standard error leads to wider confidence interval, which indicates that the mean of our random
sample is not a good approximation of the population mean.

31)
MARGIN OF ERROR
A margin of error tells you how many percentage points your results will differ from the real
population value. For example, a 95% confidence interval with a 4 percent margin of error means that
your statistic will be within 4 percentage points of the real population value 95% of the time

37)
Counter point:
The idea behind confidence levels and margins of error is that any survey or poll will differ from the
true population by a certain amount. However, confidence intervals and margins of error reflect the
fact that there is room for error, so although 95% or 98% confidence with a 2 percent Margin of Error
might sound like a very good statistic, room for error is built in, which means sometimes statistics are
wrong.

34)
Difference between the confidence interval and margin of error?
The margin of error is how far from the estimate we think the true value might be (in either direction).
The confidence interval is the estimate ± the margin of error.

Eg. You might hear people say things like:


"It'll be an hour, give or take 10 minutes."
Meaning they are confident that they would be within one hour, either 10 mins more than that or 10
mins less than that or anywhere in between.
But it'll likely be no more than 10 minutes out, either way. Ten minutes (or 3 percentage points) is our
margin of error.

38)

SIGNIFICANCE LEVEL α

The value of α=0.05 is a common one; it means there's only a 5% chance our confidence interval will
not capture the true value. Using α=0.01 would mean there's only a 1% chance.

35)
THINGS TO KNOW ABOUT CONFIDENCE INTERVALS
1. tell you the most likely range of the unknown population average or percentage
2. provide both the location and precision of a measure
3. Three things impact the width of a confidence interval
a. Confidence level: 90, 95, 99
b. Variability: as measured by the standard deviation
c. Sample Size: Smaller sample sizes generate wider intervals
4. Confidence Intervals can be computed on sample sizes as small as two: The intervals will
be very wide but there’s nothing with the math preventing you form computing them. With
small sample sizes you can show that an interface is unusable, but it’s harder to show it’s
usable. For example, if 0 out of 2 people can complete a task, there’s only about a 5% chance
more than half of all users will.

DISCUSSION ON confidence interval


Q) Is it better to have narrow confidence interval or wider?
A> The narrower the interval, the more certainty about the estimate mean.
A 99% confidence interval is wider than a 95%, which means the former is most likely to
contain the true value.
In 99% CI, value is more accurate (more likely to be closer to true value). At 95% CI, value is
more precise (closer than the last time)
So in all it really depends on what type of value one is predicting, accurate or precise.

You might also like