Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

A random variable describes the probabilities for an uncertain future numerical outcome of a random process. It has 2 characteristics: a) can take one of several possible values b)
a probability is associated with each value
• A probability distribution identifies possible outcomes of a random variable and assigns a probability to each. It can be discrete with finite number of values (Ex: die roll) or
continuous with all possible values in a range (Ex: weight of a person). Note: In a continuous distribution, it is not possible to determine probability of a value, instead the probability
is calculated for a range.
• The expected value / mean of a random variable is the weighted average of its values. Formula: E(X) = 𝒊 xi P(X=xi)
2 2
• Variance is the weighted average of the squared deviations from the mean. Formula: σ (X)= 𝒊 (xi- μ) P(X=xi)
• Standard deviation (σ) is the square root of variance
• Normal distribution is a symmetric, continuous probability distribution (See figure on right). It:
• can be uniquely specified by a mean (μ) and standard deviation (σ)
• can take values from -∞ to +∞
• is symmetric & centered around the mean
• has Mean= median= mode
• is denoted as X~N(μ,σ )
• Every normal distribution can be converted into a standard normal distribution (Z-score). Formula: zi = (xi- μ)/ σ
• Z-scores have normal distribution with μ=0 & σ=1 i.e. Z~N(0,1)
• A random variable ‘aX’ has μ= aμx and σ=aσx
• Sum of normally distributed random variables is a normally distributed random variable. Ex: in Y= aX1+bX2, If X1,X2 are normally distributed, Y is normally distributed with E(Y)=
2 2 2 2
aμ1+bμ2 & Var (Y)= a σ1 + b σ2

2 2
• Different notations: Population parameters are μ, σ & π. Sample statistics are 𝒙, s & p.
• Sample statistics are random variables because they vary across samples drawn from the same population.
• Sample statistics provide point estimates of the population parameter.
• Simple Random Sample (SRS):
• Every selection should have an equal chance of being chosen(Unbiased)
• Selection of one unit should not influence selection of another
• Sampling frame is list of items from which sample is drawn, it should represent the population (Refer to figure)
• Central limit theorem states that, no matter what the population distribution is, the sample mean (𝑋) is normally distributed with mean (μ) and standard error of the mean (𝜎/ 𝑛)
• n is determined by the sample size and random variable 𝑋 can be denoted as 𝑿~𝑵(μ,σ /n)

• Informally, a confidence interval indicates a range of values that’s likely to contain the true value of the population parameter at a confidence level
• The confidence level is the proportion of all possible confidence intervals that contain the true value of the population parameter (Imagine the proportion obtained on constructing
millions of such confidence intervals. Ex: @ 95% confidence, 95% of all the intervals constructed will have the true value of the population parameter)
• Interval estimate = Point estimate ± Margin of error
• The size of the margin of error depends on the level of confidence, the variation in the data and the sample size (1-α, σ, n respectively)
• If the population standard deviation is known, Confidence Interval: 𝑿 ± 𝒛1-α 𝝈/ 𝒏 (Remember @95% confidence, 95% is the probability on both sides of mean. Refer to the figure:)

• If only the sample standard deviation is known, Confidence interval: 𝑿 ± 𝒕1- α, n-1 𝒔/ 𝒏. (n-1 is the degrees of freedom for t-table)
• t-values at a particular level of confidence are more than z- values at the same confidence level
• For estimating population proportion (π), we can use sample proportion (p) as the point estimate.

• Population proportion confidence interval: 𝒑 ±𝒛1-α (𝒑(𝟏 − 𝒑)/𝒏)

You might also like