Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

PROBABILITY DISTRIBUTION

Random (By Chance) Variable


Variable (Quantitative)
- Discrete
- Values that can be counted
- Continuous
- Values that can be measured

Probability Distributions
- A table consisting of values a random variable can assume and the corresponding probabilities of the values. The probabilities
are determined theoretically or by observation.
REQUIREMENTS
- The sum of the probabilities of all the events in the sample space must equal to 1
- ∑P(X) = 1 or 100%
- The probability of each event in the sample space must be between or equal to 0 and 1
- 0 ≤ P(X) ≤ 1
- X HAS NO REQUIREMENT (can be negative)
X P(X) XP(X) X-µ (𝑋 − µ)
2 2
(𝑋 − µ) 𝑃(𝑋)

Variance:
Farther from 0 = more variable µ = ∑𝑋 · 𝑃(𝑋)
Closer to 0 = less variable (more uniform) 2 2
σ = ∑[(𝑋 − µ) · 𝑃(𝑋)]
No variability 0 = constant

A box contains four red and three blue marbles. Michele picks three marbles at random from this box. If Z is the random variable
representing the number of blue marbles picked from the box, complete the table below.

4𝐶3
7𝐶3
(4 red marbles choose 3)

4𝐶2•3𝐶1
7𝐶3
(4 red choose 2 • 3 blue choose 1)

4𝐶1•3𝐶2
7𝐶3
(4 red choose 1 • 3 blue choose 2)

3𝐶3
7𝐶3
(3 blue marbles choose 3)

EXPECTED VALUE OF A DISCRETE PROBABILITY


DISTRIBUTION
Expected Value of a discrete random variable of a probability distribution
- is the theoretical average of the variable.
- µ = 𝐸(𝑋) = ∑𝑋 · 𝑃(𝑋)

GROSS INCOME = INCOME BEFORE DEDUCTING EXPENSES AND


OTHER STUFF
BINOMIAL EXPERIMENT
- The outcomes of a binomial experiment and the corresponding probabilities of these outcomes.

- Each trial can have only two outcomes. (success/failure)


T Trials Fixed & independent
- There must be a fixed number of trials (n).
- The outcomes of each trial must be independent. O Outcomes Always two
- The probability of a success must remain the same for
each trial. P Probability Must be the same for each trial

- WITHOUT REPLACEMENT = NOT BINOMIAL


p numerical probability of a success
EXPERIMENT
q numerical probability of a failure - WITH REPLACEMENT = YES BINOMIAL EXPERIMENT
- 1 - P(6) = P(0) + P(1) + P(2) + P(3) + P(4) + P(5)
n number of trials (always given)

X number of successes in n trials (X ≤ n)

In a binomial experiment, the probability of exactly X successes in n trials is


𝑋 𝑛−𝑋
𝑃(𝑋) = 𝑛𝐶𝑥 · 𝑝 · 𝑞
Mean: µ = 𝑛 · 𝑝
2
Variance: σ = 𝑛 · 𝑝 · 𝑞
Standard deviation: σ = 𝑛 · 𝑝 · 𝑞

CONFIDENCE INTERVALS FOR THE MEAN


When σ is known (POPULATION MEAN)
1. Point estimate (sample mean, one value)
- a specific numerical value estimate of a parameter. The best point estimate of the population mean is the sample mean
- disadvantage

2. Mean
- It is the most unbiased estimator
- All values have a contributing factor to the mean

3. Interval estimate of a parameter


- Statisticians prefer this type more
- is an interval or a range of values used to estimate the parameter.
- This estimate may or may not contain the value of the parameter being estimated.
● The parameter is specified as being between two values.
● Either the interval contains the parameter or it does not. A degree of confidence must be assigned before an
interval estimate is made.
● If you desire to be more confident, then you must make the interval larger
- The bigger the interval, the more nonsense/insignificant the quality of research becomes

4. Confidence level of an interval estimate of a parameter


- probability that the interval estimate will contain the parameter.
- (90%, 95%, 99%)

5. Confidence interval
- is a specific interval estimate of a parameter determined by using data obtained from a sample and by using the specific
confidence level of the estimate.

* Don’t forget to divide by 2 if z


Ex: 90% = 10/2 = 5%

● For an 80% confidence interval, 𝑧 α = 1.28


2

● For a 90% confidence interval, 𝑧 α = 1.65


2

● For a 95% confidence interval, 𝑧 α = 1.96


2

● For a 99% confidence interval, 𝑧 α = 2.58


2

ANSWER: ___ < µ < ___

* If no exact value for z-score, choose the one that’s farther from the mean
When σ is unknown (SAMPLE MEAN)
- Most of the time, the value of 𝝈 is not known, so it must be estimated by using s, namely, the standard deviation of the sample.
- When s is used, critical values greater than the values for 𝑧 α are used in confidence intervals. These values are taken from the
2

Student t distribution, most often called the t distribution.

- The t distribution is similar to the standard normal distribution in these ways:


1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
4. The curve approaches but never touches the x axis.
5. As the sample size increases, the t distribution approaches the standard normal distribution.
- The t distribution differs from the standard normal distribution in the concept of degrees of freedom, which is related to sample
size.

Degrees of freedom
- are the number of values that are free to vary after a sample statistic has been computed.
- df = n -1

* Always ROUND DOWN to the nearest table value.


ANSWER: ___ < µ < ___

For Proportions (always use z)

STEP 1: Determine p hat and q hat


STEP 2: Determine the Critical Value
STEP 3: Substitute in the formula

ANSWER: ___ < p < ___ (can be fraction, decimal, or percentage)

HYPOTHESIS TESTING
- It is a decision-making process for evaluating claims about a population.
STEP 1: Hypotheses and Claim
STEP 2: Critical Value(s)
STEP 3: Test Value
STEP 4: Decide
STEP 5: Conclusion

Statistical Hypothesis
- Is a conjecture about a population parameter. (±) (+) (-)
- May or may not be true

● Every hypothesis testing situation begins with the statement of a hypothesis.


○ The null hypothesis, symbolized by 𝐻0, is a statistical hypothesis that states that there is no difference between a
parameter and a specific value, or that there is no difference between two parameters.
○ The alternative hypothesis, symbolized by 𝐻1, is a statistical hypothesis that states the existence of a difference
between a parameter and a specific value, or states that there is a difference between two parameters.

Level of Significance (α)


- Maximum probability of committing a type I error.
- P(type I error) = α
- Chance of getting an error
Critical Value (C.V.)
- Separates the critical region from the noncritical region.
- If two tailed test, divide alpha by 2

Statistical test
- Uses data obtained from a sample to make a decision about whether the null hypothesis should be rejected

Test value
- Numerical value obtained from a statistical test
- If the test value is the same as the critical value, the null
hypothesis should be rejected.

P-VALUE METHOD
- probability of the test value in the direction of the alternative hypothesis

Find p value by finding the value for z/t


- If (z), multiply z value by 2 if two tailed test
- If (z), find the corresponding value if one tailed
- If (t), find df -> find value closest to test
value (in same row) - get alpha

CORRELATION AND REGRESSION


SCATTERPLOTS & CORRELATION
1. Positive Linear Relationship
- as the values of the independent variable (x variable) increase, the values of the dependent variable ( y variable)
increase.
2. Negative Linear Relationship
- as the values of the independent variable increase, the values of the dependent variable decrease.
3. Nonlinear Relationship or a Curvilinear Relationship
4. No Relationship
- No relationship between the independent variable and the dependent variable since no pattern (line or curve) can be
seen.

1. population correlation coefficient (Greek letter p)


- is the correlation computed by using all possible pairs of data values (x, y) taken from a population.
- The two variables play different roles. We’ll call the variable of interest the response variable (x) and the other the
explanatory or predictor variable (y).
2. linear correlation coefficient (r)
- computed from the sample data measures the strength and direction of a linear relationship between two quantitative
variables. The symbol for the sample correlation coefficient is r
- The linear correlation coefficient explained in this section is called the Pearson Product Moment Correlation Coefficient
(PPMC), named after statistician Karl Pearson, who pioneered the research in this area.

The range of the linear correlation coefficient is from -1 to +1.


- Strong positive linear relationship between the variables – the value of r will be close to +1.
- Strong negative linear relationship between the variables – the value of r will be close to -1.
- No linear relationship between the variables or only a weak relationship, the value of r will be close to 0.

CORRELATION COEFFICIENT AND ITS SIGNIFICANCE


PROPERTIES OF LINEAR CORRELATION COEFFICIENT
1. The correlation coefficient is a unit-less measure.
2. The value of r will always be between -1 and +1 inclusively. That is, -1≤ r ≤1.
3. If the values of x and y are interchanged, the value of r will be unchanged.
4. If the values of x and/or y are converted to a different scale, the value of r will be unchanged.
5. The value of r is sensitive to outliers and can change dramatically if they are present in the data.

ASSUMPTIONS FOR THE CORRELATION COEFFICIENT


1. The sample is a random sample.
2. The data pairs fall approximately on a straight line and are measured at the interval or ratio level.
3. The variables have a bivariate normal distribution.
Rounding Rule for the Correlation Coefficient
- Round the value of r up to three decimal places.

n is the number of data pairs

(ALWAYS) ↓
𝐻0: p = 0; This null hypothesis means that there is no correlation between the x and y variables in the population.
𝐻1: p ≠ 0; This alternative hypothesis means that there is a significant correlation between the variables in the population.

←(ALWAYS USE (t) TEST VALUE)

df = n - 2

STEP 1: State the hypotheses. 𝐻0: p = 0 and 𝐻1: p ≠ 0


STEP 2: Find the critical values.
STEP 3: Find the test value.
STEP 4: Make the decision. (Reject the null hypothesis.)
STEP 5: Summarize the results. (There is a significant relationship between the number of cars a rental agency owns and its
annual income.)

REGRESSION LINE (LINE OF BEST FIT)


Line of Best Fit (Trend Line)
- line of best fit will pass through ALL points if and only if r = -1 or 1
- A straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points,
or all of the points.
- The number of points above the line and below the line is about equal.
- y’ = a + bx

Round off the value of a and b to three decimal places

Residual
- The difference between the actual value y and the
predicted value y’ (that is, the vertical distance) is called - If the residual is positive, the actual data is above the
a residual or a predicted error. regression line.
- y - y’ (use the value of y that corresponds to the x that - If the residual is negative, the actual data is below the
was used) regression line.
- If the residual is zero, the actual data is on the regression
line.

You might also like