Professional Documents
Culture Documents
QM Reviewer
QM Reviewer
LINEAR REGRESSION
REGRESSION ANALYSIS
• Regression is the technique concerned with predicting some variables by knowing others.
*Correlation
*Regression
• The regression line makes the sum of the squares of the residuals smaller than for any other line.
LEAST-SQUARES METHOD
• A procedure that minimizes the vertical deviations of plotted points surrounding a straight line.
• It constructs a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form
of:
➢ y-hat = regression equation (the predicted equation for a line of best fit in linear regression.)
➢ b = slope (indicates the property possessed by a line or surface that departs from the horizontal line.)
Note:
ŷ = 59.95 + 3.17x
ŷ = 59.95 + 3.17x
ŷ = 97.99
Therefore, the predicted final grade for someone who studies for 12 hours is 97.99
• Predict the final grade of someone who studies for 1 hour:
ŷ = 59.95 + 3.17x
ŷ = 63.12
Therefore, the predicted final grade for someone who studies for 1 hour is 63.12
Example 2: A sample of 6 persons was selected the value of their age (x variable) and their weight is demonstrated in the
following table. Find the following:
As you can see in the table, there is no weight when the age is 8.5 years therefore, we will be predicting it using the least
squares method of the regression analysis.
Step 1: Compute for the xy, x2 and y2 in order to complete the table.
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x = 41 ∑y = 66 ∑xy = 461 ∑x = 291
2
∑y = 742
2
Step 3: Solve for the x (sample mean) by dividing the value of ∑x with the total number of frequencies (n).
Step 4: Solve for the y (sample mean) by dividing the value of ∑y with the total number of frequencies (n).
Step 5: Substitute the computed values in the formula.
Step 8: Find the predicted weight when age is 8.5 years using the regression equation.
SYNTHESIS / GENERALIZATION
Understanding the concept of linear regression is needed for students to be able to show or predict the relationship
between two variables or factors. The concept is to predict the value of a certain variable (dependent/outcome variable)
based on the value of another variable (independent/predictor variable).
By using the Least-Squares method, students can find the best-fit for a set of data points by minimizing the sum of the
residuals of points from the plotted curve. Students will also be able to predict the behavior of dependent/outcome
variables using this method.
CHAPTER 10
CONCEPT OF SAMPLING
• A sample is a subset of a larger population of objects, individuals, households, businesses, organizations and so forth.
SAMPLING
• Sampling enables researchers to make estimates of some unknown characteristics of the population in question.
• A finite group is called population whereas a nonfinite (infinite) group is called universe.
WHY SAMPLING?
➢ Less costs
➢ More accuracy
*Probability Sampling
• A probability sample is a sample in which every element of the population has a known and equal probability of being
selected into the sample.
* Non-Probability Sampling
• Every item in the population has an even chance and likelihood of being selected in the sample.
• The selection of items completely depends on chance or by probability and therefore this sampling technique is also
sometimes known as a method of chances.
• A type of probability sampling which a research organization can branch off the entire population into multiple non-
overlapping, homogeneous groups (strata) and randomly choose final members from the various strata for research which
reduces cost and improves efficiency.
• A probability sampling method where the elements are chosen from a target population by selecting a random starting
point and selecting other members after a fixed ‘sampling interval’.
• Sampling interval is calculated by dividing the entire population size by the desired sample size.
• This sampling technique is used in an area or geographical cluster sampling for market research.
• A widespread geographical area can be expensive to survey in comparison to surveys that are sent to clusters which are
divided on the basis of area.
Example: A researcher is looking into understanding the smartphone usage in Germany. In this case, the cities of Germany
will form.
Multi-stage Random Sampling
• Divides large populations into stages to make the sampling process more practical.
• A combination of stratified sampling or cluster sampling and simple random sampling is usually used.
• In order to classify multistage sampling as probability sampling, each stage must involve a probability sampling method.
Convenience Sampling
• Used to create sample as per ease of access, readiness to be a part of the sample, availability at a given time slot or any
other practical specifications of a particular element.
• Researcher chooses members merely on the basis of proximity/nearness and doesn’t consider whether they represent
the entire population or not.
Example: Companies stop people at a mall or on a crowded street to distribute their promotional pamphlets and ask
questions.
Judgment Sampling
• It is most effective in situations where there are only a restricted number of people in a population who own qualities that
a researcher expects from the target population.
• The process of selecting a sample using judgmental sampling involves the researchers carefully picking and choosing
each individual to be a part of the sample.
• Researcher’s knowledge is primary in this sampling process as the members of the sample are not randomly chosen.
• The sample members are chosen only on the basis of the researcher’s knowledge and judgment.
• Researchers prefer to implement Judgmental sampling when they feel that other sampling techniques will consume more
time and that they have confidence in their knowledge to select a sample for conducting research.
Quota Sampling
• Researchers can decide the trait as per which the sample subset selection will be conducted so that the sample can be
effective in collecting data that can be generalized to the entire population.
• When a researcher seeks to conduct a comparative market analysis of how a product is dealt with, by different age
groups, socio-economic backgrounds and also gender.
Snowball Sampling
• A technique in which the samples have traits that are rare to find.
• A sampling technique, in which existing subjects provide referrals to recruit samples required for a research study.
Example: If you are studying the level of customer satisfaction among the members of an elite country club, you will find it
extremely difficult to collect primary data sources unless a member of the club agrees to have a direct conversation with
you and provides the contact details of the other members of the club.
SYNTHESIS / GENERALIZATION
Being familiarized and able to understand the concept of sampling is needed for students to be able to select individuals
or a subset of a population to make statistical inferences and estimate the characteristics of the whole population.
By using the different sampling techniques available, it will be easier for students to collect data since it is practical, cost-
effective, convenient, and manageable.
CHAPTER 11
• The most unbiased point estimate of the population mean, , is the sample mean.
Example:
A random sample of 32 textbook prices (rounded to the nearest dollar) is taken from a local college bookstore. Find a point
estimate for the population mean, .
Answer: The point estimate for the population mean of textbooks in the bookstore is $74.22.
INTERVAL ESTIMATE
• Used to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation,
which is a single number.
LEVEL OF CONFIDENCE
• The level of confidence c is the probability that the interval estimate contains the population parameter.
• It refers to the percentage of all possible samples that can be expected to include the true population parameter.
• The difference between the point estimate and the actual population parameter value is called the sampling error.
• Given a level of confidence, the margin of error E is the greatest possible distance between the point estimate and the
value of the parameter it is estimating.
σ
E = z cσ x = z c
n
Note that when n 30, the sample standard deviation, s, can be used for .
Example 1:
A random sample of 32 textbook prices is taken from a local college bookstore. The mean of the sample is 74.22, and the
sample standard deviation is s = 23.44. Use a 95% confidence level and find the margin of error for the mean price of
all textbooks in the bookstore.
σ 23.44
E = zc = 1.96
n 32
Since n 30, s can be substituted for σ.
Answer:
We are 95% confident that the margin of error for the population mean is about $8.12.
• Using the problem in example 1, solve for the interval estimates (left and right endpoints).
Answer: With 95% confidence we can say that the cost for all textbooks in the bookstore is between $66.10 and $82.34.
Example 2:
A random sample of 25 students had a grade point average with a mean of 2.86. Past studies have shown that the
standard deviation is 0.15 and the population is normally distributed. Construct a 90% confidence interval for the
population mean grade point average.
Given:
x = 2.86
σ 0.15
E = zc = 1.645
n 25 x + E = 2.86 ± 0.05
0.05 2.81 < μ < 2.91
Answer: With 90% confidence, the mean grade point average for all students in the population is between 2.81 and 2.91.
SAMPLE SIZE
z c
2
n= .
E
Example:
You want to estimate the mean price of all the textbooks in the college bookstore. How many books must be included in
your sample if you want to be 99% confident that the sample mean is within $5 of the population mean?
Given:
SYNTHESIS / GENERALIZATION
Understanding and applying the concept of confidence intervals for the mean using larger samples are important for
students to solve for the margin of error and interval estimates. By having the ability to solve for the margin of error and
interval estimates, students will be able to figure out how many percentage points the results will differ from the real
population value.
CHAPTER 12
THE t-DISTRIBUTION
• When a sample size is less than 30, and the random variable x is approximately normally distributed, it follows a t-
distribution.
• The t-distribution is a family of curves, each determined by a parameter called the degrees of freedom.
Example1:
Find the critical value tc for a 95% confidence when the sample size is 5.
d.f. = n – 1
=5-1
d.f = 4
Example 2:
In a random sample of 20 customers at a local fast food restaurant, the mean waiting time to order is 95 seconds, and the
standard deviation is 21 seconds. Assume the wait times are normally distributed and construct a 90% confidence
interval for the mean wait time of all customers.
Given:
d.f. = n – 1
= 20 - 1
d.f = 19
tc = 1.729
Then, solve for the interval estimates (left endpoint and right endpoint).
Answer:
SYNTHESIS / GENERALIZATION
Understanding and applying the concept of confidence intervals for the mean for smaller samples is important for students
to solve for the critical values, margin of error and interval estimates using the t-distribution.
Students will also be familiarized and be able to solve for the degrees of freedom and moreover, have the ability to
distinguish when and where to use the normal distribution from the t-distribution.
CHAPTER 13
HYPOTHESIS TESTING
WHAT IS A HYPOTHESIS?
HYPOTHESIS TESTS
• A hypothesis test is a process that uses sample statistics to test a claim about the value of a population parameter.
• Example:
If a manufacturer of rechargeable batteries claims that the batteries they produce are good for an average of at least 1,000
charges, a sample would be taken to test this claim.
HYPOTHESIS TESTING
• The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results
from a research study.
• Hypothesis testing is a technique to help determine whether a specific treatment has an effect on the individuals in a
population.
STATING A HYPOTHESIS
• A null hypothesis H0 (“H subzero” or “H naught”) is a statistical hypothesis that contains a statement of equality such
as , =, or .
• An alternative hypothesis Ha (“H sub-a”) is the complement of the null hypothesis. It is a statement that must be true if
H0 is false and contains a statement of inequality such as >, , or <.
• To write the null and alternative hypotheses, translate the claim made about the population parameter from a verbal
statement to a mathematical statement.
Example 1:
Write the claim as a mathematical sentence. State the null and alternative hypotheses and identify which represents
the claim.
A manufacturer claims that its rechargeable batteries have an average life of at least 1,000 charges.
Write the claim as a mathematical sentence. State the null and alternative hypotheses and identify which represents
the claim.
Statesville college claims that 94% of their graduates find employment within six months of graduation.
TYPE OF ERRORS
• A type II error occurs if the null hypothesis is not rejected when it is false.
LEVEL OF SIGNIFICANCE
• In a hypothesis test, the level of significance is your maximum allowable probability of making a type I error. It is
denoted by , the lowercase Greek letter alpha.
• The probability of making a type II error is denoted by , the lowercase Greek letter beta.
➢ = 0.10
➢ = 0.05
➢ = 0.01
• If the null hypothesis is true, a P-value of a hypothesis test is the probability of obtaining a sample statistic with a value as
extreme or more extreme than the one determined from the sample data.
• There are three types of hypothesis tests – a left-, right-, or two-tailed test.
LEFT-TAILED TEST
If the alternative hypothesis contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test.
RIGHT-TAILED TEST
If the alternative hypothesis contains the greater-than symbol (>), the hypothesis test is a right-tailed test.
TWO-TAILED TEST
If the alternative hypothesis contains the not-equal-to symbol (), the hypothesis test is a two-tailed test. In a two-tailed
test, each tail has an area of P.
Example 1: A cigarette manufacturer claims that less than one eighth of the US adult population smokes cigarettes
H0: p ≥ 0.125
Example 2: A local company claims that the average length of a phone call is 8 minutes
H0: µ = 8 (Claim)
Ha: µ ≠ 8
SYNTHESIS / GENERALIZATION
Understanding and applying the concept of hypothesis testing in order to test the correctness or incorrectness of a claim.
With the students having been able to identify the null and alternative hypotheses, the claim can now be tested and
evaluated. Also, students are now capable of using the decision-making rule and table in deciding whether to accept or
reject the claim.
CHAPTER 14
Recall that when the sample size is at least 30, the sampling distribution for the sample mean is normal.
Examples:
The P-value for a hypothesis test is P = 0.0256. What is your decision if the level of significance is:
a.) 0.05
Answer: Because 0.0256 is < 0.05, you should reject the null hypothesis.
b.) 0.01
Answer: Because 0.0256 is > 0.01, you should fail to reject the null hypothesis.
After determining the hypothesis test’s standardized test statistic and the test statistic’s corresponding area, do one of the
following to find the P-value.
Example 1:
The test statistic for a right-tailed test is z=1.56. Find the P-value.
P-value = 1 – z-score
= 1 - 0.9406
= 0.0594
Example 2:
The test statistic for a two-tailed test is z = - 2.63. Find the P-value.
P-value = 2 (z-score)
= 2 (0.0043)
= 0.0086
• The z-test for the mean is a statistical test for a population mean.
• It can be used when the population is normal and is known, or for any population when the sample size n is at least
30.
Example:
A manufacturer claims that its rechargeable batteries are good for an average of more than 1,000 charges. A random
sample of 100 batteries has a mean life of 1002 charges and a standard deviation of 14. Is there enough evidence to support
this claim at = 0.01?
H0: 1000
z-score = 0.9236
= 1 - 0.9236
= 0.0764
ANSWER:
At the 1% level of significance, there is not enough evidence to support the claim.
SYNTHESIS / GENERALIZATION
Understanding and applying the concept of statistical tests hypothesis testing for the mean is useful for students to have
the ability to solve for the p-values depending on the type of the hypothesis test.
With the students having been able to interpret a decision whether to accept or reject a claim, they can now perform
calculations and prove theories or statements and such.