Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

CHAPTER 9

LINEAR REGRESSION

REGRESSION ANALYSIS

• Regression is the technique concerned with predicting some variables by knowing others.

• Uses a variable (x) to predict some outcome variable (y).

• Tells you how values in y change as a function of changes in values of x.

CORRELATION AND REGRESSION

*Correlation

• Describes the strength of a linear relationship between two variables.

• Linear means “straight line”.

*Regression

• Tells us how to draw the straight line described by the correlation.

• Calculates the “best-fit” line for a certain set of data.

• The regression line makes the sum of the squares of the residuals smaller than for any other line.

• Regression minimizes residuals

LEAST-SQUARES METHOD

• A procedure that minimizes the vertical deviations of plotted points surrounding a straight line.

• It constructs a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form
of:

➢ y-hat = regression equation (the predicted equation for a line of best fit in linear regression.)

➢ b = slope (indicates the property possessed by a line or surface that departs from the horizontal line.)

➢ a = y-intercept (indicates the location where it intersects an axis.)

Note:

Regression equation describes the regression line mathematically.

Slope – indicates the steepness

Example 1: Regressing grades on hours where the Regression equation is given:

ŷ = 59.95 + 3.17x

• Predict the final grade of someone who studies for 12 hours:

ŷ = 59.95 + 3.17x

ŷ = 59.95 + 3.17 (no. of study hours per week)

ŷ = 59.95 + 3.17 (12)

ŷ = 97.99

Therefore, the predicted final grade for someone who studies for 12 hours is 97.99
• Predict the final grade of someone who studies for 1 hour:

ŷ = 59.95 + 3.17x

ŷ = 59.95 + 3.17 (no. of study hours per week)

ŷ = 59.95 + 3.17 (1)

ŷ = 63.12

Therefore, the predicted final grade for someone who studies for 1 hour is 63.12

Example 2: A sample of 6 persons was selected the value of their age (x variable) and their weight is demonstrated in the
following table. Find the following:

1. Regression equation and

2. The predicted weight when age is 8.5 years.

Serial No. Age (years) Weight (Kg)


x y
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13

As you can see in the table, there is no weight when the age is 8.5 years therefore, we will be predicting it using the least
squares method of the regression analysis.

Step 1: Compute for the xy, x2 and y2 in order to complete the table.

Serial No Age (years) Weight (Kg) y xy x2 y2


x

1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x = 41 ∑y = 66 ∑xy = 461 ∑x = 291
2
∑y = 742
2

Step 3: Solve for the x (sample mean) by dividing the value of ∑x with the total number of frequencies (n).

Step 4: Solve for the y (sample mean) by dividing the value of ∑y with the total number of frequencies (n).
Step 5: Substitute the computed values in the formula.

Step 7: Simplify the Regression Equation.

Step 8: Find the predicted weight when age is 8.5 years using the regression equation.

Answer: The predicted weight when age is 8.5 years is 12.54kgs.

SYNTHESIS / GENERALIZATION

Understanding the concept of linear regression is needed for students to be able to show or predict the relationship
between two variables or factors. The concept is to predict the value of a certain variable (dependent/outcome variable)
based on the value of another variable (independent/predictor variable).

By using the Least-Squares method, students can find the best-fit for a set of data points by minimizing the sum of the
residuals of points from the plotted curve. Students will also be able to predict the behavior of dependent/outcome
variables using this method.
CHAPTER 10

CLASSIFICATION OF SAMPLING TECHNIQUES

CONCEPT OF SAMPLING

• A process of selecting units from a population.

• A process of selecting a sample to determine certain characteristics of a population.

• A sample is a subset of a larger population of objects, individuals, households, businesses, organizations and so forth.

SAMPLING

• Sampling enables researchers to make estimates of some unknown characteristics of the population in question.

• A finite group is called population whereas a nonfinite (infinite) group is called universe.

• A census is an investigation of all individual elements of a population.

WHY SAMPLING?

➢ Less costs

➢ More accuracy

➢ Less field time

➢ Get information about large populations

➢ When it is impossible to study the whole population

CLASSIFICATION OF SAMPLING TECHNIQUES

*Probability Sampling

• Utilizes some form of random selection.

• A probability sample is a sample in which every element of the population has a known and equal probability of being
selected into the sample.

* Non-Probability Sampling

• Does not involve random selection.


TYPES OF PROBABILITY SAMPLING

Simple Random Sampling

• Every item in the population has an even chance and likelihood of being selected in the sample.

• The selection of items completely depends on chance or by probability and therefore this sampling technique is also
sometimes known as a method of chances.

• A fair sampling technique.

Stratified Random Sampling

• A type of probability sampling which a research organization can branch off the entire population into multiple non-
overlapping, homogeneous groups (strata) and randomly choose final members from the various strata for research which
reduces cost and improves efficiency.

Systematic Random Sampling

• A probability sampling method where the elements are chosen from a target population by selecting a random starting
point and selecting other members after a fixed ‘sampling interval’.

• Sampling interval is calculated by dividing the entire population size by the desired sample size.

Cluster / Area Random Sampling

• This sampling technique is used in an area or geographical cluster sampling for market research.

• A widespread geographical area can be expensive to survey in comparison to surveys that are sent to clusters which are
divided on the basis of area.

Example: A researcher is looking into understanding the smartphone usage in Germany. In this case, the cities of Germany
will form.
Multi-stage Random Sampling

• Divides large populations into stages to make the sampling process more practical.

• A combination of stratified sampling or cluster sampling and simple random sampling is usually used.

• In order to classify multistage sampling as probability sampling, each stage must involve a probability sampling method.

TYPES OF NON-PROBABILITY SAMPLING

Convenience Sampling

• Used to create sample as per ease of access, readiness to be a part of the sample, availability at a given time slot or any
other practical specifications of a particular element.

• A quick mode to collect data.

• Researcher chooses members merely on the basis of proximity/nearness and doesn’t consider whether they represent
the entire population or not.

Example: Companies stop people at a mall or on a crowded street to distribute their promotional pamphlets and ask
questions.

Judgment Sampling

• It is most effective in situations where there are only a restricted number of people in a population who own qualities that
a researcher expects from the target population.

• The process of selecting a sample using judgmental sampling involves the researchers carefully picking and choosing
each individual to be a part of the sample.

• Researcher’s knowledge is primary in this sampling process as the members of the sample are not randomly chosen.

• The sample members are chosen only on the basis of the researcher’s knowledge and judgment.

• Researchers prefer to implement Judgmental sampling when they feel that other sampling techniques will consume more
time and that they have confidence in their knowledge to select a sample for conducting research.

Quota Sampling

• The samples are chosen according to traits or qualities.

• Researchers can decide the trait as per which the sample subset selection will be conducted so that the sample can be
effective in collecting data that can be generalized to the entire population.

• When a researcher seeks to conduct a comparative market analysis of how a product is dealt with, by different age
groups, socio-economic backgrounds and also gender.

Example: Quota: Male, above 30.

Snowball Sampling

• A technique in which the samples have traits that are rare to find.

• A sampling technique, in which existing subjects provide referrals to recruit samples required for a research study.

• A popular business study method.

Example: If you are studying the level of customer satisfaction among the members of an elite country club, you will find it
extremely difficult to collect primary data sources unless a member of the club agrees to have a direct conversation with
you and provides the contact details of the other members of the club.
SYNTHESIS / GENERALIZATION

Being familiarized and able to understand the concept of sampling is needed for students to be able to select individuals
or a subset of a population to make statistical inferences and estimate the characteristics of the whole population.

By using the different sampling techniques available, it will be easier for students to collect data since it is practical, cost-
effective, convenient, and manageable.
CHAPTER 11

CONFIDENCE INTERVALS FOR THE MEAN (LARGE SAMPLES)

CONFIDENCE INTERVALS FOR THE MEAN (Large Samples)

POINT ESTIMATE FOR POPULATION μ

• A single value estimate for a population parameter.

• The most unbiased point estimate of the population mean, , is the sample mean.

Example:

A random sample of 32 textbook prices (rounded to the nearest dollar) is taken from a local college bookstore. Find a point
estimate for the population mean, .

Answer: The point estimate for the population mean of textbooks in the bookstore is $74.22.

INTERVAL ESTIMATE

• An interval estimate is an interval, or range of values, used to estimate a population parameter.

• Used to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation,
which is a single number.

LEVEL OF CONFIDENCE

• The level of confidence c is the probability that the interval estimate contains the population parameter.

• It refers to the percentage of all possible samples that can be expected to include the true population parameter.

COMMON LEVELS OF CONFIDENCE

Confidence Interval Level Corresponding z-score


90% confidence interval 1.645
95% confidence interval 1.96
99% confidence interval 2.575
MARGIN OF ERROR

• The difference between the point estimate and the actual population parameter value is called the sampling error.

• Given a level of confidence, the margin of error E is the greatest possible distance between the point estimate and the
value of the parameter it is estimating.

σ
E = z cσ x = z c
n
Note that when n  30, the sample standard deviation, s, can be used for .

Example 1:

A random sample of 32 textbook prices is taken from a local college bookstore. The mean of the sample is 74.22, and the
sample standard deviation is s = 23.44. Use a 95% confidence level and find the margin of error for the mean price of
all textbooks in the bookstore.

σ 23.44
E = zc = 1.96 
n 32
Since n  30, s can be substituted for σ.

Answer:

We are 95% confident that the margin of error for the population mean is about $8.12.

• Using the problem in example 1, solve for the interval estimates (left and right endpoints).

Use this formula:

Answer: With 95% confidence we can say that the cost for all textbooks in the bookstore is between $66.10 and $82.34.

Example 2:

A random sample of 25 students had a grade point average with a mean of 2.86. Past studies have shown that the
standard deviation is 0.15 and the population is normally distributed. Construct a 90% confidence interval for the
population mean grade point average.

Given:

x = 2.86
σ 0.15
E = zc = 1.645 
n 25 x + E = 2.86 ± 0.05
 0.05 2.81 < μ < 2.91

Answer: With 90% confidence, the mean grade point average for all students in the population is between 2.81 and 2.91.

SAMPLE SIZE

 z c 
2

n=  .
 E 

Example:

You want to estimate the mean price of all the textbooks in the college bookstore. How many books must be included in
your sample if you want to be 99% confident that the sample mean is within $5 of the population mean?

Given:

Answer: 146 books

SYNTHESIS / GENERALIZATION
Understanding and applying the concept of confidence intervals for the mean using larger samples are important for
students to solve for the margin of error and interval estimates. By having the ability to solve for the margin of error and
interval estimates, students will be able to figure out how many percentage points the results will differ from the real
population value.

CHAPTER 12

CONFIDENCE INTERVALS FOR THE MEAN (SMALL SAMPLES)

CONFIDENCE INTERVALS FOR THE MEAN (SMALL SAMPLES)

THE t-DISTRIBUTION

• When a sample size is less than 30, and the random variable x is approximately normally distributed, it follows a t-
distribution.

• The t-distribution is a family of curves, each determined by a parameter called the degrees of freedom.

d.f. = n – 1 (Degrees of freedom)

Example1:

Find the critical value tc for a 95% confidence when the sample size is 5.

d.f. = n – 1

=5-1

d.f = 4

Answer: The critical value (tc) = 2.776

Example 2:

In a random sample of 20 customers at a local fast food restaurant, the mean waiting time to order is 95 seconds, and the
standard deviation is 21 seconds. Assume the wait times are normally distributed and construct a 90% confidence
interval for the mean wait time of all customers.

Given:
d.f. = n – 1

= 20 - 1

d.f = 19

tc = 1.729

Now, compute for the margin of error (E).

Then, solve for the interval estimates (left endpoint and right endpoint).

Answer:

➢ Critical value (tc) = 1.729

➢ Margin of Error (E) = 8.1

➢ Left endpoint = 86.9

➢ Right endpoint = 103.1

DETERMINING WHAT TO USE (Normal or t-Distribution?)


Remember that:

➢ Normal – uses z-score.

➢ t-Distribution – uses tc and degrees of freedom (d.f.)

SYNTHESIS / GENERALIZATION

Understanding and applying the concept of confidence intervals for the mean for smaller samples is important for students
to solve for the critical values, margin of error and interval estimates using the t-distribution.

Students will also be familiarized and be able to solve for the degrees of freedom and moreover, have the ability to
distinguish when and where to use the normal distribution from the t-distribution.
CHAPTER 13

HYPOTHESIS TESTING

WHAT IS A HYPOTHESIS?

• A hypothesis is an assumption about the population parameter.

• A parameter is a population mean or proportion.

• The parameter must be identified before analysis.

HYPOTHESIS TESTS

• A hypothesis test is a process that uses sample statistics to test a claim about the value of a population parameter.

• A verbal statement, or claim, about a population parameter is called a statistical hypothesis.

• Example:

If a manufacturer of rechargeable batteries claims that the batteries they produce are good for an average of at least 1,000
charges, a sample would be taken to test this claim.

HYPOTHESIS TESTING

• The general goal of a hypothesis test is to rule out chance (sampling error) as a plausible explanation for the results
from a research study.

• Hypothesis testing is a technique to help determine whether a specific treatment has an effect on the individuals in a
population.

STATING A HYPOTHESIS

• A null hypothesis H0 (“H subzero” or “H naught”) is a statistical hypothesis that contains a statement of equality such
as , =, or .

• An alternative hypothesis Ha (“H sub-a”) is the complement of the null hypothesis. It is a statement that must be true if
H0 is false and contains a statement of inequality such as >, , or <.

• To write the null and alternative hypotheses, translate the claim made about the population parameter from a verbal
statement to a mathematical statement.

Example 1:

Write the claim as a mathematical sentence. State the null and alternative hypotheses and identify which represents
the claim.

A manufacturer claims that its rechargeable batteries have an average life of at least 1,000 charges.

Null hypothesis (H0):   1000 (Claim)

Alternative hypothesis (Ha):  < 1000


Example 2:

Write the claim as a mathematical sentence. State the null and alternative hypotheses and identify which represents
the claim.

Statesville college claims that 94% of their graduates find employment within six months of graduation.

Null hypothesis (H0): p = 0.94 (Claim)

Alternative hypothesis (Ha): p  0.94

TYPE OF ERRORS

• A type I error occurs if the null hypothesis is rejected when it is true.

• A type II error occurs if the null hypothesis is not rejected when it is false.

LEVEL OF SIGNIFICANCE

• In a hypothesis test, the level of significance is your maximum allowable probability of making a type I error. It is
denoted by , the lowercase Greek letter alpha.

• The probability of making a type II error is denoted by , the lowercase Greek letter beta.

Commonly used levels of significance:

➢  = 0.10

➢  = 0.05

➢  = 0.01

PROBABILITY VALUE (P-values)

• If the null hypothesis is true, a P-value of a hypothesis test is the probability of obtaining a sample statistic with a value as
extreme or more extreme than the one determined from the sample data.

• The P-value of a hypothesis test depends on the nature of the test.

• There are three types of hypothesis tests – a left-, right-, or two-tailed test.

LEFT-TAILED TEST

If the alternative hypothesis contains the less-than inequality symbol (<), the hypothesis test is a left-tailed test.
RIGHT-TAILED TEST

If the alternative hypothesis contains the greater-than symbol (>), the hypothesis test is a right-tailed test.

TWO-TAILED TEST

If the alternative hypothesis contains the not-equal-to symbol (), the hypothesis test is a two-tailed test. In a two-tailed
test, each tail has an area of P.

IDENTIFYING TYPE OF TEST

Example 1: A cigarette manufacturer claims that less than one eighth of the US adult population smokes cigarettes

H0: p ≥ 0.125

Ha: p < 0.125 (Claim)

Example 2: A local company claims that the average length of a phone call is 8 minutes

H0: µ = 8 (Claim)

Ha: µ ≠ 8
SYNTHESIS / GENERALIZATION

Understanding and applying the concept of hypothesis testing in order to test the correctness or incorrectness of a claim.
With the students having been able to identify the null and alternative hypotheses, the claim can now be tested and
evaluated. Also, students are now capable of using the decision-making rule and table in deciding whether to accept or
reject the claim.
CHAPTER 14

STATISTICAL TESTS: HYPOTHESIS TESTING FOR THE MEAN

USING P-values TO MAKE A DECISION

Recall that when the sample size is at least 30, the sampling distribution for the sample mean is normal.

Examples:

The P-value for a hypothesis test is P = 0.0256. What is your decision if the level of significance is:

a.) 0.05

Answer: Because 0.0256 is < 0.05, you should reject the null hypothesis.

b.) 0.01

Answer: Because 0.0256 is > 0.01, you should fail to reject the null hypothesis.

FINDING THE P-value

After determining the hypothesis test’s standardized test statistic and the test statistic’s corresponding area, do one of the
following to find the P-value.

Example 1:

The test statistic for a right-tailed test is z=1.56. Find the P-value.

P-value = 1 – z-score

= 1 - 0.9406

= 0.0594
Example 2:

The test statistic for a two-tailed test is z = - 2.63. Find the P-value.

P-value = 2 (z-score)

= 2 (0.0043)

= 0.0086

USING P-values FOR A z-Test

• The z-test for the mean is a statistical test for a population mean.

• It can be used when the population is normal and  is known, or for any population when the sample size n is at least
30.

• When n  30, the sample standard deviation s can be substituted for 

Example:

A manufacturer claims that its rechargeable batteries are good for an average of more than 1,000 charges. A random
sample of 100 batteries has a mean life of 1002 charges and a standard deviation of 14. Is there enough evidence to support
this claim at  = 0.01?

H0:   1000

Ha:  > 1000 (Claim)

The level of significance is  = 0.01.

z-score = 0.9236

Ha:  > 1000

Type of Test: Right-tailed


P-value = 1 - zscore

= 1 - 0.9236

= 0.0764

1. If P  , then reject H0.

2. If P > , then fail to reject H0.

Since the p-value is 0.0764 and  is 0.01

0.0764 > 0.01 fail to reject H0.

ANSWER:

At the 1% level of significance, there is not enough evidence to support the claim.

SYNTHESIS / GENERALIZATION

Understanding and applying the concept of statistical tests hypothesis testing for the mean is useful for students to have
the ability to solve for the p-values depending on the type of the hypothesis test.

With the students having been able to interpret a decision whether to accept or reject a claim, they can now perform
calculations and prove theories or statements and such.

You might also like