Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

STA404 : TUTORIAL 1 – INTRODUCTION TO STATISTICS

1. Classify each of the following random variables as qualitative or quantitative.


a) Weights of fish caught in Lake Kenyir.
b) Marital status of club members in Penang Swimming Club Society.
c) The average Body Mass Index of Malaysia’s football players.

2. A researcher selected a sample of foreign tourists who arrive at Penang Airport in July
2015, he interviewed each of them to collect the following data: the nationality, gender,
age, occupation, annual income, purpose of travel, and number of days stay in Penang.
Classify each of these variables as Qualitative or Quantitative.

3. Classify each of the following random variables as discrete or continuous.


a) Capacity (in gallons) of six reservoirs in Malaysia.
b) The number of people waiting in line at a checkout counter.
c) The size of a pair of shoes.
d) The time it takes a customer to choose a product.
e) The height of 5-year-old durian trees.
f) The total number of phone calls a salesman makes in a week.

4. Identify whether the study indicated below is descriptive or inferential.


a) A biologist constructs a graph showing the average height of a plant that is given
a new plant food over 6-month period.
b) The table below shows the number of homicides in a particular countrya from year
1989 to1993.
c) Based on a random sample of 1000 people, a researcher obtained the following
estimates of the percentage of people lacking health insurance in one big city.

5. Classify each of the following as nominal-level, ordinary-level, interval-level, or ratio-


level measurement.
a) Rating of eight local tourist sites (poor, fair, good, excellent).
b) Customer service waiting times at queue lines.
c) Age of students in a classroom.
d) Marital status of managers in restaurants.
e) Horsepower of tractor engines.
6. Identify the sampling technique used to obtain a sample. Explain your answer.
a) A group of people are classified according to height and then random samples of
people from each group are taken.
b) A custom stops every 5th passengers and carry out a thorough inspection of
his/her baggage for all arriving local travelers at Subang Airport.
c) In order to determine the expected number of broken eggs per carton, an egg
packaging worker selects a group of 40 cartons that come off a particular conveyor
belt and counts the number of broken eggs within these cartons.
d) A researcher randomly selects 8 hotels in Penang and interviews all the workers
at each of these 8 hotels.
e) A pollster uses a computer to generate 500 random numbers and then interviews
the voters corresponding to those numbers.
f) From a group of 496 students, every 49th student starting with the 3rd student is
selected.
g) At a college there are 120 certificate level students, 90 diploma students, 110
degree students, and 80 master students. A school administrator selects a simple
random sample of 12 of the certificate level students, a simple random sample of
9 of the diploma students, a simple random sample of 11 of the degree students,
and a simple random sample of 8 of the master students. She then interviews all
the students selected.

7. The manager of Sun Rise Hotel is interested to determining the level of customer
satisfaction with the online room booking service provided by the hotel. She decided to
select a random sample of 100 customers who are using the service and staying at the
hotel for the next 20 days. A questionnaire will be distributed upon the arrival of the
customers at the hotel. The questionnaire contains items to elicit information on
demographic profile of the customers such as gender, monthly income, occupation and
the satisfaction level (from 0 which is very poor to 5 which is very good).
a) State the population for the above study.
b) State the sample for the above study.
c) Does the study involve primary or secondary data? State your reason.
d) State one advantage and one disadvantage of using the type of data stated in (c)
above.
e) State any TWO (2) qualitative variables that can be obtained from the
questionnaire. Determine their level of measurement. State the appropriate
graphical presentation for each of the variables.
f) State any TWO (2) quantitative variables that can be obtained from the
questionnaire. Determine their level of measurement. State the appropriate
graphical presentation for each of the variables.

8. The internet serves as a useful tool for suppliers and customers to communicate
information and enable purchasing online. Over the last two decades, various
approached have been used to investigate travelers’ online satisfaction and purchase
intention. Due to its importance in the travel industry, a study was conducted by a team
of researcher from Royal Holiday Resort in providing an outstanding high-quality service
to their customers. The target populations for this study were those international and
local travelers who employ the Royal Holiday Resort purchasing online. The email
questionnaire was disseminated to all 800 travelers that used this platform from various
nationalities. The email questionnaire instrument consisted of three sections. The first
part was designed to extract travelers’ demographic profile including their age, gender,
race, nationality, working position, annual income, number of children, choice of
information search channels, hotel booking options, and number of frequently used
online purchasing channels. The second and third parts contains seven-point Likert
scale items ranging from strongly agree (7) to strongly disagree (1) comprising
satisfaction on purchasing online. The satisfaction questionnaires including staff service
quality, room qualities, and general amenities.
a) State the population for the above study.
b) What is the sampling frame for the above study.
c) Does the study involve primary or secondary data? Give a reason to support your
answer.
d) State any FOUR (4) variables that can be obtained from the questionnaire. Hence,
determine the types of variables used and the corresponding scale of
measurements.

9. With the influx of mobile cell phones, housewives are no longer relying on house phones
for easy communication. A national telephone company is interested to find out ways to
improve their services so that house phones will still be in the market. A survey will be
conducted to elicit information on ways to improve the services provided by the
telephone company. A sampling frame consisting of 55500 names of telephone
subscribers in a rural area will be used to choose a sample. These names are listed in
alphabetical order. A random sample of 1500 subscribers is needed for this survey and
questionnaires will be mailed to each of them.
a) Identify the population of the study.
b) Identify the sample of the study.
c) Determine whether the study involves primary data or secondary data. Give a
reason to support your answer.
d) State the main objective of the study.
e) Can the sample be chosen using random sampling technique? Give a reason for
your answer.
f) State a possible question that would be in the questionnaire and its measurement
level.
10. Due to the increasing number of tourist arrival in Malaysia, it is important for hoteliers to
aware of the different needs of these tourists. A study was conducted to examine the
differences in the importance ratings of hotel selection criteria between local and foreign
tourists. A random sample of 120 local tourists and 130 foreign tourists were selected
and a set of questionnaire was distributed to each of the tourists. The questionnaire
contains information on the demographic profile of the tourists as well as the importance
ratings (0 to 100) of hotel selection criteria such as price, type of services provided,
distance and type of food provided.
a) State the population for the above study.
b) State the sample for the above study.
c) Does the study involve primary or secondary data? State your reason.
d) State ONE (1) advantage and ONE (1) disadvantage of using the type of data in (c).
e) State any TWO (2) qualitative and ONE (1) quantitative variable that can be obtained
from the questionnaire. Determine their level of measurement.
f) State the appropriate graphical presentation for each of the variables in (e).
11. The management of a popular culinary magazine, ‘Malaysian Cooks Book’ decided to
conduct a study to determine what characteristics of Malaysian food that attract most
Arabian tourists. In order to obtain the information, they decided to invite all Arabian
tourists who are in Kuala Lumpur in September 2015 to participate in a food creativity
festival. Invitations will be issued through Saudi Arabian embassy and announcements
in newspapers, radio and television, At the festival, the tourists will be given
questionnaires and will be asked to evaluate by giving a score out of 100 to a number
of dishes according to the taste and presentation of each dish. In addition to the scores,
data regarding their age, race, gender, income, education, and the region they come
from will also be obtained from each tourist.
a) Identify the population of the study.
b) Determine whether the study involves primary data or secondary data. Give ONE
reason to support your answer.
c) State the main objective of the study.
d) State FOUR (4) of the variables mention in the scenario above and identify the
best graphical presentation for EACH variable.
e) State TWO (2) possible questions that would be in the questionnaire and its
measurement level.

12. Tourism industry is one of the contributors in the Malaysian economic growth. As a
result, the Ministry of Tourism is consistently monitoring the travel agencies in providing
good transportation services to the tourists. A research was conducted to identify the
level of satisfaction towards the transportation services provided by the agencies. Due
to limited time and costs, a random sample of 200 tourists was selected. Questionnaires
were distributed to tourists who used the transportation services. Among the information
collected were gender, age, number of visits to Malaysia, types of transportation used
during the vacation and their perception on transportation services (very poor, poor, fair,
good, excellent).
a) Identify the population for the study.
b) Determine if the study employs a sample survey or a census. Give a reason to
support your answer.
c) State TWO (2) variables in the study. Hence, identify the types of variables and
the corresponding scales of measurement.
d) State an appropriate graphical presentation for each of the variables stated in (c).
e) State the most appropriate data collection method for the study. Give a reason to
support your answer.
STA404 : TUTORIAL 2 – DESCRIPTIVE STATISTICS

1. The scores (out of 100) given by the foreign tourists to the services provided at the tourist
visit destinations are summarized as below:

Table 1 : Descriptive Statistics


Gender N Mean Median Standard Minimum Maximum Skewness
deviation
Male 28 84 88 7.2 78 92 -1.6667
Female 22 83 80 6.8 75 85 0.9184

a) How many female foreign tourists are selected in the study?


b) State the lowest score given by the male foreign tourist to the services provided at
the tourist visit destinations?
c) State the highest score given to the services provided at the tourist visit
destinations?
d) State the skewness of the male's scores distribution and explain what it means.
e) Which gender is more consistent when giving scores to the services provided at
the tourist visit destinations? Give a reason for your answer.

2. The weights (in kilograms) of the respondents selected for a study are summarized as
follows:
Weights of Respondents
3 9
4 5 7 9
5 2 4 5 5 7
6 0 0 3 4 5 6 8
7 2 2 2 3 5 5 5 5 7
8 0 0 3 4 4 4 6 6 8 8 8
9 3 3 5 7

a) State the name of the above chart.


b) How many respondents were involved in the study?
c) What is the mean weight of the respondents? Interpret the values.
d) State the lightest and the heaviest weights of the respondent.
e) If the weight of the respondents is above 80 kilograms, the respondent is
considered obese, what is the percentage of obese respondents in this study?
f) What is the median and modal weight of the respondents? Explain the values
obtained.
g) State the skewness of the above distribution and explain what it means.

3. Table 1 below gives the descriptive statistics on the number of cigarettes smoked by a
sample of respondents.

a) How many respondents were involved in the survey?


b) From the table, using the values of mean and mode to show that Pearson
coefficient of skewness is –0.0073. State the shape of the distribution and explain
what it means.
4. The following data represents the age of tourists at Hotel Fabulous Vacation in January
2011.

a) State the youngest and oldest tourist at Hotel Fabulous Vacation.


b) Compute the range of the data.
c) Find the mode, mean and standard deviation for the age of tourists at the hotel.
d) Find the first quartile, median and third quartile for the age of tourists at the hotel.
e) Find the Pearson coefficient of skewness and state the skewness of the
distribution. Explain what it means.

5. The following table shows sample statistics of the diameter of vehicle disc brakes
measured in millimeters.

a) From the table, state the value of minimum, maximum, first quartile, median and
third quartile.
b) Sketch a box and whisker plot for the above data.
c) From the table, determine the value of the Pearson coefficient of skewness. State
the shape of the distribution.

6. Table 1 below shows the number of hours spent for doing part time jobs in a month by
10 workers in Town A and Town B respectively.

Table 2 below shows descriptive statistics obtained from computer output analysed
based on data in Table 1.
Based on the information provided,
a) Determine the values of P, Q, R, S and T.
b) interpret the meaning of the value Q.
c) compute and interpret the meaning of the Pearson coefficient of skewness for the
number of hours spent for doing part time jobs by the workers from Town B.
d) using an appropriate measure, determine the town where the workers are more
consistent, in the number of hours spent for doing part time jobs.

7. Diagram below shows Box and Whiskers plots for the marks scored by students in Quiz
2 and Quiz 3.

a) Describe the general shape of the distributions of the marks from both quizzes.
b) In which quiz do the marks are more dispersed? Give a reason to support your
answer.
c) In which quiz do more students score higher marks?
d) Comparing the median value of the two quizzes, in which quiz do 50% of the
students scored higher marks?
8. Figure 1 shows the boxplot of family size (total family members) for two groups of tourists
(local and foreign). Based on the figure, answer the following questions.

a) Determine the shape of the distribution of family size for both local and foreign
tourists.
b) Which group shows the highest dispersion in family size? Give a reason to support
your answer.
c) Estimate the followings for local tourists and interpret the meaning of each value:
i. Minimum
ii. Maximum
iii. First quartile
iv. Median
v. Third quartile
STA404: TUTORIAL 3 – SAMPLING DISTRIBUTION AND ESTIMATION (CONFIDENCE
INTERVAL FOR 1 MEAN)

For question 1 – 5, determine whether each statement is true or false. If the statement is
false, explain why.
1. A statistic is said to be an efficient estimator of a population parameter if, with increasing
sample size, it becomes almost certain that the value of the statistic comes very close
to that of the population parameter.
2. An interval estimate is a range of values used to estimate the shape of a population’s
distribution.
3. The most frequently used estimator of  is s.
4. Interval estimates are preferred over point estimates since a confidence level can be
specified.
5. For a specific confidence interval, the larger the sample size, the smaller the margin of
error will be.

6. A recent study showed that the modern working person experiences an average of 2.1
hours per day of distractions (phone calls, e-mails, impromptu visits, etc.). A random
sample of 50 workers for a large corporation found that these workers were distracted
an average of 1.8 hours per day and the population standard deviation was 20 minutes.
Estimate the true mean population distraction time with 90% confidence and compare
your answer to the results of the study.

7. A study of 415 university students showed that they have seen on average 5000 hours
of Youtube channel. If the sample standard deviation of the population is 900, find the
95% confidence level of the mean for all students. If lecturers claimed that their students
watched 4000 hours, would the claim be believable?

8. Ten randomly selected people were asked how long they slept at night. The mean time
was 7.1 hours, and the standard deviation was 0.78 hour. Find the 95% confidence
interval of the mean time. Assume the variable is normally distributed.

9. The posted speed limit on the Federal Highway Kuala Lumpur is 90 km/h. Congestion
results in much slower speed actual speeds. A random sample of 25 vehicles clocked
speed with an average of 70 km/h and a standard deviation of 0.48 km/h. Assume that
the speed follows normal distribution.
a) Estimate the standard deviation of the population
b) Estimate the standard error of the mean for this population
c) What are the upper and lower limits of the confidence interval for the mean speed
given a desired confidence level of 95%?

10. Amount spent for groceries (in RM) for 25 housewives were taken randomly from a
normally distributed population. Based on the information below, answer the following
questions:
Table 1 : Summary statistics
Mean Std deviation
Amount spent for groceries (RM) 210.35 11.44

a) Construct a 95% confidence interval for the amount of groceries spent by the
housewives.
b) Based on the confidence interval obtained, can you conclude that the housewives
spent more than RM200 for groceries? Explain your answer.
11. A report prepared by the Academic Affairs of ABC College claimed that on the average,
students spent 3 hours per day for studying. However, one the dean at the college
believes that the mean study hours spent by students is actually more than 3 hours per
day. To investigate this claim, the dean carried out a survey and the following were
obtained. Assume that the study hours follow normal distribution.

Table 1 : One sample statistics


N Mean Std deviation Std error of mean
Study hour 20 3.3240 0.40829 0.0913

a) Show that the standard error of mean is 0.0913.


b) Construct a 95% confidence interval for the mean study hours of the students per
day. Hence, can you conclude that the mean study hours of the students per day
is more than 3 hours? Explain your answer.
STA404: TUTORIAL 4 – ESTIMATION (CONFIDENCE INTERVAL THE DIFFERENCE
BETWEEN 2 MEANS)

1. In parts of the eastern United States, whitetail deer are a major nuisance to farmers and
homeowners, frequently damaging crops, gardens, and landscaping. A consumer
organization arranges a test of two of the leading deer repellents A and B on the market.
Fifty-six unfenced gardens in areas having high concentrations of deer are used for the
test. Twenty-nine gardens are chosen at random to receive repellent A, and the other
27 receive repellent B. For each of the 56 gardens, the time elapsed between application
of the repellent and the appearance in the garden of the first deer is recorded. for
repellent A, the mean time is 101 hours. For repellent B, the mean time is 92 hours.
Assume that the two populations of elapsed times have normal distributions with
population standard deviations of 15 and 10 hours, respectively.

(a) Let 𝜇1 and 𝜇2 be the population means of elapsed times for the two repellents,
respectively. find the point estimate of 𝜇1 − 𝜇2 . (answer: 9)
(b) Find a 97% confidence interval for 𝜇1 − 𝜇2 and interpret the interval.
(answer: (1.6529, 16.3471))

2. The management of New Century Bank claims that the mean waiting time for all
customers at its branches is less than that at the Public Bank, which is its main
competitor. A business consulting firm took a sample of 200 customers from the New
Century Bank and found that they waited an average of 4.5 minutes before being served.
Another sample of 300 customers taken from the Public Bank showed that these
customers waited an average of 4.75 minutes before being served. Assume that the
standard deviations for the two populations are 1.2 and 1.5 minutes, respectively.
Construct a 98% confidence interval for the difference between the two population
means. (answer: (-0.532, 0.032))

3. The following information was obtained from two independent samples selected from
two normally distributed populations with unknown but equal deviations.
Sample 1: 47.7 46.9 51.9 34.1 65.8 61.5 50.2 40.8 53.1 46.1 47.9 45.7 49.0
Sample 2: 50.7 47.4 32.7 48.8 54.0 46.3 42.5 40.8 39.0 68.2 48.5 41.8

(a) Let 𝜇1 be the mean of population 1 and 𝜇2 be the mean of population 2. What is
the point estimate of 𝜇1 − 𝜇2 ? (answer: 2.56)
(b) Construct a 98% confidence interval for 𝜇1 − 𝜇2 . (answer: (- 5.92, 11.04))

4. A company claims that its medicine, Brand A, provides faster relief from pain than
another company’s medicine, Brand B. A researcher tested both brands of medicine on
two groups of randomly selected patients. The results of the test are given in the
following table. The mean and standard deviation of relief times are in minutes.

Brand Sample Size Mean of Relief Standard deviation of


Times Relief Times
A 25 44 11
B 22 49 9

Assume that the two populations are normally distributed with unknown but equal
standard deviations. Construct a 99% confidence interval for the difference between the
mean relief times for the two brands of medicine. (answer: (-12.996, 2.996))
5. A sample of 35 customers who drive luxury cars showed that their average distance
between oil changes was 5099 kilometres with a sample deviation of 67.80 kilometres.
Another sample of 31 customers who drive compact lower-price cars resulted in an
average distance of 5142 kilometres with a standard deviation of 72.10 kilometres.
Suppose that the standard deviation for the two populations are not equal.

(a) Construct a 95% confidence interval for the difference in the mean distance
between oil changes for all luxury cars and all compact lower-price cars? (answer:
((- 77.585, - 8.415))
(b) Can you conclude from (a) if there is a difference in the mean distance between
oil changes for all luxury cars and all compact lower-price cars? Give a reason to
support your answer. (answer: Yes)

6. Quadro Corporation has two supermarkets in a city. The company’s quality control
department wanted to check if the customers are equally satisfied with the service
provided at these two stores. A sample of 380 customers selected from Supermarket I
produced a mean satisfaction index of 7.6 (on a scale of 1 to 10, 1 being the lowest and
10 being the highest) with a standard deviation of 0.75. Another sample of 370
customers selected from Supermarket II produced a mean satisfaction index 8.1 with a
standard deviation of 0.59. Assume that the customer satisfaction index for each
supermarket has an unknown and different population standard deviation. Construct a
98% confidence interval for the difference in the mean satisfaction indexes for the
customers for the two supermarkets. (answer: (- 0.614, - 0.386))
STA404: TUTORIAL 5 – ESTIMATION (CONFIDENCE INTERVAL THE DIFFERENCE
BETWEEN 2 MEANS – PAIRED SAMPLE)

1. A company believes that by setting up canteen facilities, the time spent by employees
during lunch could be reduced. A sample of 23 employees was selected and the time
(in minutes) spent during lunch, before and after the set up canteen facilities were
recorded. The results are shown in the following tables.

Based on the SPSS output,


i) Show that the standard error of the mean is 4.55
ii) Prove that the test statistic is 1.05
iii) State the 95% confidence interval for the mean difference before and after the
setup canteen was recorded. Interpret the results obtained.

2. The weight (in kilograms) of five patients before they stopped smoking and five weeks
after they stopped smoking are given as follows:

Patient 1 2 3 4 5
Before 66 80 69 52 75
After 71 82 74 56 85

At 99% confidence interval, can you conclude that stopped smoking had increased the
patients weight?

3. Examiner A and Examiner B are assigned to mark the students’ projects. The marks
given by the examiners for each of twelve randomly selected students’ projects are
recorded. The analysis using SPSS are given below.

Student 1 2 3 4 5 6 7 8 9 10 11 12
Examiner A 53 53 39 47 72 61 64 51 68 59 58 53
Examiner B 42 44 42 56 65 63 58 59 89 87 72 79

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Examiner A 56.58 12 9.120 2.633
Examiner B 63.00 12 16.310 4.708
Paired Samples Test
95% Confidence
Std. Std. Error Sig. (2-
Mean Interval of the t df
Dev Mean tailed)
Difference
Examiner A –
-6.417 13.708 3.957 -15.126 2.293 -1.622 11 0.133
Examiner B

Based on the SPSS output,


i) Show that the standard error of the mean is 3.957
ii) Prove that the t-statistic is -1.622
iii) At 95% confidence interval, is there a difference in the marks given by the two
examiners?

4. The following table shows the number of defective units produced during the day shift
and afternoon shift for a sample of six days.

Day 1 2 3 4 5 6
Day shift 11 8 12 15 14 17
Afternoon shift 8 6 10 13 10 14

At 99% confidence interval, can we conclude that there are more defective units
produced during the day shift? Assume that the population of the number of defective
items is normally distributed.

5. In order to help overweight person to reduce their weights, a nutritionist had introduced
a special diet. Below are recorded weights (in kilogram) of eight volunteers before and
after following the special diet.

Person 1 2 3 4 5 6 7 8
Before 72 105 99 75 80 77 89 70
After 67 90 89 69 72 72 79 67

At 99% confidence interval, can you conclude that the special diet is effective in reducing
weight.
STA404: TUTORIAL 6 – HYPOTHESIS TESTING (ONE-SAMPLE MEAN)

1. A researcher wishes to see if the mean number of days that a basic, low-price, small
automobile sits on a dealer’s lot is 29. A sample of 30 automobile dealers has a mean
of 30.1 days for basic, low-price, small automobiles. At 𝛼 = 0.05, test the claim that the
mean time is greater than 29 days. The standard deviation of the population is 3.8 days.
(Answer: 𝐻1 : 𝜇 > 29, critical value = 1.6449, test statistic = 1.59, do not reject 𝐻0 )

2. A medical investigation claims that the average number of infections per week at a
hospital is 16.3. A random sample of 10 weeks had a mean number of 17.7 infections.
The sample standard deviation is 1.8. Is there enough evidence to reject the
investigator’s claim at 𝛼 = 0.05? Assume that the population distribution of the number
of infections is approximately normally distributed.
(Answer: 𝐻1 : 𝜇 ≠ 16.3, critical values = ±2.262, test statistic = 2.459, reject 𝐻0 )

3. A survey found that the average hotel room rate in Shah Alam is RM176.84 and the
average room rate in Klang is RM161.22. Assume that the data were obtained from two
samples of 50 hotels each and that the standard deviations of the populations are
RM11.24 and RM9.66, respectively. At 𝛼 = 0.05, can it be concluded that there is a
significant difference in the rates?
(Answer: 𝐻1 : 𝜇𝑆𝐴 − 𝜇𝐾 ≠ 0, critical values = ±1.96, test statistic = 7.45, reject 𝐻0 )

4. Amount spent for groceries (RM) for 25 housewives were taken randomly. Based on
the information given, answer the following questions :

Table 1 : Summary Statistics


Mean Std. Deviation
Amount spent for groceries (RM) 210.35 11.44

Specify the null and alternative hypotheses for testing whether housewives spent more
than RM200 for groceries. Perform a test at 5% significance level.
(Answer : Test statistic = 4.52. Critical value = 1.711. Decision: reject 𝐻0 )

5. A report prepared by the Academic Affairs Department of UCC College claimed that on
the average, students spent 3 hours per day studying. However, one of the deans at
the college believes that the mean study hours is actually more than 3 hours per day.
To investigate this claim, the dean carried out a survey and the following results were
obtained.

Table 2 : Summary Statistics


Mean Std. Deviation
Amount spent for groceries (RM) 210.35 11.44

Perform a test of hypotheses at 5% significance level to test the claim made by one the
deans at the college that the mean study hours is actually more than 3 hours per day.
Show all steps clearly.
(Answer : Test statistic = 3.5487. Critical value=1.729. Decision: reject 𝐻0 )
6. The performances measured by time taken (in seconds) for 21 finishers in the men’s
2000 meter event at a championship were analyzed and the computer output is given
below :

Table 3 : One-Sample Statistics


N Mean Std. Deviation Std. Error Mean
Time 21 407.6143 17.6038 P
Table 4 : One-Sample Test
Test Value = 420
98% Confidence Interval of
Sig. (2- Mean
t df the Difference
tailed) difference
Lower Upper
Time -3.224 20 0.004 -12.3857 -22.097 -2.675

a) Determine the value for P.


b) Based on the p-value in the computer output, is there any evidence that the
average performance is significantly different from 420 seconds? Use 𝛼 = 0.01.
(Answer : a) P = 3.841, b) p-value = 0.004. Decision: reject 𝐻0 )

7. It is claimed that many working population suffer from depression. Forty participants are
recruited for psychological intervention study. The researcher performs a test on each
participant to characterize his/her depression level based on a particular depression
index. Anyone who achieves a score of 4.0 is defined to have a ‘normal’ level of
depression. Meanwhile, lower scores indicate low level of depression and higher scores
indicate greater level of depression. Depression scores are recorded in terms of the
variable dep_score. An analysis has been conducted and the results are reported in the
following tables.

Table 4 : One-Sample Statistics


N Mean Std. Deviation Std. Error Mean
Dep_score 40 3.7225 0.73709 0.11654

Table 5 : One-Sample Test


Test Value = 4
95% Confidence Interval of
Sig. (2- Mean
t df the Difference
tailed) difference
Lower Upper
Dep_score -2.381 39 0.022 -0.27750 -0.5132 -0.0418

Based on the above tables, answer the following questions.

a) Show that the standard error of the mean is 0.11654.


b) State the null and alternative hypotheses for the above study.
c) Using the p-value, decide whether there is sufficient evidence to indicate that the
depression level of the participants have ‘normal’ level of depression at 5%
significance level.
STA404: TUTORIAL 7 – HYPOTHESIS TESTING (TWO POPULATION MEANS – LARGE
SAMPLE AND EQUAL VARIANCE)

1. A survey found that the average hotel room rate in Shah Alam is RM176.84 and the
average room rate in Klang is RM161.22. Assume that the data were obtained from two
samples of 50 hotels each and that the standard deviations of the populations are
RM11.24 and RM9.66, respectively. At 𝛼 = 0.05, can it be concluded that there is a
significant difference in the rates?
(Answer: 𝐻1 : 𝜇𝑆𝐴 − 𝜇𝐾 ≠ 0, critical values = ±1.96, test statistic = 7.45, reject 𝐻0 )

2. A researcher hypothesizes that people who are allowed to sleep for only four hours will
score significantly lower on a cognitive skills test than people who are allowed to sleep
for eight hours. He brings sixteen participants into his sleep lab and randomly assigned
them to one of the two groups. In one group, he has participants slept for eight hours
and in the other groups he has them slept for four hours. The next morning, he
administered the SCAT (Sam’s Cognitive Ability Test) to all participants. Scores on the
SCAT range from 1 – 9 with higher scores representing better performance.

Sleeping time SCAT Scores


8-hour sleep 5 7 5 3 5 3 3 9
4-hour sleep 8 1 4 6 6 4 1 2

Use the computer output below, test if there is a difference in the two means scores at
5% level of significance using the t-value (Assume equal variances).

Group Statistics
Group N Mean Std. Deviation Std. Error
Mean
8-hour 8 5.00 2.138 .756
SCAT_Score
4-hour 8 4.00 2.563 .906

Independent Samples Test


Levene's
Test for
t-test for Equality of Means
Equality of
Variances
95% Confidence
Std.
Sig. (2- Mean Interval of the
F Sig. t df Error
tailed) Diff Difference
Diff
Lower Upper
Equal variances
.500 .491 .847 14 .411 1.000 1.180 -1.531 3.531
assumed
SCAT_Score
Equal variances
.847 13.563 .412 1.000 1.180 -1.539 3.539
not assumed

(Answer : Do not reject Ho)

3. For the last semester, male students had outperformed female students on statistics test
given at a college. The statistics test scores obtained from a random sample of male
and female students are shown below. Assume that the population variances are equal.

Male 34 34 30 18 24 16 15 26 13 21 24 11 11 15
Female 18 33 27 11 23 23 18 20 26 10 20 22 14 21

The data collected was analysed and the output is given below:
Group Statistics
Group N Mean Std. Deviation Std. Error
Mean
Male 14 20.86 7.999 2.138
Stat_Score
Female 14 20.43 6.198 1.657

Independent Samples Test


Levene's Test
for Equality of t-test for Equality of Means
Variances
95% Confidence
Sig. Std.
Mean Interval of the
F Sig. t df (2- Error
Diff Difference
tailed) Diff
Lower Upper
Equal variances
2.051 .164 .158 26 .875 .429 2.704 -5.130 5.988
assumed
Stat_Score
Equal variances
.158 24.475 .875 .429 2.704 -5.147 6.005
not assumed

a) Prove that this data has equal variances.


b) At 5% level of significance, test whether male students still outperform the female
students in the statistics test for this semester.
(Answer : p‐value = 0.438 > 0.05. Do not reject Ho)

4. The two methods of learning a computer program are visual and textual manual. A study
was conducted to investigate whether there is a difference in the comprehension scores
among students between the two methods. A random sample of 36 students are
selected and equally divided into two groups. Students from Group 1 use the visual
manual, while Group 2 students use the textual manual.

Based on the computer output, answer the following questions.

Group Statistics
Group N Mean Std. Deviation Std. Error
Mean
1 18 62.20 12.0 2.8
Sample
2 18 54.40 9.0 2.1

t-test for Equality of Means


95% Confidence
Sig.
Mean Std. Error Interval of the
t df (2-
Diff Diff Difference
tailed)
Lower Upper
Equal variances
Sample 2.21 34 0.034 7.80 3.5355 0.61493 14.98507
assumed

Using the p-value approach, is there sufficient evidence at 5% significance level to


conclude a significant difference on the average comprehension score between students
using the visual manual and students using the textual manual?
(Answer : Reject 𝐻0 ).
5. Two groups of drivers are surveyed to see how many kilometres per week they drive for
pleasure trips. The recorded data were analysed using SPSS. The output of the
statistical analysis is shown in the following tables.

Group Statistics
Types of drivers N Mean Std. Deviation Std. Error
Mean
Distance in Single drivers 35 196.4779 32.62732 5.51502
km Married drivers 35 189.1209 25.94217 4.38503

Independent Samples Test


Levene's Test
for Equality of t-test for Equality of Means
Variances
95% Confidence Interval
Sig. (2- Std. Error
F Sig. t df Mean Diff of the Difference
tailed) Diff
Lower Upper
Equal variances
1.102 .298 1.044 68 .300 7.35700 7.04585 -6.70277 21.41677
Distance in assumed
km Equal variances
1.044 64.714 .300 7.35700 7.04585 -6.71570 21.42971
not assumed

a) Based on p-value in the Levene’s Test, test the equality of variances in this study. Use
𝛼 = 0.05.
b) State the null and alternative hypothesis to test whether single drivers do more driving
on average than married drivers for pleasure trips.
c) At 5% significance level, can it be concluded that the single drivers do more driving for
pleasure trips on average than the married drivers?
(Answer : a) Equal variances are assumed, b) 𝐻0 : 𝜇1 − 𝜇2 = 0, 𝐻1 : 𝜇1 − 𝜇2 > 0 c) p-
value = 0.15. Decision: Do not reject 𝐻0 ).
STA404: TUTORIAL 8 – HYPOTHESIS TESTING (TWO POPULATION MEANS –
UNEQUAL VARIANCE AND PAIRED SAMPLE)

UNEQUAL VARIANCE

1. The average size of a farm in Pulau Meranti is 191 m 2. The average size of a farm in
Pulau Jati is 199 m 2. Assume the data were obtained from two samples with standard
deviation of 38 and 12 m 2, respectively, and sample sizes of 8 and 10, respectively. Can
it be concluded that at 𝛼 = 0.05 that the average size of the farms in the two Pulau is
different? Assume the populations are normally distributed with unequal variances.
(Answer: 𝐻1 : 𝜇𝑀 − 𝜇𝐽 ≠ 0, 𝜈 = 8, critical values = ±2.306, test statistic = −0.573, do
not reject 𝐻0 )
2. The number of points held by a sample of National Hockey League’s scorers for both
the Eastern region and the Western region are summarised in the SPSS output below.
Assume that the population distributions are normally distributed and has unequal
variances.

Group Statistics
Group N Mean Std. Deviation Std. Error
Mean
Eastern 11 65.73 9.12 2.8
Sample
Western 9 60.20 11.4 3.8

t-test for Equality of Means


95% Confidence
Std.
Sig. (2- Mean Interval of the
t df Error
tailed) Diff Difference
Diff
Lower Upper
Eastern - Equal variances
1.17 15 0.259 5.51 4.691 -4.49 15.50
Western not assumed

a) Show that the test statistic, 𝑡 = 1.17.


b) At 𝛼 = 0.05, is there enough evidence to conclude that there is a difference in
means based on output? Use 𝑝-value.
(Answer: 𝐻1 : 𝜇𝐸 − 𝜇𝑊 ≠ 0, 𝑝-value = 0.259 > 𝛼 = 0.05, do not reject 𝐻0 )
c) State the 95% confidence interval for the mean difference of the two populations.
Is the interval consistent with the conclusion in (b)? Explain your answer.

PAIRED SAMPLE
3. A company is considering installing new machines to assemble its products. The
company is considering two types of machines, but it will buy only one type. The
company selected eight assembly workers and asked them to use these two types of
machines to assemble products. The following table gives the time taken (in minutes) to
assemble one unit of the product on each type of machine for each of these eight
workers. Test at 5% significance level whether the mean time taken to assemble a unit
of product is different for the two types of machines. Assume the population of the paired
differences is (approximately) normally distributed.

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Machine_1 22.38 8 3.249 1.149
Pair 1
Machine_2 24.00 8 2.000 .707
Paired Samples Test
Paired Differences
95% Confidence
Std. Sig. (2-
Std. Interval of the t df
Mean Error tailed)
Deviation Difference
Mean
Lower Upper
Machine_1 -
Pair 1 -1.625 3.583 1.267 -4.621 1.371 -1.283 7 .240
Machine_2

(Answer: 𝐻1 : 𝜇𝑑 ≠ 0, do not reject 𝐻0 )

4. An archery equipment manufacturer has developed a new arrow release. The


manufacturer wants to know if it performs better than the old arrow release. Fifty archers
of varying abilities are selected and their average scores with both types of release are
measured. The computer output of the scores is given below:

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
New 129.42 50 10.82 1.53
Pair 1
Old 129.70 50 6.31 .89

Paired Samples Test


Paired Differences
95% Confidence
Std. Sig. (2-
Interval of the t df
Error tailed)
Difference
Mean
Lower Upper
Pair 1 New - Old 1.23 -2.76 2.20 -0.23 49 .822

a) Show that the value of the test statistic is -0.23


b) Give the null and alternative hypotheses.
c) Test at the 5% level of significance whether the new arrow release is an
improvement over the old arrow release.
d) State the 95% confidence interval for the mean difference.
e) Does the result in (d) above support your conclusion in (c)? Explain.

(Answer : b) 𝐻0 : 𝜇𝑑 = 0 𝐻1 : 𝜇𝑑 > 0 c) 𝑝-value = 0.411 do not reject 𝐻0 d) (-2.76, 2.20)


STA404: TUTORIAL 9 – HYPOTHESIS TESTING FOR MORE THAN TWO POPULATION
MEANS (ONE-WAY ANOVA)

1. A university employment office wants to compare the time taken by graduates with three
different majors to find their first full-time job after graduation. The following table lists
the time (in days) taken to find their first full-time job after graduation for a random
sample of eight business majors, seven computer science majors and six engineering
majors who graduated last year.

Business 208 162 240 180 148 312 176 292


Computer Science 156 113 281 128 305 147 232
Engineering 126 275 363 146 298 392

At 5% significance level, using F-value can you conclude that the mean time taken to
find their first full-time job for all last year’s graduates in these fields are the same?
(Answer: do not reject 𝐻0 )

2. The following ANOVA table, based on information obtained for three samples selected
from three independent populations that are normally distributed with equal variances,
has a few missing values.

ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments 2 19.2813
Error 89.3677 𝐹= =
Total 12

(a) Find the missing value and complete the ANOVA table.
(b) Use 𝛼 = 0.01, what is your conclusion for the test with the null hypothesis that the
means of the three populations are all equal against the alternative hypothesis that
the means of the three populations are not all equal?
(Answer: (a) 𝐹 = 2.1575 (b) critical value= 7.56, do not reject 𝐻0 )
3. The following data represent the final grades obtained by 5 students in Mathematics,
English, French, and Biology:

Subject
Student Math English French Biology
1 68 57 73 61
2 83 94 91 86
3 72 81 63 59
4 55 73 77 66
5 92 68 75 87

Below is the SPSS output of the data above :

ANOVA
Grade
Sum of df Mean Square F Sig.
Squares
Between Groups 42.150 (i) 14.050 (iii) .969
Within Groups 2730.800 16 (ii)
Total 2772.950 19

(a) Show that the total sum of squares is 2772.95.

(b) Find the missing values (i), (ii) and (iii)

(c) Test the hypothesis that the courses are of equal difficulty. Use 𝛼 = 0.10.

(Answer : (b) (i) 3, (ii) 170.675 (iii) 0.082 (c) Do not reject 𝐻0 )

4. A billiards parlour in a small town is open just 4 days per week, i.e., Thursday through
Sunday. Revenue varies considerably from day to day and week to week, so the owner
is not sure whether some days of the week are more profitable than others. He takes
random samples of 5 Thursdays, 5 Fridays, 5 Saturdays and 5 Sundays from last year’s
records and lists the revenue for these 20 days. His bookkeeper finds the average
revenue for each of the four samples, and then calculates ∑ 𝑥 2 . The results are shown
in the following table. The value of the ∑ 𝑥 2 came out to be 2,890,000.
Day Mean Revenue (RM) Sample Size
Thursday 295 5
Friday 380 5
Saturday 405 5
Sunday 345 5

At a 1% level of significance, can you conclude that the mean revenue is the same for
each of the four days of the week?

(Answer: critical value = 5.29, test statistic = 0.5725, do not reject 𝐻0 )


5. A large company buys thousands of lightbulbs every year. The company is currently
considering four brands of lightbulbs to choose from. Before the company decides which
lightbulbs to buy, it wants to investigate if the mean lifetimes of the four types of light
bulbs are the same. The company’s research department randomly selected a few bulbs
of each type and test them. The following table lists the number of hours (in thousands)
that each of the bulbs in each brand lasted before being burned out.

Brand I 23 24 19 26 22 23 25
Brand II 19 23 18 24 20 22 19
Brand III 23 27 25 26 23 21 17
Brand IV 26 24 21 29 28 24 28

(a) Find the missing values (i), (ii) and (iii)

(b) At 2.5% significance level, using F-value, test the null hypothesis that the mean
lifetime of bulbs for each of these four brands is the same.

ANOVA
Lifetime
Sum of df Mean Square F Sig.
Squares
Between Groups 87.536 3 (iii) (iii) .022
Within Groups 180.571 (i) 7.524
Total 268.107 27

(Answer: critical value = 3.72, test statistic, 𝐹 = 3.88, reject 𝐻0 )


STA404: TUTORIAL 10 – HYPOTHESIS TESTING FOR TEST OF INDEPENDENCE

1. The manager of CBC Sdn Bhd wishes to determine whether age of the employees
related to their level of job stress. The results are shown below:

a) Calculate the value of X.


b) State the null and alternative hypotheses.
c) At 1% significance level, what can you conclude about the relationship between
the age of the employees and their degree of job stress?
(Answer : a) 5.0788, b) H0 : Age and job stress are not related. H1 : Age and job
stress are related. c) Decision: Do not reject H0 )

2. A survey is conducted to investigate whether the level of support for social welfare
reform varies according to the age. Respondents are grouped according to whether
they are ages ‘under 45’ or ’45 and older’ years old. The computer output for the data
collected is given below:
a) Calculate the values of A, B and C.
b) Give the appropriate hypotheses to test whether the level of support for social
welfare reform varies according to age.
c) Based on the p-value, can we conclude that the level of support social welfare
reform varies according to age? Use 𝛼 = 0.05.
(Answer : a) A = 147, B = 67.5, C = 2 b) 𝐻0 : The level of support for social
welfare reform does not vary by age. 𝐻1 : The level of support for social welfare
reform varies by age. c) Decision: Do not reject H0 )

3. The below table shows the number of students in the three departments, i.e. Chemistry,
French and Mechanical Engineering, at one leading university in Malaysia.

The crosstabulations below presents the relationship between the gender of the
students and the department of their studies.

Based on the above tables, answer the following questions:

a) State the total number of students in the three departments.


b) How many students are from the French department?
c) How many female (f) students are from the Chemistry department? Hence, show
that the expected frequency for the group is 15.5.
d) Based on the crosstabulation table, determine whether there is any association
between the gender of the students and the department of their studies at 5%
significance level.
(Answer : Test statistics = 2.905. Critical value = 5.991. Decision: Do not reject
𝐻0 )
4. Many students who graduated from universities are deeply in debt from student loans,
credit card debts and so on. A Sociologist took a random sample of graduates who were
still single, classified them by gender and asked, “would you consider marrying someone
who has RM25,000 or more debts?”. The crosstabulation of this survey is shown in the
following table.

Based on the table, answer the following questions :

a) State the total number of students giving their opinions.


b) How many students give ‘No’ as their opinions?
c) How many female students give ‘Yes’ as as their opinions? Hence, show that the
expected frequency is 115.5.
d) Determine whether there is an association between gender of the students and
their opinion at 5% significance level.
(Answer e) 𝜒2 = 5.9102. Critical value = 5.991. Decision: Do not reject 𝐻0 )

5. A lecturer at UWM College wishes to determine whether students’ gender is related to


their smoking habit. A sample of 180 students was selected at random. The data was
analyzed using SPSS and the following tables were obtained.

a) Find the values of A and B.


b) State the hypotheses of the study.
c) Based on the p-value, can we conclude that the students’ gender is related to their
smoking habit? Use 𝛼 = 0.01.
(Answer : b) 𝐻0 : Gender and smoking habit are not related, H1 : Gender and
smoking habit are related c) p-value = 0.000. Reject H0 )
STA404: TUTORIAL 11 – BIVARIATE ANALYSIS (CORRELATION AND REGRESSION)

1. The following table gives information on the number of megapixels and the prices of nine
randomly selected point-and-shoot digital cameras that were available on bestbuy.com.
The scatter diagram and SPSS output as shown below.

Megapixels 10.3 10.2 7.0 9.1 10.0 12.1 8.0 5.0 14.7
Price (RM) 130 150 62 160 200 280 125 60 400

Based on the above information, answer the following questions.

(a) Identify the independent and dependent variables.


(b) By referring to the scatter diagram, describe the relationship between the two
variables.
(c) Use the data set above, show that the Pearson’s correlation coefficient if 0.933
and interpret the value.
(d) State the coefficient of determination. Comment on this value.
(e) Write the regression equation.
(f) Show that the slope is 35.68 and give the interpretation in the context of the
problem.
(g) Estimate the price of the digital camera if the number of megapixels is 11.
(h) Test at 1% significance level whether the linear regression model is significant.

2. The experience (in years) and monthly salaries (in hundreds of RM) of nine randomly
selected secretaries are tabulated in the table below.

Experience 14 3 5 6 4 9 18 5 16
Monthly salary 62 29 37 43 35 60 67 32 60

Below is the SPSS output of the above data.

Based on the SPSS output, answer the following questions :

(a) State the independent and dependent variable.


(b) Write down the regression equation.
(c) Show by calculation that the slope value is 2.438 and interpret its value in the
context of the problem.
(d) Determine the coefficient of correlation.
(e) State the coefficient of determination and explain its meaning.
(f) Based on the regression equation, estimate the monthly salary of a secretary
who has worked for 10 years.
(g) Perform a test to determine whether the linear regression model is significant.
Use 1% level of significance.
3. Determine if each of the following statement is TRUE or FALSE.
a) The best fit regression line passes through the mean values of the data set.
b) Pearson’s correlation coefficient, r, measures the strength of linear relationship
between an independent variable and a dependent variable.
c) Coefficient of determination, 𝑅2 , denotes the proportion of variation in predictor
variable that can be explained by the response variable.

4. The manager of Puteri Hotel is interested to assess the impact of advertising by looking
at the monthly advertising cost (RM thousands) and the monthly income (RM millions)
of for seven consecutive months. The data was recorded and analyzed using SPSS.
The results are as follows:

Monthly advertising
60 100 50 150 30 20 40
cost (RM thousand)
Monthly income (RM
7 10 4 20 6 3 5
millions)

a) Show that the Pearson's correlation coefficient is 0.955. Explain on the value
obtained.
b) Calculate the coefficient of determination. Interpret the value obtained.
c) Write down the estimated regression equation.
d) Interpret the slope in the context of the problem.
e) Estimate the monthly income (RM millions) if the monthly advertising cost is
RM90,000.

You might also like