Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

MINI PROJECT – A STUDY ON PAR INC CUT

RESISTANT GOLF BALL

Submitted by

Anindita Das
TABLE OF CONTENTS

1. Introduction………………………………………………………………………………………………………… 2
1.1 Problem………………………………………………………………………………………………………… 2
1.2 Question………………………………………………………………………………………………………… 2
2. Project Objective………………………………………………………………………………………………….. 3
3. Assumptions………………………………………………………………………………………………………… 3
4. Exploratory Data Analysis……………………………………………………………………………………… 3
4.1 Environment Setup and Data Import……………………………………………………………….. 3
4.1.1 Packages used during Analysis…………………………………………………………….. 3
4.1.2 Set up of working Directory…………………………………………………………………. 3
4.1.3 Importing and reading the data Set……………………………………………………… 3
4.2 Variable Identification……………………………………………………………………………………. 3
4.3 Univariate Analysis………………………………………………………………………………………… 4
4.3.1 Graphical representation of data using histogram and boxplot……………… 5
5. Hypothesis formulation and Testing……………………………………………………………………… 6
5.1 Rationale of Hypothesis …………………………………………………………………………………. 6
5.2 Hypothesis Formulation………………………………………………………………………………….. 6
5.3 Results of Hypothesis Testing………………………………………………………………………….. 8
5.4 Reservation about the Results………………………………………………………………………….. 8
6. Required Sample Size …………………………………………………………………………………………… 8
7. Conclusion……………………………………………………………………………………………………………. 9
8. Appendix A Source Code ……………………………………………………………………………………….. 9

1|Page
1. Introduction

1.1 Problem

Par Inc is a major manufacturer of golf equipment .Management believes that Par’s market share
could be increased with the introduction of cut-resistant, longer-lasting golf ball. Therefore, the
research group at Par has been investigating a new golf ball coating designed to resists cuts and
provides a more durable ball. The tests with the coating have been promising. One of the
researcher voiced concern about the effect of the new coating on driving distances. Par would
like the new cut-resistant ball to offer driving distances comparable to those of the current-model
golf ball. To compare the driving distances for the two balls, 40 balls of both the new and
current models were subjected to distance tests. The testing was performed with a mechanical
hitting machine so that any difference between the mean distances for the two models could be
attributed to a difference in the design. The results of the tests, with distances measured to the
nearest yard, are contained in the data set “Golf”. Prepare a Managerial Report.

1.2 Questions

a) Formulate and present the rationale for a hypothesis test that Par could use to compare the
driving distances of the current and new golf balls.
b) Analyze the data to provide the hypothesis testing conclusion. What is the p value for your
test? What is your recommendation for Par Inc?
c) Provide descriptive statistical summaries of the data for each model.
d) What is the 95% confidence interval for the population mean of each model? And what is
the 95% confidence interval for the difference between the means of the two populations?
e) Do you see a need for larger sample sizes and more testing with the golf balls? Discuss.

2|Page
2. Project Objectives
The objectives of the report are listed below

a. Importing the data set “Golf” in R


b. Exploring descriptive statistical summaries of the data
c. Visualizing the data by graphical representation
d. Analyzing the data to provide recommendation for Par Inc.

3. Assumptions
The assumptions which are considered during the analysis are as follows

a. As hitting machine is used to drive the balls, it is assumed that the force implied on the
ball is equal in both the cases.
b. The airflow during the test is assumed to be uniform and have no effect on the change in
driving distances.
c. It is assumed that type I and type II error are 0.05 and 0.2 respectively for the calculation
of sample size.
d. It is assumed that difference in mean between Current and New balls population µd =5
yard for the calculation of sample size.

4. Exploratory Data Analysis


4.1 Environment Setup and Data Import

It is the very first and important step of data analysis in R. It consists of installing necessary R
packages, setting up working directory and importing the data set in R.

4.1.1 Packages used during Analysis

library(readxl)

4.1.2 Set up of working Directory

setwd("F:/PGPBABI/course/statistical methods for decision making/project")

4.1.3 Importing and reading the data Set

golf=read_xls("Golf.xls")

4.2 Variable Identification

dim(golf)

[1] 40 2

Dimension of data: It has 40 rows and 2 columns.


names(golf)

3|Page
[1] "Current" "New"

Name of columns: The column names are Current and New.


Current = driving distances of golf balls without coating
New = driving distances of golf balls with coating.
str(golf)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:


$ Current: num 264 261 267 272 258 283 258 266 259 270 ...
$ New : num 277 269 263 266 262 251 262 289 286 264 ...

Data Structure: Both the columns in data set are numeric.

4.3 Univariate Analysis

summary(golf)

Current New
Min 255.0 Min 250.0
1st Qu 263.0 1st Qu 262.0
Median 270.0 Median 265.0
Mean 270.3 Mean 267.5
3rd Qu 275.0 3rd Qu 274.5
Max 289.0 Max 289.0

cat("SD for Variable current:",sd(Current))

SD for Variable current: 8.752985

cat("SD for variable New:",sd(New))

SD for variable New: 9.896904

cat("Varriance for Variable current:",var(Current))

Varriance for Variable current: 76.61474

cat("Varriance for variable New:",var(New))

Varriance for variable New: 97.94872

cat("SD for the difference of variables current and new",sd(New-Current))

SD for the difference of variables current and new 13.74397

4|Page
Summary of the given data shows mean and median are very close. The variance of New balls
driving distance is 97.95 which is 28 % more than the variance of driving distances of Current
balls.

4.3.1 Graphical representation of data using histogram and boxplot

hist(Current,breaks=c(250,255,260,265,270,275,280,285,290),col="green",border=9,

main="Histogram of Current Golf Balls", xlab="Driving distances of Current Balls",

ylab="Frequency")

hist(New,breaks=c(250,255,260,265,270,275,280,285,290),col="orange",border=9,

main="Histogram of New Golf Balls", xlab="Driving distances of New Balls",

ylab="Frequency")

boxplot(Current,horizontal = TRUE,col="Green",border="black", main="Boxplot of Current


Balls", xlab="Driving distances of Current Balls")

boxplot(New,horizontal = TRUE,col="orange",border="black", main="Boxplot of New Balls",

xlab="Driving distances of New Balls")

The Current data looks more normally distributed, whereas the New data looks right skewed.

5|Page
5. Hypothesis formulation and Testing
5.1 Rationale of hypothesis

a. The sample size is 40, which is sufficiently large for a Z stat Test. But since the
population standard deviation is not known a Tstat Test is used.
b. The level of significance ,α=0.05
c. Since the sample is different for both sampling tests, degree of freedom=40+40-2=78
d. A Two tailed T test can be used as the sole purpose of the test is to check whether there
is any effect on driving distances due to new coating.
e. The samples are unpaired.

5.2 Hypothesis formulation

Null Hypothesis

H0 : µold-µnew=0

(New coating does not have effect on driving distance of the new balls)

Alternate Hypothesis

H1 : µold-µnew≠0

(New coating does have significant effect on the driving distance of new ball)

print("Two tailed Independent Two sample T test")

[1] "Two tailed Independent Two sample T test"

t.test(Current,New,paired=FALSE,conf.level = 0.95,alternative = "t")

Welch Two Sample t-test

data: Current and New

t = 1.3284, df = 76.852, p-value = 0.188

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.384937 6.934937

sample estimates:

mean of x mean of y

6|Page
270.275 267.500

print("Two tailed One sample T test for Current balls")

[1] "Two tailed One sample T test for Current balls"

t.test(Current,paired=FALSE,conf.level = 0.95,alternative="t")

One Sample t-test

data: Current

t = 195.29, df = 39, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 0


95 percent confidence interval:

267.4757 273.0743

sample estimates:

mean of x

270.275

print("Two tailed One sample T test for New balls")

[1] "Two tailed One sample T test for New balls"

t.test(New,paired=FALSE,conf.level = 0.95,alternative="t")

One Sample t-test

data: New

t = 170.94, df = 39, p-value < 2.2e-16

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

264.3348 270.6652

sample estimates:

mean of x

267.5

7|Page
5.3 Results of Hypothesis Testing
• The p value is 0.188 which is greater than 0.05.So the Null Hypothesis can’t be
rejected. Thus accepting the Null hypothesis means that there is no significant
change in driving distance due to new coating.
• 95% confidence interval for difference in means is -1.384937 to 6.934937
• 95% confidence interval for Current variable in means is 267.4757 to 273.0743
• 95% confidence interval for New variable in means is 264.3348 to 270.6652

5.4 Reservation about the results

• If the means of two sample distributions are compared, it can be seen that even
though visually new coating has some effect in new balls but statistically it does not
hold.
• The variance of New balls driving distance is 97.95 which is 28 % more than the
variance of driving distances of Current balls.
• The sampling errors in data are not sure.
• Statistically there is no effect of new coating on driving distances. Though it is
suggested to check the effects on the weights, size and shape of new balls.
• The given sample is from only one golf course, it is advisable that test should be
performed on different kind of golf courses to take care of difference in grounds.

6. Required Sample Size


• It is assumed that Type I and Type II error are 0.05 and 0.2 respectively. So level of
significance=95% and power of test=80%
• It is assumed that sample standard deviation is equal to population standard deviation.
• It is assumed that difference in mean between Current and New balls population µd =5
yard.
• Null Hypothesis, µd =0 and Alternate Hypothesis, µd =5
• Sample standard deviation is 13.74397

power.t.test(power=0.80,delta=5,sd=13.74397,type="t",alternative="t",sig.level = 0.05)

Two-sample t test power calculation

n = 119.5782
delta = 5
sd = 13.74397
sig.level = 0.05
power = 0.8
alternative = two.sided

NOTE: n is number in *each* group

In order to retain the power the value of n is rounded to next whole number. Therefore it may
be concluded that a sample size of 120 is required.

8|Page
8. Conclusion
From the given data it may be concluded that

• There is no significant change in driving distance due to new coating on golf balls.
• It is recommended that the test should be carried out with larger sample covering
number of golf courses to improve the accuracy of test results.
• The test results need to be interpreted and future actions be planned for other
characteristics like size, shape, weight etc.

9. Appendix A Source Code


#Set up of working directory

setwd("F:/PGPBABI/course/statistical methods for decision making/project")

#Packages used

library(readxl)

#Importing the data set

golf=read_xls("Golf.xls")

attach(golf)

# Variable Identification

dim(golf)

names(golf)

str(golf)

#Univariate analysis

summary(golf)

cat("SD for Variable current:",sd(Current))

cat("SD for variable New:",sd(New))

cat("Varriance for Variable current:",var(Current))

cat("Varriance for variable New:",var(New))

cat("SD for the difference of variables current and new",sd(New-Current))

#visualisation using histogram and boxplot

hist(Current,breaks=c(250,255,260,265,270,275,280,285,290),col="green",border=9,

main="Histogram of Current Golf Balls",

xlab="Driving distances of Current Balls",ylab="Frequency")

9|Page
hist(New,breaks=c(250,255,260,265,270,275,280,285,290),col="orange",border=9,

main="Histogram of New Golf Balls",

xlab="Driving distances of New Balls",

ylab="Frequency")

boxplot(Current,horizontal = TRUE,col="Green",border="black",

main="Boxplot of Current Balls",

xlab="Driving distances of Current Balls")

boxplot(New,horizontal = TRUE,col="orange",border="black",

main="Boxplot of New Balls",

xlab="Driving distances of New Balls")

#Hypothesis formulation

#Null hypothesis:-H0: µC-µN=0 (New coating does not have effect on the driving
distance of new balls)

#Alternate hypothesis:-H1: µC-µN not equal to 0 (New coating does have significant
effect on the driving distance of new balls)

print("Two tailed Independent Two sample T test")

t.test(Current,New,paired=FALSE,conf.level = 0.95,alternative = "t")

print("Two tailed One sample T test for Current balls")

t.test(Current,paired=FALSE,conf.level = 0.95,alternative="t")

print("Two tailed One sample T test for New balls")

t.test(New,paired=FALSE,conf.level = 0.95,alternative="t")

#Size of sample by assuming α=0.05 and β=0.2

power.t.test(power=0.80,delta=5,sd=13.74397,type="t",alternative="t",sig.level = 0.05)

10 | P a g e

You might also like