Explanation For Project Major League Baseball

Master of Management
Odette School of Business, University of Windsor, Canada

BSMM-8320 Quantitative Studies
Winter 2023
Group (9)
Submitted to: Dr. Mostafa Moussa
Student Name ID
Puja Chakraborty 110120227
Binny Kaur 110114712
Jothi Prakash Murugan 110117643
Suhani Prajapati 110119218
Deepack Ravichandran 110119595
Team Members’ Contributions
1. Puja Chakraborty
a. Continuous Probability Distribution
b. Sampling Methods and Central Limit Theorem
c. Estimations and Confidence Intervals
d. Compiling the final work including alignment, word file processing
and work validation.
2. Binny Kaur
a. Creating Files and folder, sharing those among group members and
refining initial data.
b. Introduction to Statistics
c. Frequency Tables, Distributions and Graphs
d. Numerical Measures
e. Displaying and Exploring Data
3. Jothi Prakash Murugan
a. Steps of Hypothesis Testing
b. Nonparametric methods: Nominal Level Hypothesis
c. Correlation and Linear Regression
4. Suhani Prajapati
a. A Survey of Probability Concepts
b. Discrete Probability Distribution
5. Deepack Ravichandran
a. Steps of Hypothesis Testing
b. Nonparametric methods: Nominal Level Hypothesis
c. Correlation and Linear Regression
Project Major League Baseball
1. Introduction to Statistics
Consider the following variables: number of wins, payroll, season attendance, whether the
team is in the American or National League, and the number of home runs hit.
a. Which of these variables are quantitative and which are qualitative?
Variable Type
Number of Wins Quantitative
Payroll Quantitative
Season Attendance Quantitative
Team classification (American or National Qualitative
League)
Number of home runs hit Quantitative
b. Determine the level of measurement for each of the variables.

Variable Level of Measurement
Number of Wins Ratio
Payroll Ratio
Season Attendance Ratio
Team classification (American or National Nominal
League)
Number of home runs hit Ratio
2.Frequency Tables, Distributions and Graphs

Create a frequency distribution for the Team Salary variable and answer the following
questions.
a. What is the typical salary for a team? What is the range of the salaries?
Typical Salary of Team is between $95 million. Range of salaries is $80 - $110 million
b. Comment on the shape of the distribution. Does it appear that any of the teams have a
salary that is out of line with the others?
Distribution is normal and symmetric
Teams in salary intervals of $80-$110 million are out of line with the others. There are 12 such
teams.
c. Draw a cumulative relative frequency distribution of team salary. Using this distribution,
40% of the teams have a salary of less than what amount? About how many teams have a
total salary of more than $165 million?
40% of Teams have salary less than $110 million.
Two Teams have a total salary of more than $165 million.
Team Salary > $ 165 million
Chicago Cubs 194.08
Boston Red Sox 187.23
3.Numerical Measures
Refer to the team salary variable, include the answer to the following questions in your
report.
a. Around what values do the data tend to cluster? Specifically, what is the mean team
salary? What is the median team salary? Is one measure more representative of the typical
team salary than the others?
Mean Team Salary = Total Team Salary / Number of Teams = 3162.18/30 = $105.4 million
Median Team Salary is the middle value of the data when data is arranged in ascending order.
Since we have even number of teams. We take average of two middle values. As a result
Median Team Salary = $100.43
Data tends to cluster around the mean value of $105.40 million. Median Team Salary is $100.43.
Mean is the more accurate representation of Team salary as more values approach this number.
b. What is the range of the team salaries? What is the standard deviation? About 95% of the
salaries are between what two values?
Range = Maximum Value – Minimum Value
Maximum Salary = $194.08 million (Team Chicago Cubs)
Minimum Salary = $49.08 million (Tampa Bay Rays)
Range of Team Salaries = $194.08 - $49.08 = $145 million
Standard Deviation = 36.73 (considering the given data as population)
Using Empirical Rule to find out the range in which 95% of the salaries exists between $178.88
million and $31.93 million.
Lower Limit = Mean – 2 × Standard Deviation = 31.93
Upper Limit = Mean + 2 × Standard Deviation = 178.88
4.Displaying and Exploring Data
a. In the data set, the year opened is the first year of operation for that stadium. For each
team, use this variable to create a new variable, stadium age, by subtracting the value of the
variable year opened from the current year. Develop a box plot with the new variable,
stadium age. Are there any outliers? If so, which of the stadiums are outliers?
Please refer to Box Plot for new variable, Stadium Age.
Box Plot Variables for Stadium Age

Minimum Value 6
Quartile 1 18.5
Median (Quartile 2) 23.5
Quartile 3 33.25
Maximum Value 111
Mean 32.37
Interquartile Range 14.75
Lower Outlier Limit -3.63
Upper Outlier Limit 55.38
There are few outliers for Stadium Age.
Outlier Stadium Outlier Value
Boston Red Sox 111
Chicago Cubs 109
Los Angeles Dodgers 61
Oakland Athletics 57
Los Angeles Angels 57
b. Using the variable salary create a box plot. Are there any outliers? Compute the quartiles.
Write a summary of your analysis.
Please refer to the box plot for the variable, Salary. As we can see that salaries range from 49 to
194, with median of salary of around 83. Most of the salaries fall in the Interquartile range and
there are no outliers for variable salary.
Box Plot Variables for Salary
Minimum Value 49.08
Quartile 1 80.21
Median (Quartile 2) 100.43
Quartile 3 126.82
Maximum Value 194.08
Mean 105.41
Interquartile Range 46.60
Lower Outlier Limit 10.31
Upper Outlier Limit 196.72
c. Draw a scatter diagram with the variable wins on the vertical axis and salary on the
horizontal axis. Compute the correlation coefficient between wins and salary. What are your
conclusions?
Please refer to scatter diagram.
Scatter Plot Wins Vs Salary y = 0.1252x + 67.769

R² = 0.0866
120
100
80
Wins e
60
40
20
0
0.00 50.00 100.00 150.00 200.00 250.00
Salray
Correlation Coefficient between wins and salary = 0.29

We can conclude that there is a positive relationship between wins and salary. But the relationship
does not seem to be a strong relationship.
d. Using the variable wins draw a dot plot. What can you conclude from this plot?
Please refer to Dot plot for variable wins in the excel sheet. The distribution of the number of wins
appears uniform between 60 and 100. Only two teams, have same number of wins i.e , 84 that can
be observed from dot plot.
5. A survey of probability concept
a. In the Major Baseball League, each team plays 162 games in a season. A rule-of-thumb is
that 90 or more wins in a season qualifies a team for the post-season playoffs. To summarize
the 2022 season, create a frequency table of wins. Start the first class at 40 and use a class
interval of 10.
Frequency table of wins in 2022 season
Wins Number of wins
40-49 1
50-59 3
60-69 3
70-79 7
80-89 6
90-99 6
100-109 4
Grand Total 30
1. What is the probability that a team wins 90 or more games?
We must determine the percentage of teams that won 90 or more games in order to determine the
likelihood that a team will win 90 or more games. Out of the total of 30 teams, 8 teams in this
instance won 90 or more games, so the likelihood that a team will win 90 or more games is
Probability = 10/30 = 0.33 or 33%
2. In the playoffs, only 10 teams can enter the playoffs. Based on the 2022 season, what is the
probability that a team that wins 90 or more games makes the playoffs?
We know that there are 10 teams won 90 or more games and in order to determine the
likelihood that a team that wins 90 or more games makes the playoffs based on the 2022
season, we use:
𝑇𝑜𝑡𝑎𝑙 𝑡𝑒𝑎𝑚𝑠 𝑡ℎ𝑎𝑡 𝑤𝑜𝑛 90 𝑜𝑟 𝑚𝑜𝑟𝑒 𝑔𝑎𝑚𝑒𝑠
Probability = 𝑇𝑜𝑡𝑎𝑙 𝑡𝑒𝑎𝑚𝑠 𝑡ℎ𝑎𝑡 𝑐𝑎𝑛 𝑒𝑛𝑒𝑡𝑒𝑟 𝑝𝑙𝑎𝑦𝑜𝑓𝑓𝑠
= 10/10 = 1 or 100%
A team that wins 90 or more games will therefore have a 100% chance of making the playoffs in
the 2022 season.
3. Make a statement based on your responses to parts (1) and (2).
(1) Only 33% of the teams in the 2022 season had 90 or more successes, which is less than the
average opinion that 90 or greater wins are required to make the playoffs.
(2) At 100%, there is a good chance that at least one of the teams that won 90 or more games will
make it to the playoffs. This indicates that having a winning record of 90 or more games is a
reliable predictor of making the playoffs.
b. Presently the National League requires that all fielding players, including pitchers, take a
turn to bat. In the American League, teams can use a designated hitter (DH) to take the
pitcher's turn to bat. For each league, create a frequency distribution and a relative
frequency distribution of teams based on the season total of home runs. For the frequency
distributions, start the first class at 140 home runs and use a class interval of 30.
Frequency distribution of Home Runs in the American League:
HR Count of HR Relative Frequency
140-169 2 13.33%
170-199 1 6.67%
200-229 5 33.33%
230-259 4 26.67%
260-289 1 6.67%
290-319 2 13.33%
Grand Total 15 100%
Frequency distribution of Home Runs in the National League

HR Count of HR Relative Frequency
140-169 3 14.43%
200-229 6 39.87%
230-259 5 37.23%
260-289 1 8.46%
Grand Total 15 100%
1. In the American League, what is the probability that a team hits 200 or more home runs?
The probability that a team hits 200 or more home runs in American League =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑒𝑎𝑚𝑠 𝑤𝑖𝑡ℎ 200 𝑜𝑟 𝑚𝑜𝑟𝑒
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑎𝑚𝑠
= 12/15 = 0.8
2. In the National League, what is the probability that a team hits 200 or more home runs?
The probability that a team hits 200 or more home runs in National League =
𝑇𝑜𝑡𝑎𝑙 𝑡𝑒𝑎𝑚𝑠 𝑤𝑖𝑡ℎ 200 𝑜𝑟 𝑚𝑜𝑟𝑒
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑎𝑚𝑠
= 12/15 = 0.8
3. Make statements comparing the two distributions.

(1) There are 80% chances that a team hits more than 200 home runs in an American League.
(2) Similar to American League, there are 80% for a team to score 200 or more home runs in a
National League.
6. Discrete Probability Distributions

Compute the mean number of home runs per game. To do this, first find the mean number
of home runs per team for 2022. Next, divide this value by 162 (a season comprises 162
games). Then multiply by 2 because there are two teams in each game. Use the Poisson
distribution to estimate the number of home runs that will be hit in a game.
Mean number of Home Runs per game = (𝑀𝑒𝑎𝑛 𝑜𝑓𝐻𝑜𝑚𝑒 𝑟𝑢𝑛𝑠 𝑝𝑒𝑟 𝑡𝑒𝑎𝑚 𝑓𝑜𝑟 2022 ÷ 162) × 2
= 2.79
Find the probability that:
a. There are no home runs in a game.
Value:0.0615
b. There are two home runs in a game.
Value:0.239
c. There are at least four home runs in a game.
Value: 0.305
7. Continuous Probability Distributions
a. For the variable salary, compute the mean, median, range, standard deviation, and
coefficient of skewness. Also, make a box plot for the variable, salary. Does it seem reasonable
that salary is normally distributed? Explain.
For the salary variable, the mean is 105.406, the median is 100.43, the range is 145.00, and the
standard deviation is 37.364. The coefficient of skewness is 0.39953 and from this, it can be
interpreted that the distribution of the value is approximately symmetric.
From the boxplot where median is closer to Q1 than Q3, indicating a slightly left-skewed
distribution/ positively skewed distribution. Therefore, based on the calculations and box plot, it
seems reasonable to assume that the salary variable is not normally distributed,
b. Compute a new variable, stadium age, by subtracting the year the stadium was built from
2020. For the variable stadium age, compute the mean, median, range, standard deviation,
and coefficient of skewness. Also, make a box plot for the variable, stadium age. Does it seem
reasonable that stadium age is normally distributed? Explain.
Based on the calculations, the mean stadium age is 29.37 years, with a median of 20.5 years. The
range of stadium ages is quite large, from 3 to 108 years, with a standard deviation of 25.12 years.
The coefficient of skewness is positive, with a value of 1.06, indicating that the distribution is
highly skewed to the right.
From the box plot, we can see that the distribution is left-skewed/ positively skewed, as the median
is closer to Q1 than Q3.
So, it can be concluded that the variable stadium age is not normally distributed.
8. Sampling Methods and the Central Limit Theorem

Over the last decade, the mean attendance per team followed a normal distribution with a
mean of 2.45 million per team and a standard deviation of .71 million. Compute the mean
attendance per team for the 2022 season. Determine the likelihood of a sample mean
attendance this large or larger from the population.
By the central limit theorem, the mean of the sampling distribution is μx-bar = 2.45. The population
standard deviation is, σ =0.71. The sample size is, n = 30. The standard deviation using the central
limit theorem is σ/√n= 0.71/√30 = 0.13.
To determine the likelihood of a sample mean attendance as large or larger than 2.283, we need to
calculate the z-score and corresponding probability using the central limit theorem.
z = (2.283 - 2.45) / (0.71/√30)
= -1.29
Using a standard normal distribution table the probability of getting a z-score of -1.29, which is
equivalent to the probability of getting a sample mean attendance of 2.283 or larger.
P(Z< -1.29)= (0.5-0.4015) = 0.0985.
Therefore, the likelihood of a sample mean attendance as large or larger than 2.283 from the
population is approximately 0.0985 or 9.85%.
9. Estimation and Confidence Intervals

Assumed that 2022 data representing a sample.
As the population standard deviation is unknown, t test has been used.
a. Develop a 95% confidence interval for the mean number of home runs per team.
95% confidence interval for the mean number of home runs per team is (210.30,241.44)
b. Develop a 95% confidence interval for the mean batting average by each team.
95% confidence interval for the mean batting average by each team (0.248,0.256)
c. Develop a 95% confidence interval for the mean earned run average (ERA) for each team.
95% confidence interval for the mean earned run average (ERA) for each team (4.280,4.714)
10. Steps of Hypothesis Testing
a. Conduct a test of hypothesis to determine whether the mean salary of the teams was
different from $100.0 million. Use the .0S significance level.
To determine whether the mean salary of the team was different from 100 million, we first state
the null and alternate hypothesis.
Step 1: Stating Null and Alternate Hypothesis
H0: μ = $100.0 million
H1: μ ≭ $100.0 million
Step 2: Selecting Level of Significance
We should select the level of significance α. From the Question we get to know that,
α = 0.05
Step 3: Selecting the test Statistic.
We should select the test statistic. Here we select the test statistic t.
Where, t = (X ̅- μ)/(s/√n)
Step 4: Formulate the Decision rule.
From Step 1 we get to know that this is a two tailed test and by using the level of significance α =
0.05 we can establish the conditions under which null hypothesis can be rejected or not. By using
the t distribution table, and with the degree of freedom being 29 which is found by the following
formula,
Degree of freedom = n -1
We can infer that the critical value is 2.045 and the conditions which null hypothesis can be
rejected or not can also be established. The conditions are:
Decision Rule: If the computed value of z is not between −2.045 and 2.045, reject
the null hypothesis. If z falls between −2.045 and 2.045, do not reject the null
hypothesis.
Step 5: Make a decision.
In this step, we compute the value of test statistic and compare it to the critical value which was
obtained in step 4. By comparing it we can decide whether to reject the null hypothesis or not. By
using the following formula, we compute the value of t.
t = (X ̅- μ)/(s/√n)
We know that:
X ̅= 105.46; μ= 100; s = 37.36; n = 30
t= ((105.406-100)*sqrt(30))/37.36
t=0.79
We can infer that, 0.79 does not fall in the rejection region, so we decide not to reject H0.
Step 6: Interpret the result.

We did not reject the null hypothesis, so we failed to show that the population mean has changed
from 100 million.
b. Using a 5% significance level, conduct a test of the hypothesis to determine whether the
mean attendance was more than 2 million per team.
To determine whether the mean attendance of the team was more than 2 million, we first state the
null and alternate hypothesis.
Step 1: Stating Null and Alternate Hypothesis
H0: μ ≤ million
H1 : μ > 2 million
Step 2: Selecting Level of Significance
We should select the level of significance α. From the Question, we get to know that,
α = 0.05
Step 3: Select the test Statistic.
We should select the test statistic. Here we select the test statistic t.
Where, t = (X ̅- μ)/(s/√n)
Step 4: Formulate the Decision rule.
From Step 1 we get to know that this is a one tailed test and by using the level of significance α =
0.05 we can establish the conditions under which null hypothesis can be rejected or not. By using
the t distribution table, and with the degree of freedom being 29 which is found by the following
formula,
Degree of freedom = n -1
We can infer that the critical value is 1.699 and the conditions which null hypothesis can be
rejected or not can also be established. The conditions are:
Decision Rule: If the computed value of z is not between −1.699 and 1.699, reject
the null hypothesis. If z falls between −1.699 and 1.699, do not reject the null
hypothesis.
Step 5: Make a decision.

In this step, we compute the value of test statistic and compare it to the critical value which was
obtained in step 4. By comparing it we can decide whether to reject the null hypothesis or not. By
using the following formula, we compute the value of t.
t = (X ̅- μ)/(s/√n)
We know that:
X ̅= 2283163; μ= 2000000; s = 762164.1704; n = 30
t = ((2283163-2000000) *√ (30))/762164.1704
t = 2.03
We can infer that, 2.03 falls in the rejection region, we decide to reject H0.
Step 6: Interpret the result.

We reject the null hypothesis and conclude that the population mean attendance of the teams was
more than 2 million.
11. Nonparametric Methods: Nominal Level Hypothesis

Set up a variable that divides the teams into two groups, those that had a winning season and
those that did not. There are 162 games in the season, so define a winning season as having
won 81 or more games. Next, find the median team salary and divide the teams into two
salary groups. Let the 15 teams with the largest salaries be in one group and the 15 teams
with the smallest salaries be in the other. At the .05 significance level, is there a relationship
between salaries and winning?
To determine if there is a relationship between salaries and winning, We can use a contingency
table which will be able to test whether two tests or characteristics are related.
From the raw data, we can establish a contingency table using the conditions given in the above
question.
Contingency Table
High Salary Low Salary Total
Winning 9 7 16
Not Winning 6 8 14
Total 15 15 30
Step 1: Stating the null and alternate hypothesis.

Null Hypothesis: H0 = There is no relationship between the Salary and Wins
Alternate Hypothesis: H1 = There is a relationship between Salary and Wins
Step 2: Level of Significance
From the question, we can infer the significance level which is,
Significance level = 0.05
Step 3: Identify test statistics.
For the selection of test statistics, we will use chi-square, X2.
Step 4: Formulating Decision Rule
We can formulate the decision rule by finding the X2. First, we will find the degrees of freedom
by using the following formulae.
Degree of Freedom = (No. of rows - 1) (No of columns - 1).
= (2-1) * (2-1)
=1
With the level of significance being 0.05 and the degree of freedom being 1 we can compute the
X2. value from the chi-square table as follows.
X2. value from chi-square table: 3.84

So, the decision rule is to reject the null hypothesis, if the chi-square is > 3.84
Step 5: Make a Decision
To Make a decision whether to reject or not to reject the H0, we have to compute the chi-square
value using the following formulae.
Chi-Square: X2. = ((f0-fe) ^2/fe)
To compute the X2., We must find the expected frequency which is calculated by the following
expression.
Formula for expected Frequency: fe = (Row total * Column total) /total
Contingency Table - Observed Frequency

High Salary Low Salary Total
Winning 9 7 16
Not Winning 6 8 14
Total 15 15 30
From the above tables, we can compute the value:
Chi-Square: X2. = ((f0-fe) ^2/fe)
= 0.13 + 0.13 + 0.14 + 0.14
= 0.54
Because the chi-square value (x2) value is less than 3.84, we do not reject the null hypothesis.
Contingency Table - Expected Frequency

High
Salary Low Salary Total
Winning 8.00 8.00 16
Not Winning 7.00 7.00 14
Total 15 15 30
f0 fe f0 fe
High Salary Low Salary
Winning 9 8 7 8
Not Winning 6 7 8 7
Step 6: Interpret the results.

The sample data do not provide enough evidence that salary and wins are related.
12. Correlation and Linear Regression
Let attendance be the dependent variable and total team salary be the independent variable.
Determine the regression equation and answer the following questions.
a. Draw a scatter diagram. From the diagram, does there seem to be a direct relationship
between the two variables?
Scatter Plot y = 12960x + 917096

4,500,000
4,000,000
3,500,000
Attendance
3,000,000
2,500,000
2,000,000
1,500,000
1,000,000
500,000
0
0.00 50.00 100.00 150.00 200.00 250.00
Salary
From the above plot, we can observe a direct relationship as there is a positive correlation between
attendance and team salary which is 0.4037. This can be considered as a moderate positive
correlation.
b. What is the expected attendance for a team with a salary of $100.0 million?
From the plot we can get the linear regression equation:
y = 12960x + 917096
for the expected attendance for a team with a salary of $100 million:
y = 12960(100) + 917096
y = 1296000 + 917096
y = 2,213,096
A total number of 2,213,096 attendees is the expected attendance for the team.
c. If the owners pay an additional $30 million, how many more people could they expect to
attend?
For an additional $30 million, the total salary will be $130 million:
y = 12960(130) + 917096
y = 1296000 + 917096
y = 2,601,896
A total number of 2,601,896 attendees is the expected attendance for the team.
d. At the .05 significance level, can we conclude that the slope of the regression line is
positive? Conduct the appropriate test of the hypothesis.
Step 1: State the null and the alternate hypothesis

H0: ρ ≤ 0 The correlation between team salary and attendance is negative or zero.
H1: ρ > 0 The correlation between the team salary and attendance is positive.
Step 2: The level of significance is .05
Step 3: Select the test statistic, we use t.
Step 4: Formulate the decision rule, reject H0 if t > 1.701
Degree of freedom = n-2 = 30-2 =28
t = r√(n-2) / √(1-r2)
t = .635340317√(30-2) / √(1-.6353403172)
t = 4.353489327
Step 5: Make a decision, reject H0, t = 4.3534
Step 6: Interpret whether there is a correlation with respect to salary and attendance.
From the above one-tail test the null hypothesis H0 is rejected and there is a positive correlation
between Team salary and attendance.
e. What percentage of the variation in attendance is accounted for by salary?
R square indicates the percentage of variation which is 40.37% (0.403657318).
f. Determine the correlation between attendance and team batting average and between
attendance and team ERA. Which is stronger? Conduct an appropriate test of the hypothesis
for each set of variables.
Calculating the respective correlations:

There is a weak positive correlation between attendance and team batting: 0.309906328
The is a weak negative correlation between attendance and ERA: -0.359561518
The correlation between attendance and team ERA (approx. 35% inversely related) is significantly
stronger than the attendance and batting average (approx. 30.9% directly related).
Test of hypothesis for each set of variables

For attendance and team batting average:
H0: ρ ≤ 0 The correlation between team batting average and attendance is negative or zero.
H1: ρ > 0 The correlation between team batting average and attendance is positive.
Step 3: Select the test statistic, we use t
Degree of freedom = n-2 = 30-2 =28
t = r√(n-2) / √(1-r2)
t = .309906328√(30-2) / √(1- .3099063282)
t = 1.724786393
Step 5: Make a decision, reject H0, t = 1.7247 is greater than 1.701
From the above one-tail test the null hypothesis H0 is rejected. Hence, there is a positive
correlation between Team batting average and attendance.
For attendance and team ERA:

H0: ρ ≤ 0 The correlation between team ERA and attendance is negative or zero.
H1: ρ > 0 The correlation between team ERA and attendance is positive.
Step 3: Select the test statistic, we use t.
t = r√(n-2) / √(1-r2)
t = -.359561518√(30-2) / √(1- .3595615182)
t = -2.038985228
Step 5: Make a decision, we cannot reject H0, t = -2.0389, which is less than 1.701
From the above one-tail test there is no sufficient evidence to reject the null hypothesis H0.
Hence, there is a negative correlation between Team ERA and attendance.

Explanation For Project Major League Baseball

Uploaded by

Copyright:

Available Formats

You might also like

Explanation For Project Major League Baseball

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Explanation For Project Major League Baseball

Uploaded by

Copyright:

Available Formats

Master of Management

Odette School of Business, University of Windsor, Canada

Submitted to: Dr. Mostafa Moussa

b. Determine the level of measurement for each of the variables.

2.Frequency Tables, Distributions and Graphs

Box Plot Variables for Stadium Age

Scatter Plot Wins Vs Salary y = 0.1252x + 67.769

Correlation Coefficient between wins and salary = 0.29

1. What is the probability that a team wins 90 or more games?

Probability = 10/30 = 0.33 or 33%

Frequency distribution of Home Runs in the National League

3. Make statements comparing the two distributions.

6. Discrete Probability Distributions

8. Sampling Methods and the Central Limit Theorem

9. Estimation and Confidence Intervals

Step 6: Interpret the result.

Step 5: Make a decision.

Step 6: Interpret the result.

11. Nonparametric Methods: Nominal Level Hypothesis

Step 1: Stating the null and alternate hypothesis.

X2. value from chi-square table: 3.84

Contingency Table - Observed Frequency

Contingency Table - Expected Frequency

Step 6: Interpret the results.

Scatter Plot y = 12960x + 917096

From the plot we can get the linear regression equation:

Step 1: State the null and the alternate hypothesis

e. What percentage of the variation in attendance is accounted for by salary?

R square indicates the percentage of variation which is 40.37% (0.403657318).

Calculating the respective correlations:

Test of hypothesis for each set of variables

For attendance and team ERA:

You might also like