Mathematics For Management - Statistics Section

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

3/10/22, 10:38 AM Mathematics for Management: Statistics Section

This document is authorized for use only by RAJESH KUMAR NAYAK. Copy or posting is an infringement of copyright.

Mathematics for Management: Statistics Section


Pretest Introduction

Welcome to the pre-assessment test for the Mathematics for Management tutorial. This test will allow you to assess your knowledge of Mathematics for Management.
All questions must be answered for your exam to be scored.

Navigation:

one question to the next, select one of the answer choices or, if applicable, complete with your own choice and click the “Submit” button. After submitting your
To advance from
answer, you will not be able to change it, so make sure you are satisfied with your selection before you submit each answer. You may also skip a question by pressing the forward
advance arrow. Please note that you can return to “skipped” questions using the “Jump to unanswered question” selection menu or the navigational arrows at any time. Although
you can skip a question, you must navigate back to it and answer it - all questions must be answered for the exam to be scored.

Your results will be displayed immediately upon completion of the exam.

After completion, you can review your answers at any time by returning to the exam.

Good luck!

Introduction

During your business degree program, you will use mathematics in many situations. In your economics courses, you will have to determine how demand is related to price. You
might even use basic calculus to come up with a profit-maximizing price. In your statistics course, you might be expected to know the basic laws of probability. In your finance
courses, you will need to understand the mathematics behind valuing cash flows. Most of your professors will expect you to know how to solve simple equations and do basic
manipulations of algebraic formulas. You may not have used algebra, calculus, probability, and statistics for five or ten years. If you are an undergraduate English or music major
now entering a graduate business program, you may have never studied calculus or basic probability and statistics.

The purpose of our course is to help level the playing field by giving you the analytic background you need to hit the ground running and complete a top MBA program successfully.
We will try to make the concepts as interesting and easy to learn as possible. You may find it useful to refer to the Mathematics for Management Concept Summary while taking the
course. Let's get started!

Setting

Statistics

INTRODUCTION

The analysis of data is crucial to business. In finance class, you will analyze returns on stocks and other investments. In your operations and marketing classes, you will analyze
monthly demand for products that are being sold. This section of the course begins by introducing you to the basics of data analysis.

Summation Notation

INTRODUCTION

Suppose you want to add up the first 100 even positive integers. You could write a lengthy addition operation that specifies all 100 digits — i.e., 2 + 4 + 6 + ... + 198 + 200. A less
cumbersome, more elegant way to represent the operation is with the symbol ∑, which means summation.


i
=
1
100
2
i

The notation dictates that for each of the first 100 positive integers i, find 2i; then add the results together. The only values of i for which you determine 2i are those from 1
through 100 — they are, respectively, the lower and upper limits of the summation. The summation itself, or sigma, of all the 2i calculations (2 + 4 + 6 + ... + 198 + 200) is your
answer: 10,100.

USING SUMMATION NOTATION TO EXPRESS AN AVERAGE


When you analyze data, you often need to find the average of n numbers. The list of numbers is written as x1, x2, ..., xn, and the average of them is written with the x-bar symbol.
In summation notation, you write:

x
¯
=
1
n

i
=
1
n
x
i

It means that to find the average of n numbers, add up the n numbers and divide the sum by n. For example, if x1 = 3, x2 = 5, and x3 = 4, then

x
¯
=
1
3

i
=
1
3
x
i
=
1
3
3
+
5
+
4
=
4

1/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

EXERCISES
(1) Evaluate ∑
i
=
1
4
3
i
-
1

(2) Smalltown Bagels bakes n types of bagels. Today the shop is planning to bake xi type i bagels, which each cost ci dollars to produce.

a. In summation notation, write an expression for the total cost of baking today's bagels.

b. Given the following assumptions

n=3
c1 = $1.20, x1 = 100
c2 = $1.50, x2 = 50
c3 = $2.00, x3 = 50

     compute the total cost of baking today's bagels.

Using Bar Graphs and Histograms to Summarize Data

INTRODUCTION

Summarizing data often yields important managerial insights. There are two main ways to summarize data:
         1. using a bar graph, or histogram, that gives a graphical summary of the data

         2. using descriptive statistics such as mean, median, mode, and standard deviation

Let's start with bar graphs and histograms.

SETTING THE BIN RANGES FOR THE BAR GRAPH


The following table gives the height (in inches) of all of the girls in Tina's seventh-grade homeroom.

2/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

You can easily summarize these data in a bar graph.

First, divide the data into 5 to 10 categories, or bin ranges, of equal size. In this case, you might create seven bins: one for girls up to and including 58 inches tall, another for
girls over 58 inches up to and including 60 inches tall, a third for girls over 60 inches up to and including 62 inches tall, and so on. For example, 13 girls fall in the range of
heights over 60 inches up to and including 62 inches (see the pink shaded cells). Note: There are other ways to treat bin ranges; we are using the convention used by the
Histogram tool in Microsoft Excel.

Next, create a frequency table that identifies how many data points, or observations, fall into each bin range.

CONSTRUCTING THE FREQUENCY TABLE

INSTALLING THE ANALYSIS TOOLPAK FOR A HISTOGRAM


If your data include thousands of observations, manual construction of a frequency table is impractical. Excel makes it easy to construct a histogram for a data set of any size.

Before using Excel to create a histogram, install the Analysis Toolpak.

CONSTRUCTING A HISTOGRAM WITH EXCEL


Let's construct a histogram for the heights of the girls in Tina's homeroom. Please download the file histogram.xlsx.

EXERCISES
Please download the file histogramdata.xlsx.

(1) The Salaries worksheet contains the annual salaries (in thousands of dollars) for the employees of the Smalltown tourist bureau. With bin ceilings of 40, 50, 60, 70, 80, and
90, construct a bar graph of employee salaries.

(2) The Microsoft worksheet gives a sample of daily percentage returns on Microsoft stock. Use Excel to summarize these data with a histogram. For your bin ranges, use upper
boundaries of -20%, -15%, -10%, -5%, 0%, 5%, 10%, and 15%.

Measures of Central Tendency


INTRODUCTION

It's often practical to summarize data with a single number that typifies the data set. For example,

3/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
What is the typical number of ounces in a can of Coca-Cola?
What is a typical family income in Smalltown?
What is the typical number of points that a team scores in a game?

This section focuses on three measures of central tendency for a data set: mean, median, and mode.

DEFINING THE MEAN, MEDIAN, AND MODE


Suppose that you have a set of n numbers x1, x2, ..., xn.

The mean is simply the average of the n numbers. It is usually written as x-bar and expressed as  
x
¯
=
1
n

i
=
1
n
x
i
.

To compute the mean, simply add up all of your observations and divide by the number of observations.

The median is the halfway mark between the lower and upper extremes of the list of numbers in a data set. To find the median, first order the numbers from smallest to
largest. If n is odd, the median is the (n + 1)/2 smallest number. For example, if the data set includes 9 numbers, calculate (9 + 1)/2 = 5 to find that the median is the
fifth-smallest number (the one in the middle). Of the eight other numbers, four are smaller than the median and four are larger. If n is even, the median is the average of
the n/2 smallest number and (n + 2)/2 smallest numbers. For example, if the data set includes 10 numbers, calculate 10/2 = 5 and (10 + 2)/2 = 6. The median is,
therefore, the average of the fifth- and sixth-smallest numbers. Five of the numbers are smaller than the median and five are larger; the median sits in between these two
groups.

The mode is the most frequently occurring number in a data set. A data set can have more than one mode (for the numbers that occur most frequently may be identical in
their frequency). If no number occurs more than once in a data set, the data set has no mode.

EXAMPLE OF COMPUTING THE MEAN


Suppose that the six employees of the Happytail Vet Clinic earn the following salaries (in thousands of dollars): 80, 30, 40, 50, 70, and 30. The mean salary is simply 80
+
30
+
40
+
50
+
70
+
30
6
=
300
6
=
$50,000.

4/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

EXAMPLE OF COMPUTING THE MEDIAN


Let's compute the median salary of the six Happytail employees.

EXAMPLE OF COMPUTING THE MODE


Given that the employee salaries are 80, 30, 40, 50, 70, and 30, what is the mode? Two employees make exactly $30,000, and no other salary occurs more than once. Therefore,
the mode is $30,000.

If another employee were hired at a salary of $80,000, there would be two modes: $30,000 and $80,000.

If, instead, one of the two employees making $30,000 were to leave, there would be no mode.

USING EXCEL TO COMPUTE MEASURES OF CENTRAL TENDENCY

5/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
For large data sets, calculating the mean, median, and mode is difficult to do manually. Excel's AVERAGE, MEDIAN, and MODE functions make it simple.

Please download the file Colts.xlsx. The file contains the number of yards gained on all passing plays attempted by the 2006 Super Bowl Champion Indianapolis Colts. You can
use Excel to compute the mean, median, and mode of the number of yards gained on a passing play.

EXERCISES
(1) Ten geography majors at the University of North Carolina had the following starting salaries (in thousands of dollars): 20, 25, 30, 28, 35, 20, 20, 25, 40, and 757. Find the
mean, median, and mode of these salaries. Which seems to be the best measure of a typical geography major's starting salary?

(2) Find the mean, median, and mode for both data sets in the file Histogramdata.xlsx.

Skewness and Measures of Central Tendency

INTRODUCTION

The mode is rarely used as a measure of central location. If a shoe store could only stock one size, it would probably stock the modal shoe size. In most situations, however, we
use the mean or median as a measure of central location for a data set. In general, we use the mean as a measure of central location unless extreme values greatly distort the
mean. The U.S. government reports family income for the country as a whole as a median, not a mean. A football team's offense is assessed in terms of the average, not the
median, points scored per game. Why use the median in the first situation and the mean in the second? The answer is that people with large incomes distort, or skew, the mean
family income; the median is not subject to that distortion. Before identifying precisely when to use the mean or median as a measure of central tendency, let's return to the topic
of histograms and define the concept of skewness.

SYMMETRIC DATA
A data set is symmetric if the data set's histogram has a single peak at the center and "looks the same" to the left and right of the most likely value of the data. The following
histogram displays IQs of students at Smalltown High School.

6/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

A symmetric data set's mean, median, and mode are approximately equal because the peak is at the center and the declines to the left and to the right of the peak occur at the
same rate. For example, nearly as many people have IQs around 95 as have IQs around 115.

POSITIVELY SKEWED DATA


A data set exhibits positive skewness (or is "skewed right") if its histogram has a single peak and the values of the data extend much farther to the right than to the left of the
peak. The following histogram describes the family income (in thousands of dollars) of Smalltown's residents.

The histogram shows that the most common income range is $30,000 to $50,000. Some people earn more than $300,000, whereas some earn $10,000 or less. Because the data
extend farther to the right of the peak than to the left, family incomes in Smalltown are positively skewed.

NEGATIVELY SKEWED DATA


A data set exhibits negative skewness (or is "skewed left") if its histogram has a single peak and the values of the data extend much farther to the left than to the right of the peak.
The following histogram shows the number days from conception to birth for babies born at Smalltown Hospital.

The most common category is "more than 280 days." Because the data extend much farther to the left of the highest bar than to the right, days from conception to birth is
negatively skewed.

THE EXCEL SKEW FUNCTION

7/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
If the data you are analyzing are not skewed, use the mean as the measure of central tendency. In cases of great skewness, use the median as the measure of central tendency to
avoid distortion by extreme values.

You can usually assess skewness by simply eyeballing a histogram. To be precise about measuring skewness, apply the Excel SKEW function to a data set.

If SKEW > +1, the data are positively skewed and the median is the better measure of central tendency.
If SKEW < -1, the data are negatively skewed and the median is again the better measure of central tendency.
If SKEW is between -1 and +1, the data are relatively symmetric and the mean is the better measure of central tendency.

MEAN OR MEDIAN?
Please download file Skewness.xlsx. Let's compare the mean and the median as measures of central tendency for the IQ, income, and conception-to-birth data sets. In the cell
range D3:F3, the skewness for each data set has been computed — e.g., using the formula =SKEW(D8:D657) for IQs in cell D3. The median, mode, and mean for each data set
have been computed using the MEDIAN, MODE, and AVERAGE functions, respectively.

Click the column titles below to choose the appropriate skew behavior.

The data reveals the measure of central tendency that is best for each data set.

RELATION BETWEEN MEAN, MEDIAN, AND SKEWNESS


For positively skewed data sets, the mean is greater than the median. For negatively skewed data sets, the mean is less than the median. For relatively symmetric data sets, the
mean and median are usually very close in value. The three example data sets are consistent with these rules.

IQs are symmetric, and the mean (100.04) and the median (100) are virtually identical.
Income is positively skewed, and the mean (67.745) is larger than the median (48).
Days from conception to birth is negatively skewed, and the mean (259.9) is smaller than the median (269.5).

8/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

EXERCISES
(1) Please download file Income.xlsx. The file contains data that are representative of the income of U.S. families (adjusted for inflation) during the years 1975, 1985, 1995, and
2005. Does it appear that Americans were becoming better off as the decades passed?

(2) For the Colts' passing data, what measure of central location would you use?

Measures of Variability
INTRODUCTION

Sarah Lopez Clooney is trying to determine in which of two stocks to invest a client's money. For each of the last six years, the annual percentage returns (expressed as a
decimal) for the stocks were as follows:

Stock 1: .18, .22, .20, .20, .19, .21

Stock 2: -.4, .8, -.4, .8, -.4, .8

For each stock, the mean and median return for the last six years is .2. Therefore, the stocks are identical with respect to "typical" value. If you assume (naively) that the past is a
good predictor of the future, these two stocks seem to be equally good investments. Most investors, however, would choose Stock 1, because its annual returns are more
consistent than those on Stock 2. In this segment, you will learn how to use variance and standard deviation to measure the dispersion, or spread, of the data set about its mean.

DEFINITION OF SAMPLE VARIANCE


9/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
Suppose that you have a data set of the n observations x1, x2, ..., xn . The sample variance of the data set (written as S2) is expressed as S
2
=
1
n
-
1

i
=
1
n
x
i
-
x
¯
2
.

If you divide by n instead of n - 1, the sample variance is the average squared deviation of each data point from the average of the data. The reasons why you should divide by n - 1
instead of n are complex enough to defer them to your statistics class.

The kind of data set that has the least spread about its mean is, not surprisingly, one in which all points have the same value and, thus, all equal the mean. Such a data set has a
sample variance of zero.

DEFINITION OF SAMPLE STANDARD DEVIATION


The sample standard deviation S is simply the square root of the sample variance. Sample deviation is often used as a measure of spread or dispersion in a data set because the
sample standard deviation has the same units as the data. For example, if your data are in dollars, the sample variance is given in dollars squared. But what exactly is a dollar
squared? Taking the square root of the variance to obtain the standard deviation has the effect of returning the units to dollars.

COMPUTING THE SAMPLE VARIANCE


How can Sarah determine which of the two stocks is a better investment? Since both stocks have around the same average return, Sarah will want to recommend the
less risky investment. Risk is often measured by the standard deviation of a stock's annual return. Therefore, you need to calculate the standard deviation of each stock's annual
return. You do this by computing the variance of each stock's annual return and then determine the standard deviation of each stock's annual return as the square root of the
variance. After completing your calculation, click on your stock choice below.

COMPUTING THE SAMPLE STANDARD DEVIATION


For each stock, the sample standard deviation is simply the square root of the sample variance. Therefore, for Stock 1, the sample standard deviation =
.0002
=
.0141
. The
sample standard deviation for Stock 2 =
.432
=
.657
. Since Stocks 1 and 2 have the same mean and Stock 2 has a much larger standard deviation than Stock 1, most investors
would prefer the less risky Stock 1 to Stock 2.

USING EXCEL
10/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
For any data set, Excel makes it easy to find the sample variance and standard deviation. Use the VAR function to find the sample variance and the STDEV function to compute
the sample standard deviation.

Please download the file Samplevariance.xlsx. In the file, the VAR and STDEV functions have been used to determine the sample variance and standard deviation for each stock.
For example, for Stock 1 in cell C13, the sample variance has been computed with the formula =VAR(C7:C12), and in cell C14 the sample standard deviation has been computed
with the formula =STDEV(C7:C12).

EXERCISE
(1) The heights (in inches) of the members of Smalltown High School girls' basketball team are 68, 70, 64, 62, and 68. Compute the sample variance and sample standard
deviation of these heights. Use Excel to verify your computations.

The Rule of Thumb and Outliers


INTRODUCTION

William Edwards Deming (1900-1993) was an American quality-control guru who stressed the importance of understanding "normal variation" in a business process. When a
data set has a symmetric histogram (skewness between -1 and +1), you can usually gain insight into the "normal range of variation for a data set" by relying on the following rule
of thumb involving the sample mean x-bar and sample standard deviation S:

68% of the data points are within S of the mean (between x-bar − S and x-bar + S).
95% of the data points are within 2S of the mean (between x-bar − 2S and x-bar + 2S).
99.7% of the data points are within 3S of the mean (between x-bar − 3S and x-bar + 3S).

Any data point that is more than 2S from the mean is designated an unusual observation or outlier. Deming showed how identifying the cause of "unfavorable" outliers can help
you prevent them from occurring again. Let's now apply these ideas to the distribution of IQs. The graph would look like this:

COMPUTING RULE OF THUMB LIMITS MONTHLY STOCK RETURNS


Please download the file Cisco.xlsx. The file contains monthly stock returns for Cisco Systems during the 1990s. Use the rule of thumb to decipher the "normal variation" in
monthly returns of Cisco. In cells E7:E12, the bounds for the rule of thumb have been computed. The results and formulas used are shown here:

Cells E3:E5 reflect computations of the mean (0.055), standard deviation (0.122), and skewness for the monthly returns. The skewness of .104 indicates that the Cisco returns
are symmetric, so you would expect the rule of thumb to be approximately valid for this data set.

Computing the limits for the rule of thumb in cells E7:E12 reveals that

11/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
99.7% of the monthly returns should be between .055 ± 3 (.122) — i.e., between -31% and 42%.
95% of the monthly returns should be between .055 ± 2 (.122) — i.e., between -19% and 30%.
68% of the monthly returns should be between .055 ± .122 — i.e., between -7% and 18%.

HOW WELL DOES THE RULE OF THUMB DESCRIBE MONTHLY STOCK RETURNS?
According to the example, "normal variation" for monthly Cisco returns is between -19% and 30%. Therefore, a month in which Cisco returned, say, 28% or -15% would not be
surprising. Any month during which Cisco returned less than -19% or more than 30% would be an outlier.

Highlighted in gray are Cisco monthly returns that fell within one standard deviation of the mean. In light and dark orange are returns that fell within two standard deviations in
either direction of the mean. Finally, the dark orange bars represent returns that fell more than 2S from the mean. No returns fell more than 3S from the mean. Of the 130
monthly returns that constitute our data set, 9, or 6.9%, deviated from the mean by more than 2S. Thus, 6.9% (close to the rule of thumb prediction of 5%) of returns were more
than 2S from the mean. Of the 130 returns, 43, or 33% (close to the rule of thumb prediction of 32%), were more than S from the mean.

12/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

EXERCISES
The file Cisco.xlsx also contains monthly stock returns for GM and Microsoft. Use these data to answer the following questions.

(1) You would expect 95% of the Microsoft monthly returns to be between __________ and ________.

(2) You would expect 68% of the GM monthly returns to be between __________ and __________.

Covariance and Correlation


INTRODUCTION

So far in our study of statistics we have discussed how to use measures of central tendency and variability to summarize a data set. We now turn our attention to studying how
to measure the strength of the relationship between two data sets. For example, how is the price of a house related to the size of the house? How are the returns on two stocks
related? How is a high school senior's SAT score related to his college GPA? The relationship between two data sets is usually measured by the covariance and correlation
between the two data sets.

COVARIANCE DEFINITION
Given n points (x1, y1), (x2, y2), ...(xn, yn), the covariance between data sets X and Y is given by

Covariance(X,Y) = ∑
i
=
1
n
x
i
-
x
¯
y
i
-
y
¯
n
-
1

Suppose that X and Y tend to go up and down together. That is, when X is larger than average, then Y is usually larger than average and when X is smaller than average, then Y is
usually smaller than average. Then most of the terms in the numerator of our covariance formula will be positive and the covariance will be positive. Conversely, suppose that
when X is larger than average, then Y is usually smaller than average and when X is smaller than average, then Y is usually larger than average. Then most of the terms in the
numerator or our covariance formula will be negative and the covariance will be negative. Therefore, if X and Y "covary" in the same direction, their covariance will be positive,
whereas if X and Y covary in opposite directions, their covariance will be negative.

In summary, a positive covariance indicates that X and Y tend to go up or down together whereas a negative covariance indicates that X and Y tend to move in opposite directions
(relative to their averages). Note that covariance only measures the strength of a linear relationship and is not useful for detecting nonlinear relationships
between variables. Therefore, covariance is a measure of linear association between two variables.

SAMPLE DATA FOR COMPUTING COVARIANCE


Please download the file Correlcov.xlsx. The file gives the size in square feet and price (in dollars) of five houses in Smalltown.

13/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

When you graph these five points in the x-y plane it becomes clear that bigger houses tend to sell for a higher price.

SAMPLE DATA FOR COMPUTING COVARIANCE


Here X = size of house in square feet and Y = price of house. For example, x1 = 1500 and y1 = $140,000.

Note that for houses 1 and 2, both size and price are below average, whereas for houses 4 and 5, size and price are above average. For house 3, size is average and price is slightly
above average. Therefore, you expect that the covariance between home size and price will be positive.

COMPUTING THE COVARIANCE


We find that

x
¯
=
1,500
+
2,000
+
2,500
+
3,000
+
3,500
5
= 2500 square feet and

y
¯
=
140,000
+
260,000
+
330,000
+
345,000
+
420,000
5
= $299,000

Applying the formula for covariance, you find that

Covariance(X,Y) =

14/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section
Or Covariance (X,Y) = 80,625,000 sq. ft. dollars.

The positive covariance indicates that home size and home price tend to go up and down together. As you will now see, however, it is difficult to interpret the magnitude of the
covariance.

COVARIANCE DEPENDS ON UNITS!


As we will now show, the covariance between two variables depends on the units in which the variables are measured. This makes it very difficult to interpret the magnitude of a
covariance. Continuing with our home example, suppose that we measure the size of a home in thousands of square feet and measure the price in $100,000s of dollars. Then our
data looks like

For example, 1500 square feet is 1.5 thousand square feet, whereas the $140,000 home price is 1.4 (in units of $100,000). You can see that in the numerator of each term of the
covariance, our home size will be divided by 1000 and each home price will be divided by 100,000. This means that each term in the numerator or the covariance is divided by
(1000)(100,000), or 100 million. Therefore, the covariance will now be the original covariance of 80,625,000 divided by 100,000,000. That yields a covariance of .80625,
measured in units of (thousands of square feet) × (hundreds of thousands of dollars). Since covariance depends on the units in which the data are measured, interpreting the
magnitude of a covariance is difficult. We now turn our attention to developing the correlation coefficient (called r), which is a unit-free measure of the strength of a linear
relationship between two variables.

CORRELATION DEFINITION
The Pearson correlation (usually denoted by r) is a unit-free measure of the degree of linear association between two data sets X and Y. Given n points (x1, y1), (x2, y2), ...(xn, yn),
the covariance between data sets X and Y is given by

r = Correlation(X,Y) = Covariance
X
,
Y
S
x
S
y

Here SX = sample standard deviation of X and SY = sample standard deviation of Y.

It can be shown that for any set of n points, -1 ≤ r ≤ 1. The correlation r is a unit-free measure of the degree of linear association between the data sets X and Y. Values of r may be
interpreted as follows:

Values of r near -1 indicate a strong negative linear relationship between X and Y. When X is larger than average, Y is almost always smaller than average; when X is
smaller than average, Y is almost always larger than average.
Values of r near -.5 indicate a moderate negative linear relationship between X and Y. When X is larger than average, Y tends to be smaller than average; when X is
smaller than average, Y tends to be larger than average.
Values of r near 0 indicate a weak linear relationship between X and Y. When X is larger than average, Y has little or no tendency to be larger or smaller than average.
Similarly, when X is smaller than average there Y has little or no tendency to be larger or smaller than average.
Values of r near +.5 indicate a moderate positive linear relationship between X and Y. When X is larger than average, Y tends to be larger than average; when X is smaller
than average, Y tends to be smaller than average.
Values of r near +1 indicate a strong positive linear relationship between X and Y. When X is larger than average, Y is almost always larger than average; when X is
smaller than average, Y is almost always smaller than average.

EXAMPLE OF A STRONG POSITIVE LINEAR RELATIONSHIP


Plotted below are the size in square feet and the price in dollars for a sample of Smalltown homes.

15/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

The correlation between home size and home price is r = .90, which indicates a strong positive linear relationship between home size and home price. This high positive
correlation is consistent with the fact that the data points are tightly scattered about a straight line that has a positive slope.

EXAMPLE OF A MODERATE NEGATIVE LINEAR RELATIONSHIP


Plotted below are the daily price for lasagna dinners and the number of lasagna dinners sold at Smalltown's Italian restaurant. The correlation between price and demand is r =
-.50, indicating a moderate negative relationship between price and demand. Note that the data points are widely scattered about a line that has a negative slope.

EXAMPLE OF A WEAK LINEAR RELATIONSHIP


Plotted below are monthly returns on Microsoft and GM.

16/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

Note that the correlation of r = .06 is near 0. That result is reflected in the weak linear relationship shown in the graph.

EXAMPLE OF COMPUTING A CORRELATION


Recall that for the following five data points

Covariance(Home Size, Home Price) = 80,625,000 sq. ft. dollars.

Let's find Correlation(Home Size, Home Price). Simply compute SHome Size and SHome Price.

Since Mean Size = 2500 sq. ft. and Mean Price = $299,000, we find that

SHome Size = 1500


-
2500
2
+
2000
-
2500
2
+
2500
-
2500
2
+
3000
-
2500
2
+
3500
-
2500
2
4
= 790.57 sq. ft.

SPrice = 140,000
-
299,000
2
+
260,000
-
299,000
2
+
330,000
-
299,000
2
+
345,000
-
299,000
2
+
420,000
-
299,000
2
4
= $105,498.82.

Therefore, Correlation(Home Size, Home Price) = 80,625,000


790.57
105,498.82
= .967

Note that the units of the numerator are sq. ft. dollars. These are also the units of the denominator. Therefore, the correlation is unit-free.

USING EXCEL TO COMPUTE COVARIANCE


Recall the definition of covariance:

Covariance(X,Y) = ∑
i
=
1
n
x
i
-
x
¯
y
i
-
y
¯
n
-
1

If the values of X are in range 1 of our spreadsheet and the values of Y are in range 2 of our spreadsheet, then the Excel function COVAR(range1, range2) computes as ∑
i
=
1
n
x
i
-
x
¯
y
i
-
y
¯
n
.

This is called the population covariance. In most uses of covariance, you should to divide by n - 1, which yields the sample covariance. To convert Excel's covariance to a
sample covariance, multiply the result of the COVAR function by n/(n - 1). In the current example, you can obtain the sample covariance by multiplying the result of
the COVAR function by 5/4 = 1.25.

Please download the file Correlcov.xlsx.

17/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

In cell E23, compute the sample covariance (80,625,000 sq. ft. dollars) by entering the formula
=(5/4)*COVAR(D16:D20,E16:E20).

USING EXCEL TO COMPUTE CORRELATIONS


If the values of X are in range 1 of our spreadsheet and the values of Y are in range 2 of our spreadsheet, then the Excel function CORREL(range1, range2) computes the
correlation between X and Y. For example, entering in cell E22 the formula = CORREL(D16:D20,E16:E20) returns the .967 correlation between home size and home price.

DOES A STRONG CORRELATION IMPLY CAUSATION?


People often assume that if X and Y have a correlation near +1 or -1, then X causes Y or Y causes X. However, this need not be true. For example, let Xi = restaurants in city i and
Yi = swimming pools in city i. For the hundred largest cities in the U.S., you would surely find a correlation near +1. Does that mean that the presence of restaurants in a city
leads to the establishment of swimming pools? Or does the strong correlation imply that the presence of swimming pools causes the opening of restaurants? Of course not! The
reason for the strong correlation between the number of restaurants and the number of swimming pools in a city is that large cities have many swimming pools and restaurants
and that small cities have few swimming pools and few restaurants. In effect, this spurious correlation between restaurants and swimming pools is caused by a third variable: the
size of a city. In general, very advanced statistical methods are needed to prove true causality.

EXERCISES
(1) Annual returns on Hot Cakes Amalgamated and Bridges Consolidated stocks for the last five years are given below.

Find the covariance and correlation between the Hot Cakes and Bridges annual returns.

(2) Please download the file Nfldata.xlsx. The file contains the points scored by each NFL team and punts attempted during the 2008 season. Compute the covariance and
correlation between punts and points scored.

(3) For the data in the file Nfldata.xlsx, each team played 16 games. Compute points scored per game and punts attempted per game. Now compute the covariance and
correlation between these two data sets.

(4) There is a moderate negative correlation between points scored and punts attempted. Therefore, the SportsCenter anchors will sometimes say that the less you punt, the more
points you score. Hence you should never punt! What is wrong with this argument?

Appendices

18/19
3/10/22, 10:38 AM Mathematics for Management: Statistics Section

Appendix A: Mathematics Concept Summary


MATHEMATICS CONCEPT SUMMARY

You may find it useful to refer to the Mathematics for Management Concept Summary while taking the course. This .pdf document is available in the Briefcase as well.
Appendix B: Exercise Solutions
EXERCISE SOLUTIONS

As you work through the exercises at the end of each section, you may find it helpful to check your answers for accuracy. Below are links to spreadsheets that contain the
answers to each exercise presented in the tutorial. The answer sheets are organized by chapter for your convenience. You can also download these items from the Briefcase at any
time.

Algebra - algebraanswers.xlsx

Calculus - calculusanswers.xlsx

Statistics - statisticsanswers.xlsx

Probability - probabilityanswers.xlsx

Finance - financeanswers.xlsx

Final Exam Introduction

Welcome to the final exam for the Mathematics for Management tutorial. This test will allow you to assess your knowledge of Mathematics for Management.
All questions must be answered for your exam to be scored.

Navigation:

one question to the next, select one of the answer choices or, if applicable, complete with your own choice and click the “Submit” button. After submitting your
To advance from
answer, you will not be able to change it, so make sure you are satisfied with your selection before you submit each answer. You may also skip a question by pressing the forward
advance arrow. Please note that you can return to “skipped” questions using the “Jump to unanswered question” selection menu or the navigational arrows at any time. Although
you can skip a question, you must navigate back to it and answer it - all questions must be answered for the exam to be scored.

Your results will be displayed immediately upon completion of the exam.

After completion, you can review your answers at any time by returning to the exam.

Good luck!

Copyright Harvard Business School Publishing. Copying or posting is an infringement of copyright. Permissions@hbsp.harvard.edu or 617-783-7860.

19/19

You might also like