Professional Documents
Culture Documents
Assignment No1 ML
Assignment No1 ML
1. Calculate the quarterly growth rate over the available data (1995.1 to 1998.4). The growth
rate can be calculated as:
Growth Rate = (Sales in Quarter N - Sales in Quarter N-1) / Sales in Quarter N-1
3. Forecast the sales for each quarter of 1999 by applying the average growth rate to the
sales in the last quarter of 1998 (1998.4).
So, the estimated gasoline sales in the United States for each quarter of 1999 are
approximately as follows (rounded to the nearest thousand barrels):
To find the product correlation (Pearson correlation coefficient) between the two variables m
and v, you can use the following formula:
Product Correlation (r) = Σ((m - ȳ)(v - ẍ)) / √[Σ(m - ȳ)² * Σ(v - ẍ)²]
Where:
- Σ represents the sum of values.
- m and v are the data points.
- ȳ and ẍ are the means of m and v, respectively.
Calculate the means and apply the formula to find the product correlation:
Mean of m (ȳ) = (1370 + 1350 + 1400 + 1330 + 1270 + 1210 + 1330 + 1350) / 8 = 1320.625
Mean of v (ẍ) = (2450 + 2480 + 2540 + 2420 + 2350 + 2290 + 2400 + 2460) / 8 = 2411.25
b. Give a reason to support fitting a regression model of the form m = a + bv to these data:
To find the slope (b) of the regression line, you can use the formula for linear regression:
b = r * (Sy / Sx)
Where:
- r is the product correlation coefficient (from part a).
- Sy is the standard deviation of m.
- Sx is the standard deviation of v.
m = a + bv
You already found the value of b in part c. To find a, you can use the means:
a=ȳ-b*ẍ
The value of b represents the slope of the regression line and indicates how much the
amount of money spent (m) changes for each additional visitor (v). In this context, if b is
positive, it means that as the number of visitors increases, the amount of money spent also
tends to increase, and if b is negative, it means the opposite.
f. Use your answer to part (d) to estimate the amount of money spent when the number of
visitors to the UK in a month is 2,500,000:
To estimate the amount of money spent (m) when the number of visitors (v) is 2,500,000,
you can use the equation from part d:
m = a + bv
g. Comment on the reliability of your estimate in part (f). Give a reason for your answer:
The reliability of the estimate in part (f) depends on the reliability of the linear regression
model and the assumptions underlying it. Here are some factors to consider:
- Linearity: The reliability of the estimate assumes that the relationship between m and v is
linear. If the relationship is not truly linear, the estimate may be less reliable.
- Sample Size: The reliability of the estimate can be influenced by the sample size. A larger
sample size generally leads to more reliable estimates.
- Outliers: If there are outliers in the data, they can significantly impact the reliability of the
estimate.
- Assumptions: Linear regression assumes that the residuals (differences between actual
and predicted values) are normally distributed and have constant variance. Violation of these
assumptions can affect the reliability of the estimate.
It's important to check these factors and assess the reliability of the estimate based on the
specific data and context.