Statistics Assignment 2: Matteo Conta Femba 9B

STATISTICS
ASSIGNMENT
Matteo Conta
FEMBA 9B
This study source was downloaded by 100000831236167 from CourseHero.com on 11-11-2022 06:57:03 GMT -06:00
https://www.coursehero.com/file/17221507/Assignment-2-Consumer-Research-Inc/
Matteo Conta, FEMBA9 -
Consumer Research, Inc.

Consumer Research, Inc. is an independent agency that conducts research on consumer
attitudes and behaviors for a variety of firms. In one study, a client asked for an investigation
of consumer characteristics that can be used to predict the amount charged by credit card
users. Data were collected on annual income, household size, and annual credit card charges
for a sample of 50 customers.
Managerial Report
1. Use methods of descriptive statistics to summarize the data. Comment on the findings.
I will first analyze the three variables separately, to see how the data is distributed.
Statistics
Income_($1000s) Household_Size Amount_Charged ($)
N 50 50 50
Mean 43.4800 3.4200 3964.0600
Median 42.0000 3.0000 4090.0000
Mode 30.00 2.00 3890.00
Std. Deviation 14.5507 1.7390 933.4941
Skewness .096 .528 -.130
Std. Error of .337 .337 .337
Skewness
Range 46.00 6.00 3814.00
Minimum 21.00 1.00 1864.00
Maximum 67.00 7.00 5678.00
2*Std.dev/mean 67% 101% 47%
Income_($1000s)
Household_Size Amount_Charged ($)
8
16 10
14
6 8
12
10 6
4
8
4
6
2
Std. Dev = 14.55 4 2
Frequ
Std . Dev = 9 33 . 49
Mean = 43.5
Frequ
Freque
Std. Dev = 1.74

M e a n = 3 96 4 .1
N = 50.00 2
0 Mean = 3.4 N = 5 0 .0 0
0
20.0 30.0 40.0 50.0 60.0 N = 50.00
0
25.0 35.0 45.0 55.0 65.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
Income_($1000s) Household_Size Amount_Charged ($)
This study source was downloaded by 100000831236167 from CourseHero.co m

P agone112-1o1-f20922
https://www.coursehero.com/file/17221507/Assignment-2-Consumer-Research-
All three sets of data are significantly spread, since the standard deviation is a high percentage
of the mean. The income data seems to be uniformly distributed, as the histogram suggests.
The household size data is skewed, due to the fact that two persons compose the majority of
households. The amount charged data seems to be more normally distributed.
With the help of the correlation matrix, I will analyze the sample correlation coefficients
between each pair of variables:
Income Househ Amount

($1000s) old size Charged
Income Pearson 1.000 .173 .631
($1000s) Correlation
Sig. (2-tailed) . .231 .000
Household Pearson .173 1.000 .753
size Correlation
Sig. (2-tailed) .231 . .000
Amount Pearson .631 .753 1.000
charged Correlation
Sig. (2-tailed) .000 .000 .
Income_($1000s)
Household_Size
Amount_Charged ($)
There is a correlation between amount charged and income and between amount charged and
household size; on the other hand, it is possible to conclude with very high level of confidence

P agone113-1o1-f20922
(better that 99%) that there is no relationship between income and household size, as the
scatter matrix plot suggests visually.
2. Develop estimated regression equations, first using annual income as the
independent variable, and then using the household size as the independent variable.
Which variable is the better predictor of annual credit card charges? Discuss your
findings.
• Simple linear regression: Amount Charged vs. Annual Income
ˆ =2204
Yi + 40.48 X i
Where
ˆ
Yi = estimated, or predicted, Y value for Amount Charged in $
Xi = value of the independent variable, annual income in thousands of $
Significance of model:
Sum of df Mean F Sig.

Squares Square
Regressio 16999744 1 16999744 31.751 .000
n
Model is significant with more than 99% confidence level.
Residual analysis also shows no particular pattern and no problems of autocorrelation.
R R Square Adjusted R Square Std. Error of the Estimate

.631 .398 .386 732
About 40% of the variation in Amount Charged is explained by annual income.
The Standard Error of the Estimate is a quite significant portion of the possible predicted
values within the range: it is about 20% of the mode, and 40% of the minimum. This indicates
that the error in the prediction using this regression equation may be high, and I would
consider this model unacceptable if I were the client of Consumer Research, Inc.
P agone114-1o1-f20922
• Simple linear regression: Amount Charged vs. Household Size
ˆ =2581.94
Yi + 404.13 X i
Where
ˆ
Yi = estimated, or predicted, Y value for Amount Charged in $
Xi = value of the independent variable, household size

Squares Square
Regressio 2.4E+07 1 2.4E+07 63 .000
n
Residual analysis also shows no particular pattern and no problems of autocorrelation.

0.75 0.567 0.558 621
3
Household Size explains about 60% of the variation in Amount Charged.
The Standard Error of the Estimate is a quite significant portion of the possible predicted
values within the range: it is about 16% of the mode, and 33% of the minimum. This indicates
that the error in the prediction using this regression equation may be high, and I would
consider this model unacceptable if I were the client of Consumer Research, Inc.
Household Size is a better predictor of Amount Charged than Annual Income, since it does
explain about 60% of the variation in Amount Charged, against only about 40% for
Annual Income.
P agone115-1o1-f20922
3. Develop an estimated regression equation with annual income and household size
as the independent variables. Discuss your findings.
ˆ =1304.91 + X + X 2i
Yi
33.13 1i 356.3
Where
ˆ
Yi = estimated, or predicted Y value for Amount Charged, $
X 1i = value of the first independent variable Annual Income, in thousand of $
X 2i = value of the second independent variable, Household Size

Squares Square
Regressio 3.5E+07 2 1.8E+07 111 .000
n

0.91 0.826 0.8188 398
Household Size and Annual Income explain about 80% of the variation in Amount Charged.
The computer output also reports high level of significance for the single variables.
Residual analysis shows no violations of assumptions about the error term (trends, outliers,
patterns) and the functional form of the model. There is no problem of multicollinearity.
-1
-2
-3
-4
2000 3000 4000 5000 6000
Unstandardized Predicted Value
The Standard Error of the Estimate is reduced compared to the two previous simple linear
regression models; this means that the model has actually improved. However, the Standard

P agone116-1o1-f20922
Error of the Estimate is still a quite significant portion of the possible predicted values within
the range: it is about 10% of the mode, and 21% of the minimum. This indicates that the
error in the prediction using this regression equation may be high, and I would consider this
model unacceptable if I were the client of Consumer Research, Inc.
4. What is the predicted annual credit card charge for a three-person household with
an annual income of $40,000?
Yˆ + ×40 + 356.3 ×3 =3699

=1304.91 33.13
The predicted annual credit card charge for a three-person household with an annual
income of $40,000 is about $3,700. The 95% confidence interval for the mean of the annual
credit card charges for this values of income and household size is $3700 $120, while for an
individual prediction this interval widens to $3700 $800, which I would consider
unacceptable if I was the client of Consumer Research, Inc.
5. Discuss the need for other independent variables that could be added to the model.
What additional variables might be helpful?
Because the Standard Error of the Estimate is too big, as discussed in the previous questions, I
see the need to improve the model, with the introduction of new variables. In order for the
model to be really improved, new variables independent from the ones we already have need
to be found. This will help avoiding the problem of multicollinearity and improving the
Standard Error of the Estimate.
Following are some proposed variables with explanation on expected independence from the
ones we already have in the model.
P agone117-1o1-f20922
• Total household assets: I expect more money to be spent on credit card if the
household is rich in assets, which doesn’t necessarily mean rich in income. Simply
stated, I can afford to spend much of my income if I have already saved a lot for
retirement, I own my house. I expect this variable to be low correlated with income
(retired persons have high assets and low income, old workers have high income and
high assets, young workers have low income and low assets) and with household
size (if I have more kids, it does not necessarily mean that I have more assets).
• Number of credit card. The expected effect of this variable is that the higher the
number of credit cards, the higher the total amount charged is likely to be. It will be
low correlated with income, cause also low-income people have multiple cards, and
low correlated with household size, even though some correlation may exist there.
Last, but not least, the number of credit cards can be valuable information for the
client and it is easily measurable.
• Purchase preference: cash or credit. People in certain households may have more a
culture of spending by cash, meaning with money that they have, rather than by credit
card, meaning with money that they may not have. This dummy variable can add
some more precision into the model, and it is not correlated with the other proposed
since it is the only one to involve cultural habits.
• Cash flow. This variable could be useful to model the typical situation of persons
having a high income but limited money available to spend by credit card, since
almost all the income is used to pay off bills and loans. It may be more difficult to
gather, but it can add more precision to the model, and it should be rather
independent from the previously mentioned.
P agone118-1o1-f20922
• Average age of household. This variable is very easy to collect and will give
some additional help to the model, if it turns out that it is independent from the
others mentioned.
• Percentage of females in household. Based on the widespread belief that females
like shopping more than men, and they also like the credit card for that, this variable
could help refine further the model, and it is again very easy to collect.
P agone119-1o1-f20922

Statistics Assignment 2: Matteo Conta Femba 9B

Uploaded by

Copyright:

Available Formats

You might also like

Statistics Assignment 2: Matteo Conta Femba 9B

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Assignment 2: Matteo Conta Femba 9B

Uploaded by

Copyright:

Available Formats

STATISTICS

Consumer Research, Inc.

for a sample of 50 customers.

Std. Dev = 1.74

Income_($1000s) Household_Size Amount_Charged ($)

This study source was downloaded by 100000831236167 from CourseHero.co m

households. The amount charged data seems to be more normally distributed.

between each pair of variables:

Income Househ Amount

This study source was downloaded by 100000831236167 from CourseHero.co m

scatter matrix plot suggests visually.

2. Develop estimated regression equations, first using annual income as the

• Simple linear regression: Amount Charged vs. Annual Income

Sum of df Mean F Sig.

Model is significant with more than 99% confidence level.

Residual analysis also shows no particular pattern and no problems of autocorrelation.

R R Square Adjusted R Square Std. Error of the Estimate

About 40% of the variation in Amount Charged is explained by annual income.

• Simple linear regression: Amount Charged vs. Household Size

Sum of df Mean F Sig.

Model is significant with more than 99% confidence level.

Residual analysis also shows no particular pattern and no problems of autocorrelation.

R R Square Adjusted R Square Std. Error of the Estimate

Household Size explains about 60% of the variation in Amount Charged.

as the independent variables. Discuss your findings.

Sum of df Mean F Sig.

Model is significant with more than 99% confidence level.

R R Square Adjusted R Square Std. Error of the Estimate

Unstandardized Predicted Value

This study source was downloaded by 100000831236167 from CourseHero.co m

model unacceptable if I were the client of Consumer Research, Inc.

an annual income of $40,000?

Yˆ + ×40 + 356.3 ×3 =3699

unacceptable if I was the client of Consumer Research, Inc.

What additional variables might be helpful?

Standard Error of the Estimate.

ones we already have in the model.

client and it is easily measurable.

since it is the only one to involve cultural habits.

independent from the previously mentioned.

• Percentage of females in household. Based on the widespread belief that females

You might also like