Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

2ND ASSIGNMENT FOR DATA ANALYSIS FOR DECISION

MAKING

BY

DIPANWITA GHOSH

PGPMex APR_A’21
Q1 : A company manager says that the average balance on their credit cards is $500. Do you
think that this assertion is justified? Use a one-sample t-test to draw your conclusion.

H0 : The average balance on credit card is $500


H1 : The avaerage balance on creadit card is different from $500

P value is 0.19 which is more than 0.05.

Hence, we fail to reject H0.

Hence, we don't have enough evidence to say that the assertion of company manager about the
average balance in credit card being $500 is not justified.

t-Test: Two-Sample Assuming Unequal Variances

Balance DUMMY VARIABLE FOR ONE SAMPLE T TEST


Mean 520.015 0
Variance 211378.2253 0
Observations 400 2
Hypothesized Mean Difference 500
df 399
t Stat 0.870673781
P(T<=t) one-tail 0.192227914
t Critical one-tail 1.648681534
P(T<=t) two-tail 0.384455827
t Critical two-tail 1.965927296
Please refer sheet ANS 1 ONE SAMPLE T TEST in excel file.

Q2 : Is there a difference between men and women as far as average balance is concerned?
Use a two-sample t-test to draw your conclusion.

H0 - Average balance for Male and Female are same


H1 - Average balance for Male and Female are different

The P value (two tail) is not less than 0.05. Hence, we fail to reject the Null Hypothesis.

There is not enough evidence to say that there is difference between male and female as far as
average balance is concerned.
t-Test: Two-Sample Assuming Unequal Variances

BALANCE MALE BALANCE FEMALE


Mean 509.8031088 510.0932642
Variance 213554.5652 213091.7829
Observations 193 193
Hypothesized Mean Difference 0
df 384
t Stat -0.006171281
P(T<=t) one-tail 0.497539633
t Critical one-tail 1.648831425
P(T<=t) two-tail 0.995079266
t Critical two-tail 1.966160961
Please refer sheet ANS 2 TWO SAMPLE T TEST in excel file.

Q3 : Is there a difference between students and non-students as far as average balance is


concerned? Use a two-sample t-test to draw your conclusion.

H0 - Average balance for Students and Non Students are same


H1 - Average balance for Students and Non Students are different

The P value (two tail) is less than 0.05. Hence, we reject the Null Hypothesis.

There is enough evidence to say that there is difference between Students and Non Students as
far as average balance is concerned.

t-Test: Two-Sample Assuming Unequal Variances

STUDENT NON STUDENT


Mean 876.825 546
Variance 240101.9429 216710.7179
Observations 40 40
Hypothesized Mean Difference 0
df 78
t Stat 3.095702734
P(T<=t) one-tail 0.001363536
t Critical one-tail 1.664624645
P(T<=t) two-tail 0.002727072
t Critical two-tail 1.990847069
Please refer sheet ANS 3 TWO SAMPLE T TEST in excel file.
Q4 : It is generally assumed that if there are more credit cards then the balance on the cards
will be more. Based on this dataset, do you think this is true? Calculate a correlation coefficient
and show a scatter plot to support your answer.

The Correlation Coefficient of Number of Card VS Credit Balance is 0.086, which suggest
that there is not a very high correlation between these two parameters. The scatter plot also
portrays the same conclusion. Hence, we cannot say that the balance will be more if there are
more credit cards.

Cards Balance
Cards 1
Balance 0.086456347 1

Please refer sheet ANS 4 CORELATION in excel file.

Q5 : Examine whether the following demographic variables influence balance: (a) age, (b) years
of education, (c) marital status. For age and years of education, use scatter plots to depict their
relationship with balance and calculate the correlation coefficient. For the relationship
between marital status and balance, use a two-sample t-test to draw your conclusion

A. Age Vs Balance : The correlation coefficient (0.0018) is not high enough to say that the
age has influence on balance.
CORRELATION 0.001835119

AGE VS BALANCE
2500

2000

1500

1000

500

0
0 20 40 60 80 100 120

Please refer sheet ANS 5 DEMOGRAPHY VS BALANCE in excel file.

B. Education Vs Balance : The correlation coefficient (-0.08) is not high enough to say that
the years of education have influence on balance

CORRELATION OF
EDUCATION VS -0.008061576
BALANCE

YEARS OF EDUCATION VS BALANCE


2500

2000

1500

1000

500

0
0 5 10 15 20 25

Please refer sheet ANS 5 DEMOGRAPHY VS BALANCE in excel file.

C. Marital Status Vs Balance :

H0 - Average balance for Married and Unmarried people are same


H1 - Average balance for Married and Unmarried people are different

The P value (two tail) is more than 0.05. Hence, we fail to reject the Null Hypothesis.
Hence, there is not enough evidence to say that the marital status has influence on
balance.

t-Test: Two-Sample Assuming Unequal Variances

Married Balance Unmarried Balance


Mean 517.9428571 523.2903226
Variance 205696.7262 221735.0385
Observations 245 155
Hypothesized Mean Difference 0
df 319
t Stat -0.112233601
P(T<=t) one-tail 0.455354389
t Critical one-tail 1.649644319
P(T<=t) two-tail 0.910708777
t Critical two-tail 1.967428387
Please refer sheet ANS 5 DEMOGRAPHY VS BALANCE in excel file.

Q6 : “Ethnicity of the cardholder matter does not matter as far a balance is concerned.” Carry
out an analysis of variance (ANOVA) and discuss whether this statement is supported by the
data or not.

H0 : Ethnicity of the card holder does not matter as far as balance is concerned
H1 : Ethnicity of the card holder matters as far as balance is concerned

As the P-Value is more than significance value of 0.05, we fail to reject H0.

Hence, there is enough data to support the statement - "Ethnicity of the cardholder does not
matter as far as balance is concerned"

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
AFRICAN AMERICAN BALANCE 99 52569 531 235839.163
ASIAN BALANCE 99 49897 504.010101 226080.112
CAUCASSIAN BALANCE 99 50635 511.464646 192363.394

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 38466.6128 2 19233.3064 0.08818806 0.91561289 3.0264659
Within Groups 64119701.6 294 218094.223

Total 64158168.2 296


Please refer sheet ANS 6 ETHNICITY VS BALANCE in excel file.
Q7 : A general principle that credit card companies often follow is to assign a higher credit limit
to people with a higher credit rating. Does the data show that this principle is being followed?

The Correlation Coefficient of Rating vs Credit Limit is 0.99 which is an indication of having
a very high correlation. It shows enough evidence that people with higher rating have higher
credit limit.

The scatter plot also portrays the same trend.

Rating Limit
Rating 1
Limit 0.99687974 1

RATING VS CREDIT LIMIT


16000

14000

12000

10000

8000

6000

4000

2000

0
0 200 400 600 800 1000 1200
Please refer sheet ANS 7 RATING VS LIMIT in excel file.

Q8 : Run a simple linear regression of balance on the credit limit. (Here credit limit is the X and
the balance is the Y). Report the coefficients and the R-squared. Show a scatter plot.

Co-efficient - 0.1716
R Square - 0.74

P value is significantly less than 0.05

As per regression test, we can see that with each unit increase in limit, the balance will increase
by 0.1716, which is the coefficient of X.
Also, from R Square value, we can conclude, that there are 74% chances of Balance being affected
by Limit which is a significant number.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.861697267
R Square 0.74252218
Adjusted R Square 0.741875251
Standard Error 233.5849982
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62624255.25 62624255.3 1147.764214 2.5306E-119
Residual 398 21715656.66 54561.9514
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -292.7904955 26.68341452 -10.972752 1.18415E-24 -345.2485494 -240.3324415 -345.24855 -240.33244
Limit 0.171637278 0.005066234 33.878669 2.5306E-119 0.161677354 0.181597203 0.16167735 0.1815972

LIMIT VS BALANCE
2500
y = 0.1716x - 292.79
2000 R² = 0.7425

1500

1000

500

0
0 2000 4000 6000 8000 10000 12000 14000 16000
-500

Please refer sheet ANS 8 LIMIT VS BALANCE in excel file.

Q9 : Run a simple linear regression of balance (Y) on credit rating (X). Report the coefficients
and R-squared. Show a scatter plot.

Co-efficient - 2.566
R Square - 0.74

P value is significantly less than 0.05

As per regression test, we can see that with each unit increase in Rating, the Balance will increase
by 2.566, which is the coefficient of X.
Also, from R Square value, we can conclude, that there are 74% chances that Balance is
significantly affected by the change in Rating.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863625161
R Square 0.745848418
Adjusted R Square 0.745209846
Standard Error 232.0713048
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 62904789.88 62904789.88 1167.994581 1.8989E-120
Residual 398 21435122.03 53857.09053
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -390.8463418 29.06851463 -13.44569362 3.07318E-34 -447.993365 -333.69932 -447.99336 -333.69932
Rating 2.566240327 0.075089102 34.1759357 1.8989E-120 2.418619483 2.71386117 2.41861948 2.71386117

RATING VS BALANCE
2500
y = 2.5662x - 390.85
2000 R² = 0.7458

1500

1000

500

0
0 200 400 600 800 1000 1200
-500

Please refer sheet ANS 9 RATING VS BALANCE in excel file.

Q10 : Consider your findings in questions 8-9. Discuss business mechanisms to increase or
decrease the balance on credit cards. Try to quantify your answers.

From the tests we performed in questions 8 & 9, we found that Credit Limit and Credit
Rating both have significant effect on Credit Balance. The R square for both the cases is 0.74, so
we have evidence to say that there are 74% chances that these data are fitting into the regression
model, which is a considerably high value. The coefficient of Credit Limit is lesser than the
coefficient of Credit Rating (0.1716 < 2.566) ; hence we can say, that the balance will be much
more higher if we are increasing the Credit Rating instead of Credit Limit. Similarly, if we are
reducing the Credit Rating, the reduction in Credit Balance will be more.
So, the credit card companies must look for individuals with higher credit ratings. Because
our data analysis shows that such individuals are likely to have higher Credit Balance.

Also, credit card companies might think of increasing Credit Limit of its existing customers,
without any changes in Credit Rating. Although, the impact won’t be as high as in the case of
customers with high Credit Rating.

Q11 : The credit limit is provided as a consolidated amount for all the credit cards the
cardholder has. Run a multiple linear regression of Balance (Y) on Limit and Cards as two X
variables. Report the coefficients. Discuss the effect on the balance of (a) increasing the credit
limit on the same number of cards and (b) increasing the number of cards without altering the
total credit limit.

Co-efficient of Limit - 0.171


Co-efficient of Cards - 26.033

P value is significantly less than 0.05 for both

As per regression test, we can see that with each unit increase in Limit, the Balance will increase
by 0.171 units, when number of cards is constant.
And, with each unit increase in number of Cards, the Balance will increase by 26.033 units, when
the Limit is constant

In both the cases, the balance will increase but the amount of increase will be more when the
number of cards is increasing. Because, the coefficient of number of cards is significantly high.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.865188295
R Square 0.748550786
Adjusted R Square 0.74728404
Standard Error 231.1247525
Observations 400

ANOVA
df SS MS F Significance F
Regression 2 63132707.37 31566353.7 590.923824 9.7585E-120
Residual 397 21207204.54 53418.6512
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -369.0359554 36.16414657 -10.20447 7.2269E-22 -440.133128 -297.93878 -440.13313 -297.93878
Limit 0.171479037 0.005013136 34.2059386 2.002E-120 0.161623424 0.18133465 0.16162342 0.18133465
Cards 26.03375427 8.438363509 3.08516625 0.00217682 9.444290848 42.6232177 9.44429085 42.6232177
Please refer sheet ANS 11 MULTIPLE LINEAR REGRESSION in excel file.
Q12 : Run a simple linear regression equation with Income as X and Balance as Y. Report the
coefficients. Is the coefficient of Income significantly different from zero? What does this say
about the effect of income on balance?

Co-efficient - 6.04

H0 : Coefficient of Income is 0
H1 : Coefficient of Income is significantly different from 0

P value is significantly less than 0.05

Hence, we can reject H0.

So, we have evidence to conclude that coefficient of income is significantly different from 0.

So, income has effect on balance. With every unit increase in Income, the balance will increase
by 6.04 units.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.463656457
R Square 0.21497731
Adjusted R Square 0.213004891
Standard Error 407.8647195
Observations 400

ANOVA
df SS MS F Significance F
Regression 1 18131167.4 18131167.4 108.991715 1.03089E-22
Residual 398 66208744.51 166353.6294
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 246.5147506 33.19934735 7.425289058 6.9034E-13 181.2467485 311.782753 181.246749 311.782753
Income 6.048363409 0.579350163 10.43990973 1.0309E-22 4.909394402 7.18733242 4.9093944 7.18733242
INCOME VS BALANCE
2500

y = 6.0484x + 246.51
2000 R² = 0.215

1500

1000

500

0
0 20 40 60 80 100 120 140 160 180 200

Please refer sheet ANS 12 INCOME VS BALANCE excel file.

Q13 : Based on the equation derived in question 12, what is the estimated balance for a person
with an income of USD 100k per year?

Equation : Y = aX + b ; Where Y = Estimated Balance, X = Income, a = Coefficient of Income,


b = constant

= 6.0484*100,000+246.51
= 65,086.51

Therefore, the estimated balance is $65,086.51.

Q14 : Based on the dataset, explore the relationship between credit card balance (Y) and (a)
Income (b) Age (c) Education (c) Limit, and (d) Rating as X variables? Estimate a multiple linear
regression model and report the statistical significance of each of these variables.

Multiple Regression Model here is

Y = -7.60X1 - 0.86X2 + 1.96X3 + 0.07X4 + 2.77X5 - 473.25

Where Y = Balance, X1 = Income, X2 = Age, X3 = Education, X4 = Limit, X5 = Rating


The P value of Income and Rating are less than 0.05, hence these are statistically significant and
has effect on Balance

The P value of Age, Education and Limit are higher than 0.05, hence these are statistically
insignificant, so we are ignoring these parameters

With every unit increase in Income, the balance will decrease by 7.60 units if all the other
parameters are constant.

With every unit increase in Rating, the balance with increase by 2.77 units if other parameters
are constant.

R square value is 0.87, so there are 87% of chances that these parameters have significant effect
on Balance

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.936702578
R Square 0.87741172
Adjusted R Square 0.875856031
Standard Error 161.9917647
Observations 400

ANOVA
df SS MS F Significance F
Regression 5 74000827.17 14800165.43 564.0020686 4.5908E-177
Residual 394 10339084.74 26241.33183
Total 399 84339911.91

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -473.2514026 55.10833546 -8.587655545 2.08837E-16 -581.5945666 -364.9082387 -581.59457 -364.90824
Income -7.608832003 0.381931562 -19.92197755 1.37077E-61 -8.359710677 -6.85795333 -8.3597107 -6.8579533
Age -0.860030445 0.478700493 -1.796594023 0.073165937 -1.801157147 0.081096257 -1.8011571 0.08109626
Education 1.967791521 2.605290902 0.755305874 0.450516748 -3.154218733 7.089801776 -3.1542187 7.08980178
Limit 0.07901642 0.044791005 1.764113581 0.078487737 -0.009042839 0.167075679 -0.0090428 0.16707568
Rating 2.773843725 0.667079559 4.158190261 3.93909E-05 1.462363177 4.085324273 1.46236318 4.08532427
Please refer sheet ANS 14 MULTIPLE VARIABLES excel file.

You might also like