Project/ Assignment-1 (PA1) 1. Credit Risk (Excel File) or Credit - Sav

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Project/ Assignment-1 (PA1)

1. Credit risk (Excel file) or credit.sav


We can do a classification of credit data to identify whether a person will
return the loan or not.
The following classification analysis we did in lab using a Random trees
classifier.

2. POS
We can do a market basket analysis of POS data to identify the products that
get sold together. Which is a highly demanded field in marketing.
The following analysis we did in lab using basket analysis.

5 group(s) of associated products were found.

group #1 contains 6 products


1
p_value: 0.000000 (log = -594.488269); support 127
"01D"
"01C"
"01B"
"01I"
"02B"
"01K"

group #2 contains 3 products


p_value: 0.000000 (log = -252.174180); support 126
"06K"
"02C"
"02D"

group #3 contains 2 products


p_value: 0.000000 (log = -79.178433); support 148
"02L"
"01X"

group #4 contains 2 products


p_value: 0.000000 (log = -77.613141); support 129
"09C"
"01L"

group #5 contains 2 products


p_value: 0.000000 (log = -71.941401); support 119
"03X"
"02P"

3. IRIS.sav
We can do the classification of iris data to identify the flower species that a
record belongs to.

2
We used a CRT classifier to classify this dataset.
The major problem with neural network and linear regression was that it didn’t
fit the data set properly.

4. Predicting Wages
We can use regression and neural network to prdict the log of avg hourly
earnings.
a) Regression Model.
Variables Entered/Removed

Model Variables Entered Variables Removed Method

3
1 Stepwise (Criteria: Probability-
of-F-to-enter <= .050,
Education (years) .
Probability-of-F-to-remove
>= .100).
2 Stepwise (Criteria: Probability-
of-F-to-enter <= .050,
Age in years .
Probability-of-F-to-remove
>= .100).
3 Stepwise (Criteria: Probability-
Years of labor market of-F-to-enter <= .050,
.
experience (AGE-ED-6) Probability-of-F-to-remove
>= .100).
4 Stepwise (Criteria: Probability-
Squared years of labor market of-F-to-enter <= .050,
.
exper Probability-of-F-to-remove
>= .100).

Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) .984 .075 13.162 .000

Education (years) .069 .006 .345 12.078 .000


2 (Constant) .360 .089 4.033 .000
Education (years) .078 .005 .390 14.318 .000
Age in years .014 .001 .312 11.463 .000
3 (Constant) -1.281 .165 -7.785 .000
Education (years) -.242 .028 -1.204 -8.614 .000
Age in years .328 .027 7.375 12.106 .000
Years of labor market
-.315 .027 -7.469 -11.604 .000
experience (AGE-ED-6)
4 (Constant) -1.304 .163 -8.006 .000

Education (years) -.240 .028 -1.194 -8.632 .000

Age in years .321 .027 7.219 11.958 .000

Years of labor market


-.290 .027 -6.875 -10.603 .000
experience (AGE-ED-6)

Squared years of labor


.000 .000 -.456 -4.894 .000
market exper

Analysis are as follows:

Minimum Error -2.326


Maximum Error 2.0

4
Mean Error -0.0
Mean Absolute Error 0.349
Standard Deviation 0.449
Linear Correlation 0.563
Occurrences 1,085
B) Neural network

Analysis are as follows:

Minimum Error -2.355


Maximum Error 1.692
Mean Error -0.023
Mean Absolute Error 0.305
Standard Deviation 0.395
Linear Correlation 0.689
Occurrences 1,085

5. Housing
We can use regression and neural networks to predict the Median Value of
owners-occ homes.

A) Regression Model
Variables Entered/Removed

Model Variables Entered Variables Removed Method

5
1 % lower status of the population,
Charles River connection dummy,
Proportion of blacks per town --
transformed, Pupil-teacher ratio by
town, Proportion of residential
land zoned 25k+, Per Capita
Crime Rate, Average # of rooms
per dwelling, Proportion of non- . Enter
retail bus acres per town,
Proportion of owner-occ dwellings
before 1940, Access index to
radial hwys, Wtd dist to five
Boston employment ctrs, Nitric
oxides concentration pp 10M, Full-
value prop-tax rate per $10Kb

b. All requested variables entered.

Coefficients

Standardized
Unstandardized Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 36.459 5.103 7.144 .000

Per Capita Crime Rate -.108 .033 -.101 -3.287 .001

Proportion of residential land


.046 .014 .118 3.382 .001
zoned 25k+
Proportion of non-retail bus
.021 .061 .015 .334 .738
acres per town

Charles River connection


2.687 .862 .074 3.118 .002
dummy

Nitric oxides concentration


-17.767 3.820 -.224 -4.651 .000
pp 10M

Average # of rooms per


3.810 .418 .291 9.116 .000
dwelling

Proportion of owner-occ
.001 .013 .002 .052 .958
dwellings before 1940

Wtd dist to five Boston


-1.476 .199 -.338 -7.398 .000
employment ctrs

Access index to radial hwys .306 .066 .290 4.613 .000

Full-value prop-tax rate per


-.012 .004 -.226 -3.280 .001
$10K

Pupil-teacher ratio by town -.953 .131 -.224 -7.283 .000

6
Proportion of blacks per
.009 .003 .092 3.467 .001
town -- transformed

% lower status of the


-.525 .051 -.407 -10.347 .000
population

Minimum Error -13.295


Maximum Error 12.505
Mean Error 0.054
Mean Absolute Error 2.299
Standard Deviation 3.057
Linear Correlation 0.943
Occurrences 506

B) Neural Networks

Minimum Error -15.594


Maximum Error 26.199
Mean Error -0.0
Mean Absolute Error 3.271
Standard Deviation 4.684
Linear Correlation 0.861
Occurrences 506

You might also like