Stern's MBA 1 Students Expect To Make The Big Bucks After Graduation!

Sterns MBA 1 students expect to make the big bucks after graduation!
Data File: salary.doc

Sterns school of business is a very reputable university. Students generally attend
this school not only to increase their knowledge base of business but also to increase their
salary. I am interested in finding out what leads MBA 1 student to believe that they going
to earn a certain salary after graduation. In other word, what factors affect the expected
salary of students? As I could not find enough meaningful data on the topic, I decided to
conduct my own survey. I was actually very surprised to see how cooperative students
were in filling my survey when I gave them candies in return!
Here are some descriptive statistics for all 40 MBA 1 surveyed. Variables include
expected salary after graduation (expected), age of the person (age), the number of year
of working experience in the chosen industry after graduation (num.of), if the person
plans to work in finance (plan fin), if the person plans to work in consulting (plan con), if
the person plans to work in marketing (plan mar), if the person plans to work in other
industries (plan oth), if the person is unsure about the industry (unsure), if the person
plans to work for a Fortune 500 Company (500 comp), the number of hours he/she plans
to work a week, and his/her past salary (past sal).
Descriptive Statistics
VariableNMeanMedianTrMeanStDevSEMean
Expected40842508000083333172132722
age4026.82526.00026.6942.4900.394
num.of401.7250.0001.5002.4490.387
planfin400.67501.00000.69440.47430.0750
plancon400.25000.00000.22220.43850.0693
planmar400.07500.00000.02780.26670.0422
planoth400.000000.000000.000000.000000.00000
unsure400.000000.000000.000000.000000.00000
500comp400.50000.50000.50000.50640.0801
Hrs.of4070.6270.0070.4212.571.99
pastsal40542504850051667206643267
VariableMinMaxQ1Q3
Expected600001250007000097500
age23.00033.00025.00029.000
num.of0.0008.0000.0003.000
planfin0.00001.00000.00001.0000
plancon0.00001.00000.00000.7500
planmar0.00001.00000.00000.0000
planoth0.000000.000000.000000.00000
unsure0.000000.000000.000000.00000
500comp0.00001.00000.00001.0000
Hrs.of50.0095.0060.0080.00
pastsal300001250004000060000
At a first look at the data, there are different things happening. None of the 40
students claim to be unsure about which industry they want to go in after graduation and
none of them are planning to work in an industry other than finance, consulting or
marketing. Moreover, the average expected salary is $84,250 is much higher than the
actual salary of graduating MBA students in 1996 of $70,000 (footnote). This
discrepancy could mean that MBA 1 students are rather optimistic.
Salaries are often right tailed. Lets check the distribution of both expected and
past salaries.
10
8
6
Frequency
Frequency
7
5
4
3
2
1
0
60000 70000 80000 90000 100000110000120000130000
0
30000 40000 50000 60000 70000 80000 90000 100000
Expected salary
past salary
These distributions of salaries are right tailed. Thus, it might be helpful to log
salaries.
10
Frequency
Frequency
6
5
4
3
2
1
0
0
4.8
4.9
5.0
5.1
4.48 4.53 4.58 4.63 4.68 4.73 4.78 4.83 4.88 4.93 4.98
log expected
log past
Lets check potential outliers in both expected and past salaries:
130000
70000
120000
Expected salary
past salary
120000
110000
100000
90000
80000
70000
20000
60000
130000
130000
120000
120000
Expected salary
Expected salary
There are apparently 3 outliers in the past salaries observations.

Now, lets look at the distribution of the different industries students plan to work in.
110000
100000
90000
80000
70000
110000
100000
90000
80000
70000
60000
60000
0
plan fin.
plan cons.
130000
Expected salary
120000
110000
100000
90000
80000
70000
60000
0
plan mark.
There are only 2 students out of 40 who plan to work in the marketing industry. This
variable apparently has a low significance. Students who plan to work in finance are
coded by 1. It is interesting to see that students who plan to work in finance and the ones
who do not actually have the same median expected salary ($80,000). It looks like
students who are planning to go in finance and those who are planning to go to consulting
are negatively correlated.
Regression Analysis
*planmark.ishighlycorrelatedwithotherXvariables
*planmark.hasbeenremovedfromtheequation
*planotherhasallvalues=0
*planotherhasbeenremovedfromtheequation
*unsurehasallvalues=0
*unsurehasbeenremovedfromtheequation
Theregressionequationis
Expectedsalary=44885+3352age+647num.ofyrs.ofexp.
2591planfin.+4025plancons.+3171500comp?
+432Hrs.ofwork+0.125pastsalary
PredictorCoefStDevTPVIF
Constant44885180682.480.018
age3351.8849.53.950.0003.6
num.of647.5514.71.260.2171.3
planfin259150820.510.6144.6
plancon402553440.750.4574.4
500comp317128871.100.2801.7
Hrs.of431.6122.23.530.0011.9
pastsal0.125060.096841.290.2063.2
S=6999RSq=86.4%RSq(adj)=83.5%
AnalysisofVariance
SourceDFSSMSFP
Regression79988126411142687520229.130.000
Error32156737358948980425
Total3911555500000
SourceDFSeqSS
age18185071089
num.of175586010
planfin1531542752
plancon1158051549
500comp184132640
Hrs.of1872058157
pastsal181684215
UnusualObservations
ObsageExpectedFitStDevFitResidualStResid
1625.070000865413156165412.65R
3427.01200001013123493186883.08R
Rdenotesanobservationwithalargestandardizedresidual
DurbinWatsonstatistic=1.64
We can see that Minitab directly get rid of 3 variables. These variables are students
planning to work in marketing, planning to work in other industries and students who are
unsure about for which industry they will be working. I would also remove students who
plan consulting because it is highly negatively correlated with students who plan finance.
The overall regression is statistically significant. However, some variables have P-values
over.05.
Residuals Versus the Fitted Values
(response is Expected)
Residuals Versus the Order of the Data

3
Standardized Residual
3
2
1
0
-1
-2
2
1
0
-1
-2
-3
-3
5
10
15
20
25
Observation Order
30
35
40
55000
65000
75000
85000
95000
Fitted Value
105000
115000
125000
Normal Probability Plot of the Residuals
Histogram of the Residuals
10
2
1
Frequency
0
-1
-2
-3
-2
-1
Normal Score
-3
-2
-1
The distribution of the residual looks normal. However, we can notice couples of outliers.
Now, lets try a regression with logged salaries for past and expected while keeping the
same variables.
Regression Analysis
*planmark.ishighlycorrelatedwithotherXvariables
*planmark.hasbeenremovedfromtheequation
*planotherhasallvalues=0
*planotherhasbeenremovedfromtheequation
*unsurehasallvalues=0
*unsurehasbeenremovedfromtheequation
logexpected=4.26+0.0178age+0.00289num.ofyrs.ofexp.
0.0137planfin.+0.0190plancons.+0.0125500comp?
+0.00202Hrs.ofwork+0.000001pastsalary
Constant4.260540.0896747.520.000
age0.0177590.0042164.210.0003.6
num.of0.0028870.0025541.130.2671.3
planfin0.013660.025220.540.5924.6
plancon0.018990.026520.720.4794.4
500comp0.012510.014330.870.3891.7
Hrs.of0.00202150.00060633.330.0021.9
pastsal0.000000570.000000481.180.2463.2
S=0.03473RSq=86.2%RSq(adj)=83.2%
AnalysisofVariance
SourceDFSSMSFP
Regression70.2411620.03445228.560.000
Error320.0386000.001206
Total390.279762
SourceDFSeqSS
age10.201723
num.of10.001471
planfin10.012164
plancon10.003731
500comp10.001379
Hrs.of10.019007
pastsal10.001687
UnusualObservations
ObsagelogexpeFitStDevFitResidualStResid
1625.04.845104.926680.015660.081582.63R
3427.05.079184.997670.017330.081512.71R
Logging the salaries does not change the regression model significantly. Thus, I will keep
the antilog data.
Lets see how the regression looks without the plan consulting variable.
Regression Analysis
Expectedsalary=46346+3477age+574num.ofyrs.ofexp.
5884planfin.+2553500comp?+466Hrs.ofwork
+0.113pastsalary
Constant46346178462.600.014
age3476.8827.64.200.0003.4
num.of573.8501.91.140.2611.2
planfin588425732.290.0291.2
500comp255327500.930.3601.6
Hrs.of465.9112.64.140.0001.6
pastsal0.113030.094891.190.2423.1
S=6953RSq=86.2%RSq(adj)=83.7%
AnalysisofVariance
SourceDFSSMSFP
Regression69960345863166005764434.340.000
Error33159515413748338004
Total3911555500000
SourceDFSeqSS
age18185071089
num.of175586010
planfin1531542752
500comp122521046
Hrs.of11077031091
pastsal168593876
UnusualObservations
1033.0110000111467504914670.31X
1625.070000864103131164102.64R
3427.01200001011243461188763.13R
XdenotesanobservationwhoseXvaluegivesitlargeinfluence.
34
Frequency
7
6
5
4
3
2
1
1
0
-1
-2
0
-3
-2
-1
16
-3
55000
65000
75000
85000
95000
105000
115000
125000
Fitted Value

34
2
1
0
-1
-2
16
-3
-2
-1
Normal Score
Not the outliers are still here. Now, lets run a best subset regression to find out what variables are best to
choose for our model.
Best Subsets Regression
ResponseisExpected
p5p
nl0Ha
ua0rs
mnst
.c.
afos
RSqgoimoa
VarsRSq(adj)CpSefnpfl
170.870.133.79417.8X
157.856.764.811325X
281.280.210.97660.7XX
275.574.124.68752.2XX
384.783.44.67012.4XXX
383.281.88.17334.3XXX
485.383.65.26971.0XXXX
485.283.55.57000.3XXXX
585.883.85.96938.4XXXXX
585.683.56.36983.8XXXXX
686.283.77.06952.6XXXXXX
My choice is between the two possibilities in bold. One reason is that they have small S
and relatively high R-sq. Another reason is that C-p should be approximately P+1 =6.
Thus, I picked the one that has a C-p of 5.5 and a S of 7000.3. Here is the new regression:
Regression Analysis
Expectedsalary=57732+4034age6828planfin.+0.0975past
salary
+468Hrs.ofwork
Constant57732164153.520.001
age4034.1753.25.360.0002.8
planfin682823922.850.0071.0
pastsal0.097550.091981.060.2962.9
Hrs.of468.5111.74.190.0001.6
S=7000RSq=85.2%RSq(adj)=83.5%
AnalysisofVariance
SourceDFSSMSFP
Regression49840364419246009110550.200.000
Error35171513558149003874
Total3911555500000
SourceDFSeqSS
age18185071089
planfin1536172676
pastsal1257832450
Hrs.of1861288204
UnusualObservations
829.01250001115642946134362.12R
1033.0110000115013449750130.93X
1625.070000873292846173292.71R
3427.01200001015453129184552.95R
3930.0100000107987444979871.48X
4031.0110000114851446648510.90X
Past salaries still have a P-value above .05. So, I decide to take this variable out of the
regression.
Regression Analysis
Expectedsalary=69455+4583age6843planfin.+501Hrs.ofwork
PredictorCoefStDevTP
Constant69455121575.710.000
age4583.5547.78.370.000
planfin684323962.860.007
Hrs.of500.9107.74.650.000
S=7012RSq=84.7%RSq(adj)=83.4%
AnalysisofVariance
SourceDFSSMSFP
Regression39785248786326174959566.330.000
Error36177025121449173645
Total3911555500000
SourceDFSeqSS
age18185071089
planfin1536172676
Hrs.of11064005021
DurbinWatsonstatistic=1.64noautocorrelation.Itconfirmsthe
residualvrsorderplot
UnusualObservations
829.01250001110472910139532.19R
1033.0110000116859415468591.21X
1625.070000877042829177042.76R
3427.01200001018803118181202.88R

3
34
-1
-2
16
-3
5
10
15
34
20
25
30
35
40
Observation Order
1
0
-1
-2
16
-3
50000
60000
70000
80000
90000
100000
110000
120000
Fitted Value

3
2
1
0
-1
-2
-3
-2
-1

Normal Score
10
Frequency
0
-3
-2
-1
Now we have a statistically significant model with P-value below .05. However, two outliers are still
visible in the residuals plots. We can try to get ride of these 2 oultiers (observation 34 and 16).
Regression Analysis
Constant65739100886.520.000
age4508.2474.39.510.0001.6
planfin667620713.220.0031.0
Hrs.of474.7100.64.720.0001.6
S=5742RSq=88.9%RSq(adj)=87.9%
AnalysisofVariance
SourceDFSSMSFP
Regression38941383570298046119090.410.000
Error34112082695632965499
Total3710062210526
SourceDFSeqSS
age17937259218
planfin1270764351
Hrs.of1733360001
UnusualObservations
829.01250001100912846149092.99R
1033.0110000116257341562571.36X
1324.070000804302798104302.08R
-1
-2
5
10
15
20
25
Observation Order
30
35

-1
-2
60000
70000
80000
90000
100000
110000
120000
Fitted Value

-1
-2
-2
-1
Normal Score
Regression Analysis
Constant6305088277.140.000
age4652.8415.511.200.0001.6
planfin455819082.390.0231.1
Hrs.of351.1094.813.700.0011.7
S=5004RSq=90.1%RSq(adj)=89.2%
AnalysisofVariance
SourceDFSSMSFP
Regression37482915093249430503199.630.000
Error3382616598825035333
Total368309081081
SourceDFSeqSS
age17066013767
planfin173551726
Hrs.of1343349601
UnusualObservations
933.0110000115070299650701.27X
3427.070000808401093108402.22R
-1
-2
60000
70000
80000
90000
100000
110000
120000
Fitted Value
Heteroscadasticity is nonconstant variance.
It appears that there is non-constant
variance the residuals versus
the
fitted
values.
But
to further explore that aspect, we
2
would have to do a Levenes test. Hopefully, the logged variables would take care of this.
1
Lets do a regression with logged
expected salaries.
-1
-2
10
15
20
Observation Order
25
30
35

-1
-2
-2
-1
Normal Score

7
6
Frequency
5
4
3
2
1
0
-2.0
-1.5
-1.0
-0.5
0.0
0.5
Lets check residuals, leverage points and cooks distance.

SRES5 HI5
COOK5
0.59872 0.079655
0.007756
0.27269 0.116828
0.002459
-0.013530.076361
0.000004
0.97156 0.103741
0.027315
-1.265100.044574
0.018667
1.15005 0.063633
0.022470
1.27035 0.105834
0.047753
1.99149 0.152316
0.178160
-1.265240.358517*
0.223670
-0.731940.143014
0.022351
1.0
1.5
2.0
-0.156380.058155
-1.584370.284480
-0.915330.063633
0.76627 0.063603
0.23355 0.128017
-1.449700.231559
-0.751440.054100
-0.391980.079616
1.54343 0.100642
-1.683070.092249
0.94590 0.060364
-0.013530.076361
0.35320 0.063603
-0.305420.095103
0.59163 0.084579
1.66542 0.197385
0.21357 0.105834
-1.202020.170243
0.04569 0.064749
0.42040 0.043466
0.37278 0.145178
1.33865 0.097529
-0.915330.063633
-2.220130.047736
-0.156380.058155
-0.381650.091066
0.38048 0.134487
0.000378
0.249509
0.014234
0.009971
0.002002
0.158323
0.008074
0.003323
0.066644
0.071967
0.014370
0.000004
0.002118
0.002451
0.008085
0.170528
0.001350
0.074111
0.000036
0.002008
0.005900
0.048414
0.014234
0.061771
0.000378
0.003648
0.005624
Leverage points should be less than 2.5*(p+1)/n, 2.5*(4+1)/37 =.35

One leverage point is about .35 (*).
Cooks distance should be less than 1, which is true.
Regression Analysis
logexp=4.18+0.0231age0.0256planfin.+0.00186Hrs.ofwork
Constant4.182040.0486885.920.000
age0.0230560.00229110.060.0001.6
planfin0.025600.010522.430.0211.1
Hrs.of0.00186430.00052283.570.0011.7
S=0.02759RSq=88.3%RSq(adj)=87.2%
AnalysisofVariance
SourceDFSSMSFP
Regression30.1889910.06299782.740.000
Error330.0251260.000761
Total360.214116
SourceDFSeqSS
age10.176882
planfin10.002427
Hrs.of10.009681
UnusualObservations
ObsagelogexpFitStDevFitResidualStResid
825.04.903094.851660.010770.051432.02R
933.05.041395.073400.016520.032011.45X
2025.04.778154.835390.008380.057242.18R
3427.04.845104.900150.006030.055052.04R
(response is logexp)
-1
-2
4.8
4.9
5.0
5.1
Fitted Value
It appears that there is still non-constant variance. The only reasonable thing to do now is
weighted least square.

-1
-2
5
10
15
20
Observation Order
25
30
35

-1
-2
-2
-1
Normal Score

8
7
Frequency
6
5
4
3
2
1
0
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0

Stern's MBA 1 Students Expect To Make The Big Bucks After Graduation!

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stern's MBA 1 Students Expect To Make The Big Bucks After Graduation!

Uploaded by

Copyright:

Available Formats

Sterns MBA 1 students expect to make the big bucks after graduation!

Data File: salary.doc

Lets check potential outliers in both expected and past salaries:

There are apparently 3 outliers in the past salaries observations.

Residuals Versus the Order of the Data

Normal Probability Plot of the Residuals

Histogram of the Residuals

Residuals Versus the Fitted Values

Normal Probability Plot of the Residuals

Best Subsets Regression

Residuals Versus the Order of the Data

Residuals Versus the Fitted Values

Normal Probability Plot of the Residuals

Histogram of the Residuals

Residuals Versus the Fitted Values

Normal Probability Plot of the Residuals

Residuals Versus the Order of the Data

Normal Probability Plot of the Residuals

Histogram of the Residuals

Lets check residuals, leverage points and cooks distance.

Leverage points should be less than 2.5*(p+1)/n, 2.5*(4+1)/37 =.35

Residuals Versus the Order of the Data

Normal Probability Plot of the Residuals

Histogram of the Residuals

You might also like

Leverage points should be less than 2.5(p+1)/n, 2.5(4+1)/37 =.35