Computer Project MAS291 SE150263

Computer Project MAS291
Subsample
id<- 263
set.seed(id)
wage <- read.csv("wage.csv")
yourdata<-wage[sample(1:nrow(wage),30),]
educ age wage IQ exper lwage black
1917 14 27 265 75 7 5.57973 1
5.65248
161 17 27 285 121 4 0
9
6.70563
1592 12 34 817 93 16 0
9
6.27098
1589 16 27 529 108 5 1
8
6.33682
2113 16 24 565 NA 2 0
6
6.32435
2835 12 28 558 90 10 0
9
6.05912
1467 12 31 428 NA 13 1
3
6.62406
2566 12 29 753 NA 11 1
5
1627 11 25 700 93 8 6.55108 0
5.57215
2994 12 27 263 100 9 0
4
5.29831
2957 10 25 200 NA 9 1
7
6.55819
956 11 26 705 89 9 0
8
6.05678
606 9 24 427 NA 9 0
4
2128 17 25 442 NA 2 6.09131 0
6.16331
223 12 26 475 104 8 0
5
7.22548
1064 12 33 1374 NA 15 0
2
6.35784
2350 15 27 577 93 6 0
2
5.89989
1946 12 31 365 70 13 1
8
5.70378
227 13 26 300 122 7 0
3
7.17701
2202 18 33 1309 NA 9 0
9
6.56244
634 12 33 708 101 15 0
4
5.78382
2025 12 30 325 86 12 1
5
5.52146
1712 14 27 250 NA 7 1
1
6.13556
655 18 25 462 107 1 0
5
6.47697
1224 13 32 650 99 13 0
3
5.89715
1780 13 29 364 80 10 1
4
6.86171
1508 17 29 955 132 6 0
2
6.46146
829 12 24 640 NA 6 1
8
6.32793
2765 17 25 560 116 2 0
7
6.13556
1767 13 24 462 NA 5 0
5
Topic: Probability
Ex1
X: “Choose a black”
Number of black in data: 703
703 C 6∗2297 C 24
P(X=6) = 3010C 30
~~ 0
Ex 2
X: “Choose a black”
Number of black in subsample: 10
The probability that there is at least one black:
20 C 5
P(X≥1) = 1 – P(X=0) = 1 - 30 C 5 = 0.891
Topic: Discrete Random Variables and Probability Distribution

Ex3
Number of black in data: 703
703
p = 3010 = 0.233
a. E(X) = n.p = 30 * 0.233 = 7

b. Number of blacks in data: 703, number of blacks in subsample: 10
Ex4
X: “Number of education”
μX = λ = 12.3
e−λ x
P(X = 6) = ∗λ = 0.022
x!
Ex5
24 +34
E(X) = 2
= 29
(34−24)2−1
V(X) = = 8.25
12
Statistics from subsample:

Age Frequenc
e
24 4
25 5
26 3
27 6
28 1
29 3
30 1
31 2
32 1
33 3
34 1
 Not uniform distribution
Topic: Continuous Random Variables and Probability Distribution

Ex6
X: “lwage”
μ = 6.26
σ = 0.44
P(5<X<6) = P(X<6) – P(X<5)
6−6.26 5−6.26
= P(Z< 0.44
) - P(Z< 0.44
)
= 0.2776-0.002
= 0.2774
Ex 7
μ = 99.04
σ = 20.2
a. P(100<X<110) = P(X<110) – P(X<100)
110−99.04 100−99.04
= P(Z< ) - P(Z< )
20.2/ √ 20 20.2/ √ 20
= P(Z<2.43) – P(Z<0.21)
= 0.9925 – 0.5832
= 0.4093
σ
b. n = E = 20.2
 number of sample must be 21 if we want the standard error of sample
mean to be 1
Ex 8
- Statistics for wage:
R Code:
Minimum 200
Maximum 1374
1.
364.25
Quartile
3.
687.5
Quartile
Mean 557.1
Median 502
Variance 79733.4724
Stdev 282.371161
- Statistics for IQ
NAs 11
Minimum 70
Maximum 132
1.
89.5
Quartile
3.
107.5
Quartile
Mean 98.894737
Median 99
Variance 266.766082
Stdev 16.332975
- Statistics for education
Minimum 9
Maximum 18
1.
12
Quartile
3.
15.75
Quartile
Mean 13.466667
Median 12.5
Variance 6.050575
Stdev 2.459792
- Statistics for experiment
Minimum 1
Maximum 16
1.
6
Quartile
3.
10.75
Quartile
Mean 8.3
Median 8.5
Variance 16.493103
Stdev 4.06117
Topic: Sampling Distributions and Point Estimation of Paramaters

Ex 9
X: “IQ”
mean = x̅ = 98.894
s= 16.33
variance = s2 = 266.766
Ex 10
X: “lwage”
Mean = x̅ = 6.217
95% confidence interval:
σ σ
x̅ - Zα/2* ≤ μ ≤ x̅ +Zα/2*
√n √n
0.44 0.44
6.217 - 1.96 * ≤ μ ≤ 6.217 + 1.96 *
√ 30 √ 30
6.0595 ≤ μ ≤ 6.3745
Ex11
X: “IQ”
mean = x̅ = 98.894
standard diviation = s = 16.33
n = 19 (11 of 30 are NA)
s s
x̅ - tα/2, 18* ≤ μ ≤ x̅ + tα/2, 18*
√n √n
16.33 16.33
98.894 - 2.214 * ≤ μ ≤ 98.894+ 2.214 *
√1 9 √19
90.6 ≤ μ ≤ 107.1884
Ex 12
Z α /2∗σ 2 1.96∗0.44 2
n=( E
) = ( 0.2
) = 18.59
 sample size should be used: 19

Ex 13
Number people who near 4-year college: 2053
2053
 ṕ = 3010 = 0.682
ṕ∗(1−ṕ) ṕ∗(1−ṕ)
ṕ - Zα/2*
√ n
≤ p ≤ ṕ + Zα/2*
√ n
0.682∗0.318 0.682∗0.318
0. 682– 2.576*
√ 3010
≤ p ≤ 0. 682+ 2.576*
√ 3010
0.6602 ≤ p ≤ 0.7038
Ex 14
Number of blacks = 703
703
 ṕ = 3010 = 0.23355
Z α /2 2 2.576 2
n=( E
) * ṕ∗(1−ṕ ) = (
0.01
) * 0.23355∗0.76645 = 11878.33
  sample size should be used: 11879

Topic: Test of Hypotheses for a Single Sample
Ex 15
H0: mean(IQ) = 100
H1: mean(IQ) ≠ 100
mean = x̅ = 98.894
n = 19 (11 of 30 are NA)
x́−μ 98.894−100
Test statistics = σ = 15 = -0.321 < Zα/2 = 2.33
√n √19
 Fail to reject H0
Ex 16
H0: mean(lwage) = 6
H1: mean(lwage) > 6
mean = x̅ = 6.212
standard diviation = s = 0.473
x́−μ 6. 212−6
Test statistics = s = 0.47 3 = 2.455 > t α, n -1 = 1.31
√n √ 30
 Reject H0
Ex 17
H0: p = 0.07
H1: p > 0.07
Number of people with less than 10 years of work experience = 1850
1850
 ṕ = 3010 = 0.615
ṕ− p 0.615−0.07
Test statistics = ṕ∗(1−ṕ ) = 0.615∗0.385 = 61.4485 > Zα = -2.05
√ n √ 3010
 Reject H0
Ex 18
a.
H0: Δ = 0
H1: Δ # 0
X1: black
X2: not black
x́ 1 = 411.9, n1 = 10
x́ 2 = 629.7, n1 = 20
s1 = 178.6023
s2 = 299.9067
( n 1−1 )∗s 12+ ( n2−1 )∗s 22 ( 10−1 )∗178.60232 + ( 20−1 )∗299.90672
sp =
266.9955
√ n 1+n 2−2
=
√ 28
=
x́ 1−x́ 2−Δ
Test statistics: t0 = s p 2 s p2 = -2.106
√ +
n 1 n2
- t0.005, 29 < t0 < t0.005, 29 = 2.76
 Fail to reject H0
b.
H0: p1 – p2 = 0
H1: p1 – p2 # 0
X1: black
X2: not black
n1 = 5, x1(number of people has IQ<90) = 4

n2 = 14, x1(number of people has IQ<90) = 1
(11 of 30 are NA)
 p1 = 0.8
p2 = 0.0714
x 1+ x 2
ṕ = n 1+ n 2 = 0.126315
Test statistics:
p 1− p 2
Z0 = 1 1 = 4.24 > Z0.025 = 1.96
 Reject H0
√ ṕ∗( 1−ṕ )∗( + )
n1 n2
Ex 19:
x: “education”
y: “wage”
Σx = 404 Σx
2
= 5616 Σ xy =
228219
Σy = 16713 = Σy
2
11623083
( Σx)2
Sxx = Σ x 2
- = 175.4667
n
Σx∗Σy
Sxy = Σ xy - n
= 3150.6
Sxy 3150.6
a. Slope = β1 = Sxx = 175.4667 = 17.955
Intercept = β0 = ȳ - β1* x̅ = 315.2986

2
( Σy)
SST = Σ y 2 - = 2312271
n
SSE = SST - β1* Sxy = 2255700

SS E 2255700
σ2 = n−2 = 28
= 80560.71
σ2
Se(β1) =
√ Sxx
= 21.42
β1
b. Test statistics = Se (β 1) = 0.83 < t α/2, n -2 = 1.7
Fail to reject H0
1∗S xy
c. coefficient of determination = R2 = β SS T = 0.024465
meaning: a measure that assesses the ability of a model to predict or explain

an outcome in the linear regression setting.
Ex 20:
x: “education”
y: “lwage”
Σx = 404 Σx
2
= 5616 Σ xy =
2515.057
Σy = 189.183 Σy =
2
1164.322
( Σx)2
Sxx = Σ x 2
- = 175.4667
n
Σx∗Σy
Sxy = Σ xy - n
= 5.240778
Sxy 5.240778
d. Slope = β1 = Sxx = 175.4667 = 0.0298
Intercept = β0 = ȳ - β1* x̅ = 5.81

 Regression line: y = 0.0298x+ 5.81
SST = 6.49
SSE = 6.34
1∗S xy
coefficient of determination = R2 = β SS T = 0.024087
 Use wage is better to predict because

R2(use lwage) < R2 (use wage)

Computer Project MAS291 SE150263

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Project MAS291 SE150263

Uploaded by

Copyright:

Available Formats

Computer Project MAS291

Topic: Discrete Random Variables and Probability Distribution

a. E(X) = n.p = 30 * 0.233 = 7

Statistics from subsample:

Topic: Continuous Random Variables and Probability Distribution

- Statistics for education

- Statistics for experiment

Topic: Sampling Distributions and Point Estimation of Paramaters

 sample size should be used: 19

  sample size should be used: 11879

n1 = 5, x1(number of people has IQ<90) = 4

Intercept = β0 = ȳ - β1* x̅ = 315.2986

SSE = SST - β1* Sxy = 2255700

meaning: a measure that assesses the ability of a model to predict or explain

Intercept = β0 = ȳ - β1* x̅ = 5.81

 Use wage is better to predict because

You might also like