Matlab Assignment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Matlab assignment, question 2:

(a) The data (US household income in 2012) is a population (it is a census) (it is not a sample).
Additionally the data is quantitative and continuous.

(b) From the histogram we can estimate the mean by taking the midpoint of each class (bin) as a
representative for that class.

Class intervals

Midpoint

$0 to $5,000
$5,000 to $9,999
$10,000 to $14,999
$15,000 to $19,999
$20,000 to $24,999
$25,000 to $29,999
$30,000 to $34,999
$35,000 to $39,999
$40,000 to $44,999
$45,000 to $49,999
$50,000 to $54,999
$55,000 to $59,999
$60,000 to $64,999
$65,000 to $69,999
$70,000 to $74,999
$75,000 to $79,999

2500
7500
12500
17500
22500
27500
32500
37500
42500
47500
52500
57500
62500
67500
72500
77500

Frequency
4,204
4,729
6,982
7,157
7,131
6,740
6,354
5,832
5,547
5,254
5,102
4,256
4,356
3,949
3,756
3,414

$80,000 to $84,999
$85,000 to $89,999
$90,000 to $94,999
$95,000 to $99,999
$100,000 to $104,999
$105,000 to $109,999
$110,000 to $114,999
$115,000 to $119,999
$120,000 to $124,999
$125,000 to $129,999
$130,000 to $134,999
$135,000 to $139,999
$140,000 to $144,999
$145,000 to $149,999
$150,000 to $154,999
$155,000 to $159,999
$160,000 to $164,999
$165,000 to $169,999
$170,000 to $174,999
$175,000 to $179,999
$180,000 to $184,999
$185,000 to $189,999
$190,000 to $194,999
$195,000 to $199,999
$200,000 to $205,000

82500
87500
92500
97500
102500
107500
112500
117500
122500
127500
132500
137500
142500
147500
152500
157500
162500
167500
172500
177500
182500
187500
192500
197500
202500

3,326
2,643
2,678
2,223
2,606
1,838
1,986
1,464
1,596
1,327
1,253
1,140
1,119
920
1,143
805
731
575
616
570
502
364
432
378
5,460

Now we use the mean formula:

which gives us
65843
Next, we find the median from the histogram. The total frequency (n) is 122459(odd number),
so the midpoint corresponds to the position:

which corresponds to the interval $50,000 to $54,999 so we can assume that center point of this
interval is the median:

(c) The discrepancy between the mean and the median is mainly caused by the outlier group
$80,000-$84,999. This group does not follow the general trend and its frequency is very high,
which caused the mean income to be higher than the median income. (the mean is not a robust
statistic so it is largely affected the outlier group $80,000-$84,999 which increased it)
(d) The median is a better representative of the average household income. Usually, household
income distribution is not symmetric and it often has outliers which are the individuals who
acquire very high incomes and the individuals who acquire very low incomes. So using a robust
statistic as a measure of the center is important. The median is a robust statistic (outliers don't
have that big effect on the median), so it is a better representative of the income than the mean
which is affected by outlier groups.

You might also like