Professional Documents
Culture Documents
Intro To Statistics CH 1
Intro To Statistics CH 1
Intro To Statistics CH 1
Statistics
Semester 2, 2012-13
Course Notes
Party code
1
2
3
4
Number of supporters
366
344
212
78
(ii) Take say 50 such manufactured items from the production line and
determine the lifetimes (in hours) of each of them. A sample of data
representing this scenario has been simulated and resulted in the following frequency table where each of the listed intervals is open on the
left and closed on the right:
Intervals
323.75 to
326.25 to
328.75 to
331.25 to
333.75 to
336.25 to
338.75 to
341.25 to
343.75 to
Totals
326.25
328.75
331.25
333.75
336.25
338.75
341.25
343.75
346.25
Frequencies
1
0
9
12
11
10
5
1
1
50
Percents
2
0
18
24
22
20
10
2
2
100
(iii) Obtain the gross income details (in units of a thousand pounds) of 500
adult males in Manchester who are working full-time. Such a sample
has been simulated and the data is summarized in the following table.
Again, each interval is open on the left and closed on the right.
Intervals
5 to 15
15 to 25
25 to 35
35 to 45
45 to 55
55 to 65
65 to 75
75 to 85
85 to 95
95 to 105
105 to 115
115 to 125
125 to 135
135 to 145
145 to 155
155 to 165
165 to 175
175 to 185
185 to 195
Totals
Frequencies
83
142
90
79
46
28
13
6
4
3
0
2
0
0
1
0
1
1
1
500
Percents
16.6
28.4
18.0
15.8
9.2
5.6
2.6
1.2
0.8
0.6
0.0
0.4
0.0
0.0
0.2
0.0
0.2
0.2
0.2
100.0
(iv) Repeat the laboratory experiment say 15 times, under the same conditions, and record the measurement of interest each time.
As we can see by the above examples, the nature of the data collected
can vary. In general, we have either qualitative or quantitative variables.
Qualitative variables are either nominal (such as the sex of a person or
the political party they support) or ordinal (such as the variable size with
categories small, medium and large). Quantitative variables are either
discrete (based on counting, for example) or continuous like the variables
income and lifetime in the above examples.
Sampling techniques used are probabilistic in nature in that members of
the population will be included in the sample with a certain probability so
that the actual composition of the final sample is random. If these samples
are representative of the populations from which they were drawn then the
information determined from them should enable us to say something about
4
n
Y
FX (xi )
i=1
for all possible {x1 , . . . xn } R(X1 ,...,Xn ) and where FX denotes the common
cdf of each Xi .
In the discrete case, the joint pmf of X1 , . . . , Xn under independence is
p(X1 ,...,Xn ) (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn ) =
n
Y
i=1
pX (xi )
for all possible {x1 , . . . xn } R(X1 ,...,Xn ) and where pX denotes the common
pdf of each Xi .
We have an an analogous result in the continuous case.
Suppose that the population is finite of size N and that we wish to obtain
a random sample of size n. Random sampling with replacement involves
at each draw selecting an item and then returning it to the population. Hence,
at any of the n draws all N members of the population have an equal chance
of being selected, no matter how often they have already been selected and
there are N n possible samples. In this scenario the Xi s are independent
and identically distributed.
Random sampling without replacement
requires that each of the Nn possible samples has an equal probability of
being selected. Such a sample is usually drawn one-at-a-time and at each
draw each item left in the population has an equal chance of being selected.
The Xi s in this case are actually dependent but the level of dependence is
very small provided N is large. In fact, when N >> n they can be regarded
as being essentially independent.
For illustrative purposes, consider the very simplistic scenario where we
have a population of size N = 3 consisting of three items A, B, C taking
values 1, 2, 3, respectively. We want to select a random sample of size (n=2)
from this population using random sampling with replacement and then calculate the sample mean which will be the average of the two selected items.
This means that, at each draw, each of A,B and C has an equal probability
(of 1/3) of being selected. There are thus 9 possible, equally likely samples
which are:
{A, B}, {A, C}, {B, C}, {B, A}, {C, A}, {C, B}, {A, A}, {B, B}, {C, C}
We can then define sample variables to be Xj = the measured value associated with the jth member of the sample for j = 1, 2. This leads to 9 possible
pairs of data values which, in the corresponding order as above, are:
{1, 2}, {1, 3}, {2, 3}, {2, 1}, {3, 1}, {3, 2}, {1, 1}, {2, 2}, {3, 3}
Under this scheme, A, B, and C each have probability 1/3 of being the
first member of the sample and probability 1/3 of being the second member
of the sample. Consequently, their respective values 1, 2 and 3 also each have
6
probability 1/3 of being the first member of the sample and of being the
second member of the sample.
The set of all possible sample means is
{1.5, 2.0, 2.5, 1.5, 2.0, 2.5, 1.0, 2.0, 3.0}
Note that the mean of the 9 possible sample means listed above is 18.0/9 =
2.0 which is the same as the population mean calculated as = (1+2+3)/3 =
2.0. This is an important property of random samples in that the mean of all
the possible samples of size n from the population is equal to the population
mean.
2.02 = 0.667 whereas the variance
The population variance is 2 = 14.0
3
39.0
2
of the above set of means is 9 2.0 = 0.333 which can be shown is equal
to 2 /n = 0.667/2 = 0.333.
If we now use random sampling without replacement to select samples of
size two then the set of possible samples is
{1, 2}, {1, 3}, {2, 3}
with corresponding sample means
1.5, 2.0, 2.5.
The mean of these sample means is 6.0/3 = 2.0 which is again equal to the
2.02 = 0.1667
population mean. The variance of these sample means is 12.5
3
which
can
be shown to be related to the population variance by the formula
2 N n
. When sampling without replacement, the variance of the sampling
n
N 1
n
than in the case
distribution of sample means is smaller by the fraction N
N 1
when we are sampling with replacement.
We can select a random sample from any size population by making use of
random numbers. A sequence of random numbers is a collection of digits
(0 9) such that each one is equally likely to occur in any one position,
independently of all the others. To select a random sample:
(i) Number the individual members of the target population (if finite).
Sample by time and/or space if this is not possible.
(ii) Obtain a random number, for example from a published table of random numbers or by using a routine programmed on a computer and
then identify the corresponding member of the population.
7