Professional Documents
Culture Documents
Math 100 Final Week 2
Math 100 Final Week 2
Math 100 Final Week 2
To find things out about a population of interest, it is common practice to take a sample.
A sample is a selection of objects and observations taken from the population of interest.
For example, a population might be all apples in an orchard at a given time. We wish to know how
big the apples are. We can't measure all of them, so we take a sample of some of them and
measure them.
The method chosen for taking the sample depends on the nature of the population, and the
resources available in terms of time and money.
The ideal is for each object in a population to be equally likely to be chosen as part of the sample.
This is called an unbiased sample. It is also desirable for the sample to be representative of the
population. If the population of apples were two thirds red and one-third green, the sample should
be similarly split.
Note that no matter what we do, there will always be sampling error or variation due to sampling,
as we are looking at a part of the population, not the whole population.
The video on variation covers these concepts more thoroughly. This video presents five methods of
sampling:
For each method we will outline the process and the advantages and disadvantages.
Simple random sampling is theoretically the ideal method of Sampling.
o You list each member the population and use random numbers to decide which objects
are in the sample. Each object is equally likely to be selected. This produces an unbiased
sample which we hope is representative. However it can be difficult and expensive to take
a simple random sample when dealing with people.
Convenience samples are often biased in some way. But, for a quick and cheap poll it may not
really matter.
Convenience samples can also have self-selection bias when people choose to participate,
because they have an interest in the issue in question.
With systematic sampling, you choose a starting point at random, and then systematically take
objects at a certain number apart.
For example, if there are a thousand in the population and you want a sample of fifty, you would
take every twentieth object.
Systematic samples are easier to administer than simple random samples and are usually a good
approximation of a random sample. However, if there's a pattern in the population, certain types
of objects could be chosen more or less often than others.
Cluster sampling the population is divided into clusters which are then chosen at random.
For example, departments of a business can be clusters, or suburbs within a city. Within each
cluster, all of the objects are included in the sample.
Cluster sampling can be more convenient and practical than simple random sampling. However, if
the clusters are different from each other with regard to the elements we are measuring, it can
lead to bias or non-representativeness.
Stratified sampling seems like cluster sampling, but the strata, or groups, are chosen specifically to
represent different characteristics within the population, such as ethnicity, location, age or
occupation. Within each group, a random sample is taken, sometimes in proportion to the size of
the group.
Stratified sampling can lead to a very good random representative sample. However it can be
complex to administer, and a sampling frame with considerable information about the population
is required.
There are other sampling methods. The five explained here give an idea of the advantages and
disadvantages of various methods. You should attempt to use a sampling method that produces
the best result for the resources you have available. If your sample has known bias, this should be
taken into account in analysis and reporting.
Examples of parameters and statistics are the mean and standard deviation but
because of the definitions we just talked about we have to be very careful with
what symbols we use to represent these numbers.
When we are dealing with a sample, we use the symbol x-bar to represent the
sample mean and we use the letter s to represent the sample standard
deviation, these are statistics. When we are dealing with a population, we use
the Greek letter mu to represent the population mean and we use the Greek
letter Sigma to represent the population standard deviation, these are
parameters.
The population parameters mu and Sigma are very important when we talk
about normally distributed populations.
What is a normal distribution? Normal distribution is a special type of density
curve that is bell-shaped.
For this reason the normal distribution is sometimes called the bell curve or the
normal curve the normal distribution describes the tendency for data to cluster
around a central value.
In fact, this central value is the population mean mu which is always located in
the middle of the curve.
So for any normal distribution we can say that some data points will fall below the
mean other data points will fall above the mean but most of the data values are
located near the mean.
This is why the normal distribution is commonly studied. For example exam scores
are known to follow a normal distribution, some people do great on exams
some, people do poorly on exams but a large majority of people score near the
average or the mean. In this example, the average exam score is 50 because it
is located in the middle of the curve.
Now that you know what a normal distribution looks like, we need to talk about
the population mean and the population standard deviation sigma. Both of
these tell us important information about how the normal distribution looks.
We all talk about the population mean mu first; the population means mu
characterizes the position of the normal distribution. If you increase the mean the
curve will follow and move towards the right and if you decrease the mean the
curve will still follow and move towards the left.
Increase the mean means go to right Decrease the means means go to left
This happens because the data will always cluster around the mean in normally
distributed populations, as a result the value of the mean determines the position
of the normal distribution.
on the other hand, the population standard deviation Sigma characterizes the
spread of the normal distribution. The larger the standard deviation, the more
spread out the distribution will be and the smaller the standard deviation the less
spread.
Will be notice that when the spread increases, the curve gets much flatter
;and when the spread decreases, the curve gets taller. The reason for this is
because the normal distribution is a density curve and the total area of any
density curve must remain equal to one or a hundred percent. So changes in the
width of the curve, must be compensated for by changes in the height of the
curve and vice versa.
Overall, here are some points about the normal distribution, the normal
distribution is unimodal this means that the distribution has a single peak.
The normal curve is symmetric about its mean so you can clearly see that the
distribution can be cut into two equal halves.
68-95-99.7 rule says is that, within one standard deviation away from the mean it
contains a total area of 0.68 or 68%. Because of this we can say that 68% of the
populations are between five and six feet tall.
And finally within three standard deviations away from the mean, it contains a
total area of 99.7%. This means that for the population we are studying, 99.7% of
the people are between 4 and 7 feet tall.
Now you might be wondering what happens if we go four standard deviations
away from the mean or five or six standard deviations away from the mean, and
to answer that you actually can a normal distribution actually never touches the
x-axis it continues on to infinity, so you can go as many standard deviations away
from the mean as you want but the area contained within these regions will be
very very small.
The 68-95-99.7 rule is a great way for approximating the areas of a normal
distribution and this works for any normal distribution no matter what shape and
size.