Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 57

MATH 14

Obtaining and Organization of Data;


Lecture Outline

• Data

• Obtaining Data: Techniques and Methods

• Graphical Organization & Summarization of Data

• Measures of Location (Mean, Median and Mode)

• Measures of Variability
Data

• Statistics is a tool for converting data into information


Data

sheffield.ac.uk
Obtaining Data:
Techniques and Methods

Methods of Collecting Data

• Direct Observations
• Experiments
• Surveys
Obtaining Data:
Techniques and Methods

Sampling
process in which a predetermined
number of observations are taken from a
larger population

psu.edu
Obtaining Data:
Techniques and Methods

Sampling Methods

• Simple Random Sampling (SRS)


• Stratified Random
• Cluster Sampling
Obtaining Data:
Techniques and Methods

Sampling Methods

Simple Random Sampling


• sample of same size is equally
likely to be chosen
Obtaining Data:
Techniques and Methods

Sampling Methods

Stratified Random Sampling


• separates population into strata
Obtaining Data:
Techniques and Methods

Sampling Methods

Cluster Sampling
• simple random sample of groups
Obtaining Data:
Techniques and Methods

Sampling Methods (Comparison)

colby.edu
Obtaining Data:
Techniques and Methods

Common Errors in Data Acquisition

• incorrect measurements being taken because of faulty equipment

• mistakes made during transcription from primary sources

• inaccurate recording of data due to misinterpretation of terms

• inaccurate responses to questions concerning sensitive issues


Graphical Organization & Summarization of Data

Frequency Table/Distribution - a systematic arrangement of values


grouped into class intervals. Frequency tables are used to summarize data so
that the frequency of each interval is clearly displayed and the relative
frequency of each interval can be easily computed.

Class Interval - range of numbers define arbitrarily by the highest and


lowest numbers in the class.
Graphical Organization & Summarization of Data

Frequency - the number of times a particular value or phenomenon occurs

Midpoint – average of the upper and lower boundary of a class

Relative Frequency - the proportion of all given values that fall within the
interval. Usually express in percent.

Cumulative frequency - is the sum of the frequency for that class and all the
previous classes
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution


• The following data represents the ages of 30 students in a statistics class. Construct a
frequency distribution that has five classes.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution

1. The number of classes (5) is stated in the problem.


2. The minimum data entry is 18 and maximum entry is 54, so the range is 36. Divide
the range by the number of classes to find the class width

Round up to 8
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution

3. The minimum data entry of 18 may be used for the lower limit of the first class. To
find the lower class limits of the remaining classes, add the width (8) to each lower
limit.
The lower class limits are 18, 26, 34, 42, and 50
The upper class limits are 25, 33, 41, 49, and 57

4. Make a tally mark for each data entry in the appropriate class.
5. The number of tally marks for a class is the frequency for that class.
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution


Age of Students
Class Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
50 - 57 2
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution


Age of Students
Class Frequency, f Midpoint
18 – 25 13 21.5 (18 + 25)/ 2 = 21.5
26 – 33 8 29.5 (26 + 33)/ 2 = 29.5
34 – 41 4 37.5 (34 + 41)/ 2 = 37.5
42 – 49 3 45.5 (42 + 49)/ 2 = 45.5
50 - 57 2 53.5 (57 + 50)/ 2 = 53.5
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution


Age of Students
Class Frequency, f Relative Frequency
18 – 25 13 43.3% (13/30) x 100%
26 – 33 8 26.7% (8/30) x 100%
34 – 41 4 13.3% (4/30) x 100%
42 – 49 3 10% (3/30) x 100%
50 - 57 2 6.7% (2/30) x 100%
100%
Graphical Organization & Summarization of Data

Constructing a Frequency Distribution


Age of Students
Class Frequency, f Cumulative Frequency
18 – 25 13 13
26 – 33 +8 21
34 – 41 +4 25
42 – 49 +3 28
50 - 57 +2 30 These should
be equal
Graphical Organization & Summarization of Data

Bar Graph
• a graphical representation of a frequency table for qualitative data.
• On one axis of the graph frequencies of the relative frequencies are represented.
• The various classes of data are labeled on the other axis.

Histogram
• a graphical representation of a frequency table; it displays quantitative data.
• The class intervals are marked off on the horizontal axis; frequencies or relative
frequencies are marked off on the vertical axis.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data

Frequency Polygon
• the geometric shape obtained by connecting with a straight line the midpoints of
adjacent class intervals of a histogram.
• The relevance of presentation of data in the pictorial or graphical form is
immense.
• Frequency polygons give an idea about the shape of the data and the trends that a
particular data set follows.
• This can be very useful in comparing different sets of data by superimposing one
on the other.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data

Line graph

• graphical display of information that


changes continuously over time.
• visual comparison of how two
variables—shown on the x- and y-axes
—are related or vary with each other
Graphical Organization & Summarization of Data

Curve Smoothing - process of smoothing the corners of a frequency


polygon so that we obtain a smooth curve, suggesting the basic shape of the
distribution of numbers.

The aim of smoothing is to give a general idea of relatively slow changes


of value with little attention paid to the close matching of data values,
while curve fitting concentrates on achieving as close a match as possible
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data

Scatter Plot
• a graphic display of data points in a two-dimensional plane.
• Each data point represents a single unit of observation on which two
measurements, X and Y, have been made.
• The values of each of the measurements are scaled on the X and Y axes,
respectively.
• Each data point is located in the plane at the intersection of its associated X and
Y values.
Graphical Organization & Summarization of Data
Graphical Organization & Summarization of Data
Measures of Location (Mean, Median and Mode)

Measure of location - a number that represents the central or most


representative measurement in a set.

Mean ()- the arithmetic average of a set of measurements.


Measures of Location (Mean, Median and Mode)

Median (Md) - the middle number in an ordered set of measurements.

Mode (Mo) - number that occurs most frequently in a set of measurements.


It is possible for a set of measurements to have more than one mode.

Outlier - is a value that is very different from the other data in your data set.
This can skew your results.
Measures of Variability

1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability
Measures of Variability

1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability

1, 1, 2, 2, 2, 3, 3, 4, 4.5, 5, 5, 5.3, 105, 205


Measures of Variability

This is a single number that represents the spread or amount of dispersion in


a set of data.

Range - measures the total spread of a set of data and is computed from only
two numbers.

Range = largest measurement - smallest measurement


Measures of Variability

1, 1, 2, 2, 2, 3, 3, 4, 5, 5
Measures of Variability

Variance
• Variance is a numerical value that describes the variability of observations from its
arithmetic mean.
• Variance measures how far the outcome varies from the mean
• The variance equals the average of the sum of all the squared deviations of the
population.
• A deviation is the distance from any single measurement of a set to the mean of that set.
• It indicates how far are the individuals or the observations in a group that are spread
out.
Measures of Variability

Variance
• Statisticians use variance to see how individual numbers relate to each other
within a data set, rather than using broader mathematical techniques such as
arranging numbers into quartiles.
• The advantage of variance is that it treats all deviations from the mean as the
same regardless of their direction.
Measures of Variability

Population Variance

Sample Variance
Measures of Variability

Standard Deviation, Sd
• the square root of the variance.
• This measurement is very useful for describing the spread or dispersion of
a set of data around the mean.
• Measures how far the normal standard deviation is from the expected
value.
• Indicates how much observations or the individuals of a data set which
differs from the mean.
Measures of Variability

Population Standard deviation

Sample Standard deviation


Measures of Variability

Height of dogs

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Measures of Variability

Mean (average) height is 394 mm.


Measures of Variability

Calculate each dog's difference from the Mean


Measures of Variability

Variance
Measures of Variability

Standard Deviation
Measures of Variability

Now we can show which heights are within one Standard Deviation (147mm) of the
Mean:
Measures of Variability

• So, using the Standard Deviation we have a "standard" way of knowing what is
normal, and what is extra large or extra small.
• Rottweilers are tall dogs. And Dachshunds are a bit short, right?
Measures of Variability

Coefficient of Variation (CV)

• this indicates the degree of precision with which the treatments are
compared and is good index of the reliability of the experiment.
• It expresses the experimental error as percentage of the mean, thus the
higher the CV values, the lower is the reliability of the experiment.
Measures of Variability

Coefficient of Variation (CV)

• Basically CV<10 is very good, 10-20 is good, 20-30 is acceptable, and CV>30 is not
acceptable.

• For field experiments CV of 30% is tolerable and for laboratory/ clinical experiments
5% is the limit.

• Acceptable CV depends on the different factors: experimental designs, number of


replications and size, experimental materials, parameters, etc.
Measures of Variability

CV of population

CV of sample
Next Meeting…
Probability

You might also like