Professional Documents
Culture Documents
SEE5211 Chapter1 P2017
SEE5211 Chapter1 P2017
(SEE5211/SEE8212)
100%
Reading material
Statistics: The Exploration and Analysis of Data, (2011)
Roxy Peck, Jay L DeVore | ISBN-10: 0840058012 | ISBN-13: 9780840058010
Data Analysis in Envir Application
(SEE5211/SEE8212)
Week 1
What is statistics?
1. To be informed . . .
a) Extract information from tables, charts and graphs
b) Follow numerical arguments
c) Understand the basics of how data should be gathered, summarized,
and analyzed to draw statistical conclusions
2. To make informed judgments
3. To evaluate decisions that affect your life
categorical numerical
discrete continuous
Identify the following variables:
discrete numerical
Classifying variables by the number of
variables in a data set
She plans on having two groups. On one group she will write
“Thank you” on the receipt and on the other group she will not
write “Thank you” on the receipt.
Measure
Random Assignment
Block Compare
treatments
1 for block 1
Treatment Measure
Experimental
response
Compare the
for B
2 blocks
Units
Random Assignment
Treatment
Measure
A response
for A
Block Compare
treatments
2 for block 2
Treatment Measure
B response
for B
The Role of Statistics
How to construct
• Constructed like bar charts, but with two (or more) groups
being compared
• MUST use relative frequencies on the vertical axis
• MUST include a key to denote the different bars
Example
A survey of students applying to college and of parents of college applicants:
In 2009, 12,715 high school students responded to the question “Ideally how
far from home would you like the college you attend to be?” Also, 3007
parents of students applying to college responded to the question “how far
from home would you like the college your child attends to be?” Data is
displayed in the frequency table below.
Frequency
Ideal Distance Students Parents
Less than 250 miles 4450 1594
250 to 500 miles 3942 902
500 to 1000 miles 2416 331
More than 1000 miles 1907 180
Example
Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
1.0
0.6
500 to 1000 miles
0.4
More than 1000 miles
0.2
How to construct
• MUST first calculate relative frequencies
• Draw a bar representing 100% of the group
• Divide the bar into segments corresponding to the relative
frequencies of the categories
Pie (Circle) Chart
How to construct
• Draw a circle to represent the entire data set
• Calculate the size of each “slice”:
Relative frequency × 360°
• Using a protractor, mark off each slice
To describe
– comment on which category had the largest
proportion or smallest proportion
Example
Typos on a résumé do not make a very good impression when
applying for a job. Senior executives were asked how many typos
in a résumé would make them not consider a job candidate. The
resulting data are summarized in the table below.
Example: 2,3,5,6,7,8,8,13,15,17,17,17,17,19,22,33
Mean=199/16=12.4; Median=(13+15)/2=14; Mode=17
Numerical / Univariate Graph: Spread
What strikes you as the most distinctive difference among the
distributions of scores in classes D, E, & F?
2. Spread
• discuss how spread out the data is
The directions are positively (or right) skewed or negatively (or left) skewed.
4. Unusual occurrences
• Outlier - value that lies away from the rest of the data
• Gaps
• Clusters
Stem-and-Leaf Displays
How to construct
• Select one or more of the leading digits for the
stem
• List the possible stem values in a vertical column
• Record the leaf for each observation beside each
corresponding stem value
• Indicate the units for stems and leaves in a key
or legend
To describe
– comment on the center, spread, and shape of the
distribution and if there are any unusual features
The following data are price per ounce for various
brands of different brands of dandruff shampoo at a
local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
When to use
- when you have a concentration of data in the
middle with some extreme values
How to construct
- construct similar to histograms with continuous
data, but with density on the vertical axis
When to use
- used to answer questions about percentiles ( a value with a given percent of
observations at or below that value)
How to construct
- Mark the boundaries of the intervals on the horizontal axis
- Draw a vertical scale and mark it with relative frequency
- Plot the point corresponding to the upper end of each interval with its
cumulative relative frequency, including the beginning point
- Connect the points.
The National Climatic Center has been collecting weather data for many
years. The annual rainfall amounts for Albuquerque, New Mexico from
1950 to 2008 were used to create the frequency distribution below.
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
4 to <5 0.052 0.052
5 to <6 0.103
+
0.155
6 to <7 0.086 +
0.241
7 to <8 0.103
8 to <9 0.172
Continue this pattern to
9 to <10 0.069 complete the table
10 to < 11 0.207
11 to <12 0.103
12 to <13 0.052
13 to <14 0.052
The National Climatic Center has been collecting weather data for many
years. The annual rainfall amounts for Albuquerque, New Mexico from
1950 to 2008 were used to create the frequency distribution below.
Annual Rainfall Relative Cumulative relative
(in inches) frequency frequency
4 to <5 0.052 0.052
5 to <6 0.103 0.155
6 to <7 0.086 0.241
7 to <8 0.103 0.344
8 to <9 0.172 0.516
9 to <10 0.069 0.585
10 to < 11 0.207 0.792
11 to <12 0.103 0.895
12 to <13 0.052 0.947
13 to <14 0.052 0.999
1.0 What proportion of years had rainfall
Cumulative relative frequency amounts that were 9.5 inches or less?
0.8
0.6
Approximately 0.55
0.4
0.2
2 4 6 8 10 12 14
Rainfall
1.0 Approximately 30% of the years had
Cumulative relative frequency
annual rainfall less than what amount?
0.8
0.6
0.4
0.2
Rainfall
1.0
0.8
amounts had a larger
proportion of years –
9 to 10 inches or
0.6 10 to 11 inches?
Explain
0.4
The interval 10 to 11 inches,
because its slope is steeper,
indicating a larger proportion
0.2
occurred.
2 4 6 8 10 12 14
Rainfall
Scatterplots
How to construct
- Draw a horizontal scale and mark it with appropriate values of the
independent variable
- Draw a vertical scale and mark it appropriate values of the dependent
variable
- Plot each point corresponding to the observations
To describe
- comment the relationship between the variables
Time Series Plots
When to Use
- measurements collected over time at regular intervals
How to construct
- Draw a horizontal scale and mark it with appropriate values
of time
- Draw a vertical scale and mark it appropriate values of the
observed variable
- Plot each point corresponding to the observations and
connect
To describe
- comment on any trends or patterns over time
Group project
Group Project --- 20% (group presentation 10% and term paper 10%
(Individual Participation 2% ) students will first be divided into 10-12 small
groups (4-6 students form a group) .
Each small group will conduct a forum on a topic of your choice. Your group
will select one type of datasets (such as Air pollutant concentration, weather
data, Power data, or others). Group members will work together to prepare a
15-minute presentation and a term paper (1500 words + 4 figures ) about
data analysis , each project should first introduce the environmental datasets or
historical events and discuss the types of datasets, especially focus on
collecting, analyzing, and drawing conclusions from data.
Topics
Ozone (O3): higher concentrations, longer exposure and greater activity levels
cause greater effects