Professional Documents
Culture Documents
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
2
What is statistics?
Population Sample
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
◼ Statistics
◼ The branch of mathematics that transforms data into
useful information for decision makers.
◼ Collect data
◼ e.g., Survey
◼ Present data
◼ e.g., Tables and graphs
◼ Characterize data
◼ e.g., Sample mean =
X i
n
Inferential Statistics
◼ Estimation
◼ e.g.: Estimate the population
mean weight using the
sample mean weight
◼ Hypothesis testing
◼ e.g.: Test the claim that the
population mean weight is 70
Kg
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Basic Vocabulary of Statistics
VARIABLE
A variable is a characteristic of an item or individual that can change or take on
different values. Most research begins with a general question about the relationship
between two variables
DATA
Data are the different values associated with a variable.
OPERATIONAL DEFINITIONS
Data values are meaningless unless their variables have operational definitions,
universally accepted meanings that are clear to all associated with an analysis.
Collecting Data
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
Collecting Data Correctly Is A Critical Task
Variables
Categorical Numerical
Examples:
◼ Marital Status
◼ Political Party Discrete Continuous
◼ Eye Color
(Defined categories) Examples: Examples:
◼ Number of Children ◼ Weight
◼ Defects per hour ◼ Voltage
(Counted items) (Measured characteristics)
.
Examples of Types of Variables
18
Levels of measurement
• There are four levels of measurement: Nominal, Ordinal,
Interval and Ratio. These go from lowest level to highest
level.
• Data is classified according to the highest level which it
fits. Each additional level adds something the previous
level did not have.
– Nominal is the lowest level. Only names are
meaningful here;
– Ordinal adds an order to the names;
– Interval adds meaningful differences;
– Ratio adds a zero so that ratios are meaningful.
Data Types
Levels of Measurement
.
Levels of Measurement
and Measurement Scales
Differences between Highest Level
measurements, true Ratio Data
zero exists (Strongest forms of
measurement)
Differences between
measurements but no Interval Data
true zero
Higher Levels
Ordered Categories
(rankings, order, or Ordinal Data
scaling)
Banking Preference
Banking Preference? %
16% ATM
ATM 16% 24%
Automated or live 2% 2% Automated or live
telephone telephone
Drive-through service at
Drive-through service at 17%
17% branch
branch
In person at branch
In person at branch 41%
Internet 24% Internet
41%
Cross Tabulations
▪ The cell is the intersection of the row and column and the
value in the cell represents the data corresponding to that
specific pairing of row and column categories.
Frequency
4
(In a percentage
histogram the vertical
axis would be defined to 2
show the percentage of
observations per class)
0
5 15 25 35 45 55 More
Histograms
A histogram shows three general types of
information:
It provides visual indication of where
the approximate center of the data is.
We can gain an understanding of the
degree of spread, or variation, in the
data.
We can observe the shape of the
distribution.
Organizing Numerical Data:
Stem and Leaf Display
▪ A stem-and-leaf display organizes data into groups (called
stems) so that the values within each group (the leaves)
branch out to the right on each row.
Age of College Students
29 146
150
33 160
100
38 167
50
42 170
0
50 188
20 30 40 50 60 70
55 195
Volume per Day
60 200
Visualizing Two Numerical Variables:
Time Series Plot
Number of
Year Franchises Number of Franchises, 1996-2004
120
1996 43
100
1997 54 Franchises
Number of
80
1998 60 60
1999 73 40
2000 82 20
0
2001 95
1994 1996 1998 2000 2002 2004 2006
2002 107 Year
2003 99
2004 95
Measures of Central Tendency
the extent to which all the data values group around a typical or central value
x1 + x2 + x3 + + xn x i
x= = i =1
n n
The Median
The Mode
Measures of Dispersion
The variation is the amount of dispersion, or scattering, of values
Variance
The Range
Interquartile Range
Measures of Variation:
The Variance
◼ Average (approximately) of squared deviations
of values from the mean
n
◼ Sample variance:
(X − X) i
2
S =2 i=1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Measures of Variation:
The Standard Deviation
n
◼ Sample standard deviation: (X − X)
i
2
S= i=1
n -1
Measures of Variation:
Comparing Standard Deviations
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
Q1 Q2 Q3 Q1 Q 2 Q3 Q1 Q2 Q3
Personal Computer Programs
Used For Statistics
◼ Minitab
◼ A statistical package to perform statistical analysis
◼ Designed to perform analysis as accurately as possible
◼ Microsoft Excel
◼ A multi-functional data analysis tool
◼ Can perform many functions but none as well as programs that
are dedicated to a single function.