Professional Documents
Culture Documents
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
They argue long and loud and though each was partly in the
right, all were in the wrong.
What Is Statistics?
(Figure 2.1)
Data Definitions
Numerical Data
Numerical or quantitative data arise from counting or some kind
of mathematical operation.
For example,
- Number of auto insurance claims filed in
March (e.g., X = 114 claims).
- Ratio of profit to sales for last quarter
(e.g., X = 0.0447).
Continuous Data
Cross-sectional data
– Data collected by recording a characteristic of many
subjects at the same point in time, or without regard to
differences in time.
– Subjects might include individuals, households, firms,
industries, regions, and countries.
– The survey data from the Introductory Case is an example
of cross-sectional data.
Types of Data
• Time series data
– Data collected by recording a characteristic of a subject
over several time periods.
– Data can include daily, weekly, monthly, quarterly, or
annual observations.
– This graph plots the
U.S. GDP growth rate
from 1980 to 2010 - it
is an example of time
series data.
Time Series Data
We are interested in
- variation among observations or in
- relationships.
LO 1.4
Variables and Scales of Measurement
LO 1.4
Variables and Scales of Measurement
Scales of Measure
- Nominal
Qualitative Variables
- Ordinal
- Interval
Quantitative Variables
- Ratio
LO 1.4
Levels of Measurements
Level of
Characteristics Example
Measurement
Eye color (blue,
Nominal Categories only
brown, green, hazel)
Bond ratings (Aaa,
Ordinal Rank has meaning
Aab, C, D, F, etc.)
Distance has Temperature (57o
Interval
meaning Celsius)
Meaningful zero Accounts payable
Ratio
exists ($21.7 million)
Levels of Measurements
Nominal Measurement
Nominal data merely identify a category.
Nominal data are qualitative, attribute, categorical or
classification data (e.g., Small, Medium, Large, Extra Large, etc.,).
Nominal data are usually coded numerically, codes are
arbitrary (e.g., 36 = Small, 40 = Medium, 42 = Large, 44 = Extra
Large).
Only mathematical operations are counting (e.g., frequencies)
and simple statistics.
Levels of Measurements
Ordinal Measurement
Ordinal data codes can be ranked
(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).
Interval Measurement
Data cannot only be ranked, but also have meaningful intervals
between scale points
(e.g., difference between 60F and 70F is same as difference
between 20F and 30F).
Since intervals between numbers represent distances,
mathematical operations can be performed (e.g., average).
LO 1.4
Variables and Scales of Measurement
LO 1.4
Variables and Scales of Measurement
The Ratio Scale
• The strongest level of measurement.
• Ratio data may be categorized and ranked with
respect to some characteristic or trait.
• Differences between interval values are equal and
meaningful.
• There is an “absolute 0” or defined starting point.
“0” does mean “the absence of …” Thus, meaningful
ratios may be obtained.
LO 1.4
Overview of Statistics
Statistics
LO 1.2
Population and Sample
• Population
– Consists of all items of interest.
• Sample
– A subset of the population.
• A sample statistic is calculated from the sample data
and is used to make inferences about the population
parameter.
LO 1.2
The Need for Sampling
N n
Here,
N/n ≥ 20
Descriptive Statistics
Numerical Description
Central Tendency
Dispersion
Numerical Description
Statistics are descriptive measures derived from a
sample (n items).
Parameters are descriptive measures derived from a
population (N items).
34
Central Tendency
Mean
• A familiar measure of central tendency.
Measures of Variation
Statistic Formula Excel Pro Con
Mean n
absolute xi − x =AVEDEV(Data)
Easy to
Lacks “nice”
theoretical
i =1
deviation understand.
n properties.
(MAD)
Dispersion
Variance
• The population variance (s2) is N
( xi − )
2
defined as the sum of squared
deviations around the mean s2 = i =1
divided by the population size. N
Population N Sample
( xi − )
2 1
standard
s = i =1
standard
n
( xi − x ) 2
deviation N deviation
Descriptive Statistics
Standardized Data
Percentiles, Quartiles and Box Plots
Standardized Data
Chebyshev’s Theorem
• Developed by mathematicians Jules Bienaymé
(1796-1878) and Pafnuty Chebyshev (1821-1894).
Q1 Q2 Q3
The three values that separate the four groups are called Q1, Q2,
and Q3, respectively.
Percentiles, Quartiles and Box Plots
Quartiles
The first quartile Q1 is the median of the data values
below Q2, and the third quartile Q3 is the median of the
data values above Q2.
Q1 Q2 Q3
Lower 25% | Second 25% | Third 25% | Upper 25%
Correlation Coefficient
The sample correlation coefficient is a statistic that describes
the degree of linearity between paired observations on two
quantitative variables X and Y.
n
(x i − x )( yi − y )
r= i =1
n n
( xi − x )
i =1
2
i
( y
i =1
− y ) 2
Correlation
Correlation Coefficient
Its range is -1 ≤ r ≤ +1.
Excel’s formula =CORREL(Xdata, Ydata)
Correlation
Illustration of Correlation Coefficients