Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Section 1.

What is Statistics?

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.1


In today’s world…
…we are constantly being bombarded with statistics and
statistical information. For example:

Customer Surveys, Medical News,


Political Polls, Economic Predictions and
Marketing Information.

How can we make sense out of all this data?


How do we differentiate valid from flawed claims?
What is Statistics?!

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.2


What is Statistics?

“Statistics is a way to get information from data.


That’s it!”
-Gerald Keller
Statistics – the science of data

It provides techniques and methods to

– Collect data
– Evaluate data
(classification, summary, organization and analysis)
– Interpret summarized results from data.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.3


What is Statistics?

“Statistics is a way to get information from data”


Statistics

Data Information

Data: Facts, especially Information: Knowledge


numerical facts, collected communicated concerning
together for reference or some particular fact.
information.

Statistics is a tool for creating new understanding from a set of numbers.

Definitions: Oxford English Dictionary


Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.4
Example
A business school student is anxious about their statistics course, since
they’ve heard the course is difficult. The professor provides last term’s
final exam marks to the student. What can be discerned from this list of
numbers?
Statistics

Data Information
List of last term’s marks. New information about
the statistics class.
95
89
70 E.g. Class average,
65 Proportion of class receiving A’s
78 Most frequent mark,
57 Marks distribution, etc.
:
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.5
Key Statistical Concepts
Population
— a population is the group of all items of interest to
a statistics practitioner.
— frequently very large; sometimes infinite.
E.g. All registered voters in Malaysia

Sample
— A sample is a subset of data drawn from the
population.
— Potentially very large, but less than the population.
E.g. a sample of 100 registered voters to be surveyed.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.6
Key Statistical Concepts…

Parameter
— A descriptive measure of a population.
For instance population mean.

Statistic
— A descriptive measure of a sample.
For instance sample mean.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.7


Key Statistical Concepts…
Population Sample

Subset

Statistic
Parameter
Populations have parameters,
Samples have statistics.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.8


Descriptive Statistics
…are methods of organizing, summarizing, and presenting
data in a convenient and informative way.

Descriptive Statistics - describe collected data

“Nearly 87% of players participating in


a Speed Training Program improved
their sprint times.”

“Only about 3% of players participating in a Speed Training Program


had decreased times.”

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.9


Inferential Statistics
Descriptive Statistics describe the data set that’s being
analyzed, but doesn’t allow us to draw any conclusions or
make any inferences about the data. Hence we need another
branch of statistics: inferential statistics.

Inferential statistics is also a set of methods, but it is used to


draw conclusions or make generalization about
characteristics of populations based on data from a sample.

For instance, based on a poll of sample voters taken


immediately after they have exited the polling stations, it can
be concluded that more people voted for candidate of
Barisan Nasional in that parliament seat.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.10
Statistical Inference…
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population

Sample

Inference

Statistic
Parameter

What can we infer about a Population’s Parameters


based on a Sample’s Statistics?
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.11
Statistical Inference…
We use statistics to make inferences about parameters.

Therefore, we can make an estimate, prediction, or decision


about a population based on sample data.

Thus, we can apply what we know about a sample to the


larger population from which it was drawn!

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.12


Statistical Inference…
Rationale:
• Large populations make investigating each member impractical
and expensive.
• Easier and cheaper to take a sample and make estimates about the
population from the sample.

However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of
reliability”. Measures of reliability – are statements about the
uncertainty associated with an inference, namely confidence level
and significance level.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.13


Confidence & Significance Levels
The confidence level is the proportion of times that an
estimating procedure will be correct.
E.g. a confidence level of 95% means that, estimates based on this
form of statistical inference will be correct 95% of the time.

When the purpose of the statistical inference is to draw a


conclusion about a population, the significance level
measures how frequently the conclusion will be wrong in the
long run.
E.g. a 5% significance level means that, in the long run, this type of
conclusion will be wrong 5% of the time.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.14


Confidence & Significance Levels…

If we use  (Greek letter “alpha”) to represent significance,


then our confidence level is 1–.

This relationship can also be stated as:

Confidence Level
+ Significance Level
=1

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.15


Confidence & Significance Levels…

Consider a statement from polling data you may hear about


in the news:

“This poll is considered accurate within 3.4


percentage points, 19 times out of 20.”

In this case, our confidence level is 95% (19/20 = 0.95),


while our significance level is 5%.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.16


Statistical Applications in Business…

Statistical analysis plays an important role in virtually all


aspects of business and economics.

Throughout this course, we will see applications of statistics


in accounting, economics, finance, human resources
management, marketing, and operations management.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.17


Introduction & Re-cap…
Descriptive statistics involves arranging, summarizing, and
presenting a set of data in such a way that useful information
is produced.
Statistics

Data Information

Its methods make use of graphical techniques and numerical


descriptive measures (such as averages) to summarize and
present the data.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.18


Populations & Samples

Population Sample

Subset

The graphical & tabular methods presented here apply to entire


population and a sample drawn from the population.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.19


Definitions…
A variable is some characteristic of a population or sample.
E.g. student grades.
Typically denoted with a capital letter: X, Y, Z…

The values of the variable are the range of possible values


for a variable.
E.g. student marks (0..100)

Data are the observed values of a variable.


E.g. a sample of seven student marks: {67, 74, 71,
83, 93, 55, 48}
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.20
Types of Data & Information

Data (at least for purposes of Statistics) fall into two main
groups:

• Qualitative Data
– The data collected are non-numeric and classified into categories.

• Quantitative Data
– The data which can be quantified into numbers.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.21


Interval Data…
For the quantitative data, it can be represented by:

Interval scale
• Consists of real numbers but it does not have true zero point (which
shows actually no value in the measurement taken.)
For instance, measurement for temperature in degree Celsius.
Zero degree Celsius does not mean there is no temperature in a place.

Ratio scale
• Consists of all real numbers with absolute zero (zero means no
value)
For instance, the annual net profit of a business entity and bank
balance.

Arithmetic operations can be performed on Interval and Ratio data. For


instances 2*Height, price + $1 and so on.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.22
Types of qualitative data
Ordinal Data appear to be categorical in nature, but their values have
an order; a ranking to them:

E.g. College course rating system:


poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
age categories :
< 12 years old, 12 and less than 20 years old, >20 years old.

While it is still not meaningful to do arithmetic on this data (e.g. does


2*fair = very good?), we can say things like:
excellent > poor or fair < very good

That is, order is maintained no matter what numeric values are


assigned to each category.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.23


Nominal Data…
Nominal Data
• The values of nominal data are categories.
E.g. responses to questions about marital status, coded
as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4

Because the numbers are arbitrary arithmetic operations


don’t make any sense (e.g. does Widowed ÷ 2 = Married?!)

Nominal data are also called qualitative or categorical.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.24


Types of Data & Information…

N Interval
Data Categorical?
Data
Y

Y Ordinal
Ordered?
Data
Categorical
Data N
Nominal
Data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.25


E.g. Representing Student Grades…

N Interval Data
Data Categorical?
e.g. {0..100}
Y

Y Ordinal Data
Ordered?
e.g. {F, D, C, B, A}
Categorical
Data N Rank order to data

Nominal Data
e.g. {Pass | Fail}

NO rank order to data

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.26


Calculations for Types of Data
As mentioned above,

• All calculations are permitted on interval data.

• Only calculations involving a ranking process are allowed for


ordinal data.

• No calculations are allowed for nominal data, save counting the


number of observations in each category.

This lends itself to the following “hierarchy of data”…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.27


Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.

Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.

Nominal
Values are the arbitrary numbers that represent categories.
Only calculations based on the frequencies of occurrence are valid.
Data may not be treated as ordinal or interval.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.28


Cross-sectional and Time Series Data
Observations measured at the same point in time (without
regard to differences in time ) are called cross-sectional
data. It provide the ‘snapshot’ of the population at specific
time point.
For instance sample of registered voters survey before the
polling day of an election.

Observations measured at successive points in time are


called time-series data. These data indicates changes of a
variable over a period of time. For instance, the monthly
sales of a consumer product over last 3 years.
Time-series data is graphed on a line chart, which plots
the value of the variable on the vertical axis against the
time periods on the horizontal axis.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.29
Line Chart for time series data
For example, the line graph for the total amounts of U.S.
income tax for the years 1987 to 2002…

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.30


Line Chart for time series data
From ’87 to ’92, the tax was fairly flat. Starting ’93, there was a rapid
increase taxes until 2001. Finally, there was a downturn in 2002.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.31

You might also like