Week 2 Introduction To Statistics

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Week 2

Introduction to Statistics
by
Anath Rau Krishnan, PhD
Senior Lecturer at Labuan Faculty of International Finance, Universiti Malaysia Sabah

Learning outcome
By end of this lecture, students should be able to:
Define what statistics is.
Discuss the use of statistics in business.
Interpret the basic components of a data set.
Identify the measurement scales used to describe a variable.
Differentiate qualitative and quantitative variables.
Explain the difference between descriptive and inferential statistics.
List down the techniques of collecting data.
Use suitable sampling approach for the data collection purpose.

What is statistics?

Application of Statistics in Business


Few examples:
Visually understanding the trend of profit earned for in the past (i.e. time series plot)
Risk involved in establishing new branch of a company (probability)
Economists use statistical information in making forecasts about the future of the
economy or some aspect of it (regression).
Determining whether 1kg packet of rice produced by a machine really containing 1
kg of rice (sampling and hypothesis testing)
Finding the relationship between the cost for repairing machines in production with
respect to number of technicians assigned to the machines and number of hour the
machines operating per day (correlation, regression)

In Class Exercise (1):


Identify the elements, number of observations, and the variables involved in the given toy data set:

Hotel

Breakf
ast

Room
size

Distance
(km)

Room
Temperat
ure
(Celsius)

Yes

Medium

No

Small

No

Large

0.5

Yes

Medium

0.8

-1

Yes

Very
large

Scales of Measurements
Nominal
Ordinal
Interval
Ratio

Nominal
The information on of variable is represented or described using labels, names, and categories
(linguistic terms)
Numeric codes may be used to represent the data
Example:
Students of a Labuan Faculty of International Finance are clustered by the management according to
their program, namely International Finance, Banking, Marketing, and so on.
Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes International
Finance, 2 denotes Banking, 3 denotes Marketing, and so on).

Ordinal

The information of variable is represented or described using labels, names, and categories (linguistic terms), but the
ranking of these terms is meaningful.

Numeric codes could be used.

Example:
Students of a university are classified by their position using a nonnumeric label such as:
Tutor < Lecturer < Senior Lecturer < Associate Professor < Professor
Alternatively, a numeric code could be used for the position variable (e.g. 1 denotes tutor, 2 denotes lecturer, 3 denotes
senior lecturer and so on).

Interval

The data have the properties of ordinal data, but the distance between two values has a meaningful distance.
Numeric values
Example: temperature;
difference between 40F and 50F = difference between 70F and 80F

Ratio

The data have all the properties of interval data and the ratio of two values is meaningful.
Variables such as distance, height, weight, and time use the ratio scale.
This scale must contain a zero value that indicates that nothing exists for the variable at the zero point.

Categorical and Quantitative Data


Data can be further classified as being categorical
or quantitative.
The statistical analysis that is appropriate depends
on whether the data for the variable are categorical
or quantitative.
In general, there are more alternatives for statistical
analysis when the data are quantitative.

Categorical Data
Labels or names used to identify an attribute of
each element
Often referred to as qualitative data
Use either the nominal or ordinal scale of
measurement
Can be either numeric or nonnumeric
Appropriate statistical analyses are rather limited

Scales of Measurement
Data
Categorical

Numeric

Nomina
l

Ordina
l

Quantitativ
e

Non-numeric

Numeric

Nominal Ordinal

Interval Ratio

Cross-Sectional Data
Cross-sectional data are collected at the same or
approximately the same point in time.
Example: data detailing the number of building
permits issued in November 2010 in each of the
counties of Ohio

Time Series Data


Time series data are collected over several time
periods.
Example: data detailing the number of building
permits issued in Lucas County, Ohio in each of
the last 36 months
Graphs of time series help analysts understand
what happened in the past,
identify any trends over time, and
project future levels for the time series

Time Series Data

Graph of Time Series


Data
U.S. Average Price Per Gallon
For Conventional Regular Gasoline

Source:
Source: Energy
Energy Information
Information Administration,
Administration, U.S.
U.S. Department
Department of
of Energy,
Energy, May
May 2009.
2009.

Data Sources (i.e.


collecting data)

Existing Sources /
Secondary data

Internal company records


Business database services
Government agencies Industry associations
Special-interest organizations
Internet

Data Sources

Data Available From Internal Company


Records
Record
Some of the Data Available
Employee records

name, address, social security number

Production records

part number, quantity produced,


direct labor cost, material cost

Inventory records

part number, quantity in stock,


reorder level, economic order quantity

Sales records

product number, sales volume, sales


volume by region

Credit records

customer name, credit limit, accounts


receivable balance

Customer profile

age, gender, income, household size

Data Sources
Data Available From Selected Government
Agencies
Government
Agency
Some of the Data Available

Census Bureau
www.census.gov

Population data, number of


households, household income

Federal Reserve Board


www.federalreserve.gov

Data on money supply, exchange


rates, discount rates

Office of Mgmt. & Budget


www.whitehouse.gov/omb

Data on revenue, expenditures, debt


of federal government

Department of Commerce

Data on business activity, value of


shipments, profit by industry

www.doc.gov
Bureau of Labor Statistics

www.bls.gov

Customer spending, unemployment


rate, hourly earnings, safety record

Data Sources
Statistical Studies Experimental
In experimental studies the variable of interest is
first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.
The largest experimental study ever conducted is
believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.

Data Sources

Statistical Studies - Observational


In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest. a survey is a good
example
Studies of smokers and nonsmokers are
observational studies because researchers
do not determine or control
who will smoke and who will not smoke.

Data Acquisition Considerations


Time Requirement

Searching for information can be time consuming.


Information may no longer be useful by the time it
is available.
Cost of Acquisition

Organizations often charge for information even


when it is not their primary business activity.
Data Errors

Using any data that happen to be available or were


acquired with little care can lead to misleading
information.

Descriptive statistics
Most of the statistical information in newspapers, magazines, company
reports, and other publications consists of data that are summarized and
presented in a form that is easy to understand.
Such summaries of data, which may be tabular, graphical, or numerical, are
referred to as descriptive statistics.
deal with statistical techniques that could be used to organized and
summarize the collected data

Example: Hudson Auto Repair


The manager of Hudson Auto would like to
have a
better understanding of the cost of parts used in
the
engine tune-ups performed in her shop. She
examines
50 customer invoices for tune-ups. The costs of
parts,
rounded to the nearest dollar, are listed on the
next
slide.

Example: Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tuneups


91
71
104
85
62

78
69
74
97
82

93
72
62
88
98

57
89
68
68
101

75
66
97
83
79

52
75
105
68
105

99
79
77
71
79

80
75
65
69
69

97
72
80
67
62

62
76
109
74
73

Tabular Summary:
Frequency and Percent Frequency

Example: Hudson Auto


Parts
Cost ($)

Percent
Frequency Frequency
4
2
50-59
26 (2/50)100
13
60-69
32
16
70-79
14
7
80-89
14
7
90-99
10
5
100-109
100
50

Graphical
Summary: Histogram
Example: Hudson Auto
18

Tune-up Parts Cost

16

Frequency

14
12
10
8
6
4
2

Parts
Cost ($)
5059 6069 7079 8089 9099 100-110

Numerical Descriptive Statistics


The most common numerical descriptive statistic
is the average (or mean).
The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudsons average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).

Inferential statistics
Population
the set of all elements of interest in a particular study
Sample

a subset of the population


Statistical inference

the process of using data obtained from a sample to make estimates and test
hypotheses about the characteristics of a population

Sampling approaches
Probability sampling
o Simple random sampling
o Systematic sampling
o Stratified sampling
o Cluster sampling

Non-probability sampling
o Convenient sampling
o Judgmental sampling

THE END

You might also like