Introduction To Stats, Datasets

BSA
Statistics
Applications in Business and Economics
Data
Descriptive Statistics
Statistical Inference
The term statistics can refer to numerical facts such as
averages, medians, percents, and index numbers that
help us understand a variety of business and economic
situations.
Statistics can also refer to the art and science of
collecting, analyzing, presenting, and interpreting
data.
Definition:
Croxton and Cowden –

“Statistics may be defined as the collection,
presentation, analysis and interpretation of
numerical data.“
Seligman –
“Statistics is the science which deals with the methods
of collecting, classifying, presenting, comparing and
interpreting numerical data collected to throw some
light on any sphere of enquiry."
i) Weekly wages of 100 workers of a factory.
ii) Height of Ram is six feet.
iii) Mohan's weight is 70 Kgs, Sohan's height is 6.2

feet, and Ram's monthly income is Rs. 1,500.
iv) Sales of a company during the past 10 years.

Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
 Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
Samples and Populations
 A population consists of the set of all measurements for

which the investigator is interested.
 A sample is a subset of the measurements selected from

the population.
 A census is a complete enumeration of every item in a

population.
A POPULATION or UNIVERSE consists of all
members of a class or category of interest or under
consideration.
A SAMPLE is some portion or subset of a population.
If every member of a population is evaluated, a

CENSUS has been performed and the summary
value of all of the individual measurements is called
a PARAMETER. If only a subset or sample from a
population has been evaluated.
The summary of such measurements is called

STATISTICS. INFERENTIAL STATISTICS, therefore,
involves using sampling statistics to estimate
population parameters.
Samples and Populations
Population (N) Sample (n)

Reasons for Sampling
Census of a population may be:

 Difficult
 Impractical
 Expensive
Sample vs. Census
Favoring Conditions for

Type of Study Sample Census
1. Budget Small Large
2. Time available Short Long
3. Population size Large Small
4. Variance in the characteristic Small Large
5. Cost of sampling errors Low High
6. Cost of nonsampling errors High Low
7. Nature of measurement Destructive Nondestructive
8. Attention to individual cases Yes No

Population vs. Samples
 Statistical analyses are based on a simple model.
 You want to extrapolate from the data you have collected to make
general conclusions.
 There is a large population of data out there, and you have randomly
sampled parts of it.
 You analyze your sample to make inferences about the population.

DISCRIPTIVE STATISTICS 
This area of statistics consists of methods dealing with the

collection, tabulation, and summarization of data.
INFERENTIAL STATISTICS 
This area of statistics consists of methods, which permits

one to make inferences and estimates about population based
upon information from sample.
Using Statistics (Two Categories)
 Descriptive Statistics  Inferential Statistics

 Collect  Predict and forecast values
 Organize of population parameters
 Summarize  Test hypotheses about
 Display values of population
parameters
 Analyze
 Make decisions
Calculate
‾x estimate µ
Sample Population
‾x µ
(Statistic) (Parameter)
Select a
random
sample
Functions of Statistics:
i) To present facts in a proper form

ii) To simplify complex data
iii) Provide techniques for making comparison
iv) To formulate policies in different fields
v) Study relationship between different phenomenon
vi) Forecast future values
vii) Measure uncertainty
viii) To test hypotheses
ix) To draw valid inferences
x) To develop models
Statistics in business:
i) Best way to market

ii) Stress on the job
iii) Financial decisions
iv) How is the economy doing
v) The impact of technology at work
Limitations of Statistics:
i) Deals only with quantitative data

ii) Does not deal with individuals
iii) Statistics laws are not exact
iv) Statistics results are true only an average
v) Statistics is one of the method of study problems
vi) Statistical information can be misused
STATISTICAL THINKING UNDERSTANDING AND
MANAGING VARIABILITY:
1. An understanding of variation
2. An awareness of when and how variability affects

the quality
3. An ability to identify variability that can be

controlled
4. A commitment to controlling and reducing

variability in a never ending striving for quality
improvement
Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.
 All the data collected in a particular study are referred

to as the data set for the study.
Elements, Variables, and Observations
 Elements are the entities on which data are collected.

 A variable is a characteristic of interest for the elements.
 The set of measurements obtained for a particular
element is called an observation.
 A data set with n elements contains n observations.
 The total number of data values in a complete data
set is the number of elements multiplied by the
number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Variables
Element
Names Stock Annual Earn/
Company Exchange Sales(Rs.M) Share(Rs.)
Dataram BSE 73.10 0.86

EnergySouth NSE 74.00 1.67
Keystone Nasdaq 365.70 0.86
LandCare MCX 111.40 0.33
Psychemedics N 17.60 0.13
Data Set
Scales of measurement include:
Nominal Interval
Ordinal Ratio
The scale determines the amount of information

contained in the data.
The scale indicates the data summarization and

statistical analyses that are most appropriate.
Nominal
Data are labels or names used to identify an

attribute of the element.
A nonnumeric label or numeric code may be used.
Example: Employment Classification
1 for Educator
2 for Construction Worker
3 for Manufacturing Worker
Example: Ethnicity
1 for African-American
2 for Anglo-American
3 for Hispanic-American
Scales of Measurement
Nominal
Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Ordinal
The data have the properties of nominal data and

the order or rank of the data is meaningful.
A nonnumeric label or numeric code may be used.
Example: Ranking productivity of employees
Example: Taste test ranking of three brands of soft drink
Example: Position within an organization
1 for President
2 for Vice President
3 for Plant Manager
4 for Department Supervisor
5 for Employee
Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Ordinal Data
Faculty and staff should receive preferential

treatment for parking space.
Strongly Agree Neutral Disagree Strongly

Agree Disagree
1 2 3 4 5
Interval
The data have the properties of ordinal data, and

the interval between observations is expressed in
terms of a fixed unit of measure.
Interval data are always numeric.

Interval
Example:
Tushar has an SAT score of 1205, while Priya
has an SAT score of 1090. Tushar scored 115
points more than Priya.
Example: Fahrenheit Temperature

Example: Calendar Time
Ratio
The data have all the properties of interval data

and the ratio of two values is meaningful.
Variables such as distance, height, weight, and time

use the ratio scale.
This scale must contain a zero value that indicates

that nothing exists for the variable at the zero point.
Example: Monetary Variables, such as Profit and Loss, Revenues, and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory Turnover, and Quick
Ratio.
Ratio
Example:
If we compare the cost of Rs. 30000 for one
automobile to the cost of Rs. 15000 for a second
automobile, the ratio property shows that the first
automobile is Rs 30000/Rs 15000= 2 times the cost
of the second one.
Tushar’s college record shows 36 credit hours
earned, while Priya’s record shows 72 credit
hours earned. Priya has twice as many credit
hours earned as Tushar.
Scale
Nominal Numbers
Assigned 7 8 3 Finish
to Runners
Ordinal Rank Order Finish

of Winners Third Second First
place place place
Interval Temperature(in 8.2 9.1 9.6

0C or Fahrenite)
Ratio Time to Finish, in 15.2 14.1 13.4

Seconds
Scale Basic Common Marketing Permissible Statistics

Characteristics Examples Examples Descriptive Inferential
Nominal Numbers identify Social Security Brand nos., store Percentages, Chi-square,
& classify objects nos., numbering types mode binomial test
of football players
Ordinal Nos. indicate the Quality rankings, Preference Percentile, Rank-order
relative positions rankings of teams rankings, market median correlation,
of objects but not in a tournament position, social Friedman
the magnitude of class ANOVA
differences
between them
Interval Differences Temperature Attitudes, Range, mean, Product-
between objects (Fahrenheit) opinions, index standard moment
Ratio Zero point is fixed, Length, weight Age, sales, Geometric Coefficient of
ratios of scale income, costs mean, harmonic variation
values can be mean
compared
1. How long ago were you released from the hospital.
2. Which type of unit were you in for most of your stay i)
Coronary care ii) Intensive care iii) Maternity care iv)
Medical unit v) Surgical unit
3. In choosing hospital, how important was the hospital
location? i)VI ii) SWI iii) NVI iv) NAI
4. How serious was your condition? i) serious ii) critical
iii) moderate iv) minor
5. Rate the skill of your doctor. i) E ii) VG iii) G iv)F v) P
6. On the following scale from one to seven, rate the
nursing care. poor1234567excellent
Categorical and Quantitative Data
Data can be further classified as being categorical

or quantitative.
The statistical analysis that is appropriate depends

on whether the data for the variable are categorical
or quantitative.
In general, there are more alternatives for statistical

analysis when the data are quantitative.
Labels or names used to identify an attribute of
each element
Often referred to as qualitative data
Use either the nominal or ordinal scale of

measurement
Can be either numeric or nonnumeric
Appropriate statistical analyses are rather limited

Quantitative Data
Quantitative data indicate how many or how much:
discrete, if measuring how many
continuous, if measuring how much
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful for

quantitative data.
Data
Categorical Quantitative
Numeric Non-numeric Numeric
Nominal Ordinal Nominal Ordinal Interval Ratio

Cross-Sectional Data
Cross-sectional data are collected at the same or

approximately the same point in time.
Example: data detailing the number of building

permits issued in February 2010 in each of the
sector of Greater Noida
Time Series Data
Time series data are collected over several time

periods.
Example: data detailing the number of building

permits issued in Greater Noida, U.P. in each of
the sector last 36 months
Time Series Data
U.S. Average Price Per Gallon

For Conventional Regular Gasoline
Source: Energy Information Administration, U.S. Department of Energy, May 2009.

SOURCES OF DATA
47
SOURCES OF DATA
Internal External
Primary Secondary
METHODS OF COLLECTING DATA
COLLECTION OF DATA
Primary Secondary
Direct Personal Indirect Personal Mailed Schedules through

Interview Interview Questionnaire enumerators
Published Sources Unpublished Sources
Government International Semi official Commission’ Private

Publications Publications Publications Report Publications
Tables and Charts for Categorical
Data
Categorical
Data
Tabulating Data Graphing Data
Summary Bar Pie Pareto

Table Charts Charts Diagram
CLASSIFICATION:
Let us consider out of thirty students to enter the
school we wish to compare the number with
black hairs with the number with brown hairs. We
can make entries in the note book and obtain, say,
the following information.
Black, brown, brown, black, brown, black, brown, black,
brown, black, brown, brown, brown, brown, black, black,
brown, brown, black, brown, brown, black, brown, brown,
brown, black, brown, brown, black, black.
It is difficult to Find out any features from
information presented in there was but if we
tabulate this information, the important features
become evident.
Hair Colour of 30 Students
Colour Number
Brown 18
Black 12
Total 30
The bringing together, if items with
common characteristics are known
as CLASSIFICATION.
21 50 42 75 55 67 74 55 47 64
71 61 40 25 25 54 64 37 88 44
31 70 81 51 45 63 49 43 35 67
68 31 38 45 59 75 57 29 66 50
56 84 56 88 63 32 55 88 79 78
MARKS IN STATISTICS OF 250 STUDENTS
32 47 41 51 41 30 39 18 48 53
54 32 31 46 15 37 32 56 42 48
38 26 50 40 38 42 35 22 62 51
44 21 45 31 37 41 44 18 37 47
68 41 30 52 52 60 42 38 38 34
41 53 48 21 28 49 42 36 41 29
30 33 37 35 29 37 38 40 32 49
43 32 24 38 38 22 41 50 17 46
46 50 26 15 23 42 25 52 38 46
41 38 40 37 40 48 45 30 28 31
40 33 42 36 51 42 56 44 35 38
31 51 45 41 50 53 50 32 45 48
49 43 40 34 34 44 38 58 49 28
40 45 19 24 34 47 37 33 37 36
36 32 61 30 44 43 50 31 38 45
46 40 32 34 44 54 35 39 31 48
48 50 43 55 43 39 41 48 53 34
32 31 42 34 34 32 33 24 43 39
40 50 27 47 34 44 34 33 47 42
17 42 57 35 38 17 33 46 36 23
48 50 31 58 33 44 26 29 31 37
47 55 57 37 41 54 42 45 47 43
34 52 47 46 44 50 44 38 42 19
52 45 23 41 47 33 42 24 48 39
48 44 60 38 38 44 38 43 40 48
MARKS NO. OF STUDENTS ( F )
15 – 19 9
20 – 24 11
25 – 29 10
30 – 34 44
35 – 39 45
40 – 44 54
45 – 49 37
50 – 54 26
55 – 59 8
60 – 64 5
65 – 69 1
TOTAL 250
MARKS NO. OF STUDENTS ( F )
15 – 20 9
20 – 25 11
25 – 30 10
30 – 35 44
35 – 40 45
40 – 45 54
45 – 50 37
50 – 55 26
55 – 60 8
60 – 65 5
65 – 70 1
TOTAL 250
MARKS NO. OF CUMULATIVE CUMULATIVE
STUDENTS ( F ) FREQUENCY (<) FREQUENCY (>)
15 – 20 9 9 250
20 – 25 11 20 241
25 – 30 10 30 230
30 – 35 44 74 220
35 – 40 45 119 176
40 – 45 54 173 131
45 – 50 37 210 77
50 – 55 26 236 40
55 – 60 8 244 14
60 – 65 5 249 6
65 – 70 1 250 1
TOTAL 250
Group Data and the Histogram
 Dividing data into groups or classes or

intervals
 Groups should be:
Mutually exclusive
Not overlapping - every observation is assigned to
only one group
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)
First or last group may be open-ended
Frequency Distribution
 Table with two columns listing:
Each and every group or class or interval of values
Associated frequency of each group
 Number of observations assigned to each group
 Sum of frequencies is number of observations
N for population
n for sample
 Class midpoint is the middle value of a group or class or interval
 Relative frequency is the percentage of total observations in each
class
Sum of relative frequencies = 1
x f(x) f(x)/n
Spending Class ($) Frequency (number of customers) Relative Frequency
0 to less than 100 30 0.163

100 to less than 200 38 0.207
200 to less than 300 50 0.272
300 to less than 400 31 0.168
400 to less than 500 22 0.120
500 to less than 600 13 0.070
184 1.000
• Example of relative frequency: 30/184 = 0.163

• Sum of relative frequencies = 1
Cumulative Frequency Distribution
x F(x) F(x)/n
Spending Class ($) Cumulative Frequency Cumulative Relative Frequency
0 to less than 100 30 0.163

100 to less than 200 68 0.370
200 to less than 300 118 0.641
300 to less than 400 149 0.810
400 to less than 500 171 0.929
500 to less than 600 184 1.000
The cumulative frequency of each group is the sum of the

frequencies of that and all preceding groups.
Methods of Displaying Data
 Pie Charts
Categories represented as percentages of total
 Bar Graphs
Heights of rectangles represent group frequencies
 Frequency Polygons
Height of line represents frequency
 Ogives
Height of line represents cumulative frequency
Organizing Numerical Data
Numerical Data
Frequency Distributions
Ordered Array and
Cumulative Distributions
Stem-and-Leaf
Histogram Polygon Ogive
Display
The Ordered Array
A sequence of data in rank order:

 Shows range (min to max)
 Provides some signals about variability
within the range
 May help identify outliers (unusual observations)
 If the data set is large, the ordered array is
less useful
(continued)
Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Bar and Pie Charts
Bar charts and Pie charts are often used
for categorical data
Height of bar or size of pie slice shows

the frequency or percentage for each
category
Existing Sources
Internal company records – almost any department

Business database services – Dow Jones & Co.
Government agencies - U.S. Department of Labor
Industry associations – Travel Industry Association
of America
Special-interest organizations – Graduate Management
Admission Council
Internet – more and more firms
Data Sources
Data Available From Internal Company Records

Record Some of the Data Available
Employee records name, address, social security number
Production records part number, quantity produced,
direct labor cost, material cost
Inventory records part number, quantity in stock,
reorder level, economic order quantity
Sales records product number, sales volume, sales
volume by region
Credit records customer name, credit limit, accounts
receivable balance
Customer profile age, gender, income, household size
Data Sources
Data Available From Selected Government Agencies

Government Agency Some of the Data Available
Census Bureau Population data, number of
www.census.gov households, household income
Federal Reserve Board Data on money supply, exchange
www.federalreserve.gov rates, discount rates
Office of Mgmt. & Budget Data on revenue, expenditures, debt
www.whitehouse.gov/omb of federal government
Department of Commerce Data on business activity, value of
www.doc.gov shipments, profit by industry
Bureau of Labor Statistics Customer spending, unemployment
www.bls.gov rate, hourly earnings, safety record
Statistical Studies - Experimental
In experimental studies the variable of interest is

first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.
The largest experimental study ever conducted is

believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.
Data Sources
Statistical Studies - Observational

In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest.
a survey is a good example
Studies of smokers and nonsmokers are

observational studies because researchers
do not determine or control
who will smoke and who will not smoke.
Most of the statistical information in newspapers, magazines,
company reports, and other publications consists of data
that are summarized and presented in a form that is easy to
understand.
 Such summaries of data, which may be tabular, graphical,

or numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have a

better understanding of the cost of parts used in the
engine tune-ups performed in her shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Example: Hudson Auto Repair
Sample of Parts Cost ($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Example: Hudson Auto
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
(2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Example: Hudson Auto
18
Tune-up Parts Cost
16
14
12
Frequency
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
 The most common numerical descriptive statistic
is the average (or mean).
 The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
 Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Statistical Inference
Population - the set of all elements of interest in a

particular study
Sample - a subset of the population
Statistical inference - the process of using data obtained

from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census - collecting data for the entire population
Sample survey - collecting data for a sample

Population
Population and Census Data
Identifier Color MPG
RD1 Red 12
RD2 Red 10
RD3 Red 13
RD4 Red 10
RD5 Red 13
BL1 Blue 27
BL2 Blue 24
GR1 Green 35
GR2 Green 35
GY1 Gray 15
GY2 Gray 18
GY3 Gray 17
Sample and Sample Data
Identifier Color MPG
RD2 Red 10
RD5 Red 13
GR1 Green 35
GY2 Gray 18
1. Population
consists of all tune- 2. A sample of 50
ups. Average cost of engine tune-ups
parts is unknown. is examined.
3. The sample data

4. The sample average
provide a sample
is used to estimate the average parts cost
population average. of $79 per tune-up.
Descriptive vs. Inferential Statistics
Descriptive Statistics — using data gathered on

a group to describe or reach conclusions about
that same group only
Inferential Statistics — using sample data to

reach conclusions about the population from
which the sample was taken
Ethical Guidelines for Statistical Practice
 In a statistical study, unethical behavior can take a

variety of forms including:
• Improper sampling
• Inappropriate analysis of the data
• Development of misleading graphs
• Use of inappropriate summary statistics
• Biased interpretation of the statistical results
 You should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
 As a consumer of statistics, you should also be aware
of the possibility of unethical behavior by others.
Ethical Guidelines for Statistical Practice
 The American Statistical Association developed the

report “Ethical Guidelines for Statistical Practice”.
 The report contains 67 guidelines organized into
eight topic areas:
•Professionalism
•Responsibilities to Funders, Clients, Employers
•Responsibilities in Publications and Testimony
•Responsibilities to Research Subjects
•Responsibilities to Research Team Colleagues
•Responsibilities to Other Statisticians/Practitioners
•Responsibilities Regarding Allegations of Misconduct
•Responsibilities of Employers Including Organizations,
Individuals, Attorneys, or Other Clients

Introduction To Stats, Datasets

Uploaded by

Copyright:

Available Formats

You might also like

Introduction To Stats, Datasets

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Stats, Datasets

Uploaded by

Copyright:

Available Formats

BSA

Croxton and Cowden –

ii) Height of Ram is six feet.

iii) Mohan's weight is 70 Kgs, Sohan's height is 6.2

iv) Sales of a company during the past 10 years.

 A population consists of the set of all measurements for

 A sample is a subset of the measurements selected from

 A census is a complete enumeration of every item in a

A SAMPLE is some portion or subset of a population.

If every member of a population is evaluated, a

The summary of such measurements is called

Population (N) Sample (n)

Census of a population may be:

Favoring Conditions for

1. Budget Small Large

2. Time available Short Long

3. Population size Large Small

4. Variance in the characteristic Small Large

5. Cost of sampling errors Low High

6. Cost of nonsampling errors High Low

7. Nature of measurement Destructive Nondestructive

8. Attention to individual cases Yes No

 Statistical analyses are based on a simple model.

 You analyze your sample to make inferences about the population.

This area of statistics consists of methods dealing with the

This area of statistics consists of methods, which permits

 Descriptive Statistics  Inferential Statistics

i) To present facts in a proper form

i) Best way to market

i) Deals only with quantitative data

2. An awareness of when and how variability affects

3. An ability to identify variability that can be

4. A commitment to controlling and reducing

 All the data collected in a particular study are referred

 Elements are the entities on which data are collected.

Dataram BSE 73.10 0.86

The scale determines the amount of information

The scale indicates the data summarization and

Data are labels or names used to identify an

A nonnumeric label or numeric code may be used.

Example: Employment Classification

The data have the properties of nominal data and

A nonnumeric label or numeric code may be used.

Example: Ranking productivity of employees

Example: Taste test ranking of three brands of soft drink

Example: Position within an organization

Faculty and staff should receive preferential

Strongly Agree Neutral Disagree Strongly

The data have the properties of ordinal data, and

Interval data are always numeric.

Example: Fahrenheit Temperature

The data have all the properties of interval data

Variables such as distance, height, weight, and time

This scale must contain a zero value that indicates

Ordinal Rank Order Finish

Interval Temperature(in 8.2 9.1 9.6

Ratio Time to Finish, in 15.2 14.1 13.4