Introduction To Stats, Datasets

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 86

BSA

Statistics
Applications in Business and Economics
Data

Descriptive Statistics
Statistical Inference
The term statistics can refer to numerical facts such as
averages, medians, percents, and index numbers that
help us understand a variety of business and economic
situations.
Statistics can also refer to the art and science of
collecting, analyzing, presenting, and interpreting
data.
Definition:

Croxton and Cowden –


“Statistics may be defined as the collection,
presentation, analysis and interpretation of
numerical data.“

Seligman –
“Statistics is the science which deals with the methods
of collecting, classifying, presenting, comparing and
interpreting numerical data collected to throw some
light on any sphere of enquiry."
i) Weekly wages of 100 workers of a factory.

ii) Height of Ram is six feet.

iii) Mohan's weight is 70 Kgs, Sohan's height is 6.2


feet, and Ram's monthly income is Rs. 1,500.

iv) Sales of a company during the past 10 years.


Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.
 Finance
Financial advisors use price-earnings ratios and
dividend yields to guide their investment advice.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail checkout
counters are used to collect data for a variety of
marketing research applications.
Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
Samples and Populations

 A population consists of the set of all measurements for


which the investigator is interested.

 A sample is a subset of the measurements selected from


the population.

 A census is a complete enumeration of every item in a


population.
A POPULATION or UNIVERSE consists of all
members of a class or category of interest or under
consideration.

A SAMPLE is some portion or subset of a population.

If every member of a population is evaluated, a


CENSUS has been performed and the summary
value of all of the individual measurements is called
a PARAMETER. If only a subset or sample from a
population has been evaluated.

The summary of such measurements is called


STATISTICS. INFERENTIAL STATISTICS, therefore,
involves using sampling statistics to estimate
population parameters.
Samples and Populations

Population (N) Sample (n)


Reasons for Sampling

Census of a population may be:


 Difficult
 Impractical
 Expensive
Sample vs. Census

Favoring Conditions for


Type of Study Sample Census

1. Budget Small Large

2. Time available Short Long

3. Population size Large Small

4. Variance in the characteristic Small Large

5. Cost of sampling errors Low High

6. Cost of nonsampling errors High Low

7. Nature of measurement Destructive Nondestructive

8. Attention to individual cases Yes No


Population vs. Samples

 Statistical analyses are based on a simple model.

 You want to extrapolate from the data you have collected to make
general conclusions.

 There is a large population of data out there, and you have randomly
sampled parts of it.

 You analyze your sample to make inferences about the population.


DISCRIPTIVE STATISTICS 

This area of statistics consists of methods dealing with the


collection, tabulation, and summarization of data.

INFERENTIAL STATISTICS 

This area of statistics consists of methods, which permits


one to make inferences and estimates about population based
upon information from sample.
Using Statistics (Two Categories)

 Descriptive Statistics  Inferential Statistics


 Collect  Predict and forecast values
 Organize of population parameters
 Summarize  Test hypotheses about
 Display values of population
parameters
 Analyze
 Make decisions
Calculate
‾x estimate µ

Sample Population
‾x µ
(Statistic) (Parameter)

Select a
random
sample
Functions of Statistics:

i) To present facts in a proper form


ii) To simplify complex data
iii) Provide techniques for making comparison
iv) To formulate policies in different fields
v) Study relationship between different phenomenon
vi) Forecast future values
vii) Measure uncertainty
viii) To test hypotheses
ix) To draw valid inferences
x) To develop models
Statistics in business:

i) Best way to market


ii) Stress on the job
iii) Financial decisions
iv) How is the economy doing
v) The impact of technology at work
Limitations of Statistics:

i) Deals only with quantitative data


ii) Does not deal with individuals
iii) Statistics laws are not exact
iv) Statistics results are true only an average
v) Statistics is one of the method of study problems
vi) Statistical information can be misused
STATISTICAL THINKING UNDERSTANDING AND
MANAGING VARIABILITY:

1. An understanding of variation

2. An awareness of when and how variability affects


the quality

3. An ability to identify variability that can be


controlled

4. A commitment to controlling and reducing


variability in a never ending striving for quality
improvement
Data are the facts and figures collected, analyzed, and
summarized for presentation and interpretation.

 All the data collected in a particular study are referred


to as the data set for the study.
Elements, Variables, and Observations

 Elements are the entities on which data are collected.


 A variable is a characteristic of interest for the elements.
 The set of measurements obtained for a particular
element is called an observation.
 A data set with n elements contains n observations.
 The total number of data values in a complete data
set is the number of elements multiplied by the
number of variables.
Data, Data Sets,
Elements, Variables, and Observations
Variables
Element
Names Stock Annual Earn/
Company Exchange Sales(Rs.M) Share(Rs.)

Dataram BSE 73.10 0.86


EnergySouth NSE 74.00 1.67
Keystone Nasdaq 365.70 0.86
LandCare MCX 111.40 0.33
Psychemedics N 17.60 0.13

Data Set
Scales of measurement include:
Nominal Interval
Ordinal Ratio

The scale determines the amount of information


contained in the data.

The scale indicates the data summarization and


statistical analyses that are most appropriate.
Nominal

Data are labels or names used to identify an


attribute of the element.

A nonnumeric label or numeric code may be used.

Example: Employment Classification

1 for Educator
2 for Construction Worker
3 for Manufacturing Worker

Example: Ethnicity

1 for African-American
2 for Anglo-American
3 for Hispanic-American
Scales of Measurement

Nominal

Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business, Humanities,
Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business,
2 denotes Humanities, 3 denotes Education, and
so on).
Ordinal

The data have the properties of nominal data and


the order or rank of the data is meaningful.

A nonnumeric label or numeric code may be used.

Example: Ranking productivity of employees

Example: Taste test ranking of three brands of soft drink

Example: Position within an organization

1 for President
2 for Vice President
3 for Plant Manager
4 for Department Supervisor
5 for Employee
Ordinal

Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Ordinal Data

Faculty and staff should receive preferential


treatment for parking space.

Strongly Agree Neutral Disagree Strongly


Agree Disagree

1 2 3 4 5
Interval

The data have the properties of ordinal data, and


the interval between observations is expressed in
terms of a fixed unit of measure.

Interval data are always numeric.


Interval

Example:
Tushar has an SAT score of 1205, while Priya
has an SAT score of 1090. Tushar scored 115
points more than Priya.

Example: Fahrenheit Temperature


Example: Calendar Time
Ratio

The data have all the properties of interval data


and the ratio of two values is meaningful.

Variables such as distance, height, weight, and time


use the ratio scale.

This scale must contain a zero value that indicates


that nothing exists for the variable at the zero point.

Example: Monetary Variables, such as Profit and Loss, Revenues, and Expenses
Example: Financial ratios, such as P/E Ratio, Inventory Turnover, and Quick
Ratio.
Ratio
Example:
If we compare the cost of Rs. 30000 for one
automobile to the cost of Rs. 15000 for a second
automobile, the ratio property shows that the first
automobile is Rs 30000/Rs 15000= 2 times the cost
of the second one.
Tushar’s college record shows 36 credit hours
earned, while Priya’s record shows 72 credit
hours earned. Priya has twice as many credit
hours earned as Tushar.
Scales of Measurement
Scale
Nominal Numbers
Assigned 7 8 3 Finish
to Runners

Ordinal Rank Order Finish


of Winners Third Second First
place place place

Interval Temperature(in 8.2 9.1 9.6


0C or Fahrenite)

Ratio Time to Finish, in 15.2 14.1 13.4


Seconds
Scales of Measurement

Scale Basic Common Marketing Permissible Statistics


Characteristics Examples Examples Descriptive Inferential
Nominal Numbers identify Social Security Brand nos., store Percentages, Chi-square,
& classify objects nos., numbering types mode binomial test
of football players
Ordinal Nos. indicate the Quality rankings, Preference Percentile, Rank-order
relative positions rankings of teams rankings, market median correlation,
of objects but not in a tournament position, social Friedman
the magnitude of class ANOVA
differences
between them
Interval Differences Temperature Attitudes, Range, mean, Product-
between objects (Fahrenheit) opinions, index standard moment
Ratio Zero point is fixed, Length, weight Age, sales, Geometric Coefficient of
ratios of scale income, costs mean, harmonic variation
values can be mean
compared
1. How long ago were you released from the hospital.
2. Which type of unit were you in for most of your stay i)
Coronary care ii) Intensive care iii) Maternity care iv)
Medical unit v) Surgical unit
3. In choosing hospital, how important was the hospital
location? i)VI ii) SWI iii) NVI iv) NAI
4. How serious was your condition? i) serious ii) critical
iii) moderate iv) minor
5. Rate the skill of your doctor. i) E ii) VG iii) G iv)F v) P
6. On the following scale from one to seven, rate the
nursing care. poor1234567excellent
Categorical and Quantitative Data

Data can be further classified as being categorical


or quantitative.

The statistical analysis that is appropriate depends


on whether the data for the variable are categorical
or quantitative.

In general, there are more alternatives for statistical


analysis when the data are quantitative.
Labels or names used to identify an attribute of
each element

Often referred to as qualitative data

Use either the nominal or ordinal scale of


measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited


Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Scales of Measurement

Data

Categorical Quantitative

Numeric Non-numeric Numeric

Nominal Ordinal Nominal Ordinal Interval Ratio


Cross-Sectional Data

Cross-sectional data are collected at the same or


approximately the same point in time.

Example: data detailing the number of building


permits issued in February 2010 in each of the
sector of Greater Noida
Time Series Data

Time series data are collected over several time


periods.

Example: data detailing the number of building


permits issued in Greater Noida, U.P. in each of
the sector last 36 months
Time Series Data

U.S. Average Price Per Gallon


For Conventional Regular Gasoline

Source: Energy Information Administration, U.S. Department of Energy, May 2009.


SOURCES OF DATA
47

SOURCES OF DATA

Internal External

Primary Secondary
METHODS OF COLLECTING DATA
COLLECTION OF DATA

Primary Secondary

Direct Personal Indirect Personal Mailed Schedules through


Interview Interview Questionnaire enumerators

Published Sources Unpublished Sources

Government International Semi official Commission’ Private


Publications Publications Publications Report Publications
Tables and Charts for Categorical
Data
Categorical
Data

Tabulating Data Graphing Data

Summary Bar Pie Pareto


Table Charts Charts Diagram
CLASSIFICATION:
Let us consider out of thirty students to enter the
school we wish to compare the number with
black hairs with the number with brown hairs. We
can make entries in the note book and obtain, say,
the following information.
Black, brown, brown, black, brown, black, brown, black,
brown, black, brown, brown, brown, brown, black, black,
brown, brown, black, brown, brown, black, brown, brown,
brown, black, brown, brown, black, black.
It is difficult to Find out any features from
information presented in there was but if we
tabulate this information, the important features
become evident.
Hair Colour of 30 Students

Colour Number

Brown 18

Black 12

Total 30
The bringing together, if items with
common characteristics are known
as CLASSIFICATION.
21 50 42 75 55 67 74 55 47 64

71 61 40 25 25 54 64 37 88 44

31 70 81 51 45 63 49 43 35 67

68 31 38 45 59 75 57 29 66 50

56 84 56 88 63 32 55 88 79 78
MARKS IN STATISTICS OF 250 STUDENTS
32 47 41 51 41 30 39 18 48 53

54 32 31 46 15 37 32 56 42 48

38 26 50 40 38 42 35 22 62 51

44 21 45 31 37 41 44 18 37 47

68 41 30 52 52 60 42 38 38 34

41 53 48 21 28 49 42 36 41 29

30 33 37 35 29 37 38 40 32 49

43 32 24 38 38 22 41 50 17 46

46 50 26 15 23 42 25 52 38 46

41 38 40 37 40 48 45 30 28 31

40 33 42 36 51 42 56 44 35 38

31 51 45 41 50 53 50 32 45 48

49 43 40 34 34 44 38 58 49 28

40 45 19 24 34 47 37 33 37 36

36 32 61 30 44 43 50 31 38 45

46 40 32 34 44 54 35 39 31 48

48 50 43 55 43 39 41 48 53 34

32 31 42 34 34 32 33 24 43 39

40 50 27 47 34 44 34 33 47 42

17 42 57 35 38 17 33 46 36 23

48 50 31 58 33 44 26 29 31 37

47 55 57 37 41 54 42 45 47 43

34 52 47 46 44 50 44 38 42 19

52 45 23 41 47 33 42 24 48 39

48 44 60 38 38 44 38 43 40 48
MARKS NO. OF STUDENTS ( F )
15 – 19 9
20 – 24 11
25 – 29 10
30 – 34 44
35 – 39 45
40 – 44 54
45 – 49 37
50 – 54 26
55 – 59 8
60 – 64 5
65 – 69 1
TOTAL 250
MARKS NO. OF STUDENTS ( F )
15 – 20 9
20 – 25 11
25 – 30 10
30 – 35 44
35 – 40 45
40 – 45 54
45 – 50 37
50 – 55 26
55 – 60 8
60 – 65 5
65 – 70 1
TOTAL 250
MARKS NO. OF CUMULATIVE CUMULATIVE
STUDENTS ( F ) FREQUENCY (<) FREQUENCY (>)
15 – 20 9 9 250
20 – 25 11 20 241
25 – 30 10 30 230
30 – 35 44 74 220
35 – 40 45 119 176
40 – 45 54 173 131
45 – 50 37 210 77
50 – 55 26 236 40
55 – 60 8 244 14
60 – 65 5 249 6
65 – 70 1 250 1
TOTAL 250
Group Data and the Histogram

 Dividing data into groups or classes or


intervals
 Groups should be:
Mutually exclusive
Not overlapping - every observation is assigned to
only one group
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)
First or last group may be open-ended
Frequency Distribution
 Table with two columns listing:
Each and every group or class or interval of values
Associated frequency of each group
 Number of observations assigned to each group
 Sum of frequencies is number of observations
N for population
n for sample
 Class midpoint is the middle value of a group or class or interval
 Relative frequency is the percentage of total observations in each
class
Sum of relative frequencies = 1
x f(x) f(x)/n
Spending Class ($) Frequency (number of customers) Relative Frequency

0 to less than 100 30 0.163


100 to less than 200 38 0.207
200 to less than 300 50 0.272
300 to less than 400 31 0.168
400 to less than 500 22 0.120
500 to less than 600 13 0.070

184 1.000

• Example of relative frequency: 30/184 = 0.163


• Sum of relative frequencies = 1
Cumulative Frequency Distribution

x F(x) F(x)/n
Spending Class ($) Cumulative Frequency Cumulative Relative Frequency

0 to less than 100 30 0.163


100 to less than 200 68 0.370
200 to less than 300 118 0.641
300 to less than 400 149 0.810
400 to less than 500 171 0.929
500 to less than 600 184 1.000

The cumulative frequency of each group is the sum of the


frequencies of that and all preceding groups.
Methods of Displaying Data

 Pie Charts
Categories represented as percentages of total
 Bar Graphs
Heights of rectangles represent group frequencies
 Frequency Polygons
Height of line represents frequency
 Ogives
Height of line represents cumulative frequency
Organizing Numerical Data

Numerical Data

Frequency Distributions
Ordered Array and
Cumulative Distributions

Stem-and-Leaf
Histogram Polygon Ogive
Display
The Ordered Array

A sequence of data in rank order:


 Shows range (min to max)
 Provides some signals about variability
within the range
 May help identify outliers (unusual observations)
 If the data set is large, the ordered array is
less useful
(continued)
Data in raw form (as collected):

24, 26, 24, 21, 27, 27, 30, 41, 32, 38

Data in ordered array from smallest to largest:

21, 24, 24, 26, 27, 27, 30, 32, 38, 41


Bar and Pie Charts
Bar charts and Pie charts are often used
for categorical data

Height of bar or size of pie slice shows


the frequency or percentage for each
category
Existing Sources

Internal company records – almost any department


Business database services – Dow Jones & Co.
Government agencies - U.S. Department of Labor
Industry associations – Travel Industry Association
of America
Special-interest organizations – Graduate Management
Admission Council
Internet – more and more firms
Data Sources

Data Available From Internal Company Records


Record Some of the Data Available
Employee records name, address, social security number
Production records part number, quantity produced,
direct labor cost, material cost
Inventory records part number, quantity in stock,
reorder level, economic order quantity
Sales records product number, sales volume, sales
volume by region
Credit records customer name, credit limit, accounts
receivable balance
Customer profile age, gender, income, household size
Data Sources

Data Available From Selected Government Agencies


Government Agency Some of the Data Available
Census Bureau Population data, number of
www.census.gov households, household income
Federal Reserve Board Data on money supply, exchange
www.federalreserve.gov rates, discount rates
Office of Mgmt. & Budget Data on revenue, expenditures, debt
www.whitehouse.gov/omb of federal government
Department of Commerce Data on business activity, value of
www.doc.gov shipments, profit by industry
Bureau of Labor Statistics Customer spending, unemployment
www.bls.gov rate, hourly earnings, safety record
Statistical Studies - Experimental

In experimental studies the variable of interest is


first identified. Then one or more other variables
are identified and controlled so that data can be
obtained about how they influence the variable of
interest.

The largest experimental study ever conducted is


believed to be the 1954 Public Health Service
experiment for the Salk polio vaccine. Nearly two
million U.S. children (grades 1- 3) were selected.
Data Sources

Statistical Studies - Observational


In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest.
a survey is a good example

Studies of smokers and nonsmokers are


observational studies because researchers
do not determine or control
who will smoke and who will not smoke.
Most of the statistical information in newspapers, magazines,
company reports, and other publications consists of data
that are summarized and presented in a form that is easy to
understand.

 Such summaries of data, which may be tabular, graphical,


or numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair

The manager of Hudson Auto would like to have a


better understanding of the cost of parts used in the
engine tune-ups performed in her shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Example: Hudson Auto Repair

Sample of Parts Cost ($) for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Example: Hudson Auto

Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
(2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Example: Hudson Auto
18
Tune-up Parts Cost
16
14
12
Frequency

10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
 The most common numerical descriptive statistic
is the average (or mean).
 The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
 Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Statistical Inference

Population - the set of all elements of interest in a


particular study
Sample - a subset of the population

Statistical inference - the process of using data obtained


from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census - collecting data for the entire population

Sample survey - collecting data for a sample


Population
Population and Census Data

Identifier Color MPG

RD1 Red 12
RD2 Red 10
RD3 Red 13
RD4 Red 10
RD5 Red 13
BL1 Blue 27
BL2 Blue 24
GR1 Green 35
GR2 Green 35
GY1 Gray 15
GY2 Gray 18
GY3 Gray 17
Sample and Sample Data

Identifier Color MPG

RD2 Red 10

RD5 Red 13

GR1 Green 35

GY2 Gray 18
1. Population
consists of all tune- 2. A sample of 50
ups. Average cost of engine tune-ups
parts is unknown. is examined.

3. The sample data


4. The sample average
provide a sample
is used to estimate the average parts cost
population average. of $79 per tune-up.
Descriptive vs. Inferential Statistics

Descriptive Statistics — using data gathered on


a group to describe or reach conclusions about
that same group only

Inferential Statistics — using sample data to


reach conclusions about the population from
which the sample was taken
Ethical Guidelines for Statistical Practice

 In a statistical study, unethical behavior can take a


variety of forms including:
• Improper sampling
• Inappropriate analysis of the data
• Development of misleading graphs
• Use of inappropriate summary statistics
• Biased interpretation of the statistical results
 You should strive to be fair, thorough, objective, and
neutral as you collect, analyze, and present data.
 As a consumer of statistics, you should also be aware
of the possibility of unethical behavior by others.
Ethical Guidelines for Statistical Practice

 The American Statistical Association developed the


report “Ethical Guidelines for Statistical Practice”.
 The report contains 67 guidelines organized into
eight topic areas:
•Professionalism
•Responsibilities to Funders, Clients, Employers
•Responsibilities in Publications and Testimony
•Responsibilities to Research Subjects
•Responsibilities to Research Team Colleagues
•Responsibilities to Other Statisticians/Practitioners
•Responsibilities Regarding Allegations of Misconduct
•Responsibilities of Employers Including Organizations,
Individuals, Attorneys, or Other Clients

You might also like