Statistics and Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Statistics and Data

Outline
Relevance of Statistics

Introduction to Basic Concepts

Course Details
Relevance of Statistics
CASE: PEPSI’S EXCLUSIVITY AGREEMENT
Case: Pepsi’s Exclusivity
Agreement
•A large university with a total enrollment of
about 50,000 students has offered Pepsi an
exclusivity agreement that would give Pepsi
exclusive rights to sell its products at all
university facilities for the next year with an
option for future years.
• In return, the university would receive 35% of
the on-campus revenues and an additional lump
sum of $200,000 per year.
• Pepsi has been given 2 weeks to respond.
The market for soft drinks is measured in
terms of 12-ounce cans.

Case 1: Pepsi currently sells an average of 22,000


Background cans per week (over the 40 weeks of the
year that the university operates).
Details
The cans sell for an average of 75 cents
each. The costs including labor amount to
20 cents per can.
Case 1: A Problem
• Pepsi is unsure of its market share.
• However, they suspect that it is considerably less
than 50%.

Source: https://99designs.com/icon-button-design/contests/icon-button-design-wanted-guessing-game-167222
Profit-Loss Calculation
• Suppose the current market share were around
25%.
• Pepsi would sell 88,000 (22,000 is 25% of
88,000) cans per week or 3,520,000 cans per
year.
• The profit or loss can be calculated.

Source: https://www.score.org/resource/12-month-profit-and-loss-projection
Case 1: Market Survey
• The only problem is that Pepsi does not know
how many soft drinks are sold weekly at the
university.
• Pepsi assigned a recent university graduate to
survey the university's students to supply the
missing information.
• Accordingly, she organizes a survey that asks 500
students to keep track of the number of soft drinks
they purchase in the next 7 days.

Source: https://getthematic.com/insights/customer-survey-design/
Simple Random
Sample
 Simple random sample is a sample
of n observations which has the
same probability of being selected
from the population as any other
sample of n observations.
• Most statistical methods presume
simple random samples.
• However, in some situations
other sampling methods have an
advantage over simple random
samples.

Source: https://www.statisticshowto.com/simple-random-sample/
Stratified Random
Sampling
• Divide the population into mutually
exclusive and collectively exhaustive
groups, called strata.
• Randomly select observations from each
stratum, which are proportional to the
stratum’s size.
• Advantages:
 Guarantees that each population
subdivision is represented in the
sample.
 Parameter estimates have greater
precision than those estimated from
simple random sampling.

Source: https://www.netquest.com/blog/en/random-sampling-stratified-sampling
Cluster Sampling
• Divide population into mutually exclusive
and collectively exhaustive groups, called
clusters.
• Randomly select clusters.
• Sample every observation in those randomly
selected clusters.
• Advantages and disadvantages:
 Less expensive than other sampling
methods.
 Less precision than simple random
sampling or stratified sampling.
 Useful when clusters occur naturally in
the population.

Source: https://www.netquest.com/blog/en/cluster-sampling
A Simple Representation of Survey Data
(First 8 Rows)
Student Id No. of Cans Purchased in a Week
1 14
2 10
3 8
4 6
5 9
6 12
7 13
8 4
Decision-Making

Design a Market Estimate the


Profit and Loss
Survey and Potential
Calculation
Collect Data Volume
What is Statistics?
What is Statistics?

Data Statistics Information

Statistics is a tool for creating new understanding from a set of numbers.


Steps for Good Statistical Analysis
Find the right data

Use the appropriate statistical tools

Clear communication of the numerical information


Basic Concepts
Population and Sample

Subset

Population Sample
Parameter Statistic

Populations have Parameters, Samples have Statistics.


Population and Sample
• Population
 A population is the group of all items of interest to a statistics
practitioner.
Frequently very large.

• Sample
A sample is a set of data drawn from the population.
Large enough, but less than the population.
Parameter and Statistic
• Parameter
A descriptive measure of a population.

• Statistic
A descriptive measure of a sample.
Too expensive to gather
information on the
entire population
Need for
Sampling Often impossible to
gather information on
the entire population
Two Branches

Statistics

Descriptive Statistics Inferential Statistics


Descriptive Statistics
• Descriptive Statistics provides a set of
methods for organizing, summarizing, and
presenting data in a convenient and informative
way.
• These methods include:
 Graphical Techniques and
 Numerical Techniques.
A Problem…
• Descriptive Statistics describe the data set that’s
being analyzed but doesn’t allow us to draw any
conclusions about the population.
Inferential Statistics
• Statistical inference is the process of making an estimate, prediction, or
decision about a population based on a sample.

Population What can we infer


Sample about a
Population’s
Inference Parameters based
on a Sample’s
Statistics?
Statistic
Parameter
Inferential Statistics
• We use statistics to make inferences about parameters.

• Therefore, we can make an estimate, prediction, or decision about a


population based on sample data.

• Thus, we can apply what we know about a sample to the larger


population from which it was drawn!
Data Types
Types of Data

Data Types

Cross- Time
Sectional Series
Case 1: Survey Data
Student Id No. of Cans Purchased
in a Week
Cross-sectional Data
1 14 • Data collected by recording a characteristic of many
2 10 subjects at the same point in time, or without
3 8
regard to differences in time.
4 6 • Subjects might include individuals, households,
5 9 firms, industries, regions, and countries.
6 12
7 13
8 4
Time Series Data
• Data collected by recording a characteristic of e-3 Wheeler Registrations in India
a subject over several time periods. 800000

• Data can include daily, weekly, monthly, 700000

quarterly, or annual observations. 600000

• The graph shows e-3 wheeler registrations in 500000

India. 400000

• 3-wheeler EVs like e-autos and e-rickshaws 300000

account for close to 65% of all EVs registered 200000

in India. 100000

• For more details, check our article: 0

2015,Sep

2018,May
2013,Jan

2013,Sep
2014,Jan

2014,Sep
2015,Jan

2016,Jan

2016,Sep
2017,Jan

2017,Sep
2018,Jan

2018,Sep
2019,Jan

2019,Sep
2020,Jan

2020,Sep
2021,Jan

2021,Sep
2022,Jan
2013,May

2014,May

2015,May

2016,May

2017,May

2019,May

2020,May

2021,May
https://www.thehindu.com/opinion/op-ed/indias-ev-ambition-rides-on-three-
wheels/article65480119.ece
Case 2: Tween Survey
Case 2: Tween Survey
• Luke McCaffrey owns a ski resort two hours outside Boston.
• Luke is in need of a new marketing manager.
• Luke is particularly interested in serving the needs of the “tween” population
(children aged 8 to 12 years old).
• He believes that tween spending power has grown over the past few years, and
he wants their skiing experience to be memorable so that they want to return.
Tween Survey
• At the end of last year’s ski season, Luke asked 20 tweens four specific
questions:
 Q1. On your car drive to the resort, which music streaming service
was playing?
 Q2. Rate the quality of the food at the resort on a scale of 1 to 4.
 Q3. What time should the main dining area close?
 Q4. How much of your own money did you spend at the resort today?
Tween Survey Data
Variables and Scales of Measurement
Variable
• A variable is the general characteristic being observed on an object of
interest.
Types of Variables

Variables

Qualitative Quantitative
Types of Variables
• Qualitative – gender, race, political affiliation
• Quantitative – test scores, age, weight
Discrete
Continuous
Discrete Variable
• A discrete variable assumes a countable number of distinct values.
• Examples: Number of children in a family, number of points scored in a
basketball game.
Continuous Variables
• A continuous variable can assume an infinite number of values within
some interval.
• Examples: Weight, height, investment return.
Scales of Measurement

- Nominal
Qualitative Variables
- Ordinal

- Interval
Quantitative Variables
- Ratio
Nominal Scale
• The least sophisticated level of measurement.
• Data are simply categories for grouping the data.

Qualitative values may be converted


to quantitative values for
analysis purposes.
Ordinal Scale
• Ordinal data may be categorized and ranked with respect to some
characteristic or trait.
• For example, students are often evaluated on an ordinal scale
(excellent, good, fair, poor).
• Differences between categories are meaningless because the actual
numbers used may be arbitrary.
• There is no objective way to interpret the difference between student
quality.
Tweens Survey
• What is the scale of measurement for the music streaming data?
Tweens Survey
• What is the scale of measurement of the music streaming data?

• Solution: These are nominal data—the values in the data differ merely in
name or label.
Tweens Survey
• How are the data based on the ratings of the food quality similar to or
different from the music streaming data?
Tweens Survey
• How are the data based on the ratings of the food quality similar to or
different from the music streaming data?

• Solution: These are ordinal since they can be both categorized and ranked.
Interval Scale
• Differences between values are equal and meaningful. Thus, the
arithmetic operations of addition and subtraction are meaningful.
• No “absolute 0” or starting point defined. Meaningful ratios may not be
obtained.
Interval Scale
•For example, consider the Fahrenheit
scale of temperature.
•This scale is interval because the data
are ranked and differences (+ or -)
may be obtained.
•But there is no “absolute 0”.
Ratio Scale
• The strongest level of measurement.
• Differences between values are equal and meaningful.
• There is an “absolute 0” or defined starting point. “0” does mean
“the absence of …” Thus, meaningful ratios may be obtained.
Ratio Scale
•The following variables are measured on a ratio scale:
General Examples: Weight and Distance
Business Examples: Sales, Profits, and Inventory Levels
Tween Survey
• How are the time data classified? In what ways do the time data differ from
ordinal data? What is a potential weakness of this measurement scale?
Tween Survey
• How are the time data classified? In what ways do the time data differ from
ordinal data? What is a potential weakness of this measurement scale?

• Solution: Clock time responses are on an interval scale. With this type of data,
we can calculate meaningful differences, however, there is no apparent zero
point.
Tween Survey
• What is the measurement scale of the money data? Why is it considered the
most sophisticated form of data?
Tween Survey
• What is the measurement scale of the money data? Why is it considered the
most sophisticated form of data?

• Solution: Since the tweens’ responses are in dollar amounts, this is ratio-scaled
data; ratio-scaled data has a natural zero point which allows the calculation of
ratios.
Synopsis of Tween Survey
• 60% of the tweens listened to Spotify. The resort may want to direct its
advertising dollars to this streaming service.
• 55% of the tweens felt that the food was, at best, fair.
• 95% of the tweens would like the dining area to remain open later.
• 85% of the tweens spent their own money at the lodge.
Course Details
Course Plan

Introduction to
Sampling
Descriptive Probability and
Introduction Distribution and
Statistics Probability
Interval Estimation
Distributions

Hypothesis Testing ANOVA Regression Analysis


Textbook
• Jaggia and Kelly (2021), Business Statistics, McGraw Hill Education (India).
Evaluation Components
Components (Tentative) Weightage
Quiz 20%
Mid-Term 25%
End-Term 35%
Project 20%
Quiz
6 In-Class Quizzes and 4 Scheduled Quizzes

Best 4 out of 6 In-Class Quizzes and Best 2 out of 4 Scheduled


Quizzes will be taken for final grading

Quizzes will be mainly concept-based and may require minor


computing

In-Class Quizzes: 5 Questions, 5 Minutes


Scheduled Quizzes: 10 Questions, 12 Minutes
Mid-Term & End-Term Exams
Mainly Descriptive Questions

5/6 Questions

Total Marks: 50

Open book with excel

Duration: 2-3 Hours


Project
Group project (Group decision at the end of second week, final date
will be updated by TA)

Analysis on primary data is preferred

Two Submissions: Project Proposal & Final Submission

Project Proposal submission at the end of 18th session (exact date


will be updated by TA)

Final Submission (most possibly) on the day of end-term exam (exact


date will be updated by TA)
Project Proposal
• Project Proposal Preparation Details (one page)
Title of the project
Introduction and Motivation for the Problem
Data Source/ Data Collection (If it is a survey, then a brief discussion about the
questionnaire)
Final Project Submission
• Final project report should have sections as follows:
A title of the project with introduction and motivation for the problem
Data Source(s)/Data Collection
Descriptive Statistics
Methodology
Results
Conclusion
• The data set should be provided in the Appendix.
• Project submission should include the data set and the report.
Potential Project Topics
• Effect of Pandemic on Online Shopping
• Students’ Perceptions towards the Quality of Online Education
• Future of EV Industry in India
• Future of Startup Industry in India
• Maternal education and child health
Reading Materials
• Chapter 1 of Jaggia and Kelly
Sections 1.1, 1.2
• Homework: Introductory Case: Gaining Insights into Retail Customer Data

You might also like