Quantitive Methods

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 432

Quantitative Methods

Lecture-1 16Jan’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Course and programs

Course Title: Quantitative Methods


Course Codes: MBA ZC417 PDBA ZC417 PDFT ZC417

 Timings: 10.30am-12.30pm.
 Days: Sundays +++.

 At Impartus- Recordings of online lectures and PPT used in the


lectures will be available at Impartus. Avoid posting your queries
and concerns at Impartus.

 At Taxila portal- Quizzes, Assignment, PPT in advance, Links to


Pre-recorded lectures, Announcements by faculty. Post your
queries and concerns only on Taxila, and not at Impartus.
2
BITS Pilani, Pilani Campus
Textbooks

This course has two textbooks.


Both textbooks are required.

1. Business Statistics: A First Course


D.M. Levine, T.C. Krehbiel, M.L. Berenson and P.
K. Viswanathan. Seventh edition. Pearson
Education. 2019.
This book is required now.

2. Quantitative Methods for Business


David R Anderson, Dennis J Sweeney, Thomas A
Williams, Jeffrey D Camm and Kipp Martin.
Twelfth edition. Cengage Learning. 2013.
This book will be required in April.
3
BITS Pilani, Pilani Campus
Syllabus
Textbook # Chapter # Chapter Title
1 1 Defining and Collecting Data

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures

Comprehensive Exam Syllabus


1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution Textbook #1
1 7 Sampling Distributions
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models Textbook #2

4
BITS Pilani, Pilani Campus
Pre-recorded lectures

 Pre-recorded lectures of this Screenshot from Taxila


course are available at Taxila.

 PRLs lectures have been recorded by Prof.


Sekhar Rajgopalan.
 Before the start of every session, go through
the relevant Pre-recorded lecture.

5
BITS Pilani, Pilani Campus
Course handout and Evaluation

Course handout is available at Taxila.


 Textbooks, Reference books, Topics, Delivery plan ……

Evaluation scheme- (Evaluation Component, EC)


Code Name Weight, % Date(s)
EC-1a Quiz-1 5 February 14-24, 2022
EC-1b Quiz-2 5 March 14-24, 2022
EC-1c Quiz-3 5 April 14-24, 2022
EC-1d Assignment/Quiz-4 5 To be announced
EC-2 Mid-Semester Test 35 Announced by WILP office
EC-3 Comprehensive Exam 45 Announced by WILP office

6
BITS Pilani, Pilani Campus
Familiarity with QM techniques

Type the Topic nos. you are familiar


with… in a single chat.

Example: 1, 3, 9 and 12.

7
BITS Pilani, Pilani Campus
Computer software

 MS Excel
 for all chapters.
 http://reshmat.ru/graphical_method_lpp.html
 for Chapter 7, 9, 10 of TB-2: Online and free-to-use.
 http://www.phpsimplex.com/simplex/simplex.htm?l=en
 for Chapter 7, 9, 10 of TB-2. Online and free-to-use.

 You can also use other software and programming


languages- R, SAS, SPSS, Matlab, SysStat, LINDO,
CPLEX, GAMS, Python….

8
BITS Pilani, Pilani Campus
Familiarity with MS Excel?

Almost nil Sum()… Graphs PivotTables Macros/VB


0 1 2 3 4

9
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Course feedback by students of previous


batches

10
Course feedback….1/2

1. “I'm a structural engineer (civil) working in L&T. To study and


understand the effect of wind, on structure I gone through the
journal (attachment). After seeing the term it my pleasure to
pride that I know what is skewness, kurtosis.” –Civil Engineer

2. “I am back to learning after 20 years of industrial work


experience … Now I can correlate this subject to my work
space. And I am happy that I have learned something which I
can remember for life rather than alone scoring marks in
exams.” -Engineer

BITS Pilani, Pilani Campus


Course feedback….2/2

3. “I, for one can say, that I find QM as easy as Physiology, if not
easier.” –BDS.

4. “I never thought i would like Statistics classes… Its really important


for us doctors to know how to come into conclusions for our
research purposes.” – Anesthesiologist.

5. “statistics and probability always scared me, while reading


scientific articles would skip the statistics part, all the graphs were
greek and latin to me. Now can atleast confidently go through
them, will atleast understand something.” –Endocrinologists.
6. “Now … I am planning to take few classes in basis to my students.
I am a refractive surgeon.” –Ophthalmologist.

BITS Pilani, Pilani Campus


Enhanced learning

1. Sessions will be highly interactive- participate through the


chat. Queries that remain unanswered during the session
will be taken up in the next lecture.
2. Learn from one another.
3. NITB (Not in the TextBook), QM News, BBO (Big Bang
Origin), Take 5 (Additional 5 examples), Nice to know, Do
it Yourself (DIY), HotSeat, All-in-One slide, One-step
solution, Beware, QM around me, ….
4. Excel-based HW assignment for each chapter; will be
made available at Taxila. HWs are not to be submitted.
Solutions will be posted after 2 weeks at Taxila. Concept first… calculations later.
Answer first..… calculations later.
5. PPT of the chapter to be used will be available in advance
at Taxila. Today’s PPT is already available at Taxila,
under Topic-1.
6. Do not shy away from giving feedback during the course.

13
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

14
BITS Pilani, Pilani Campus
Expect examples from diverse fields

1. 3A- Agriculture, Astronomy, Anthropometry


2. 3E- Economics, Engineering, Entertainment
3. 3M- Metrology, Meteorology, Mythology
4. 3S- Stock market, Sport, Science
5. 5Cs- Cold/Cough/Cancer/Covid/Cardiology
6. Companies- Marketing, Production, Finance
7. Banking
8. Tourism
9. Aviation
10.Transportation
11.Crime
12.Veterinary science
13.KBC
14.Cricket
15.Bollywood 15
BITS Pilani, Pilani Campus
Type on the chat now

About you… Type all 6 items in a single chat.

1. Your company/organisation-
2. Industry-
3. Designation-
4. Experience in years-
5. Highest educational qualification-
6. Your city-

For example-
 ONGC/Oil&Gas/Asst Mgr/9 yrs/ElectricalEngg/Mumbai.
 TataMotors/Automobile/Senior Engineer/6 yrs/MechanicalEngg/Pune.

16
BITS Pilani, Pilani Campus
.
.
.
..
.
.

Class Monitor

BITS Pilani, Pilani Campus


Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
18
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Chapter-1: Defining and Collecting data


Chapter-2: Organizing and Visualizing Variables
Topics

Chapter-1: Defining and Collecting Data Refer to the following PRLs,


accessible from Taxila.
 Defining variables
 Collecting data
 Types of sampling methods (will be covered with Chapter-7).
 Types of survey methods (will be covered with Chapter-7).

Chapter-2: Organizing and Visualizing Variables


 Categorical variables
 One numerical variable
 Two numerical variables

20
BITS Pilani, Pilani Campus
Statistics

21
BITS Pilani, Pilani Campus
Recent developments

Data-based decision making- Data Analytics


 Automated collection of data- Huge data size
 Real-time: as it happens
 Integrated: diverse sources
 Non-numerical (video, text, voice)
 Automated analysis
 Automated pricing
 Automated customisation
 Automated promotions
 Automated recommendations
 ……
Realtime data visualisation

22
BITS Pilani, Pilani Campus
Data in a manufacturing firm

Sales, order booking, usage, returns, refunds, loyalty schemes,


Marketing
stocks, customer satisfaction surveys….
Output, schedules, machine utilisation, inventory, quality, Project
Producion and Operations
Management….
Prices, orders, suppliers, inventory, supplier rating, quality,
Purchase
returns….
Human Resources Employee records, performance, manpower planning….
Receivables, assets, budgeting, interest rates, stock market
Finance
index….
Maintenance Breakdowns, MTBF, MTFF, MTTR….
Population, Land, Economy, Trade, Taxes, Weather, Agriculture,
Government Industry, Labour, Tourism, Education, Health, Infrastructure,
Pollution, Banking….

23
BITS Pilani, Pilani Campus
Data visualization- Dashboard

Car Human body

Sales Factory
BITS Pilani, Pilani Campus
QM- Applications and techniques

Business applications Other fields Techniques


 Casino game design, KBC  Clinical trials  Forecasting
 Insurance  Fertilizers- Design of Experiments  Regression
 Quality control  Meteorology  Classification
 Portfolio management  Dam design and reservoir operation  Clustering
 CIBIL score/ Risk management  Statistical mechanics  Association
 Amazon/Netflix recommendation  Quantum theory  Decision tree analysis
 Email spam filters  Genetics  Discriminant analysis
 Warranty policies  Econometrics  Singular Value Decomposition (SVD)
 Airline pricing  Law enforcement  Principal Component Analysis (PCA)
 Railway RAC/Waitlist  Radar- Aircraft detection  Factor analysis
 Opinion/Exit polls  Image/Signal processing  Markov process
 Marketing research  Theory building and validation  Random walk
 Emergency services  …..  …..
 Data mining
 ….
25
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Defining variables,
Collecting data
Types of data

 Unstructured- unorganized, original. Item


Nouns
Count
*
 Structured- organized. Pronouns *
Articles *
Conjunctions *
Adjectives *
Adverbs *
* DIY

From Wikipedia

 Population- complete data.


 Sample- partial data.

Statistical technique to be used


depends on the type of data.

BITS Pilani, Pilani Campus


Data organization

500 200 425 425 275 375 200 350 425 200
350 425 425 425 375 375 375 500 375 425
350 200 400 500 275 500 200 400 275 200
425 350 425 425 200 425 375 350 200 500
425 500 375 200 200 375 500 425 500 425

Observation Count Class Interval Frequency


200 10 1-200 10
275 3 201-300 3 Tables-
 Frequency table
350 5 301-400 15
 Contingency table
375 8 401-500 22
400 2 Total 50

425 14
500 8
Charts
Total 50
 Bar chart (Horizontal), Pie chart,
Histogram, Column chart (Vertical),
Box plot, Line chart/Time series,
Scatter, 3D chart, Stem-and-Leaf
chart ...
 Pareto chart, Radar, Area, Surface,
Gantt chart, Tree map, Network
diagram, Word cloud, Venn diagram,
Onion chart….
28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Categorical and One numerical variable


Graduate survey data
Grad Expected Annual Salary Employment Number of Satisfaction
ID Num Gender Age Height Class Major School GPA Salary in 5 Years Status Affiliations Advisement Spending
ID01 m 19 69 so mr y 3.19 40 70 un 0 2 550
ID02 m 21 67 sr m un 3.11 50 60 pt 0 2 400
ID03 m 20 68 jr ef n 3.02 50 60 pt 0 5 450
ID04 m 18 79 fr ef y 4.00 50 57 pt 0 5 360
ID05 m 19 67 so m y 2.75 40 100 pt 1 1 500
ID06 m 21 70 jr a y 3.24 60 100 pt 2 5 650
ID47 f 22 66 jr mr n 2.76 40 65 pt 0 3 500
ID48 m 19 69 fr un un 3.10 45 70 pt 0 4 400
ID49 m 20 68 so is n 2.61 40 65 pt 1 3 450
ID50 f 20 66 so is n 3.13 45 80 pt 0 2 500

1. ID- Roll no 10. Annual Salary in 5 years- expected  Columns are called Fields. This dataset has 14 Fields.
2. Gender: m/male, f/female after graduation, $ ‘000.  Rows are called Records. This dataset has 50 records.
3. Age-years 11. Employment status- ft/Full Time, Record # 7 to 46 are not shown.
4. Height- inches pt/Part Time, u/Unemployed.
5. Class- fr/Fresher, so/Sophomore, jr/Junior, sr/Senior. 12. No of affiliations- of clubs
6. Major- a/Accounting, CIS, ef/Economics/Finance, 13. Satisfaction advisement- likely to
ib/International Business, m/Management, r/Retailing, advise others to join the college, 1 to
m/Marketing, o/Others. 5 scale.
7. Grad school- yes/y, no/n, u/unknown 14. Spending- Money spent on laptop,
8. GPA- Grade Point Average out of 4.00. books, etc. $.
9. Expected salary- $ ‘000
30
BITS Pilani, Pilani Campus
Measurement scales

1. Nominal (Categorical)
2. Ranked (Ordinal, from Order)
3. Interval More information

4. Ratio

Measurement scale determines-


 Data organization and representation.
 Data analysis technique.

31
BITS Pilani, Pilani Campus
1. Nominal data

Nominal (from ‘name’), also called Categorical data.


 Gender?- Male/Female.
 Marital status?- Married/Single/Unmarried.
 Engineer?- Yes/No.
 Material?- Wood/Metal/Plastic/Steel/Bronze.
 Industry?- Oil/Mining/Automobile/Media/IT/Food processing.
 TV Channel?- Entertainment/Music/News/Kids/Others.
 Support political party?- CPI/BJP/Cong/TDP/TMC.
 IPL team player?- KKR/CSK/MI/RCB…
 Brand?- Lakhani/Liberty/Paragon/Bata/Nike/Adidas
 Veg-NonVeg/Soil Types/Taxonomy…

 Nominal data can only be counted; it cannot be


ranked or measured.

32
BITS Pilani, Pilani Campus
Organizing Nominal data
Major
Gender Grad School Frequency and Relative Frequency Percentage
Category Frequency Employment Status
Category Frequency mr 10 Category Frequency Category Frequency Gender
m 26 ef 9 y 18 un 11 Category Frequency Percentage%
11 17 38 m 26 52
f 24 a un pt
24 48
ib 3 n 15 ft 1 f
Total 50
is 4 Total 50 Total 50 Total 50

o 2
un 2
mr 9
Total 50

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Nominal Bar chart, Pie chart, Pareto chart
percentage table

33
BITS Pilani, Pilani Campus
Visualizing Nominal data

Gender
Category Frequency
m 26
f 24
Total 50

Major
Category Frequency
mr 10
ef 9
a 11
ib 3
is 4
o 2
un 2
m 9
Total 50

34
BITS Pilani, Pilani Campus
2. Ranked data
Ranked (from ‘Rank), also called Ordinal (from
order) data.
 Tall, taller, tallest.
 Big, bigger, biggest.
 Major, Colonel, Brigadier.
 Child, adult, senior citizen.
 Olympics: First, second, third, fourth….
 Thickness: Very thick, thick, thin.
 Taste: Good, average, below average, bad.
 Temperature: Freezing, cool, warm, hot.
 Garment sizes; S, M, L, XL, XXL, XXXL.
 Customer satisfaction: not satisfied, somewhat satisfied,
satisfied, highly satisfied.

Ranked data cannot be measured. 35


BITS Pilani, Pilani Campus
Organizing Ranked data-

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Bar chart, Pie chart, Pareto chart, Stem-
Ranked/ Ordinal percentage table, Cumulative
and-Leaf chart
frequency table

36
BITS Pilani, Pilani Campus
Visualizing Ranked data
Class
Cumulative
Class Frequency
frequency
fr 18 18
so 23 41
jr 5 46
sr 4 50
Total 50

37
BITS Pilani, Pilani Campus
This chapter will be continued in the next session.

You can download PPT for Chapter-01 and 02 at


eLearn(Taxila), under Topic-1.

38
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-2 23Jan’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Excel-2 will be held on 27Jan, Thursday 7-9 pm.
Recording of the session will be available.
2. HW-1: Tables and Charts is available at Taxila,
under Topic-2. HWs are not to be submitted.
Solutions of HWs are posted after 2 weeks, at
Taxila.
3. Post your messages only on Discussion Forum at
Taxila, and not at Impartus. ?
4. PPTs of Chapter 01 and 02, and Chapter-03 are
available in advance at Taxila, under Topic-1.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 23Jan
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Chapter-1: Defining and Collecting data


Chapter-2: Organizing and Visualizing Variables
Topics

Chapter-1: Defining and Collecting Data Refer to the following PRLs,


accessible from Taxila.
 Defining variables
 Collecting data
 Types of sampling methods (will be covered with Chapter-7).
 Types of survey methods (will be covered with Chapter-7).

Chapter-2: Organizing and Visualizing Variables


 Categorical variables
 One numerical variable (will be covered today)
 Two numerical variables (will be covered today)

6
BITS Pilani, Pilani Campus
Measurement scales

1. Nominal (Categorical)
2. Ranked (Ordinal, from Order)
3. Interval More information

4. Ratio

Measurement scale determines-


 Data organization and representation.
 Data analysis technique.

7
BITS Pilani, Pilani Campus
1. Nominal data

Nominal (from ‘name’), also called Categorical data.


 Gender?- Male/Female.
 Marital status?- Married/Single/Unmarried.
 Engineer?- Yes/No.
 Material?- Wood/Metal/Plastic/Steel/Bronze.
 Industry?- Oil/Mining/Automobile/Media/IT/Food processing.
 TV Channel?- Entertainment/Music/News/Kids/Others.
 Support political party?- CPI/BJP/Cong/TDP/TMC.
 IPL team player?- KKR/CSK/MI/RCB…
 Brand?- Lakhani/Liberty/Paragon/Bata/Nike/Adidas
 Veg-NonVeg/Soil Types/Taxonomy…

 Nominal data can only be counted; it cannot be


ranked or measured.

8
BITS Pilani, Pilani Campus
Organizing Nominal data
Major
Gender Grad School Frequency and Relative Frequency Percentage
Category Frequency Employment Status
Category Frequency mr 10 Category Frequency Category Frequency Gender
m 26 ef 9 y 18 un 11 Category Frequency Percentage%
11 17 38 m 26 52
f 24 a un pt
24 48
ib 3 n 15 ft 1 f
Total 50
is 4 Total 50 Total 50 Total 50

o 2
un 2
mr 9
Total 50

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Nominal Bar chart, Pie chart, Pareto chart
percentage table

9
BITS Pilani, Pilani Campus
Visualizing Nominal data

Gender
Category Frequency
m 26
f 24
Total 50

Major
Category Frequency
mr 10
ef 9
a 11
ib 3
is 4
o 2
un 2
m 9
Total 50

10
BITS Pilani, Pilani Campus
2. Ranked data
Ranked (from ‘Rank), also called Ordinal (from
order) data.
 Tall, taller, tallest.
 Big, bigger, biggest.
 Major, Colonel, Brigadier.
 Child, adult, senior citizen.
 Olympics: First, second, third, fourth….
 Thickness: Very thick, thick, thin.
 Taste: Good, average, below average, bad.
 Temperature: Freezing, cool, warm, hot.
 Garment sizes; S, M, L, XL, XXL, XXXL.
 Customer satisfaction: not satisfied, somewhat satisfied,
satisfied, highly satisfied.

Ranked data cannot be measured. 11


BITS Pilani, Pilani Campus
Organizing Ranked data-

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Bar chart, Pie chart, Pareto chart, Stem-
Ranked/ Ordinal percentage table, Cumulative
and-Leaf chart
frequency table

12
BITS Pilani, Pilani Campus
Visualizing Ranked data
Class
Cumulative
Class Frequency
frequency
fr 18 18
so 23 41
jr 5 46
sr 4 50
Total 50

13
BITS Pilani, Pilani Campus
14
BITS Pilani, Pilani Campus
3. Interval scale

Numerical data, when zero is arbitrary chosen.


 Temperature measured in Celsius.
 Temperature measured in Fahrenheit.
 Employee/Customer satisfaction measured on 1 to 7 scale.
‘ 1 2 3 4 5 6 7
Not satisfied Highly satisfied
 Glasgow Coma scale

 Interval scale data can be measured.


 Zero degree Celsius or Fahrenheit are arbitrary. Zero Kelvin is not
arbitrary.
 Above, Zero Employee/Customer satisfaction is arbitrary.
 Bata shoe size- interval scale.
 Date of birth 16Jan’1999- interval scale. But age-26 years- Ratio
scale.
15
BITS Pilani, Pilani Campus
Shoe size

 Notice that US, EU and UK shoe sizes are on


the Interval scale, because their zero size will
not be of zero length.
 Shoe sizes in cm or inches are on the Ratio
scale.
16
BITS Pilani, Pilani Campus
Organizing Interval data

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Histogram, Pie chart, Stem-and-Leaf
Interval percentage table, Cumulative
chart, Box plot, Cumulative frequency
frequency table

Satisfaction Advisement
Cumulative
Rating Frequency
frequency
1 3 3
2 5 8
3 12 20
4 13 33
5 13 46
6 3 49
7 1 50
Total 50

17
BITS Pilani, Pilani Campus
Visualizing Interval data
Satisfaction Advisement
Cumulative
Rating Frequency
frequency
1 3 3
2 5 8
3 12 20
4 13 33
5 13 46
6 3 49
7 1 50
Total 50

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Histogram, Pie chart, Stem-and-Leaf
Interval percentage table, Cumulative
chart, Box plot, Cumulative frequency
frequency table

18
BITS Pilani, Pilani Campus
4. Ratio scale

Numerical data, Zero means no value/absence.


 Blood pressure, mm of Hg
 Temperature in Kelvin
 Production (nos, tons)
 Sales (nos, Rs)
 No of defects, no. of employees
 BSE/NSE indices
 Height in mm/cm/m
 Weight in g/kg/tons
 Time in sec/min/hr

Ratio data can be measured.

19
BITS Pilani, Pilani Campus
Organizing and Visualizing Ratio data
Cumulative
Spending, $ Frequency
frequency
0-200 1 1
201-400 20 21
401-600 20 41
601-800 8 49
801-1000 1 50
Total 50

Measurement scale Organizing data Visualizing data


Frequency table, Relative frequency
Histogram, Pie chart, Stem-and-Leaf
Ratio percentage table, Cumulative
chart, Box plot, Cumulative frequency
frequency table

20
BITS Pilani, Pilani Campus
Frequency polygon

 Frequency polygon smoothens the Histogram.


 Useful when two Histograms to be compared are overlapping.

21
BITS Pilani, Pilani Campus
Stem-and-Leaf plot
153 Class Interval Frequency
154 150-160 3 Histogram
154 160-170 3
162 170-180 5
165 180-190 4
169 190-200 1
Total 16
172
176
176
176
Stem Leaf

26667
177

1267
181 15 344 • Stem-and-Leaf plot retains the data,

344
259
Leaf
unlike Histogram.
182 16 259 • Stem-and Leaf plot is used when data

3
186 17 26667
size is small.
187
18 1267

Stem
193

15
16
17
18
19
19 3

22
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Two numerical variables


( + Two categorical variables)

23
Two categorical variables

Customer # Food Beverage One variable: Frequency table


Contingency tables will be used later in-
1 Dosa Tea Food Beverage  Joint probability
2 Dosa Coffee Dosa 3 Tea 4  Conditional probability
 Marginal probability
3 Dosa Coffee Idly 7 Coffee 6  Chi-square test
4 Idly Tea Total 10 Total 10
5 Idly Tea
6 Idly Tea
7 Idly Coffee
8 Idly Coffee
9 Idly Coffee Two variables: Contingency table
10 Idly Coffee Beverage
Tea Coffee Total
Dosa 1 2 3
Food

Idly 3 4 7
Total 4 6 10

24
BITS Pilani, Pilani Campus
Two categorical variables

Contingency Table

25
BITS Pilani, Pilani Campus
Visualizing two categorical variables-
Column charts
Gender
f m Total
fr 3 2 5
Class

so 11 12 23
jr 7 11 18
sr 3 1 4
Total 24 26 50
Two variables are- Gender (f/m) and
Class (fr, so, jr, sr).

26
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Two numerical variables- Line/Time series chart

27
Two numerical variables- Line chart

Year No (miliion)
2001 2.54
2002 2.38
2003 2.73
2004 3.46
2005 3.92
2006 4.45
2007 5.08
2008 5.28
2009 5.17
2010 5.78
2011 6.31
2012 6.58
2013 6.97
2014 7.68
2015 8.03
2016 8.80

Two numerical variables are- Year and Tourists.

28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Two numerical variables- Scatter plot

29
Two numerical variables- Scatter plot

Measurement scale Organizing data Visualizing data


Two variables are- Height and GPA. Two variables
3D chart (Surface chart), Stacked
Both Interval/Ratio Contingency table column chart, Clustered column chart,
Time series (Line) chart, Scatter chart

In a scatter plot, both variables are numerical and also random.


30
BITS Pilani, Pilani Campus
HW:
Measurement scale of each variable (16 Columns)?

Nominal data-???
Ranked data-???
Interval data-???
Ratio data-???

31
BITS Pilani, Pilani Campus
HW:
Measurement scale of each variable (10 Columns)?
Warranty Claims- Truck tyres
Tyre Product Claim Acceptance Tyre Defect Tyre Usage Claim Loss Claim Production Claim Manufacturing Tyre Hardness
Code (% ) (Rs) Month/Year Month Location Plant Code (SHA)
104113 Accepted-Special K LOCK RING /FITMENT D 37.50 9,310.62 Apr-19 Apr-18 Faridabad 1400 75
104113 Accepted-Special K LOCK RING /FITMENT D 50.00 7,448.50 Apr-19 Apr-18 Faridabad 1400 78
104113 Accepted-Special K LOCK RING /FITMENT D 40.00 8,938.20 Apr-19 Apr-18 Faridabad 1400 77
104113 Accepted-Special K FITMENT/LOCK RING/BE 66.25 5,027.74 Apr-19 Jul-18 Faridabad 1400 73
101766 Accepted-Manufacturing TREAD / SHOULDER SEP 54.05 6,269.42 Apr-19 Jun-16 Faridabad 1400 74
104113 Accepted-Special K TURN UP SEPARATION 55.00 6,703.65 May-19 Jul-18 Faridabad 1400 76
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 35.00 9,683.05 May-19 Feb-18 Faridabad 1400 75
104113 Accepted-Manufacturing BELT EDGE / BELT SEP 37.50 9,310.62 May-19 Jun-18 Faridabad 1400 76
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 55.00 6,703.65 May-19 Feb-18 Faridabad 1400 74
104113 Rejected-Non Manufacturing FITMENT/LOCK RING/BE 5.00 0.00 May-19 Jan-19 Faridabad 1400 78
104113 Accepted-Manufacturing TURN UP SEPARATION 26.75 10,912.05 May-19 Aug-18 Faridabad 1400 77
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 52.50 7,076.07 Jun-19 Sep-18 Faridabad 1400 78
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 42.50 8,565.77 Jun-19 May-18 Faridabad 1400 75
104113 Accepted-Special K TURN UP SEPARATION 60.00 5,958.80 Jun-19 May-18 Faridabad 1400 74
104113 Accepted-Manufacturing TREAD / SHOULDER SEP 45.00 8,193.35 Jun-19 Oct-18 Faridabad 1400 74
104113 Accepted-Special K TURN UP SEPARATION 15.00 12,947.20 Aug-19 Feb-19 Chandigarh 1400 72
105627 Accepted-Manufacturing BELT EDGE / BELT SEP 60.00 6,184.40 Aug-19 Dec-18 Chandigarh 1400 74
104113 Accepted-Special K TURN UP SEPARATION 55.00 6,854.40 Aug-19 Nov-17 Chandigarh 1400 76
104112 Accepted-Special K TURN UP SEPARATION 60.00 6,094.40 Aug-19 Jan-19 Chandigarh 1400 76
104113 Accepted-Special K TURN UP SEPARATION 70.00 4,569.60 Aug-19 Jan-17 Chandigarh 1400 77

Nominal data-???
Ranked data-???
Interval data-???
Ratio data-???

32
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Excel based HW-01: For Chapter-1 & 2


Download HW-01 at Taxila, below Topic-2.
HWs are NOT to be submitted. Solutions of HWs will be available at Taxila, after 2 weeks.
33
HW-01

Contents
1. Graphs- Bar charts and Pie charts
2. Histograms
3. Frequency tables
4. Contingency tables
5. Leaf and Stem diagram
6. Line charts
7. Fun (Application of Histograms)
8. More Fun (Application of Histograms)
9. Big Fun (Application of Histograms)
10. Medical Image processing- (Application of Histograms)

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Nice to know

35
The origin of …..

 Bar chart, Pie chart, and Line/Time series


chart were first introduced by William
Playfair (1759-1823), from Scotland.

 Polar chart, a special type of Pie chart,


was popularized by Florence Nightingale.

Polar chart by Florence


Nightingale, 1858. See Wikipedia.

36
BITS Pilani, Pilani Campus
Structured and Unstructured data

Unstructured data in Marketing


From www

BITS Pilani, Pilani Campus


Take 5… Pie charts

38
BITS Pilani, Pilani Campus
Take 5… Bar charts

Bar chart- The position of categories on x (or y) axis can be interchanged.


39
BITS Pilani, Pilani Campus
Take 5… Histograms

Bar chart- The position of categories on x axis cannot be interchanged.


40
BITS Pilani, Pilani Campus
Bar graph vs Histogram

1. X axis in a Bar chart has Categories; X axis in a Histogram has Interval or Ratio data.
2. There should not be a gap between the ‘bars’ in a Histogram.
.

41
BITS Pilani, Pilani Campus
Take 5… Stem-and-Leaf plots

42
BITS Pilani, Pilani Campus
Take 5… Line (Time Series) charts

43
BITS Pilani, Pilani Campus
Additional charts

Onion chart Word Cloud Waterfall chart Radar chart

44
BITS Pilani, Pilani Campus
MS Excel can also plot three variable
charts

Date High Low Close


Aug 06, 2020 11,256.80 11,127.30 11,200.15
Aug 05, 2020 11,225.65 11,064.05 11,101.65
Aug 04, 2020 11,112.25 10,908.10 11,095.25
Aug 03, 2020 11,058.05 10,882.25 10,891.60
Jul 31, 2020 11,150.40 11,026.65 11,073.45
Jul 30, 2020 11,299.95 11,084.95 11,102.15
Jul 29, 2020 11,341.40 11,149.75 11,202.85
Jul 28, 2020 11,317.75 11,151.40 11,300.55
Jul 27, 2020 11,225.00 11,087.85 11,131.80
Jul 24, 2020 11,225.40 11,090.30 11,194.15
Jul 23, 2020 11,239.80 11,103.15 11,215.45
Jul 22, 2020 11,238.10 11,056.55 11,132.60
Jul 21, 2020 11,179.55 11,113.25 11,162.25
Jul 20, 2020 11,037.90 10,953.00 11,022.20
Jul 17, 2020 10,933.45 10,749.65 10,901.70

Here three variables are- High, Low and Close prices of a stock.
45
BITS Pilani, Pilani Campus
Bad or Incorrect charts…1/2

Pie chart would have been better? 8 variables plotted, difficult to analyse 3D, is it adding value?

X axis has Part numbers, and not time.


Since Nominal data (name of the player) is on the
Hence it is not a Line chart. 46
horizontal axis, a gap is required between the columns.

BITS Pilani, Pilani Campus


Bad or Incorrect charts… 2/2

Since dimensions of the parts are measured on Ratio scale, it should


be a drawn as a Histogram and not as a bar chart; no gap between
the bars.

In a Scatter chart both variables are random. Here X


axis has Part numbers that are non random variables.
Hence these are not Scatter charts.

Indeed colorful, but difficult to analyze. 47

BITS Pilani, Pilani Campus


How many class intervals for a frequency
table?
Age distribution- India 2011
 Choosing the number of class intervals (22 No Age group Male Female Total
age groups here) is a tradeoff between 1 0–4 5,86,32,074 5,41,74,704 11,28,06,778
2 5–9 6,63,00,466 6,06,27,660 12,69,28,126
loss of information and manageability. 3 10–14 6,94,18,835 6,32,90,377 13,27,09,212
4 15–19 6,39,82,396 5,65,44,053 12,05,26,449
 Few class intervals, easy to grasp but a lot of 5 20–24 5,75,84,693 5,38,39,529 11,14,24,222
information is lost. 6 25–29 5,13,44,208 5,00,69,757 10,14,13,965
7 30–34 4,46,60,674 4,39,34,277 8,85,94,951
 Several class intervals, less loss of 8 35–39 4,29,19,381 4,22,21,303 8,51,40,684
information, but difficult to grasp. 9 40–44 3,75,45,386 3,48,92,726 7,24,38,112
10 45–49 3,21,38,114 3,01,80,213 6,23,18,327
 4-8 class intervals are ok. 11 50–54 2,58,43,266 2,32,25,988 4,90,69,254
12 55–59 1,94,56,012 1,96,90,043 3,91,46,055
13 60–64 1,87,01,749 1,89,61,958 3,76,63,707
14 65–69 1,29,44,326 1,35,10,657 2,64,54,983
15 70–74 96,51,499 95,57,343 1,92,08,842
16 75–79 44,90,603 47,41,900 92,32,503
17 80–84 29,27,040 32,93,189 62,20,229
 Census of India-2011 summarizes huge data 18 85–89 11,20,106 12,63,061 23,83,167
(121+ crore persons) in 22 class intervals. 19 90–94 6,52,465 7,94,069 14,46,534
20 95–99 2,94,759 3,38,538 6,33,297
21 100+ 2,89,325 3,16,453 6,05,778
22 Unknown 23,72,881 21,16,921 44,89,802
Total 62,32,70,258 58,75,84,719 1,21,08,54,977

48
BITS Pilani, Pilani Campus
Mutually exclusive and Collectively
exhaustive
Class Interval Frequency
Raw data 1-10 2
10 10 20 20 20 30 40 50 50 11-20 3
21-30 1
31-40 1
41-50 2 Not mutually exclusive
Total 9

Mutually exclusive Class Interval Frequency


 When a data value can be placed only in one class 1-10 2
interval. Here data value 20 can be placed in two 11-20 3
11-30 4
class intervals. Hence the two class intervals are not
31-40 1
mutually exclusive. 41-50 2

Collectively exhaustive Class Interval Frequency


 When a data point cannot be placed in any class 1-10 2 From www
interval. Here data value 50 cannot be placed in any 11-20 3
21-30 4
class interval. Hence the class intervals not
31-40 1
collectively exhaustive.
49
BITS Pilani, Pilani Campus
50

BITS Pilani
Pilani Campus

For doctors (andtheirpatients)


Summary tables- One, Two, and Three
Categorical variables

Frequency table
One nominal variable- Blood Group
Contingency table
Three nominal variables- Blood group, Gender and Rh

Contingency table
Two nominal variables- Blood Group and Ethnicity 51
BITS Pilani, Pilani Campus
Charts of Categorical data

1 2 3

1 Pie chart
2 Column chart
3 Side-by-side chart
4 Stacked horizontal chart
5 Stacked column chart

4 5

52
BITS Pilani, Pilani Campus
Two numerical variables charts

Time Series charts Scatter plot (Chest-G vs. Length)

HW: Is this patient Healthy?


53
BITS Pilani, Pilani Campus
Additional charts

Venn diagram Network diagram Tree (Mendel)

Data on Map 54
BITS Pilani, Pilani Campus
55

BITS Pilani
Pilani Campus

Next Chapter
Chapter-3: Numerical Descriptive Measures
Quantitative Methods

Lecture-3 30Jan’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Excel-3 will be held on 1 Feb, Tuesday 7-9 pm.
Recording of the session will be available.
2. Solution of HW-01 is now available at Taxila,
under Topic-2.
3. HW-2: Numerical Descriptive Measures is
available at Taxila, under Topic-2. HWs are
not to be submitted. Solutions of HWs are
?
posted after 2 weeks, at Taxila.
4. Post your messages only on Discussion Forum
at Taxila, and not at Impartus.
5. PPT of Chapter-03 is available in advance at
Taxila, under Topic-1.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Topics

Relevant PRLs
Numerical Descriptive Measures
1. Central tendency
2. Variation
3. Shape
4. Exploring numerical data
5. Covariance and Coefficient of correlation
(will be covered later, with Chapter-12)

5
BITS Pilani, Pilani Campus
Summary measures

Data: Production/Orders/Accidents/Fires/Runs/Income/ # of Beds/Rooms….


500 200 425 425 275 375 200 350 425 200
350 425 425 425 375 375 375 500 375 425
350 200 400 500 275 500 200 400 275 200
425 350 425 425 200 425 375 350 200 500
425 500 375 200 200 375 500 425 500 425

6
BITS Pilani, Pilani Campus
Chapter summary-
Numerical Descriptive Measures
Numerical Descriptive Measures
Innings played 329
Total score 15917
Outliers Mean 48
Stdev 51
Median 32
Mean
Minimum 0
Median
Quartile 1, Q1 10
Quartile 2, Q2 32
Quartile 3, Q3 73.5
15 15
40 28
122
31
88
92
113
67
52
45
20
69
41
86
1
0
248
36
4 56
82
88
10
105
16
61
36
73
32
7
37
Maximum 248
59 148 104 177 4 4 155 16 94 1 68 143 107 41 21
8
41
6
17
71
142
74
10 83
0
136
97
8
15
1
12
34
1
37 52
34
44
62
15
47
109 7
13
6
80
15
32
1
IQR, Q3 - Q1 63.5
35 114 96 0 143 6 21 22 92 0 52 14 154 12 100 146 8 10
57 5 6 42 139 29 20 88 193 44 41 63 12 37 106 14 25 74 CoV 1.05
0 0 43 7 8 0 18 54 241 16 0 71 103 8 34 13
24
88
11
111
11
34
18
2
23
15
9
53 122
103
26
35
43
60
194
22
109
64
14
13
153
11
5
84
203
12
16
19
17
Skewness 1.5
5 1 85 61 148 124 39 90 16 2 16 101 13 160 41 56 27
6 179 36 13 18 201 176 36 8 23 31 0 54 1 13 Kurtosis -1.2
10 0 54 15 4 126 76 36 176 1 19 122 27 49 38 40
27 73 40 4 155 15 65 42 8 8 37 12 64 98 23 8
68 10 169 79 44 10 79 51 2 14 16 5 62 214 91 8
119 50 4 9 177 217 10 117 9 5 91 31 9 53 7 76 Boxplot
21 9 0 35 31 15 126 0 32 55 23 1 6 4 40 76 5
11 165 52 9 34 61 17 0 8 3 26 82 14 100 12 38 2
16 78 2 7 7 0 74 8 7 20 16 1 13 40 13 94 81
7 62 24 15 47 116 36 0 55 32 28 1 49 53 3 13

Runs scored in Tests by Tendulkar (played 347 and batted in 329 innings)
7
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

1. Central tendency
(Mean, Median and Mode)

8
Central tendency measures for raw data

1. Arithmetic Mean, AM (average) = sum of the values/number of observations.


1 3 4 8 9 AM = (1+3+4+8+9)/5= 25/5= 5. AM = 1/N * ∑𝑁𝑁
𝑖𝑖=0 𝑥𝑥𝑖𝑖

2. Median- value of middle observation- after sorting the data in ascending order.
1 3 4 8 9 Median value= 4.
11 22 30 40 50 60 700000 Median value= 40.
1 3 4 8 9 12 When even number of observations Median value= 6 = (4+8)/2.

3. Mode- value of most frequently occurring observation in the data.


1 2 2 3 4 5 Mode value= 2.
11 22 30 40 50 50 50 50 66 66 77 Mode value= 50. 9
BITS Pilani, Pilani Campus
When to use Median?

1.Median is preferred when mean value gets highly ‘distorted’ by few


extreme values as in the data given below (Mean = Sum/N = 1566/9
= 174, a poor measure of centre of the data).

2 5 6 7 8 9 11 18 1500

Distribution of wealth is known to be highly distorted (skewed)- few individuals


have high wealth, as in above data. Hence median is preferred over mean.
Distribution of income and salary are also skewed, hence median income and
median salary are preferred over mean income and mean salary. Median

2.If mean cannot be computed, as in the case of Olympic race here,


then median can be used.

3.Median is used in 5-number summary and Boxplot (later in this


chapter) to spot whether a frequency distribution is symmetric. Men's 5000 meters 2016 Summer Olympics

10
BITS Pilani, Pilani Campus
When to use Mode?

 Mode- Most frequently occurring value in the data.

1 2 2 3 4 5 Unique modal value, 2.


1 2 2 3 4 5 5 6 Two modal values, 2 and 5.
1 2 3 4 5 6 No modal value.

 Usage of Mode as a measure of central tendency


is rare. It is mostly used in voting- a candidate who
secures largest number of votes is declared the
winner.
 KBC participants generally choose Mode- the A B C D
option with largest votes of the audience poll.
11
BITS Pilani, Pilani Campus
Central tendency measures (raw data)
Spending, GradSurvey dataset: After sorting
200 400 400 500 550
250 400 425 500 600
300 400 450 500 600 Mean
300 400 450 500 600 Median
350 400 450 500 600 Mode
350 400 450 500 600
350 400 450 500 650
350 400 450 525 700
360 400 500 550 800
375 400 500 550 1000

Mean 470.7 =23535/50

Median 450
Mode 400 11 times

12
BITS Pilani, Pilani Campus
Central Tendency Measures-
Raw vs. Grouped data
Raw data: Sorted Grouped data
The observation 500 is considered in
Class Interval 300-500 and so on.
200 400 400 500 550
Class Interval Freq. Cum. Freq.
250 400 425 500 600
300 400 450 500 600 100-300 4 4
300 400 450 500 600 300-500 33 37
350 400 450 500 600 500-700 11 48
350 400 450 500 600 700-900 1 49
350 400 450 500 650
900-1100 1 50
350 400 450 525 700
360 400 500 550 800 Total 50

375 400 500 550 1000

=23535/50 Mean 448.0


Median 427.3
11 times Mode 413.7

Notice that the Mean, Median and Mode values for Raw data and Grouped data are not equal.
13
BITS Pilani, Pilani Campus
Mean of Grouped data- Computation

Class Interval, CI Freq., f MidPoint, X f*X


100-300 4 200 800
300-500 33 400 13200
500-700 11 600 6600
700-900 1 800 800
900-1100 1 1000 1000
Total Freq. 50 Sum 22400
Mean = Sum /
448.0
Total Freq.

Computation of Mean of Grouped data is in the syllabus.


Computation of Median and Mode of Grouped data is not in the syllabus.

14
BITS Pilani, Pilani Campus
What is the difference?

15
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

2. Variation

16
Variation in daily life

6 Total= 3 +6 + 4 +2 + 4 + 1 + 4 +2 +1 +3 = 30.
5
4
3
2
1 3 6 4 2 4 1 4 2 1 3

6
5 Total= 3 x 10 = 30.
4
3
2
1 3 3 3 3 3 3 3 3 3 3
1 2 3 4 5 6 7 8 9 10

17
BITS Pilani, Pilani Campus
Following may mean high variation

High
High fluctuations Low
volatility predictability

High Not
risk steady

High Highly
uncertainty uneven

Low
High noise
reliability

Poor High
quality vibrations

High High
Highly
contrast
inconsistent Variation (Image)

18
BITS Pilani, Pilani Campus
Measuring variability

19
BITS Pilani, Pilani Campus
Most important slide of this course

Learn, understand, and master-

1. Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11

2. Frequency/Probability distribution 700-900


900-1100
Total
1
1
50

3. Variation

4. Correlation

20
BITS Pilani, Pilani Campus
Most important formula of this course

Error from mean

Error = Observed value – Mean of the dataset

Errors of 1, 2, 3, 4, 5? Mean=15/5=3.

Error from mean of-


1. 1 - 3 = -2
2. 2 - 3 = -1
3. 3 - 3 = 0
4. 4 - 3 = 1 Error from mean is also called Deviation, Residue,
5. 5 - 3 = 2 Noise, Innovation (in Kalman filter).
21
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

2. Variation

22
Measuring variation

1. Range = Maximum-Minimum
2. Variance, population = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population =σ
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score = [xi-Mean]/ σ
10. Quartiles (Q , Q , Q )
1 2 3 Smallest 25%, 50%, 75% observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary Minimum, Q1, Q2, Q3, Maximum.
13. Boxplot (called Box and Whisker chart in MS Excel 2019) Plot of 5-number summary.
23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Variance and Standard deviation


(Population)

24
Higher variation- Red or Blue?

Obs No Blue- Value Red- Value


1 1 3
2 5 3
3 3 1
4 4 3
5 2 5

Range 4 4
Range= Maximum-Minimum.

According to Range, both datasets have equal variation.

25
BITS Pilani, Pilani Campus
Variance (σ2 ) and Standard deviation (σ )

Blue Data- 1, 5, 3, 4, 2.
Mean = 15/5 = 3

1 5 3 4 2 Data
(1-3) (5-3) (3-3) (4-3) (2-3) 1. Error from Mean.
-2 2 0 1 -1 Simplifying
4 4 0 1 1 2. Square the Error.

Sum = 4 + 4 + 0 + 1 + 1 = 10 3. Sum of Square of Errors (SSE).

Variance = Sum/5 = 10/5 = 2 4. Mean of Square of Errors. σ2 = 1/N * ∑ (xi-Mean)2

Standard deviation = + 2 = 1.41 5. + Square Root of Mean of Square of Errors (RMSE).


26
BITS Pilani, Pilani Campus
CW: Variance (σ2 ) and Standard deviation (σ )

Red Data- 3, 3, 1, 3, 5.
Mean = 15/5 = 3

3 3 1 3 5 Data
(3-3) (3-3) (1-3) (3-3) (5-3) 1. Error from Mean.
0 0 -2 0 2 Simplifying
0 0 4 0 4 2. Square the Error.

Sum = 0 + 0 + 4 + 0 + 4 = 8 3. Sum of Square of Errors (SSE).

Variance = Sum/5 = 8/5 = 1.6 4. Mean of Square of Errors. σ2 = 1/N * ∑ (xi-Mean)2

Standard deviation = + 1.6 = 1.265 5. + Square Root of Mean of Square of Errors (RMSE).
27
BITS Pilani, Pilani Campus
Higher variation- Red or Blue?

Obs No Blue- Value Red- Value


1 1 3
2 5 3
3 3 1
4 4 3
5 2 5

Range 4 4
Stdev 1.41 1.26
Range= Maximum-Minimum.
Standard deviation = 1/(N-1) * ∑ (xi-Mean)2

According to Range, both datasets have equal variation.


According to standard deviation, Blue dataset has higher variation.

28
BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.

29
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-4 6Feb’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Post your messages only on Discussion
Forum at Taxila, and not at Impartus.

2. PPT of Chapter-03 is available in advance at


Taxila, under Topic-1.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Topics

Relevant PRLs
Numerical Descriptive Measures
1. Central tendency

To be done today
2. Variation (partly covered in previous session)
3. Shape
4. Exploring numerical data
5. Covariance and Coefficient of correlation (will
be covered later, with Chapter-12)

5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

2. Variation

6
Measuring variation

1. Range = Maximum-Minimum
2. Variance, population (entire data) = σ2 = 1/N * ∑ (xi-Mean)2
3. Standard deviation, population = σ = RMSE
4. Coefficient of Variation, population = σ/Mean
5. Variance, sample (partial data) = s2 = 1/(N-1) * ∑ (xi-Mean)2
6. Standard deviation, sample =s
7. Coefficient of Variation, sample = s/Mean
8. Mean absolute deviation = 1/N * ∑ |xi-Mean|
9. Z score = [xi-Mean]/ σ
10. Quartiles (Q , Q , Q )
1 2 3 Smallest 25%, 50%, 75% observations.
11. Inter-quartile range = Q3 – Q1
12. 5-number summary Minimum, Q1, Q2, Q3, Maximum.
13. Boxplot (called Box and Whisker chart in MS Excel 2019) Plot of 5-number summary.
7
BITS Pilani, Pilani Campus
Higher variation- Red or Blue?

Obs No Blue- Value Red- Value


1 1 3
2 5 3
3 3 1
4 4 3
5 2 5

Range 4 4
Stdev 1.41 1.26
Range= Maximum-Minimum.
Stdev= SQRT(1/N * ∑ (xi-Mean)2).

According to Range, both datasets have equal variation.


According to Standard deviation, Blue dataset has higher variation.

8
BITS Pilani, Pilani Campus
HW: DIY
Range, Variance, Standard deviation?

A: 10 10 10 10 10 10.

B: -30 40 -30 40 -30 40.

C: -60 60 -60 60.

D: 0 20 0 20 0 20 0 20 0 20.
Standard
Dataset Range Variance
Deviation
A
B
Do only by hand, do not use calculator or computer. C
D

9
BITS Pilani, Pilani Campus
Range- Uses and shortcomings
Stock price during a day
Uses
 Stock price- minimum and maximum in a day.
 Ambient temperature- minimum and maximum in a
day.
 Blood pressure- high and low within few minutes. Ambient temperature during a day
 R-chart (Range) in Statistical Process Control (SPC).
 Range is computed only from two observations.
Hence, easy to compute.

Shortcomings Blood pressure


 Since Range is computed only from two observations, and
not from all observations, it does not capture the variation
well when the data has few extreme values as in the
following case-
31 41 5 9 26 53 58979 3 23 ...
10
Range = (58979-3) = 58976.

BITS Pilani, Pilani Campus


An application of standard deviation
Anthropometric data

Anthropometric data is used to design tables, chairs, seats,


knobs, handles, garments, dashboards, gear shifting lever,
steering wheels, brake pedal, etc.
https://www.slideshare.net/hi_4m_khi/anthropometric-measurements-majedawad

Variations in the dimensions (SD above) is used to decide


how many sizes of garments (tables, seats, handles, etc.) to
be designed.

11
BITS Pilani, Pilani Campus
ISC Exam results 2020- A school’s analysis

https://aniruddhadeb.blogspot.com/2020/07/

12

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Coefficient of Variation (CoV)

13
Coefficient of Variation, CoV

200m Men's Marathon Javelin Men's


Final, Secs Men, Secs Final, Meters
Min 19.32 7681 80.22
Max 20.69 7937 84.58
Mean 19.99 7830 82.60
Range 1.37 256 4.36 200m < Marathon
Stdev 0.45 91 1.36 200m < Marathon
CoV 2.27 1.2 1.65 Marathon < Javelin < 200m
London Olympics 2012. Only top 8 ranks considered.

NSE
NSE Sensex
Sensex
Min
Min 16614
16614 55822
55822  When the means of two datasets differ a lot, as in both the
Max
Max 18477
18477 61766
61766
cases here, use CoV to compare variation between them.
Mean 17616 59096
 CoV is dimensionless. Hence, CoV can be used to compare
Range
Range 1863
1863 5944
5944 NSE
NSE<<Sensex
Sensex
Stdev 457 1493 NSE < Sensex
variation between very different kinds of systems- say
CoV 2.6 2.5 NSE ≈ Sensex, almost equal variation in the incomes and experience in years of
employees of a company; Javelin (meters) and Marathon
(secs), etc.
CoV= Standard deviation/Mean * 100 14
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Z score

15
Z score

How far is an observation from the mean, in terms of  Unlike, Mean and Variance-
standard deviations.  Z score is not a summary measure of the dataset;
Z score can be computed for each data point.

Z scores of 1, 2, 3, 4, 5?
Uses
Mean = 15/5=3.  To identify Outliers (extreme values).
Standard deviation= 1.41.  Z value < 3 or > 3 are called Outliers.

Z = (ObservedValue - Mean)/Standard deviation -3 -2 -1 0 1 2 3


= Error/Standard deviation Z score

Z score of 1 = (1-3)/1.41 = -1.41


Z score of 2 = (2-3)/1.41 = -0.71
Z score of 3 = (3-3)/1.41 = 0
Z score of 4 = (4-3)/1.41 = 0.71
 To read Normal distribution table (Chapter-6).
Z score of 5 = (5-3)/1.41 = 1.41
 To estimate Confidence Intervals (Chapter-8).
 To test Hypothesis (Chapter-9, 10, and 11).

16
BITS Pilani, Pilani Campus
Z score- Outlier

Outliers

Mean

Z = (ObservedValue – Mean)/Stdev

+3 = (ObservedValue-48)/51. Z=+3 is used to identify outlier scores.

ObservedValue = 3*51 + 48 = 201. Runs above 201 are Outliers (extreme).

17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

3. Shape of the data- Skewness

18
Typical shapes of frequency
distributions…1/2

Symmetric, Bell Symmetric, less Symmetric and Negatively skewed, tail Positively skewed, tail Another Positively
shaped, high concentration in the flattest. There is no on left side. on right side. skewed, tail on right
concentration in the middle. mode. side.
middle.
1 2 3 4 5 6

Class Interval, CI Freq.


Degree of asymmetry is measured by Skewness.
100-300 4
=Skew.P(DataRange)… MS Excel function. 300-500 33
 Negative value- Left skewed. In MS Excel
500-700 11
 Positive value- Right skewed. 700-900 1
 0 value- Symmetric. 900-1100 1
Total 50
Calculation of Skewness is not in the course.

19
BITS Pilani, Pilani Campus
Typical shapes of frequency
distributions…2/2

Symmetric, Bell Symmetric, less Symmetric and Negatively skewed, tail Positively skewed, tail Another Positively
shaped, high concentration in the flattest. There is no on left side. on right side. skewed, tail on right
concentration in the middle. mode. side.
middle.
1 2 3 4 5 6
Mean=Median=Mode Mean=Median=Mode Mean=Median Mean<Median<Mode Mode<Median<Mean Mode<Median<Mean

Frequency distribution MMM relationship


Negatively skewed Mean < Median < Mode 4
Symmetric Mean = Median = Mode 1, 2, 3
Positively skewed Mode < Median < Mean 5, 6

20
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Shape of the data- Kurtosis

21
Flatness (or Peakedness) of a frequency distribution

Standard deviation= 20.5 Standard deviation= 27.0.


Kurtosis= 0.67. .
Kurtosis= -0.47
No. of observations= 262. No. of observations= 262.

Frequency distribution on the right is flatter.

Kurtosis measures Flatness (or Peakedness) of a frequency


In MS Excel
distribution, as compared with the Normal distribution.
=Kurt(DataRange)….. MS Excel function.
 If positive number, more Peakedness than Normal distribution.
 If negative number, flatter than Normal distribution.
 If zero, Peakedness same as of Normal distribution.

Calculation of Kurtosis is not in the course. 22


BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

4. Exploring numerical data


(Quartiles, 5-number summary, The Box plot)

23
BITS Pilani
Pilani Campus

Quartiles

24
Quartiles (from Quarter)

 Divide the sorted data into 4 quarters- 25% 40, 50, 60, 80, 100, 110, 120, 180, 220, 300, 600, 700, 900, 910, 930 and 950.
observations in each quarter.
Dataset
950
930
910
900 Q3=800 = (900+700)/2
90 200 800
700
600
 Q1, First Quartile- Lowest 25% observations.
300
 Q2, Second Quartile- Lowest 50% observations.
220 Q2=200, (220+180)/2
 Q3, Third Quartile- Lowest 75% observations. 180 IQR = 800 - 90 = 710.
120
 Inter-Quartile Range (IQR) = Q3-Q1. 110
100 Q1=90, =(80+100)/2
 Notice that Q2 = Median. 80
 Quartiles are used to study variation in the data, and to spot 60
whether the distribution of data is symmetric. 50
 Quartiles are also used in 5-number summary and Boxplots. 40

25
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

5-number Summary

26
5-number summary

The dataset is summarized by 40, 50, 60, 80, 100, 110, 120, 180, 220, 300, 600, 700, 900, 910, 930 and 950.
following 5 numbers-
Dataset 5-numbers
1. Minimum 950 950
930
2. Q1- Quartile 1 910
3. Q2- Quartile 2 900 Q3=800 = (900+700)/2 800
4. Q3- Quartile 3 700
600
5. Maximum 300
220 Q2=200, (220+180)/2 200
180
 5-number summary is used to study
variation in the data, and to quickly 120
establish whether distribution of data is 110
symmetric. 100 Q1=90, =(80+100)/2 90
80
 Used in Boxplots. 60
50
40 40
5-number summary of above dataset-
40, 90, 200, 800 and 950 27
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

The Boxplot

28
Boxplot

 A visual representation of 5-number Dataset 5-numbers


summary. 950 950
Max
930
 Put boxes over Q1 and Q2, and Q2 and Q3.
910
Q3
900 Q3=800 = (900+700)/2 800
700
600
300
220 Q2=200, (220+180)/2 200
180
120
90 200 800 110
100 Q1=90, =(80+100)/2 90
80 Q2
 Boxplot is used to study variation in the data, and to 60
spot whether distribution of data is symmetric. 50 Q1
 Boxplot = ‘Box and Whisker’ in MS Excel 2019. 40 40 Min

29
BITS Pilani, Pilani Campus
Boxplot, another example

2 4 50 60 65 70 72 100 110 112 118 120 12 observations, sorted.

5-number summary
1. Minimum: 2
2. Q1- 55 [=(50+60)/2]
3. Q2- 71 [=(70+72)/2]
4. Q3- 111 [=(110+112)/2]
5. Maximum:120

30
BITS Pilani, Pilani Campus
Typical Boxplots
Boxplot Q1, Q2, Q3,
no. Lowest 25% Lowest 50% Lowest 75%
All 100% Comments 1 5
1 25 5 5 25 Symmetric, narrow IQR
2 20 20 20 20 Symmetric, wider IQR, narrower Q1
3 20 30 30 20 Symmetric, very wide IQR, narrower Q1
4 35 15 15 35 Symmetric, narrow IQR, wider Q1
5 10 40 40 10 Symmetric, very wide IQR, very narrow Q1
6 25 5 5 25 Symmetric, low (30) median
2 6
7 45 10 10 15 Negative (left) skewed
8 20 5 10 45 Positive (Right) skewed

 This table gives the range within which


lowest 25%, 50%, 75%, and 100% (All) 3 7
observations lie.
 In Fig 1, lowest 25% observations lie in a
range of 25 (from 20 to 45), next 25%
observations are in a range of 5 (from 45 to
50), next 25% observations are in a range
of 5 (from 50 to 55), and the last 25%
observations are in a range of 25 (from 55
4 8
to 80).

31
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Covariance and Correlation


(will be done later with Chapter #12, Simple Linear Regression)

32
BITS Pilani
Pilani Campus

Nano Case study on Chapter-3


Olympics- 10,000m Women & Men
12 & 13 Aug, 2012.
33
Visualizing the races-
Position Men Women
1 1625 1757
2 1626 1773
3 1626 1783
4 1626 1794
5 1629 1808
6 1630 1813
7 1643 1826
8 1644 1827
9 1651 1874
10 1652 1875
11 1652 1887
12 1654 1888
13 1656 1889
14 1656 1892
15 1656 1893
16 1672 1896
17 1672 1896
18 1675 1896
19 1678 1897
20 1681 1904
21 1682 1911
22 1683 1913
23 1685 1917
24 1696 1918
25 1699 1924
26 1700 1926
27 1713 1928
28 1726 1929
29 1735 1930
30 1743 1932
31 1755 1958
Data taken from Wikipedia. Timings are in seconds. 32 1772.84 1959.08
Decimals shown only for the last position, # 32. 34
BITS Pilani, Pilani Campus
Visualizing the races-
Time series + Histograms + Polygons + Boxplots
Position Men Women
1 1625 1757
2 1626 1773
3 1626 1783
4 1626 1794
5 1629 1808
6 1630 1813
7 1643 1826
8 1644 1827
9 1651 1874
10 1652 1875
11 1652 1887
12 1654 1888
13 1656 1889
14 1656 1892
15 1656 1893
16 1672 1896
17 1672 1896
18 1675 1896
19 1678 1897
20 1681 1904
21 1682 1911
22 1683 1913
23 1685 1917
24 1696 1918
25 1699 1924
26 1700 1926
27 1713 1928
28 1726 1929
29 1735 1930
30 1743 1932
31 1755 1958 35
32 1772.84 1959.08
BITS Pilani, Pilani Campus
Descriptive Numerical Measures
Men, M Women, W
Average 1675 1882
Median 1672 1896
Mode - -
Minimum 1625 1757
Maximum 1773 1959
Range 148 202
Variance 1537 2882
Stdev. 39 54
CoV, % 2.3 2.9
5-Number summary
Minimum 1625 1757
Q1 1649 1862
Q2 1672 1896
Notice, in the Polygons and the table- Q3 1697 1919
Men timings- Positively skewed. Maximum 1773 1959
Women timings- Negatively skewed.
Q3-Q1 48 57
Skewness 0.78 -0.90
Kurtosis -0.0002 -0.1682
X- mean time to complete the race. MS Excel has been used. 36
BITS Pilani, Pilani Campus
HW-02:
Download Excel file at Taxila. Below Topic-2.

Contents
1. CentralTendency- Raw
2. DescriptiveMeasures
3. StemAndLeaf
4. BoxPlot
5. CentralTendency- Grouped
6. Variation1
7. Variation2
8. Z-score-1
9. Z-score-2
10. Shape
11. MoreFun Returns

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Nice to know

38
The origin of….

In modern times
 Average- Early 1500s- A part of the cargo
was thrown overboard to make the ship
lighter/safer/stable when it faced bad storm.
The losses were distributed proportionately
among the merchants whose goods were on
the ship.

 Median- In late 1500s median was often used


in astronomy.

 Standard deviation- used by Karl Pearson, a


British statistician, in 1893, for the already
known term ‘root mean square error.’

39
BITS Pilani, Pilani Campus
Applications of mean

1. To estimate the missing value (Interpolation). 1

2. Image processing- to blur a digital image.

3. Signal processing- low pass filters; Moving average.

 Cricket: is this batsman ‘in form’? Moving average.


 Car dashboard design: How many kilometers will the
car run before it runs out of fuel? Moving average. 3

40
BITS Pilani, Pilani Campus
Median in other disciplines…

Geometry Political Science


Highway
Median is a line from the vertex Median-voter theorem.
Median- 50% on each side. of a triangle that divides its
opposite side equally (50%).

41
BITS Pilani, Pilani Campus
An application of Median…

https://www.bulbs.com/learning/arl.aspx

The rated life of light bulbs is Median value, and not Mean value.

42
BITS Pilani, Pilani Campus
Another application of Median…

Image Processing
Severely corrupted image on the left has salt and pepper
noise…it has been improved by using a Median filter.
https://en.wikipedia.org/wiki/Median_filter

43
BITS Pilani, Pilani Campus
Median…..

Income and wealth distribution has high inequality. Hence median, and not mean, is preferred.
44
BITS Pilani, Pilani Campus
Few properties of Mean, Median, Mode

Mean (AM) Median Mode


AM is unique. Median may not be unique Mode may not exist.
11 12 13 14.
1 2 3 4. AM=2.5. 1 2 8 18. Multiple modes may exist
Median can be 3, 4 or 5. 11 12 13 14 15 15 16 16.
Change in the value of any Change in the values may not Change in the values may not
observation always affects AM. affect Median. affect Mode.
1 2 3 4. AM=2.5. 123456789 Blue Green Red Red Red Yellow
1 2 3 8. AM=3.5. 1 2 3 4 5 6 7 8 10000000 Mode remains Red even if Yellow
is replaced by Blue.
AM cannot be computed if all Median may be computed even May be computed even if values
values are not available. if values of all observations are of all observations are not
1 2 3 X. AM=? not available. available.
12345679X 1233334X
The Median will remain the The Mode will remain the same
same even if the largest value X even if value X is unknown.
is unknown.
45
BITS Pilani, Pilani Campus
High/Low variation
Measure Low variation High variation Measure Low variation High variation
Temperature Human body Ambient Height above MSL in the city Chennai Ooty
Temperature Mumbai Delhi Road surface Before monsoon During monsoon
Travel comfort Train Horse Smoothness Silk Cotton
Train punctuality Mumbai local Passenger train
Mail punctuality Speedpost Ordinary mail Precision CNC machine Traditional machine tool
Surface smoothness Machine made Handloom
Pill size Homeopathy Allopathy Surface finish Lapping Grinding
Size Chilli powder Chilli
Voltage DC AC
Age Class 10 students WILP students Electronic signal Music Noise
Trade price Government bonds Stocks Vibration After repair Before repair
Salary Government employees Private sector employees
Income Salaried employees Businessmen
Runs scored Rahul Dravid Chris Gayle

Straightness Potato chips Kurkure


Taste Appy Fresh apple juice
Unevenness Straight line Fractal

Movie stories Bollywood Hollywood


Phone number 99999 99999 98765 43210
Brand name MMM (3M) TCS

46
BITS Pilani, Pilani Campus
HW: Rank these graphs on their variations
1 2 3
100 100 100

80 80 80

60 60 60

40 40 40

20 20 20

0 0 0
A B C D E F G H I J K L A B C D E F G H I J K L A B C D E F G H I J K L

4 5 6
100 100 100

80 80 80

60 60 60

40 40 40

20 20 20

0 0 0
A B C D E F G H I J K L A B C D E F G H I J K L A B C D E F G H I J K

Standard deviations- Ranked on Range- 1 < 2 < 4 < 3 < 5&6.


1. 0 2. 10 3. 30
4. 14.98 5. 25.97 6. 25.97 Ranked on Stdev.- 1 < 2 < 4 < 5&6 < 3. Standard deviation = RMSE
47
BITS Pilani, Pilani Campus
Standard deviation- Image processing

Above three standard deviations (σ) have been computed from the histograms of the their digital images.
Low standard deviation- Low contrast image; High standard deviation- High contrast image.

48
BITS Pilani, Pilani Campus
Standard deviation- Signal processing

HW: Rank on variation.


https://www.mdpi.com/2076-3417/9/22/4938/htm

BITS Pilani, Pilani Campus


Standard deviation- Batting

Test Cricket
Dravid Tendulkar Sehwag
Total score 13289 15917 8229
Mean 46 48 47
Standard deviation 48 51 58

HW: What does differences in the standard deviation indicate?

50
BITS Pilani, Pilani Campus
Measuring variation of a dataset

1. Mean error is a good candidate for


measuring variation of a dataset. But Mean Error = 1/(N-1) * ∑ (xi-Mean)
it does not work, because mean error
of a dataset is always zero.

Mean Absolute Error = 1/N * ∑ |xi-Mean|


2. Mean Absolute Error- works, but not
used very often.

3. Mean Square Error (or Variance)-


Mean Square Error= σ2 = 1/N * ∑ (xi-Mean)2
works and used most frequently.

BITS Pilani, Pilani Campus


Six-Sigma (6 σ)

 Six-Sigma (6σ) is a set of techniques


and tools for process improvement.

 The Six-Sigma concept was introduced


by American engineer Bill Smith while
working at Motorola in 1986.

 A Six-Sigma process is one in which


99.99966% of all opportunities to
produce some feature of a part are
statistically expected to be free of
defects.

 Sigma (σ) refers to standard deviation.


From Wikipedia.

52
BITS Pilani, Pilani Campus
HW…. A tale of 3 exams

A B C
Mean= 105. Mean= 75. Mean= 45.
Stdev= 24.5. Stdev= 27. Stdev= 24.5.
Skewness= -1.3. Skewness= 0. Skewness= 1.3.
Kurtosis= 1.26. Kurtosis= 0.67. Kurtosis= 1.26.
No. of observations= 262. No. of observations= 262. Number of observations= 262.

What do the results of these 3 exams indicate- in terms of


a) Difficulty of the exam- A, B or C?
b) Copying happened in exam A, B or C?
c) Correction was strict/liberal in exam A, B or C?
d) Which exam is of English, Math, History?

Hint: Use Mean, Stdev, Skewness, Kurtosis, etc.


53
BITS Pilani, Pilani Campus
Take 5… Quartiles

Even number (10) of observations. Odd number (7) of observations .

54
BITS Pilani, Pilani Campus
Take 5… Boxplots

From Husband-Wife
169 pairs dataset,
available at Taxila.

55
BITS Pilani, Pilani Campus
Take 5… Outliers

Outlier on a scatter chart Outlier on a bar chart Outliers on a Histogram

Identifying Outliers using Quartiles- An


observation is considered to be an outlier
(extreme value) if its value is-
 Smaller than Q1 - 1.5*IQR, or
 Larger than Q3 + 1.5*IQR.

Outliers on a BoxPlot Outliers on a BoxPlot

BITS Pilani, Pilani Campus


HW

A B C D

1. Which archer should be selected for Paris Olympics-2024?


2. Comment on the performance of archers- in terms on mean and standard deviation.
3. Rank the archers.
4. Which archer may require least training to improve his performance?
5. Identify Eklvaya.

57
BITS Pilani, Pilani Campus
Pre- Tokyo Olympics race

Draw speed vs. time graph(s) of this race.

Express moral of the story in terms of Variance (or


Standard deviation) and Mean.

58
BITS Pilani, Pilani Campus
From mythology….

Median, Quartiles, and Percentiles.

59
BITS Pilani, Pilani Campus
Formulas

1. Mean, μ= 1/N * ∑𝑁𝑁


𝑖𝑖=1 𝑥𝑥𝑖𝑖

2. Variance, σ2 = 1/N * ∑ (xi-Mean)2

3. Standard deviation= 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉

4. Coefficient of variation, CoV= Standard deviation/Mean * 100

5. Z score= (Observed value- Mean)/Standard deviation

60
BITS Pilani, Pilani Campus
Mean- Discrete and Continuous

Discrete sequence
1 𝑁𝑁
 Mean = ∑ 𝑥𝑥 Mean
N 𝑖𝑖=1 𝑖𝑖 xi

1 2 3 ….. N

f(x)

Continuous function Mean

𝑏𝑏
 Mean = 1/(b-a) * ∫𝑎𝑎 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑
a b

This slide is not for everyone.


61
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

For doctors (andtheirpatients)


Eye surgery- RMS values
Corrected
Pre Op CCT- Pre Op HOA Post Op HOA
S.No MRD No Age(Years) Eye Refraction (minus
Diopters)
OCT(microns) (RMS Value) (RMS Value) This dataset is of 22 eyes of 11 patients
1 171065 22 OD 6.25 498 0.35 0.91 who underwent laser refractive surgery.
2 171065 22 OS 6.50 513 0.52 1.41
3 170756 21 OD 2.25 518 0.58 0.57
4 170756 21 OS 2.00 520 0.58 0.64
5 49203 21 OD 7.25 515 0.40 0.72
6 49203 21 OS 7.25 510 0.35 0.60
7 158934 26 OD 1.50 529 0.34 0.35
8 158934 26 OS 2.25 534 0.35 0.46
9 172125 24 OD 2.50 497 0.50 0.65
10 172125 24 OS 3.00 488 0.38 0.52
11 173295 31 OD 2.50 520 0.55 0.47
12 173295 31 OS 2.50 525 0.51 0.50
13 173594 22 OD 4.50 525 0.33 0.34
14 173594 22 OS 4.75 528 0.78 0.45
15 167778 24 OD 2.00 500 0.60 0.54
16 167778 24 OS 1.75 501 0.62 0.46 Notice RMS values in the last two columns.
17 121238 22 OD 3.75 488 0.46 0.61
18 121238 22 OS 4.25 492 0.43 0.55 HW for Ophthalmologists- which patient needed the surgery most?
19 143788 44 OD 6.50 565 0.46 0.69
20 143788 44 OS 6.50 559 0.41 0.81
21 155546 22 OD 4.50 513 0.35 0.40
22 155546 22 OS 4.50 510 0.43 0.55
63
BITS Pilani, Pilani Campus
Heart Rate Variation (HRV)

HW (not for Cardiologists)-


 Who should consult a cardiologist- A or B?
 Bollywood songs are written for A or B type of cases?
 B is running a race- is he accelerating or decelerating?

64
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Next Chapter
Chapter-4: Basic Probability
Quantitative Methods

Lecture-5 13Feb’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Quiz-1 will be available at Taxila (eLearn) between
14-24 Feb. Syllabus- Chapters 1 to 4, TB-1. Last
date will not be extended.
2. Extra class…… on 15Feb, Tue 7-9 pm.

3. PPT of Chapter-4 (Basic Probability) is available in


advance at Taxila, under Topic-1.
4. Students who have joined late should refer to the
Course Handout available at Taxila (eLearn) and
PPT of Lecture-1 (16Jan) available at Impartus for
prescribed Textbooks, Evaluation plan, coverage,
syllabus, etc.
5. Post your messages only on Discussion Forum at
Taxila, and not at Impartus.
2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Topics

Chapter-4: Basic Probability Relevant Pre-recorded lectures,


accessible from Taxila

1. Basic probability concepts


2. Conditional probability
3. Bayes Theorem

5
BITS Pilani, Pilani Campus
What happens when….

Acid + Blue litmus paper. Outcome ?

Litmus paper turns Red, R. Certainty


S1: R R R R R R R R R R R R R R R R R R R R R.....
S2: R R R R R R R R R R R R R R R R R R R R R.....
......

Fever + Paracetamol. Outcome ?


Recovers, R. Uncertainty
S1: N R R R R R R N R R R R R R N R R R N N R.....
S2: N R R N R N R R R N N R R R R R R R R R R.....
......
No change or Worse, N.

Toss a Coin. Outcome ?


Head, H. Uncertainty
S1: T T T H H T T H T H T T T T H T T H H H H…..
S2: T H T H H H T H H H T H H T H T H T T H H…..
…...
Tail, T.
6
BITS Pilani, Pilani Campus
Uncertainty to Probability

 Probability measures uncertainty.


 Probability is the chance of happening of an outcome.

Probability, p is a positive number, between 0 and 1 (0 ≤ p ≤ 1), or [0 ≤ p ≤ 100%].

0 0.2 0.4 0.6 0.8 1.0

 Probability of happening of all the events =1.

0.50 Recovers, R. Recovers, R. 0.10 Recovers, R.


0.70
Brand A Brand B Brand C
Paracetamol Paracetamol Paracetamol
0.50 No change or Worse, N. No change or Worse, N. 0.90 No change or Worse, N.
0.30

R R R N R R R N N N N N R N N R N R N N N..... R N R R R N R R R N N R N N R N R R R R R..... N R N N N R N N R N R R N N R N N N N N N..... 7


BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-1

Earlier, approve if p >= 70%.


Now, approve if p >= 50%.

8
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-2

9
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-3

Probability of attack

X Y Y+ Z Z+

• Z+ category is a security detail of 55 personnel, including 10+ NSG commandos and police personnel.
• Z category is a security detail of 22 personnel, including 4-6 NSG commandos and police personnel.
• Y+ category is a security detail of 11 personnel, including 2-4 commandos and police personnel.
• Y category is a security detail of 8 personnel, including 1 or 2 commandos and police personnel.
• X category is a security detail of 2 personnel, with no commandos but only armed police personnel,

From Wikipedia.

10
BITS Pilani, Pilani Campus
An application of measurement of
uncertainty-4

Average-1/26= 3.48%

Codes with shorter lengths were chosen for the


letters with higher usage or frequencies (e, t, a, o…)
for faster typing of telegraph messages and to carry
more messages in the same time.

Another application on similar lines- Hoffman encoding.


etaoi hnsrd luwgc ymfpb kvqxjz 11
BITS Pilani, Pilani Campus
Familiar applications

?
Uncertainty can be measured (estimated)
in above cases..

12
BITS Pilani, Pilani Campus
Business applications-1/2

13
BITS Pilani, Pilani Campus
Business applications-1/2

Rain

No Rain
50% days.

10% days.

Applications: Visit Mumbai or not; Schedule IPL/Test match or not; Insurance for Olympics 20XX… Application: Portfolio selection by Mutual Funds…

Earthquakes

0 1 2 3 4 5 6 7 8 9 10 11 12 …

Applications: Life insurance premium; Warranty policy...


Applications: Building design; Insurance premiums; Disaster management… 14
BITS Pilani, Pilani Campus
What is the Probability (0 to 100%)?

1. A coin is tossed, probability of a) Head, b) Tail?

2. A six-sided dice is tossed, probability of number 6?

3. Probability of 5+ Richter scale earthquake in Japan


tomorrow?

4. Probability of snowfall in Kerala next month?

15
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Basic probability concepts

16
Getting values of probability
(p-: 150-151, TB-1)

1. A priori
 Classical/Equi-likely.
 Textbook examples of Coin tossing, Playing cards,
Throwing a dice, etc.
 When you know nothing.

2. Empirical
 From historical data, observations, or experiments.
 Life tables in insurance, earthquakes, rainfall,
twins, quality, stock market …

3. Subjective
 Personal judgement.
 Covid-19 will be over in 2024, Outcome of India vs
Brazil cricket match …

17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

A priori probability
Probability- a priori…1/3
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑡𝑡𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜

1. Tossing a coin-
 Outcomes- Head or Tail.
 P(Head) = P(Tail) = ½.

2. Throwing a Dice-
 Outcomes- 1, 2, 3, 4, 5, or 6.
 P(1)=P(2)=P(3)=P(4)=P(5)=P(6)= 1/6.
 P(Even) = 3/6. P(<3) = 2/6.

3. Births-
 Outcomes- Male or Female.
 P(Male) = P(Female) = ½.

4. Playing cards-
 Outcomes- 52 nos. Its probability tree will be very large- 52
branches, hence not shown.
 P(King) = 4/52.
 P(Heart) = 13/52. 19

BITS Pilani, Pilani Campus


Probability- a priori…2/3

Number of outcomes in which the event occurs Total number of possible outcomes

1. P(4)= = 1/6 1 1, 2, 3, 4, 5, 6.
2. P(5)= = 1/6 1 1, 2, 3, 4, 5, 6.
3. P(Even)= = 3/6 3 1, 2, 3, 4, 5, 6.
4. P(<5)= = 4/6 4 1, 2, 3, 4, 5, 6.
5. P(<=5)= = 5/6 5 1, 2, 3, 4, 5, 6.
6. P(Divisible by 3)=
= 2/6 2 1, 2, 3, 4, 5, 6.
7. P(Divisible by 5)=
= 1/6 1 1, 2, 3, 4, 5, 6.
8. P(Prime)= = 3/6 3 1, 2, 3, 4, 5, 6.

20
BITS Pilani, Pilani Campus
Probability- a priori...3/3

From 52 card deck


1. P(Red)= = 26/52. Diamond

2. P(Diamond)== 13/52.
Club
3. P(Picture)= = 12/52.
4. P(=7)= = 4/52.
Heart
5. P(<7)= = 24/52.
6. P(King)= = 4/52. Spade

P(Red) > P(<7) > P(Diamond )> P(Picture) > P(King)= P(=7).

21
BITS Pilani, Pilani Campus
Shortcoming of ‘a priori’ approach

Probability 50:50 each?


 Rain / No Rain.
 NEET Pass/ NEET Fail.
 Left-handed/Right-handed person.
 Brazil win/ India win- Football or Cricket.

Probability 1/3 each?


 P(RCB_Wins)=1/3. P(CSK_Wins)=1/3. P(Others_Win)=1/3.

22
BITS Pilani, Pilani Campus
Uncertainty, Probability and Risk

Outcome-1
When probabilities are not considered, there is a risk.
Outcome-2

p Outcome-1
When probabilities are considered, still there is a risk.
1-p Outcome-2

23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Empirical probability-
From experiments or observations
Empirical probability
?
 When probability is computed from experiments,
observations, surveys, etc.
?

Item Probability
Left-handed 1 : 10 persons
Twins 3 : 100 births
Breast Cancer 1 : 8 Women in US
17.2 in 100 male smokers
Lung Cancer
11.6 in 100 females smokers
Vegetarian 38 : 100 persons
Aircraft crash 1 : 48 lakh flights
Boys to Girls ratio 51.2 : 48.8
(In most industrialized countries)
Sources of these probabilities are given in a later slide- Nice to Know.

25
BITS Pilani, Pilani Campus
Empirical probability computation-
Examples

.
S&P BSE Sensex observed for 26 days- Down-11 times,
Up- 14 times. Range Frequency Probability*
20-30 1 0.01
 P(Down) = 11/25 = 44%. 30-40 40 0.26
 P(Up) = 14/25 = 56%. 40-50 76 0.50
50-60 26 0.17
60-70 6 0.04
70-80 1 0.01
80-90 1 0.01
Total 151 1
* or Relative Frequency 26
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Subjective probability
Subjective probability
 Based on experience, private knowledge, personal
opinions, biases, etc.

1. Covid-19 will be over by 2023-


 ExpertA = 0.90. ExpertB = 0.20. ExpertC = 0.25. Layman = 0.30.

2. BSE Sensex will close 400-500 points up tomorrow-


 Broker-A: 0.40. Broker-B: 0.30. Investor-A: 0.60. MutualFund: 0.30.

3. Sports betting-
 P(IndiaWillWin) = 0.40. BookieA.
 P(IndiaiWillWin) = 0.45. BookieB.
 P(IndiaWillWin) = 0.70. BookieC.

4. Cancer?
 P(Cancer=Yes) = 0.40. DoctorA.
 P(Cancer=Yes) = 0.45. DoctorB.
 P(Cancer=Yes) = 0.70. DoctorC.

28
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Conditional probability
Joint, Marginal and Conditional probability- for two or more events

29
Types of probability
(p: 153-156, TB-1)
Outcome Probability
1 1/6
For single event- 2 1/6
Outcome Probability 3 1/6
 Simple probability. H 1/2
4 1/6
T 1/2
Total 1
5 1/6
Single Coin 6 1/6
Single Dice
Total 1

For two or more events-


1. Joint probability. Coin A Value

2. Marginal probability. Coin B


Color
3. Conditional probability. Two Coins- A and B
Coin A Colour and Value Value
Head Tail Total King NotKing Total
Head 0.25 0.25 0.5 Red 2/52 24/52 26/52

Color
Coin B
Tail 0.25 0.25 0.5
Black 2/52 24/52 26/52
Total 4/52 48/52 52/52
Total 0.5 0.5 1.0

Probabilities-
 Joint Both A and B occur. P(A and B).
 Marginal Only A occurs. P(A) = P(A and B) Or P(A and NotB).
Only B occurs. P(B) = P(A and B) Or P(NotA and B).
 Conditional A occurs given that B has occurred. P(A/B).

SK rule- For small classroom problems, draw Probability (Decision) Tree for better understanding and faster calculations. 30
BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.

31
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-6 15Feb’22. 7-9pm.


BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Quiz-1 is available at Taxila (eLearn) between 14-
24 Feb. Syllabus- Chapters 1 to 4, TB-1. Last date
will not be extended.
2. Excel HW-03 will be available at Taxila on 17 Feb.
3. PPT of Chapter-4 (Basic Probability) is available in
advance at Taxila, under Topic-1.
4. Students who have joined late should refer to the
Course Handout available at Taxila (eLearn) and
PPT of Lecture-1 (16Jan) available at Impartus for
prescribed Textbooks, Evaluation plan, coverage,
syllabus, etc.
5. Post your messages only on Discussion Forum at
Taxila, and not at Impartus.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Topics

Chapter-4: Basic Probability Relevant Pre-recorded lectures,


accessible from Taxila

1. Basic probability concepts

Will be covered today


2. Conditional probability
3. Bayes’ Theorem

5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Conditional probability
Joint, Marginal and Conditional probability- for two or more events

6
Types of probability
(p: 153-156, TB-1)

Outcome Probability
For single event- 1 1/6
Outcome Probability 2 1/6
 Simple probability. H 1/2 3 1/6
T 1/2 4 1/6
Total 1
5 1/6
Single Coin 1 2 3 4 5 6
1/2 1/2 Single Dice
1/6 1/6 1/6 1/6 1/6 1/6
6 1/6
Total 1

For two or more events- 1/2 1/2 4/52 48/52

1.Joint probability. Coin 1 Value, 1


1/2 1/2 1/2 1/2 2/4 2/4 24/48 24/48
2.Marginal probability.
Coin 2 Color, 2
3.Conditional probability. Two Coins- 1 and 2
1/4 1/4 1/4 1/4
Colour and Value- 1 and 2 KR
2/52
KB
2/52
K’R
24/52
K’B
24/52
Coin 1 Value
Head Tail Total King NotKing Total
Head 1/4 1/4 2/4
Coin 2
Red 2/52 24/52 26/52

Color
Tail 1/4 1/4 2/4 Black 2/52 24/52 26/52
Total 2/4 2/4 4/4 Total 4/52 48/52 52/52

Two events Their probabilities are written as-


1. Joint Both A and B occur. P(A and B).
2. Marginal Only A occurs. P(A).
3. Conditional A occurs given that B P(A/B).
has occurred.
SK rule- For small classroom problems, draw Probability (Decision) Tree for better understanding and faster calculations. 7
BITS Pilani, Pilani Campus
Marginal, Joint and Conditional probability…1/2.
A priori

Marginal
1. P(King)= = 4/52.
2. P(Red)= = 26/52.
3. P(7)= = 4/52.
4. P(Picture)= = 12/52.
5. P(Diamond)= = 13/52.

Joint
5. P(Red and King)= = 2/52.
6. P(Diamond and Red)== 13/52.
Marginal probability- concerned only one with event.
7. P(Picture and Red)= = 6/52. P(King) means the probability that the card is a King.
8. P(Black and Red)= = 0/52.
9. P(<3 and Red)= = 4/52. Joint probability- both events occur.
P(Red and King) means the probability that the card is of Red
color and it is also a King.
Conditional
Conditional probability- has knowledge of one of the events.
10. P(Red/King)= = 2/4. P(Red/King) means the probability that the card is of Red color
11. P(Red/7)= = 2/4. if the card is known to be a King.
12. P(Diamond/Picture)== 3/12. Here all the probabilities were computed from the data- 52 cards
13. P(Picture/Diamond)== 3/13. picture, and not from the probabilities of other events.
Notice that P(Diamond/Picture) is not be equal to P(Picture/Diamond). 8

BITS Pilani, Pilani Campus


Marginal, Joint and Conditional probability…2/2.
Empirical

Table: Historical data Computing probabilities from historical data- Joint and Marginal probabilities
Beverage
Customer # Food Beverage Marginal probability-
 P(Dosa)= 3/10 = 0.3. P(Idly) = 7/10 = 0.7. Tea Coffee Total
1 Dosa Tea  P(Tea) = 4/10 = 0.4. P(Coffee)= 6/10 = 0.6. Dosa 0.1 0.2 0.3

Food
2 Dosa Coffee
Joint probability- (AND) Idly 0.3 0.4 0.7
3 Dosa Coffee  P(Dosa and Tea)= 1/10 = 0.1. P(Dosa and Coffee)= 2/10 = 0.2.
4 Idly Tea  P(Idly and Tea) = 3/10 = 0.3. P(Idly and Coffee) = 4/10 = 0.4. Total 0.4 0.6 1.0

5 Idly Tea Conditional probability- Tea/Coffee status is known.


 P(Dosa/Tea) = 1/4 = 0.25. P(Idly/Tea) = 3/4 = 0.75. Conditional probabilities- Tea/Coffee status known
6 Idly Tea
 P(Dosa/Coffee)= 2/6 = 0.33. P(Idly/Coffee)= 4/6 = 0.67. Beverage
7 Idly Coffee
Conditional probability- Dosa/Idly status is known. Tea Coffee Total
8 Idly Coffee  P(Tea/Dosa)= 1/3 = 0.33. P(Coffee/Dosa)= 2/3 = 0.67. Dosa 0.25 0.33 x

Food
9 Idly Coffee  P(Tea/Idly) = 3/7 = 0.43. P(Coffee/Idly) = 4/7 = 0.57.
10 Idly Coffee Or- Idly 0.75 0.67 x

Event-1: Food (Events- Dosa, Idly).  P(Dosa or Tea)= 6/10 = 0.6. P(Dosa or Coffee)= 7/10 = 0.7. Total 1.00 1.00 x
Event-2: Beverage (Events- Tea, Coffee).  P(Idly or Tea) = 8/10 = 0.8. P(Idly or Coffee) = 9/10 = 0.9.

. Computing probabilities from the probabilities of other events-


Joint and Marginal probabilities Conditional probabilities- Event B status known P(A) = P(A and B) + P(A and NotB).
Event 2 Event 2 = 0.1 + 0.2 = 0.3.
B NotB Total B NotB Total P(A and B) = P(B) * P(A/B).
= 0.4 * 0.25 = 0.1.
A 0.1 0.2 0.3
Event 1

A 0.25 0.33
Event 1

x
P(A/B) = P(A and B) / P(B).
NotA 0.3 0.4 0.7 NotA 0.75 0.67 x = 0.1 / 0.4 = 0.25.
Total 0.4 0.6 1.0 Total 1.00 1.00 x P(A or B) = P(A) + P(B) – P(A and B).
= 0.3 + 0.4 - 0.1 = 0.6. 9

BITS Pilani, Pilani Campus


Formulas from previous slide
Marginal probability: From Joint probabilities-
 P(A) = P(A and B) + P(A and NotB) *

Conditional probability: From Joint and Marginal probabilities-


 P(A/B) = P(A and B) / P(B) **

Joint probability: From Conditional and Marginal probabilities-


 P(A and B) = P(B) * P(A/B) ***
= P(A) * P(B) ……this is a special case…
… when A and B are independent.

Or: From Marginal and Joint probabilities-


 P(A or B) = P(A) + P(B) - P(A and B) ****

* p-155.
** p-160.
*** p-164, General Multiplication rule for Independent events.
**** p-156, General Addition rule.
10
BITS Pilani, Pilani Campus
Conditional probability

The opponent draws a card from a pack of 52 shuffled cards. *


1. What is the probability that the card is a King? P(K)= 4/52.

2. Mona signals: The card is of Red color.


Now, what is the probability that the card is a King? P(K/R)= 2/26.

3. Mona signals: The card is a Picture.


Now, what is the probability that the card is a King? P(K/P)= 4/12.

4. Mona signals: The card is of Red color and a Picture.


Now, what is the probability that the card is a King? P(K/R&P)= 2/6.

5. Mona signals: The card is a Diamond.


Now, What is the probability that the card is a King? P(K/D)= 1/13.

P(King)= 4/52 =0.07.


P(King/Red)= 2/26 =0.07. =P(King). Independent events.
P(King/Picture)= 4/12 =0.33. ≠ P(King). Dependent events.
P(King/PandR)= 2/6 =0.33. ≠ P(King). Dependent events.
P(King/Diamond)= 1/13 =0.07. =P(King). Independent events.
*SW:- CCIITH . 11
BITS Pilani, Pilani Campus
Independent and Dependent events
Two events, A and B- Examples-

If P(A/B) = P(A), then A is independent of B, i.e., B does not affect A. Medicine-


P(Cancer) ≠ P(Cancer/Smoker) Hence dependent.
≠ P(A), then A is dependent on B, i.e., B affects A.
If P(Cancer) = P(Cancer/UsesColgate) Then independent.

P(King/Red) = 2/26 = 1/13. Credit card issue/Loans-


P(King) = 4/52 = 1/13. P(WillRepay) ≠ P(WillRepay/Employed) Hence dependent.
Since P(King/Red) = P(King/)…. If P(WillRepay) = P(Repay/EastFacing House) Then independent.
…the event King is independent of event Red. Elections-
Knowledge of color of the card did not change the probability. P(Win) ≠ P(Win/IndependentCandidate) Hence dependent.

P(King/Picture) = 4/12 = 1/3. Agrawal Sweet Shop-


P(BuyRasogolla)= 10%.
P(King) = 4/52 = 1/13.
P(BuyRagolla/Bengali)= 90%. Hence dependent.
Since P(King/Picture) ≠ P(King)….
…the event King is dependent on event Picture. Births- A Family has two children
Knowledge of Picture card changed the probability, from 1/13  P(2ndChildIsSon) = or ≠ P(2ndChildIsSon/FirstChildIsDaughter) HW.
to 1/3. .

12
BITS Pilani, Pilani Campus
Kitty party
1. How many guests took Frooti?
= PizzaAndFrooti + BurgerAndFrooti
= 100*0.25*0.20 + 100*0.75*0.60 = 5 + 45 = 50.
2. What % of the guests who took Frooti had taken Pizza?
= 5/ (5 + 45) = 0.10, or 10%.
3. What % of the guests who took Frooti had taken Burger?
= 45/ (5 + 45) = 0.90, or 90%.

4. How many guests took Coke?


= PizzaAndCoke + BurgerAndCoke
= 100*0.25*0.80 + 100*0.75*0.40 = 20 + 30 = 50.
5. What % of the guests who took Coke had taken Pizza?
5 45 20 30
= 20/ (20 + 30) = 0.40, or 40%.
6. What % of the guests who took Coke had taken Burger?
= 30/ (20 + 30) = 0.60, or 60%.
50 50
For 2 above, following formulas were used (even without thinking about them)-
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 = =
𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 + 𝑃𝑃(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃)
=
𝑃𝑃(𝑃𝑃𝑖𝑖zz𝑎𝑎)∗𝑃𝑃(𝐹𝐹𝑟𝑟𝑜𝑜𝑜𝑜𝑡𝑡𝑖𝑖/𝑃𝑃𝑖𝑖𝑧𝑧𝑧𝑧𝑎𝑎)+𝑃𝑃(Burger)∗𝑃𝑃(Frooti/Burger)
0.25∗0.20 0.05
= = = 0.10, or 10%.
0.25∗0.20 + 0.75∗0.60 0.05 + 0.45
13
BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.

14
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-7 20Feb’22. 10.30-12.30pm.


BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Quiz-1 is available at Taxila (eLearn) between 14-24 Feb.
Syllabus- Chapters 1 to 4, TB-1. Last date will not be
extended.

2. Extra class- 23Feb, Wed 7-9 pm.

3. PPT for today’s session is available in advance at Taxila,


under Topic-1.

4. Excel HW-03 is available at Taxila.

5. Students who have joined late should refer to the Course


Handout available at Taxila (eLearn) and PPT of Lecture-
1 (16Jan) available at Impartus for prescribed Textbooks,
Evaluation plan, coverage, syllabus, etc.

6. Post your messages only on Discussion Forum at Taxila,


and not at Impartus.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb, 20Feb
1 5 Discrete Probability Distributions 20Feb
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Topics

Chapter-4: Basic Probability Relevant Pre-recorded lectures,


accessible from Taxila

1. Basic probability concepts


2. Conditional probability

Will be covered today


3. Bayes’ Theorem

5
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Bayes’ Theorem

6
Kitty party
1. How many guests took Frooti?
= PizzaAndFrooti + BurgerAndFrooti
0.25 0.75 = 100*0.25*0.20 + 100*0.75*0.60 = 5 + 45 = 50.
2. What % of the guests who took Frooti had taken Pizza?
= 5/ (5 + 45) = 0.10, or 10%.
3. What % of the guests who took Frooti had taken Burger?
= 45/ (5 + 45) = 0.90, or 90%.
0.20 0.80 0.60 0.40
4. How many guests took Coke?
= PizzaAndCoke + BurgerAndCoke
= 100*0.25*0.80 + 100*0.75*0.40 = 20 + 30 = 50.
5. What % of the guests who took Coke had taken Pizza?
= 20/ (20 + 30) = 0.40, or 40%.
6. What % of the guests who took Coke had taken Burger?
5 45 20 30 = 30/ (20 + 30) = 0.60, or 60%.
For 2 above, following formulas were used (even without thinking about them)-
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹
50 50 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 = =
𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) 𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 + 𝑃𝑃(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃)
=
You have learnt 𝑃𝑃(𝑃𝑃𝑖𝑖zz𝑎𝑎)∗𝑃𝑃(𝐹𝐹𝑟𝑟𝑜𝑜𝑜𝑜𝑡𝑡𝑖𝑖/𝑃𝑃𝑖𝑖𝑧𝑧𝑧𝑧𝑎𝑎)+𝑃𝑃(Burger)∗𝑃𝑃(Frooti/Burger)
Bayes’ Theorem !!! =
0.25∗0.20
=
0.05
= 0.10, or 10%.
0.25∗0.20 + 0.75∗0.60 0.05 + 0.45
7
BITS Pilani, Pilani Campus
Reverse probability and Bayes’ Theorem

0.10 0.90 0.40 0.60 y

P(Frooti/Pizza)=0.20 is known. P(Pizza/Frooti)= to be determined. P(B/A)=x is known. P(A/B)=y to be determined.

𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 A: Pizza NotA: Burger.


𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃/𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 = 𝑃𝑃 𝐴𝐴 𝑎𝑎𝑎𝑎𝑎𝑎 𝐵𝐵
𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) B: Frooti NotB: Coke. 𝑃𝑃 𝐴𝐴/𝐵𝐵 =
𝑃𝑃(𝐵𝐵)
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃 𝐴𝐴 𝑎𝑎𝑎𝑎𝑎𝑎 𝐵𝐵
= =
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 + 𝑃𝑃(𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑎𝑎𝑎𝑎𝑎𝑎 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
𝑃𝑃 𝐴𝐴 𝑎𝑎𝑎𝑎𝑎𝑎 𝐵𝐵 + 𝑃𝑃(𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑎𝑎𝑎𝑎𝑎𝑎 𝐵𝐵)
𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗𝑃𝑃(𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹/𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃)
= 𝑃𝑃 𝐴𝐴) ∗ 𝑃𝑃(𝐵𝐵/𝐴𝐴
𝑃𝑃(𝑃𝑃𝑖𝑖zz𝑎𝑎)∗𝑃𝑃(𝐹𝐹𝑟𝑟𝑜𝑜𝑜𝑜𝑡𝑡𝑖𝑖/𝑃𝑃𝑖𝑖𝑧𝑧𝑧𝑧𝑎𝑎)+𝑃𝑃(Burger)∗𝑃𝑃(Frooti/Burger) =
𝑃𝑃 𝐴𝐴 ∗ 𝑃𝑃(𝐵𝐵⁄𝐴𝐴) + 𝑃𝑃 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 ∗ 𝑃𝑃(𝐵𝐵/𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁)
0.25∗0.20 0.05
= = = 0.10, or 10%.
0.25∗0.20 + 0.75∗0.60 0.05 + 0.45 Bayes’ Theorem.

8
BITS Pilani, Pilani Campus
 Why second test?

. False Positives

False Negatives

9
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Henceforth, several numerical problems from


TB-1 will be done.

10
From Textbook…
(p: 169-170, TB-1)

 The probability that a person has a certain disease is 0.03 or 3% Positivity rate

 If the disease is actually present, the probability that the medical 0.03 0.97
diagnostics test will give a positive result (indicating the disease is Efficacy of
present) is 0.90. Testing
procedure.
 If the disease is not actually present, the probability of a positive test
result is 0.02. 0.90 0.10 0.02 0.98

a. What is the probability of a positive test result?


= HasDiseaseANDTestsPositive + NoDiseaseANDTests Positive
= 0.03 * 0.9 + 0.97 * 0.02
= 0.0270 + 0.0194 = 0.0464 or 4.64%. 0.0270 0.0194 0.0030 0.9506
P(TP) = P(DY) * P(TP/DY) + P(DN) * P(TP/DN)

b. Suppose the test has given a positive result. What is the probability that 0.0464 0.9536
the disease is actually present.
= HasDiseaseANDTestsPositive / TestsPostive
= 0.03 * 0.9 / 0.0464 = 0.5819 or 58.19%.
P(DY/TP) = P(DY) * P(TP/DY) / P(TP)

11
BITS Pilani, Pilani Campus
‘Reverse’ probability of previous problem

0.5819 0.419

P(DY/TP) = (0.03 * 0.90) / (0.03 * 0.90 + 0.97 * 0.02)


P(TP) = (0.03 * 0.90) + (0.97 * 0.02)
= 0.0270/ (0.0270+ 0.0194) = 0.0270/0.0464
= 0.0464.
= 0.5819.
P(DN/DY) = 1- 0.581 = 0.419.

DIY: P(DY/TN) and P(DN/TN).

12
BITS Pilani, Pilani Campus
From Textbook
(p: 167-168, TB-1)

 In the past 40% of the new-models introduced by a TV


company have been successful and 60% unsuccessful.
 Before introducing a new model, the company’s market
research conducts an extensive study and releases a
favorable/ unfavorable report.
 In the past 80% successful new-models had received
favorable report and 30% of unsuccessful new-models had
received favorable reports.

a. What is the probability that a report is Favorable?


P(F)= 0.4*0.8 + 0.6 * 0.3 = 0.50, or 50%. 0.4*0.8 0.6*0.3

= P(S)*P(F/S) + P(U)*P(F/U)

b. For the next model to be introduced, the marketing research


has issued a favorable report. What is the probability the
new-model will be successful? 0.4*0.8 + 0.6*0.3
= 0.50, or 50%.
P(S/F)= (0.4*0.8) / (0.4*0.8 + 0.6 * 0.3) = 0.64, or 64%.
P(S/F)= P(S)*P(F/S) / P(F) Bayes’ Formula.
13
BITS Pilani, Pilani Campus
‘Reverse’ probability of previous problem

P(F) = (0.4 * 0.8 + 0.6 * 0.3) P(S/F) = (0.4 * 0.8)/ (0.4 * 0.8 + 0.6 * 0.3)
= 0.50. = 0.32 / (0.32 + 0.18)
= 0.64.

14
BITS Pilani, Pilani Campus
HW-03:
Download Excel file at Taxila. Below Topic-2.

Contents
1. A priori
2. Empirical- Graph
3. Empirical- Dataset
4. Emperical- BSE
5. CFC Hospital
6. BBC
7. Cancer
8. NotObvious
9. Fun with 2019
10. More Fun with 2019

15
BITS Pilani, Pilani Campus
Hot seat

cold pneumonia chicken-pox

p1 p2 p3
No fever
fever

p1, p2 and p3: Reverse probabilities

Probabilities
Probabilities

Outcome Outcome

16
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Nice to know
The origin of Probability

 Pascal and Fermat- both French. 1654.


Pascal (of Pascal’s triangle fame and inventor of
mechanical calculator) and Fermat (of Fermat's last
theorem fame) laid the groundwork of probability
theory while contemplating a gambling problem.

 Cardano- Italian. 1560s.


Cardano (of imaginary numbers fame; also a
gambler) had worked out probability but his work
remain unknown for 100 years.

 Bayes- English. 1760s.


Bayes was first to use probability inductively.

18
BITS Pilani, Pilani Campus
God and uncertainty

19
BITS Pilani, Pilani Campus
Empirical probability-1

1/48,00,000

90%
https://en.wikipedia.org/wiki/Handedness#:~:text=In%20human%20biology%2C%20handedness%20is,called%20the%20non%2Ddominant%20hand.

3%

38%

20
BITS Pilani, Pilani Campus
Empirical probability-2

1:8

17.2%, 11.6%

51.2%

≠ 50:50

21
BITS Pilani, Pilani Campus
Empirical probability-3

Beti Bachao Beti Padhao Scheme

P(Male) ≠ P(Female) ?

Prevention means changing the probabilities…


P(M) and P(F).

Refer to HW-4 (Discrete Probability Distributions)


for a numerical on a BBBP scheme launched by
https://wcd.nic.in/bbbp-schemes
the TN government.
22
BITS Pilani, Pilani Campus
Procedure for computing probability of death

Bathtub distribution
 Start observing 1,00,000 newly born babies, and record the
surviving number after every 5 years, Col(3).
 Col (4) is the difference in successive values of Col(3),
 Col 2 = Col4/Col3.
 A graph between Col(2) on Y axis and Col(1) on X axis will
resemble the graph given on the left side.

23
BITS Pilani, Pilani Campus
An application of probability- file
compression
A file of 50KB is compressed to 20 KB file using probability.

24
BITS Pilani, Pilani Campus
An application of probability-
Storage strategy in a Warehouse

https://www.allaboutlean.com/storage-strategies-random-chaotic-abc/
25
BITS Pilani, Pilani Campus
Reliability based pricing of Warranty?

1 year warranty- Rs 340. 1 year warranty- Rs 3,301.


2 year warranty- Rs 504. 2 year warranty- Rs 7,735.
Notice that Warranty cost for first year is Notice that Warranty cost for first year is
Rs 340 and additional cost for second year Rs 3,3301 and additional cost for second
is lower, Rs 164 (=504-340). year is higher Rs 4,434 (= 7,735-3,301).

HW: Explain the difference… why one is lower and another is higher in terms of reliability.

26
BITS Pilani, Pilani Campus
Applications of conditional probability
 Email spam filters. Conditional probability is Other applications
used to classify emails like the one reproduced
 Autocorrect spellings in SMS, MS word.
below as spam.
 fb- ‘people you may know.’
 Amazon’s/Netflix’s recommendation system- if watched Tom
and Jerry, high probability will watch Chotta Bhim.
 Target advertising- if female and young, show ads of cosmetics.
 Target pricing- if purchased iPhone, high probability the buyer is
rich… try selling other products above the competitive price.
 Loan/Credit card approval (next slide- Banking services).
 Medical investigation- (slides later- Medical diagnosis- 1, 2 and 3).
 Crime investigation (slide later- Crime Patrol).
 Neural Networks and Machine learning.
 Why second Covid-19 test/ second opinion?

27
BITS Pilani, Pilani Campus
Application of conditional probability-
Banking services

Decision- To issue Credit card or not?


 Questions asked-
 Do you own a house?
 Are you a working professional?
 Do you have a PAN card?
 Is your CIBIL score higher than 800?
 Do you have a credit card from another bank?

 Application forms for Credit cards have above kinds of


questions.
 If answers to these questions are yes, the applicant is
more likely to be issued a credit card, since the
probability of making payments by such an applicant is
high.
 P(LikelyToRepay/OwnsHouse) > P(LikelytoRepay/DoesNotOwnHouse).
 This probability is computed from the past data of
customers who already have credit cards.
28
BITS Pilani, Pilani Campus
HW for everyone

 It is said that only two executives know the


formula of Coca Cola, and they live in different
cities and never meet in person. Why?
 Why do some companies do not allow their
senior executives to fly together?
 Explain the phrase “Do not put your all your eggs
in the same basket.”
 Why do banks send Debit card and its PIN
number by separate couriers?
 Why mutual funds do not invest the entire fund in
a single industry?

Hints: First make a Contingency table/Decision Tree, then


assume reasonable probabilities, and then use Conditional
probability.

29
BITS Pilani, Pilani Campus
For card addicts

Fig.1 Fig.2

Method-1: Get probabilities for Fig.1 directly from the picture of 52 cards above.

Method-2: Get probabilities for Fig.1 without referring the picture of 52 cards. Instead
refer to the probabilities available in Fig. 2 and use Bayes’ Theorem.
P(Diamond/Picture)= (13/52*3/13)/(13/52*3/13+39/52*9/39)= 3/12.
P(Diamond/NotPicture)= (13/52*10/13)/(13/52*10/13+39/52*30/39)= 10/40.
P(NotDiamond/Picture)= 1- P(Diamond/Picture)= 1- 3/12= 9/12.
P(NotDiamond/NotPicture)= 1- P(Diamond/NotPicture)= 1 - 10/40= 30/40.

Method-2 is extensively used in practice since raw data (here 52 card Picture) is rarely
available.
30
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

For doctors (andtheirpatients)


Medicine=QM?

32
BITS Pilani, Pilani Campus
Application of conditional probability-
Medical diagnosis-1

 Diabetologists ask questions like- Do you


urinate a lot?, Do you have sores that heal
slowly? (Refer to the ’Diabetes Symptoms’
box).

 Notice that diabetologists ask several


questions, and not one question.
 Response to these questions changes the
probability of having diabetes.

 P(HasDiabetes/UrinateALot)= a. https://www.cdc.gov/diabetes/basics/symptoms.html
 P(HasDiabetes/DoesNotUrinateALot)= b.
 Above, a ≠ b.

33
BITS Pilani, Pilani Campus
Application of conditional probability-
Medical diagnosis-2

The data available to a Surgeon turned Data


Scientist is given in the side panel.

HW: Very easy.


If a patient smokes, what is the probability
he/she has a Lung disease?

HW: Easy.
If a patient smokes, what is the probability he/she
has Chest pain?

HW: Not very easy.


If a patient has Cough, what is the probability that
he/she has Lung Disease?

HW: Not easy.


If a patient has Chest Pain and Cough, what is the
probability that he/she Smokes? http://v1.probmods.org/patterns-of-inference.html
34
BITS Pilani, Pilani Campus
Application of conditional probability-
Medical diagnosis-3

Refer to Excel based HW-03:


Cancer.

https://journals.plos.org/plosone/article/figures?id=10.1371/journal.pone.0195029

35
BITS Pilani, Pilani Campus
a
b
From above-
P(HusbandDoctor/WifeDoctor)= 1/4= 0.25.
P(WifeDoctor/HusbandDoctor)= 0.16.

HW for males (non-Doctors)-


Guess, Probability, a=?
HW for females (non-Doctors)-
Guess, Probability, b=?
HW for everyone-
What does the difference between 16% and 25% may indicate?

36
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Cricket, Mythology, Bollywood….

37
Field placement

Where probability of hitting the ball is high, a fielder is placed.


HW: Only for non-cricket lovers-
The bowler in this picture is a fast bowler or a spinner?

38
BITS Pilani, Pilani Campus
Seven self-declared experts on Coin Tossing have
offered following strategies for next several tosses-
 Expert # 1: HHHHHHHHHHHHH…. Every time choose Head.
 Expert # 2: TTTTTTTTTTTTTTTT….
 Expert # 3: HTHTHTHTHTHTHT….
 Expert # 4: HHHTTTHHHTTTHHHTTT....
 Expert # 5: Same as the last outcome.
 Expert # 6: Opposite of the last outcome.
 Expert # 7: Toss another coin, and choose its outcome.
 …..

HW:
Which Expert’s advise should Kohli follow?

39
BITS Pilani, Pilani Campus
For Cinephiles

p Khushi

1-p Gham

HW: Only if you have watched this movie-


p=?

40
BITS Pilani, Pilani Campus
From Mythology

Others

Perfect ‘machine’ Imperfect ‘machines’

41
BITS Pilani, Pilani Campus
Conditional probability- From Mythology

My son The Elephant

Ashwathama
has died

42
BITS Pilani, Pilani Campus
Conditional probability- From Bollywood

A B
HAHK?

A or B?
Hum

43
BITS Pilani, Pilani Campus
A priori vs. Empirical

Empirical- 100:0.
A priori- 50:50. (Based on experiments)
(Equi-likely)

44
BITS Pilani, Pilani Campus
HW: Make the toss fair

 If the coin to be used is known to be unfair


(probability of head and tail are unequal)-
as in the previous slide, how can fair toss
can be ensured with this coin?

 If three equally competent candidates have


applied for the post of Project Leader, how
to select the Project Leader by tossing a
single coin?

45
BITS Pilani, Pilani Campus
Make the game fair

https://www.dutchreferee.com/alternative-penalty-shootout-for-football-attacker-defender-goalkeeper/

HW:
 Suggest two or more rules to make penalty shootouts fairer
(closer to 50:50).
46
BITS Pilani, Pilani Campus
Crime patrol

A burglary has been reported at QM Jewels and Gems store.

Two eyewitnesses gave the following clues-


Mona Jr.: I saw the burglar… the burglar was a man.
Mona Sr.: I saw the burglar… the burglar drove away in a
Maruti car.

Based on these clues the burglar was apprehended.


 Which eyewitness gave the most information? Data available with the Police-
City population Cars in the city
Male 60,000 Maruti 1,00,000
Female 40,000 NotMaruti 8,000

47
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Next chapter
Chapter-5: Discrete Probability Distributions

48
Chapter-5:
Discrete Probability Distributions
BITS Pilani
Pilani Campus

49
This chapter
Textbook # Chapter # Chapter Title
1 1 Defining and Collecting Data
1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
50
BITS Pilani, Pilani Campus
Topics
Relevant Pre-recorded lectures,
Chapter-5: Discrete Probability accessible from Taxila
Distributions

1. The probability distribution for a


discrete random variable
2. Binomial distribution
3. Poisson distribution

51
BITS Pilani, Pilani Campus
Most important slide of this course

Learn, understand, and master-

1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50

3.Variation

4.Correlation
This slide was first used in L-03, 30Jan.

52
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

The probability distribution for a


discrete random variable
53
Discrete probability distributions
Restaurant Housing Loan Sanction Throwing a fair dice
 Probability distribution: Item Probability Days Probability Outcome Probability
Tea 0.3 4 0.10 1 1/6
 Probability of all the outcomes. Coffee 0.7 5 0.20 2 1/6
Total 1 6 0.30 3 1/6
? 7 0.25 4 1/6
7+ 0.15 5 1/6
Total 1 6 1/6
? Total 1

1 2 3
Hospital stay
Days Probability
1 0.10
2 0.15
3 0.30
4 0.25
Discrete means the outcome is a- 5 0.15
 Categorical variable (H/T, Won/Lost/Draw), or 5+ 0.05
an integer (1, 2, 3, 4…). Total 1
 But not a fraction (0.666, 1.414, 1.618, 2.718,
3.1415, 9.11, etc.). Fractions are considered in 4
Continuous probability distributions, like Normal A probability distribution may also
distribution.
be expressed by an equation.
54
BITS Pilani, Pilani Campus
Probability distribution
p-184, TB-1.

Distribution of interruptions per day


in a computer network are given
below.

Interruption Probability,
per day, x P(x)
0 0.35
1 0.25
2 0.20
3 0.10
4 0.05
5 0.05
Total 1

 What is the mean number (Expected Value, EV) of


interruptions/day?
 What is the variance in the number of interruptions/day?

55
BITS Pilani, Pilani Campus
Expected Value (EV)
p-184, TB-1.

Interruption Probability,
x*P(x)
per day, x P(x)
0 0.35 0.00
1 0.25 0.25
2 0.20 0.40
3 0.10 0.30
4 0.05 0.20
5 0.05 0.25
Total 1 1.40 EV

Expected Value (Mean) = 1.4


interruptions /day.

 Mean is called Expected Value (EV).


 EV, µ= E(X)= ∑ xi * P(xi),
where xi is a random variable and P(xi) is its probability.

56
BITS Pilani, Pilani Campus
Standard deviation of interruptions
p-184, TB-1.

Interruption Probability, 2
(x-Mean) , C C*P(x)
per day, x P(x)
0 0.35 1.96 0.69
1 0.25 0.16 0.04
2 0.20 0.36 0.07
3 0.10 2.56 0.26
4 0.05 6.76 0.34
5 0.05 12.96 0.65
Total 1 2.04 Variance.

1.43 Standard deviation, σ.


σ = + Sqrt(Variance).

 Expected Value (EV)=1.40, computed in the previous slide.

 Variance, σ2 = ∑ (xi-µ)2*P(xi)
where xi is a random variable and P(xi) is its probability. Potential applications-
 Standard deviation, σ = + Sqrt(Variance).  Service Level Agreements (SLA); AMC rate.
 EV, µ= ∑ xi * P(xi).  No of technicians required; No of spare parts required.
 Reliability studies- Mean Time Between Failures (MTBF).

57
BITS Pilani, Pilani Campus
Sources of probability distributions

1. Empirical
 From historical data, experiments.
Interruptions: Empirical
probability distribution

2. Theoretical
 Binominal
 Poisson
A fair dice. Theoretical
probability distribution
Numerous other theoretical distributions (Uniform).
 Normal, F, t, Chi-square (later in the course)
 Not in the course- Uniform, Geometric, Hypergeometric, Beta,
Gamma, Maxwell-Boltzman, Cauchy, Rayleigh, Erlang, …

58
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Binomial distribution

59
Binomial examples

Only Two outcomes.

1. Coin: Head/Tail.
2. Births: Male/Female. Or, Underweight/NotUnderweight.
3. Quality: Ok/Defective.
4. Machine status: Working/Not Working.
5. Diagnostic result: Positive/Negative.
6. Cards: Red/Black. Or, Picture/NotPicture. Or, Diamond/NotDiamond.
7. KBC: Correct/Incorrect.
8. Will vote for: Cong/NotCong.

9. Football: Win/NotWin.
10. Patient: Inpatient/Outpatient.
11. SS Operation: Successful/Failure.
12. Dice: Odd/Even. Or, Prime/Composite. Or, <2 / >=2.
13. More:Defaulter/NotDefaulter; OnDuty/OnLeave; Employed/Unemployed;
Graduate/NotGraduate; BSEup/BSEdown; Immigrant/NotImmigrant.

60
BITS Pilani, Pilani Campus
Generate Binomial distribution….

Given Generate probability distribution for-


P(Head)= 0.50, P(Tail)= 0.50. 2 tosses- all possible outcomes and their probabilities

Given Solution: A fair coin tossed twice.

0.5 0.5
No. Outcome Probability Calculations Probability of 0, 1, 2 Heads?
1 HH 0.25 0.5*0.5
0.5 0.5 2 HT 0.25 0.5*0.5 P(x)= 2Cx (0.5)x (1-0.5)(2-x)
0.5 0.5 0.5 0.5
3 TH 0.25 0.5*0.5 Excel function-
4 TT 0.25 0.5*0.5 =BINOM.DIST(x,2,0.5,FALSE)
HH HT TH TT HH HT TH TT Total 1.00
0.25 0.25 0.25 0.25

P(HT)=P(H)*P(T)= 0.5*0.5= 0.25. And so on, when events are independent.

61
BITS Pilani, Pilani Campus
Generate Binomial distribution….

Given Generate probability distribution for-


P(Male)= 0.60, P(Female)= 0.40. 2 Children family- all possible outcomes and their probabilities

Given Solution: 2-children family.

0.6 0.4 No. Outcome Probability Calculations


Probability of 0, 1, 2 Males?
1 MM 0.36 0.6*0.6
0.6 0.4 2 MF 0.24 0.6*0.4 P(x)= 2Cx (0.6)x (1-0.6)(2-x)
0.6 0.4 0.6 0.4
3 FM 0.24 0.4*0.6
Excel function-
4 FF 0.16 0.4*0.4 =BINOM.DIST(x,2,0.6,FALSE)
MM MF FM FF MM MF FM FF Total 1.00
0.36 0.24 0.24 0.16

P(MF)= P(M)*P(F)= 0.6*0.4= 0.24. And so on, when events are independent.

62
BITS Pilani, Pilani Campus
Generate Binomial distribution….

Given Generate probability distribution for-


P(Male)= 0.60, P(Female)= 0.40. 3 Children family- all possible outcomes and their probabilities

Solution: 3-children family.

0.6 0.4
Given
No Outcome Probability Calculations
1 MMM 0.216 0.6*0.6*0.6
0.6 0.4 0.6 0.4 2 MMF 0.144 0.6*0.6*0.4
3 MFM 0.144 0.6*0.4*0.6
4 MFF 0.096 0.6*0.4*0.4
0.6 0.4 Probability of 0, 1, 2, 3 Males?
5 FMM 0.144 0.4*0.6*0.6
6 FMF 0.096 0.4*0.6*0.4 P(x)= 3Cx (0.6)x (1-0.4)(3-x)
0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4
7 FFM 0.096 0.4*0.4*0.6
Excel function-
8 FFF 0.064 0.4*0.4*0.4
=BINOM.DIST(x,3,0.6,FALSE)
Total 1.000
MMM MMF MFM MFF FMM FMF FFM FFF
0.216 0.144 0.144 0.096 0.144 0.096 0.096 0.064

P(MFM)= P(M)*P(F)*P(M)= 0.6*0.4*0.6= 0.144. And so on, when events are independent.

63
BITS Pilani, Pilani Campus
Generate Binomial distribution….

Given Generate probability distribution for-


P(Male)= 0.60, P(Female)= 0.40. 4 Children family- all possible outcomes and their probabilities

Solution: 4-children family.


No Outcome Probability Calculations
1 MMMM 0.1296 0.6*0.6*0.6*0.6
2 MMMF 0.0864 0.6*0.6*0.6*0.4
0.6 0.4 3 MMFM 0.0864 0.6*0.6*0.4*0.6
4 MMFF 0.0576 0.6*0.6*0.4*0.4
5 MFMM 0.0864 0.6*0.4*0.6*0.6
Given 0.6 0.4 0.6 0.4 6 MFMF 0.0576 0.6*0.4*0.6*0.4
7 MFFM 0.0576 0.6*0.4*0.4*0.6
8 MFFF 0.0384 0.6*0.4*0.4*0.4
0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 9 FMMM 0.0864 0.4*0.6*0.6*0.6
10 FMMF 0.0576 0.4*0.6*0.6*0.4
11 FMFM 0.0576 0.4*0.6*0.4*0.6
0.6 0.4
12 FMFF 0.0384 0.4*0.6*0.4*0.4 Probability of 0, 1, 2, 3, 4 Males?
0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 0.4 13 FFMM 0.0576 0.4*0.4*0.6*0.6
P(x)= 4Cx (0.6)x (1-0.6)(4-x)
14 FFMF 0.0384 0.4*0.4*0.6*0.4
15 FFFM 0.0384 0.4*0.4*0.4*0.6 Excel function-
MMMM MMMF MMFM FMFF MFMM MFMF MFFM MFFF FMMM FMMF FMFM FMFF FFMM FFMF FFFM FFFF =BINOM.DIST(x,4,0.6,FALSE)
0.1296 0.0864 0.0864 0.0576 0.0864 0.0576 0.0576 0.0384 0.0864 0.0576 0.0576 0.0384 0.0576 0.0384 0.0384 0.0256 16 FFFF 0.0256 0.4*0.4*0.4*0.4
Total 1.0000

P(MFM)= P(M)*P(F)*P(M)*P(F)= 0.6*0.4*0.6*0.4= 0.0576. And so on, when events are independent. 64
BITS Pilani, Pilani Campus
From Textbook….1/2
p-183, 191-192, TB-1.

a. What is the probability that there are 3  When customers submit orders online,
tagged orders forms in a sample of 4? the Accounting Information System
(x=3; n=4). reviews the order for possible mistakes.
P(Tagged=3)=?
 Any questionable invoices are tagged and
b. What is the probability that there are 3 or
included in a daily exception report.
more tagged order forms in a sample of 4?
(x=3, x=4; n=4).
P(Tagged>=3)=?  Recent data collected by the company
show that the likelihood is 10% that an
c. What is the probability that there are less order will be tagged.
than 3 tagged forms in a sample of 4? (x=0, [π=P(Y)=0.10, P(N)=0.90].
x=1, x=2; n=4).
P(Tagged<3)=?
0.10 Tagged, Y

0.90 NotTagged, N

65
BITS Pilani, Pilani Campus
From Textbook….2/2
p-183, 191-192, TB-1.

P(Y)=0.1, P(N)=0.9. n=4.


a. What is the probability that there are 3 tagged Outcome no Outcomes Calculations Probability
orders forms in the sample of 4? 1 YYYY 0.1*0.1*0.1*0.1 0.0001
(x=3; n=4). 2 YYYN 0.1*0.1*0.1*0.9 0.0009
P(Tagged=3)= 0.0036, or 0.36%. 3 YYNY 0.1*0.1*0.9*0.1 0.0009
4 YYNN 0.1*0.1*0.9*0.9 0.0081
b. What is the probability that there are 3 or more 5 YNYY 0.1*0.9*0.1*0.1 0.0009
tagged order forms in the sample of four? 6 YNYN 0.1*0.9*0.1*0.9 0.0081
(x=3, x=4; n=4). 7 YNNY 0.1*0.9*0.9*0.1 0.0081
P(Tagged>=3)= 0.0037, or 0.37%. 8 YNNN 0.1*0.9*0.9*0.9 0.0729
9 NYYY 0.9*0.1*0.1*0.1 0.0009
c. What is the probability that there are less than 3 10 NYYN 0.9*0.1*0.1*0.9 0.0081
tagged forms in the sample of 4? 11 NYNY 0.9*0.1*0.9*0.1 0.0081
(x=0, x=1, x=2; n=4). 12 NYNN 0.9*0.1*0.9*0.9 0.0729
P(Tagged<3)= 0.9963, or 99.63%. 13 NNYY 0.9*0.9*0.1*0.1 0.0081
14 NNYN 0.9*0.9*0.1*0.9 0.0729
15 NNNY 0.9*0.9*0.9*0.1 0.0729
16 NNNN 0.9*0.9*0.9*0.9 0.6561
P(3Y 1N) = 4 x 0.009 = 0.0036, or 0.36%. Probability of 0, 1, 2, 3, 4 Tagged
Sum 1.00
P(>=3Y 1N) = P(3Y 1N) + P(4Y 0N) P(x)= 4Cx (0.1)x (1-0.1)(4-x)
= 4 x 0.0009 + 1 x 0.0001 = 0.0036 + 0.0001 = 0.0037 or 0.37%.
P(<3Y 1N) = P(0Y 4N) + P(1Y 3N) + P(2Y 2N) Excel function-
=BINOM.DIST(x,4,0.1,FALSE)
= 0.6561+ 4 x 0.0729+ 6 x 0.0081 = 0.6561+ 02916+ 0.0486 = 0.9963 or 99.63%.
66
BITS Pilani, Pilani Campus
From Textbook
p-193, TB-1.

Wendy’s fast food


 Orders filled correctly- 86.8% [P(C), π=0.868].
 Three orders have been placed (n=3).

P(C)=0.868, P(I)=0.132. n=3.


Outcome no Outcomes Calculations Probability
1 CCC 0.868*0.868*0.868 0.6540
2 CCI 0.868*0.868*0.132 0.0995
3 CIC 0.868*0.132*0.868 0.0995
4 CII 0.868*0.132*0.132 0.0151
5 ICC 0.132*0.868*0.868 0.0995
6 ICI 0.132*0.868*0.132 0.0151
7 IIC 0.132*0.132*0.868 0.0151
8 III 0.132*0.132*0.132 0.0023
Sum 1.00 Probability of 0, 1, 2, 3, 4 Tagged

P(x)= 4Cx (0.868)x (1-0.868)(4-x)

What is the probability- Excel function-


=BINOM.DIST(x,4,0.868,FALSE)
a. All three orders are filled correctly? (x=3). P(3C 0I) = 0.6540.
b. None of the three orders are filled correctly? (x=0). P(0C 3I) = 0.0023.
c. At least 2 out of 3 orders are filled correctly? (x=2) + (x=3). P(2C 2N) + P(3C 1I) = 3*0.0995 + 0.6540= 0.9250.
67

BITS Pilani, Pilani Campus


Remaining chapter will be done in the next session.

68
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-8 23Feb’22. 7-9 pm.


BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Quiz-1 will close on 24 Feb. Syllabus- Chapters 1 to 4,
TB-1. Last date will not be extended.

2. PPT for today’s session is available in advance at Taxila,


under Topic-1.

3. Students who have joined late should refer to the Course


Handout available at Taxila (eLearn) and PPT of Lecture-
1 (16Jan) available at Impartus for prescribed Textbooks,
Evaluation plan, coverage, syllabus, etc.

4. Post your messages only on Discussion Forum at Taxila,


and not at Impartus.

2
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

3
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb, 20Feb
1 5 Discrete Probability Distributions 20Feb, 23Feb
1 6 The Normal Distribution 23Feb
1 7 Sampling Distributions
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
4
BITS Pilani, Pilani Campus
Chapter-5:
Discrete Probability Distributions
BITS Pilani
Pilani Campus

5
Topics
Relevant Pre-recorded lectures,
Chapter-5: Discrete Probability accessible from Taxila
Distributions

1. The probability distribution for a


discrete random variable

Will be done today


2. Binomial distribution (remaining part)
3. Poisson distribution

6
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Binomial distribution

7
Binomial distribution-
Formula and Excel function
π Invoice Audit. p-191, TB-1.
1-π P(Tagged, Y), π= 0.1; P(NotTagged, N)= 1- π= 1-0.1= 0.9. n=4.
0.1 Y P(Y)=0.1, P(N)=0.9. n=4.
Outcomes: Outcome no Outcomes Calculations Probability

Success or Failure P(x)= 4C


x (0.1)x (1-0.1)(4-x) 0.9 N 1 YYYY 0.1*0.1*0.1*0.1 0.0001
2 YYYN 0.1*0.1*0.1*0.9 0.0009
(for, Head or Tail; Male or Female; Ok or Defective…) 3
4
YYNY
YYNN
0.1*0.1*0.9*0.1
0.1*0.1*0.9*0.9
0.0009
0.0081
π= Probability of Success. P(0)= 4C0 (0.1)0 (1-0.1)(4-0) = 1 * 0.6561= 0.6561 None tagged.
5 YNYY 0.1*0.9*0.1*0.1 0.0009
6 YNYN 0.1*0.9*0.1*0.9 0.0081
1-π= Probability of Failure. 7 YNNY 0.1*0.9*0.9*0.1 0.0081
P(1)= 4C1 (0.1)1 (1-0.1)(4-1) = 4 * 0.0792= 0.2916 One tagged. 8
9
YNNN
NYYY
0.1*0.9*0.9*0.9
0.9*0.1*0.1*0.1
0.0729
0.0009
P(2)= 4C2 (0.1)2 (1-0.1)(4-2) = 6 * 0.0081= 0.0486 Two tagged. 10 NYYN 0.9*0.1*0.1*0.9 0.0081

P(3)= 4C3 (0.1)3 (1-0.1)(4-3) = 4 * 0.0009= 0.0036 Three tagged. 11 NYNY 0.9*0.1*0.9*0.1 0.0081
P(x) = nCx πx (1-π)(n-x) P(4)= 4C4 (0.1)4 (1-0.1)(4-4) = 1 * 0.0001= 0.0001 Four tagged.
12
13
NYNN
NNYY
0.9*0.1*0.9*0.9
0.9*0.9*0.1*0.1
0.0729
0.0081
14 NNYN 0.9*0.9*0.1*0.9 0.0729
15 NNNY 0.9*0.9*0.9*0.1 0.0729

n= number of Trials. 16 NNNN 0.9*0.9*0.9*0.9


Sum
0.6561
1.00

(No. of Tosses, Children, Lot size inspected…). Using MS Excel


Probability of 0, 1, 2, 3, 4 Tagged?
x= number of Successes (0, 1, 2, …. n). Tagged, x Probability, P(x) MS Excel function

n-x= number of Failures (0, 1, 2, …. n). 0 0.6561


1 0.2916 =BINOM.DIST(1,4,0.1,FALSE) Excel function-
P(𝑥𝑥)= Probability of 𝑥𝑥 successes happening. =BINOM.DIST(x,n,π ,FALSE)
2 0.0486
TRUE- for Cumulative.
3 0.0036 =BINOM.DIST(3,4,0.1,FALSE)
𝑛𝑛! 4 0.0001 =BINOM.DIST(x,4,0.1,FALSE)
𝑛𝑛C𝑥𝑥 = MS Excel: =Combin(n,x). =Combin(4,3)=4. Total 1.0000
𝑥𝑥! 𝑛𝑛−𝑥𝑥 !

x! MS Excel: =Fact(x). =Fact(5)=120. =Fact(0)=1.


8

BITS Pilani, Pilani Campus


Binomial distribution requirements
Requirements
a. Only two possible outcomes are possible for each trial. Success and Failure. Head/Tail; Male/Female, Ok/Defective, etc. π
b. Probability of Success + Probability of Failure = 1. P(S)= π, P(F)= 1-π. P(H)=0.5, P(T)=0.5; P(M)=0.52, P(F)=0.48.
c. The probability of outcomes remains constant. π does not change. Coin properties does not change. 1-π
d. The trials are independent. Therefore, P(A and B)= P(A) * P(B).

Independent trials meaning-


English-
 If the outcome of a toss cannot be predicted by the outcomes of previous tosses,
then outcomes are independent.
 If the gender of a newly born cannot be predicted by the gender of previous births,
then gender is independent.
 If daily change in stock market index like Sensex cannot be predicted by previous
days’ changes, then daily changes are independent.

Statistical-
 If P(A)=P(A/B), then A is independent of B, i.e., B does not affect probability of A.
 P(King)= 4/52=1/13, P(King/Red)= 2/26= 1/13. Since P(King)= P(King/Red),
the event King is independent of event Red.
 P(King)= 4/52=1/13, P(King/Picture)= 4/12=1/3. Since P(King)≠
P(King/Picture), the event King is dependent on event Picture.

9
BITS Pilani, Pilani Campus
Binomial probability distributions
Generate probability distributions for-

1. No. of Heads in 2 tosses. P(Head)= 0.50.


2. No. of Boys in 3-children families. P(Male)= 0.52.
3. No. of Breakdowns out of 4 m/c. P(BDN)= 0.008.
4. No of days stock market is Up in 5 days. P(Up)= 0.55.
5. No. of Rainy days in 10 days. P(Rain)= 0.75.
6. No. of Correct answers in 10 MCQs. P(Correct)= 0.20.
7. No. of Males in 20-employees BPO. P(Male)= 0.20.
8. No. of Defectives in a lot of 30 parts. P(Def.)= 0.03.
Assume the events are independent.

Solution given in the next slide.

10
BITS Pilani, Pilani Campus
Solutions- previous slide

1 2

P(x)= nCx (π)x (1-π)(2-x)

Excel function-
=BINOM.DIST(x,n,π,FALSE)

3 4 5

6 7 8

11
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Poisson distribution

12
What is the Probability distribution?
1. Avg no. of accidents= 4/day. Probability of 0, 1, 2, 3, 4, 5… accidents, x?
2. Avg no. of potholes= 6/km. Probability of 0, 1, 2, 3, 4, 5… potholes, x?
3. Avg no. of goals= 3.2/match. Probability of 0, 1, 2, 3, 4, 5… goals, x?
4. Avg no. of teeth cavities= 3.28/patient. Probability of 0, 1, 2, 3, 4, 5… teeth cavities, x?
5. Avg no. of shooting stars= 0.3/hour. Probability of 0, 1, 2, 3, 4, 5… shooting stars, x?
6. Avg no. of typos= 2.7/page. Probability of 0, 1, 2, 3, 4, 5… typos, x? Probability of 0, 1, 2, 3…accidents?
 Distribution of the data is essential to make decisions.
 Often distribution of the data is not available.
 But, Mean (Avg.) of the data may be available.
 If only Mean is known, then Poisson distribution
Solution of 1 above. formula can be used provided certain conditions are
x Probability, P(x) met.
0 0.0183
1 0.0733
2 0.1465 Poisson distribution formula Excel function-
3 0.1954 =POISSON.DIST(x,4,FALSE)
4 0.1954
5 0.1563
6 0.1042
7 0.0595
Mean=4
8 0.0298
9 0.0132
10 0.0053
… …
Sum 1.0000

13
BITS Pilani, Pilani Campus
Poisson distribution formula
_
𝑒𝑒 λ λ𝑥𝑥
P(x)=
𝑥𝑥!

λ= Mean (average) number of events, is given.

e= 2.718… a constant.
𝑥𝑥= 0, 1, 2, 3…. number of events (accidents, cavities, goals, potholes…).

P(𝑥𝑥)= probability of 𝑥𝑥 events happening.

x!= x * (x-1)*(x-2)…2*1. Example: 5!=120, 4!=24, 3!=6.


MS Excel function- =FACT(5) gives 120.

14
BITS Pilani, Pilani Campus
From Textbook
p-196, TB-1.

Suppose the mean number of customers (λ) who Average no. customers per minute, λ = 3.
arrive per minute at the bank during the noon-
to-1 PM hour is 3.

 What is the probability exactly two customers P(0)= 0.0498. No customer arrives.
will arrive in a given minute? P(1)= 0.1494. 1 customer arrives.
P(2)= 0.2240. 2 customers arrive,
P(x=2)= 0.2240, or 22.4%.
P(3)= 0.2240. 3 Customers arrive.
P(4)= 0.1680. And so on.
 What is the probability more than two P(5)= 0.1008.

customers will arrive in a given minute?

P(x>2)= 1 - 0.0498 - 0.1494 - 0.2240 -3 8
= 0.5768, or 57.68%. P(8)= e * 3 /8! = 0.0081.
P(9)= …
P(10)= …

P(200)= …
P(x>2)= P(3) + P(4) + P(5) + P6) + P(7) +… or

= 1 – P(0) – P(1) – P(2).
Excel function-
=POISSON.DIST(x,3,FALSE)
15
BITS Pilani, Pilani Campus
From Textbook
p-197, TB-1.

The number of work-related injuries in a glass Average no. of injuries, λ = 2.5/month


factory can be approximated by the Poisson
distribution, with a mean (λ) of 2.5 work-related
injuries a month.
P(0)= 0.0821. No injury reported.
 What is the probability that in a given month, no P(1)= 0.2052. 1 injury reported.
work-related injuries occur? P(2)= 0.2565. 2 injuries reported.
P(3)= 0.2138. 3 injuries reported.
P(x=0)= 0.0821, or 8.21%. P(4)= 0.1336. And so on.
P(5)= 0.0668.
 What is the probability that at least one work- …
related injury occurs? …
-2.5 8
P(8) = e * 2.5 /8! = 0.0031.
P(x>=0)= 1 - 0.0821 = 0.9179, or 91.79%. P(9) = …
P(10) = …

P(200)= …
P(x>=1)= P(1) + P(2)+ P(3) + P(4) + P(5) + P6) + P(7) +… or

= 1 – P(0).

Excel function-
=POISSON.DIST(x,2.5,FALSE)
16
BITS Pilani, Pilani Campus
Using MS Excel
Using the Formula
Mean no. of customers per minute, λ= 3.
=POISSON.DIST(x,3,FALSE)
Excel function-
x Probability, P(x) MS Excel function =POISSON.DIST(x,Mean,FALSE)

0 0.0498 =POISSON.DIST(0,3,FALSE) P(0)= 0.0498. TRUE is for Cumulative probability.


1 0.1494 =POISSON.DIST(1,3,FALSE) P(1)= 0.1494. x=0, 1, 2. 3, 4 …….
2 0.2240 … P(2)= 0.2240.
P(3)= 0.2240.
3 0.2240 …
P(4)= 0.1680.
4 0.1680 …
P(5)= 0.1008.
5 0.1008 …
….
6 0.0504 … ….
-3 8
7 0.0216 … P(8) = e * 3 /8! = 0.0081, or 0.81%
8 0.0081 =POISSON.DIST(8,3,FALSE) P(9) = …
9 0.0027 … P(10) = …
10 0.0008 … …
P(200)= …
… … …

Sum 1.0000

17
BITS Pilani, Pilani Campus
Poisson probability distributions
Generate probability distributions for-

1. Average no. of accidents reported= 4/day.


2. Average no. potholes observed= 6/km.
3. Average no. goals scored in FIFA World cup matches= 3.2/match.
4. Average no. of decayed or missing teeth= 3.28/person.
5. Average no. of shooting stars sighted= 0.3/hour.
6. Average no. of typos noticed= 2.7/page.

Solution given in the next slide.

18
BITS Pilani, Pilani Campus
Poisson probability distributions

Excel function-
=POISSON.DIST(x,Mean,FALSE)

x= 1, 2, 3….20….

19
BITS Pilani, Pilani Campus
On Poisson distribution

 For Poisson distribution, Variance =Mean.


 A tip: If Variance = Mean of a data, its distribution may
be approximated by the Poisson distribution.
 Applications- arrivals of customers, machine
breakdowns, complaints/accidents, phone calls….

Requirements-
 The event of interest is the number of events in a given
interval (time/length/area).
 The probability that an event occurs is same in every
interval of equal size.
 The number of events that occur in one interval is
independent of number of events that occur in another
interval.
 The probability that two or more events will occur in
an interval approaches zero as the interval becomes
smaller.

20
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Mean and Variance:


Binomial and Poisson distributions
Mean and Variance of Binomial and
Poisson distributions
Binomial Distribution:
Mean = No of trials * Probability of success.
= n * π.
Variance= No of trials * Probability of Success * Probability of Failure.
= n * π * (1-π).

Poisson Distribution:
Mean = λ, given or known.
Variance= Mean= λ.

Examples:
Binomial distribution: Mean= n*π. Variance= n*π*(1-π).
Example No of Trials, n. Prob. of Success, π. Mean= n*π. Variance=n*π*(1-π). Stdev= Sqrt(Variance).
Coin Tossing n=100. No of Tosses. P(Head), π = 0.5. Mean no. of Heads= 100*0.5= 50.
= 100*0.5*0.5=25. =Sqrt(25)= 5.
Children in family n=4. Children in family. P(Male), π = 0.6. Mean no. of Males= 4*0.6= 2.4 = 4*0.6*0.4=0.96. =Sqrt(0.96)=0.98.
Quality inspection n = 400. Lot size. P(Defective), π= 0.03. Mean no. of Defectives= 400*0.03= 12. = 400*0.03*0.97=11.64. =Sqrt(11.64)= 3.4.
MCQ questions n= 30. No of questions. P(Correct), π = 0.20. Mean no. of Correct answers= 30*0.20= 6. = 30*0.20*0.80=4.8. =Sqrt(4.8)= 2.2.
Poisson distribution: Mean= λ. Variance= Mean= λ.
Mean no of injuries, λ= 3/month. Variance=Mean= λ= 3.
Mean no. of potholes, λ= 6/km. Variance=Mean= λ= 6.
Mean no of goals scored, λ= 3.2/match. Variance=Mean= λ= 3.2.
Mean no. shooting stars, λ= 0.3/hour. Variance=Mean= λ= 0.3.
22
BITS Pilani, Pilani Campus
HW-04:
Download Excel file at Taxila. Below Topic-2.

Contents-
1. EV- Expected Value and Variance
2. Binomial
3. BetiBachao
4. Quality
5. MumbaiRains
6. Poisson
7. Ambulance
8. eLudo
9. BigBFun

23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Nice to know
Origin of ….

Binomial distribution:
 Introduced by Jakob Bernoulli, from the
family of famous Swiss mathematicians.
 Published in 1713, posthumously.

Poisson distribution:
 Introduced in 1837 by Poisson, a French
mathematician.
 While studying wrongful convictions.

25
BITS Pilani, Pilani Campus
Binomial distribution in action

Left: Boy (Head), Right: Girl (Tail)


Click to Play:  Called: Galton Box, Bean Machine, Quincunx.
http://www.mathsisfun.com/data/quincunx.html  Monte Carlo simulation has been used.

Play and Learn

26
BITS Pilani, Pilani Campus
HW

 Anyone can open a number-lock. Why


do people then use number-locks?

 Jewel Thief: What is the average


number (Expected value) of trials
needed to open a 4-digit number lock?
Assume only 0, 1, 2, …8, and 9 are allowed.

 The Digital Jewel Thief: What is the


average number (Expected value) of
trials needed to crack a 8-digit
password? Assume only 0, 1, 2, …8, and 9 are allowed.
27
BITS Pilani, Pilani Campus
HW

 Guess….. PM is in which car.. 1, 2, 3, 4, 5, or


6 (Q1)?

 What is the average number (Expected


value) of guesses required to get correct
1 2 3
answer of Q1?
4 5 6

 What is the variance of guesses for Q1?

 If one identical car is added to the


motorcade, how much risk will be reduced
to the PM and increased to the attacker?

28
BITS Pilani, Pilani Campus
HW: KBC for Monkeys
A monkey has reached the hot seat of KBC. The
monkey picks the answers randomly (say, it
uses a four-sided fair ‘dice’ with A, B, C, and D
on the faces).
1. What is the probability that the monkey answers
all 15 questions correctly without using any
lifeline?
2. What is the average amount (Expected value)
won by the monkey. What is the variance?
3. What is the probability that option A gets chosen
on all 15 questions?
4. Suggest strategies to choose life lines- when to use
Ask the Expert, take Audience poll, or use 50:50
life line?
5. How many people are in the audience? (Hint: Use
Mode value and Minimum % responses of the
audience poll result).

29
BITS Pilani, Pilani Campus
From Mythology

HW:
What is the probability of having 100 sons and
1 daughter in a family of 101 children?

30
BITS Pilani, Pilani Campus
Application of Binomial distribution

 A town has 600 families with 2 children and 100 families with one child.

 What is the expected financial outlay if a Fixed deposit of Rs 22,200/- is to be


made in the name of the girl child for the family which has only one girl child,
and Rs 15,200 for each girl child if the family has only two girl children?

This problem also appears in HW-


04, Excel file at Taxila.

https://vikaspedia.in/social-welfare/women-and-child-development/child-development-1/girl-child-welfare/state-wise-schemes-for-girl-child-welfare/sivagami-ammaiyar-memorial-girl-child-protection-scheme
31
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

For doctors (andtheirpatients)


Application of Poisson distribution

Manager of a factory has invited bids for ambulance


service to take injured workers to a nearby hospital.
Bids submitted by three service providers are as follows-
1. AlwaysFast services: Rs 3,000/case.
2. BestAndFast services: Rs 4,000/case up to 3 cases in a month
and Rs 2,000/case if cases in a month exceed 3.
3. CarryFast services: A fixed amount Rs 8,000/month and Rs
3,000/case if cases in a month exceed 5.
The average number of accidents reported in the factory
is 4.2/month. Assume the distribution of accidents is
Poisson.

Which bid is the most economical?

This problem also appears in


HW-04, Excel file at Taxila.

33
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Next chapter
Chapter-6: The Normal Distribution
Chapter-6:
The Normal Distribution
BITS Pilani
Pilani Campus

35
This chapter
Textbook # Chapter # Chapter Title
1 1 Defining and Collecting Data
1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures
1 4 Basic Probability
1 5 Discrete Probability Distributions
1 6 The Normal Distribution Textbook #1
1 7 Sampling Distributions
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models Textbook #2

36
BITS Pilani, Pilani Campus
Most important slide of this course

Learn, understand, and master-

1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50

3.Variation

4.Correlation
This slide was first used in L-03, on 30Jan.

37
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

The Normal distribution


Availability of data
 Distribution of the data is essential to make decisions.
 Often distribution of the data is not available.
 But, Mean and Standard deviation may be available.

Typical Continuous Probability distributions

Relationship between Obesity and Periodontal Diseases


in Saudi Women (Asir Region): A Prospective Study

If few conditions are met, then the distribution can


be generated from only the Mean and Standard
deviation of the data.

39
BITS Pilani, Pilani Campus
Distribution of data

Distribution of data (vertical rectangles) from diverse


phenomenon like distribution of weight, height, IQ, Inner diameter of
BMI, measurement errors, etc. have been found to have bearing rings.
N=120.
similar characteristics- Mean= 62.998 mm.
 Nearly symmetrical about the mean value, Stdev= 0.020mm
 Concentration towards the mean value, and
 Very few extremely small or large values. IQ score-
N=736,808.

Birthweight (kg)
N=3,326
Mean=3.39 kg
Stdev= 0.55kg.

40
BITS Pilani, Pilani Campus
Resemblance to the theoretical Normal
distribution
Normal A theoretical
distribution. Inner diameter of
distribution bearing rings
N=120.
Mean= 62.998 mm.
Stdev= 0.020mm

IQ score-
N=736,808.

3.39 3.39+0.55 3.39+2*0.55


=3.94. =4.49.
34.13% 13.59 %
Birthweight (kg)
N=3,326
Mean, μ=3.39 kg
Stdev, σ= 0.55kg.

41
BITS Pilani, Pilani Campus
Obtaining the probabilities
If the mean and standard deviation of a data
are known then the data can be
approximated by the Normal distribution
provided following conditions are met-
 Distribution of the data is near symmetric,
 A vast majority of the observations are
closer to the mean.
 Very few extremely small or large values.

Three ways to obtain probabilities from a


Normal distribution-
1. From the graph…. Not easy, Not accurate. z=(x-Mean)/Stdev
2. Use MS Excel function. =NORM.DIST(x,Mean,Stdev,TRUE)
3. From the z Table.
z table
P-540 & 541, TB-1
Obtaining probability values using MS Excel is
easier, faster and more accurate than using 2 4 6 8 10 12 14 16 18
the Z table.

42
BITS Pilani, Pilani Campus
Normal distribution
1. Normal distribution curve is symmetric about Mean, and a
vast majority of observations lie close to the mean.
2. Other names- Bell-shaped curve, Law of error and Gaussian.
3. It is a continuous distribution- its x-axis can have fractional
values like 3.39, 5.55, etc. (weight, diameter…) and the
curve ranges from –infinity to +infinity on x-axis.
4. Area under the curve represents probability. Total area
under the curve is 1, that is, probability of all the events=1.
5. A Normal distribution is described by two parameters- 2 4 6 8 10 12 14
x
16 18
mean (μ) and standard deviation (σ).
6. Reading probabilities from the graph is not easy. Therefore,
published tables are used to obtain area (probability) of
different regions.
7. Textbooks provide only one table for mean, μ= 0 and
standard deviation, σ = 1, called z table, or Cumulative
Probability (or Area)=?
Standardized Normal distribution table. z table =Norm.Dist(x,Mean,Stdev,True)
(p-540-541, TB-1).
8. z value is used to get probabilities from this table when μ≠ X=?
0 or σ ≠ 1. =Norm.Inv(Probability,Mean,Stdev)

9. Obtaining the probabilities however is easier with MS Excel. 43


BITS Pilani, Pilani Campus
Remaining chapter will be covered in the next session.

44
BITS Pilani, Pilani Campus
Quantitative Methods

Lecture-9 27Feb’22. 10.30am- 12.30pm.


BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Quiz-1: Correct answer of- Defectives in 25 nos of parts:
Minor defect (6) and Major defect (2). P(Major/Defect)=
2/8 and not 6/8. Correction will be made in 4-5 days.

2. Extra class on 1 March, Tue 7-9pm.

3. For today’s class, keep ready: “The Cumulative


Standardized Normal Distribution” table on p-540 & 541.

4. For next class, keep ready: “Critical Values of t” table on


p-542 & 543.

5. PPT for today’s session is available in advance at Taxila,


under Topic-1.

6. Students who have joined late should refer to the Course


Handout available at Taxila (eLearn) and PPT of Lecture-
1 (16Jan) available at Impartus for prescribed Textbooks,
Evaluation plan, coverage, syllabus, etc.

7. Post your messages only on Discussion Forum at Taxila,


and not at Impartus.

2
BITS Pilani, Pilani Campus
QM Mid-Term test

WILP will share common instructions applicable to all the


courses.

QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots/No computer outputs.
8. Explain well. Express well.

A tip/a warning/an advice- TB-1


 No unfair means.
 Do not plead for fairness with other questions or with other
students if unfair practice is used.

3
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb, 20Feb
1 5 Discrete Probability Distributions 20Feb, 23Feb
1 6 The Normal Distribution 23Feb, 27Feb
1 7 Sampling Distributions 27Feb
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
5
BITS Pilani, Pilani Campus
Chapter-6:
The Normal Distribution
BITS Pilani
Pilani Campus

6
Topics

Chapter-6: The Normal Distribution Relevant Pre-recorded lectures,


accessible from Taxila

1. Continuous probability distributions.


To be done today (remaining part)
2. The Normal distribution.

7
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Probability from MS Excel


x is given, Area=?- Using MS Excel
The average height (μ) of visitors to the Statute Museum
has been recorded as 150 cms with standard deviation (σ)
of 5 cms.

Assume the distribution of heights can be approximated


by a Normal distribution with mean, (μ) of 150 cms and
15.87%
standard deviation (σ) of 5 cms.
15.87%

Proportion of visitors with heights: for pricing/entry fee, estimate revenue…. x


MS Excel function-
=NORM.DIST(x,Mean,Stdev,TRUE)
MS Excel gives area to the left of x.
 Below 145 cms. = 0.1587 or 15.87%.
Less than, x cms Area Excel function  Below 150 cms. = 0.5000 or 50.00%.
135 0.0013 =NORM.DIST(135,150,5,TRUE)  Below 160 cms. = 0.9772 or 97.72%.
140 0.0228 …  Below 157.5 cms. = 0.9332 or 93.32%.
145 0.1587 =NORM.DIST(145,150,5,TRUE)
150 0.5000 …  Above 160 cms. = 1.0000 – Area below 160 cms.
155 0.8413 = 1.0000 - 0.9772 = 0.0228 or 2.28%.
157.5 0.9332 =NORM.DIST(157.5,150,5,TRUE)  Above 145 cms. = 1.0000 - 0.1587 = 0.8413 or 84.13%.
158.22 0.9499 =NORM.DIST(158.22,150,5,TRUE)
160 0.9772 …  Between 140 and 160 cms. = Area below 160 cms – Area below 140 cms.
165 0.9987 =NORM.DIST(165,150,5,TRUE) = 0.9772 - 0.0228 = 0.9544 or 95.44%. 9

BITS Pilani, Pilani Campus


Area is given, x=?- Using MS Excel
A government provides subsidized fertilizers to all the
owners of Tea plantations.
Land records show the average area of Tea plantations (μ)
is 20 acres with standard deviation (σ) of 4 acres. Assume
distribution of the plantations can be approximated by a
Normal distribution.
Plantations of what size will now be eligible for the subsidy if 4 8 12 16 20 24 28 32 36
the subsidy is restricted to- x

Area is given, x?
 Smallest 10% plantations. = 14.9 acres.
MS Excel formula to get x-
 Smallest 30% plantations. = 17.9. =NORM.INV(AreaToTheLeftOfx,Mean,Stdev)
 Smallest 50% plantations. = 20.0.
Plantation size Area to the Left x Excel Function
 Smallest 80% plantations. = 23.4.
Smallest 10% 0.1 14.9 =NORM.INV(0.1,20,4)
Smallest 30% 0.3 17.9 =NORM.INV(0.3,20,4)
 Largest 10% plantations. = 25.1. Smallest 50% 0.5 20.0 =NORM.INV(0.5,20,4)
 Largest 40% plantations. = 21.0. Smallest 80% 0.8 23.4 =NORM.INV(0.8,20,4)
Largest 10% 0.9 25.1 =NORM.INV(0.9,20,4)
 Middle 50% plantations. = 17.3 to 22.7. Largest 40% 0.6 21.0 =NORM.INV(0.6,20,4)
 Middle 80% plantations. = 14.9 to 25.1. Middle 50%
0.2500 17.3 =NORM.INV(0.25,20,4)
0.7500 22.7 =NORM.INV(0.75,20,4)
0.1000 14.9 =NORM.INV(0.1,20,4)
Middle 80%
0.9000 25.1 =NORM.INV(0.9,20,4) 10

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Probability from z Table


Cumulative Standardized Normal
Distribution or z table

Similar to p-540, TB-1. Similar to p-541, TB-1.

-4 -3 -2 -1 0 1 2 3 4
z
 This table is for Mean, μ=0 and Stdev, σ=1.
 This table gives area (probability) to the left of z.
 The area to the right of z= (1 – area to the left of z).

Examples-
1: Area to the left of z= -1 is 0.1587, or 15.87%.
2: Area to the left of z= 2 is 0.9772, or 97.72%.
3: Area to the left of z= 1.12 is 0.8686, or 86.86%.

Notice that-
68.26% observations lie within Mean ± 1 Stdev.
95.44% observations lie within Mean ± 2 Stdev.
99.72% observations lie within Mean ± 3 Stdev.
Area to the left of –z Area to the left of +z
 If μ≠0 or σ≠1, then compute z= (x – μ)/σ to use
the z table. Notice z= x when μ=0 and σ=1.
12
BITS Pilani, Pilani Campus
Area (probability) is given, find z

Area to the left of z is given-


What is z?
z
Mean=0, Stdev=1.

Area z=?
1. 5.48% (0.0548) 1. -1.60
2. 88.49% (0.8849) 2. 1.20
3. 30% (0.3000) 3. -0.52
4. 50% (0.0000) 4. 0.00
5. 80% (0.8000) 5. 0.84
6. 90% (0.9000) 6. 1.28
7. 99% (0.9900) 7. 2.32
8. 99.5% (0.9950) 8. 2.57
MS Excel function for the above problems,
Mean=0 and Stdev=1. MS Excel function-
=NORM.INV(Area,0,1) =NORM.INV(Area,Mean,Stdev)
It gives x value for the area given to the left of x.
13
BITS Pilani, Pilani Campus
x is given, Area=?- Using z Table
The average height (μ) of visitors to the Statute Museum
has been recorded as 150 cms with standard deviation (σ)
of 5 cms.

Assume the distribution of heights can be approximated by


a Normal distribution with mean, (μ) of 150 cms and
standard deviation (σ) of 5 cms.

Proportion of visitors with heights: for pricing/entry fee, estimate revenue. x


The z table cannot be used directly because z table is for
 Below 155 cms. = 0.8413, 84.13%. Mean=0 and Stdev=1. To use z table-
 Below 135 cms. = 0.00135, 0.135%. 1. Compute z score of the x value: z= (x-Mean)/Stdev
2. Then use z table- P-540/541, TB-1 to get the Area.
Area to the
 Above 145 cms. = 1.0000 - 0.1587 = 0.8413, 84.13%. x z = (x-Mean)/Stdev
left of z
 Above 155 cms = 1.0000 - 0.8413 = 0.1587, 15.87%. 135 z= (135-150)/5= -3. 0.00135
138 z= (138-150)/5= -2.4. 0.0082
 Between 150 and 155 cms. = 0.8413 - 0.5000 = 0.3413, 34.13%. 142 z= (142-150)/5 = -1.6. 0.0548
 Between 160 and 145 cms. = 0.9772 - 0.1587 = 0.8186, 81.86%. 145 z= (145-150)/5 = -1. 0.1587
 Between 155 and 163 cms. = 0.9953 - 0.8413 = 0.1540, 15.40%. 150 z= (150-150)/5= 0. 0.5000
 Between 145 and 150 cms. = 0.5000 - 0.1587 = 0.3413, 34.13%. 155 z= (155-150)/5= 1. 0.8413
157 z=(157-150)/5= 1.4. 0.9192
 Between 138 and 160 cms. = 0.9772 - 0.0082 = 0.9691, 96.91%.
160 z= (160-150)/5= 2. 0.9772
163 z= (165-150)/5 = 2.6. 0.9953
14

BITS Pilani, Pilani Campus


Area is given, x=?- Using z Table
A government provides subsidized fertilizers to all the
owners of Tea plantations.
Land records show the average area of Tea plantations (μ)
is 20 acres with standard deviation (σ) of 4 acres. Assume
distribution of the plantations can be approximated by a
Normal distribution.
Plantations of what size will now be eligible for the subsidy if 4 8 12 16 20 24 28 32 36
the subsidy is restricted to- x
The x value cannot be directly obtained from the z table
because z table is for Mean=0 and Stdev=1.
 Smallest 10% plantations. = 14.5 acres. Steps to use z table-
 Smallest 30% plantations. = 17.9. 1. First get z value for the given area from P-540/541, TB-1
 Smallest 50% plantations. = 20.0. 2. Then compute x value: x = Mean + z * Stdev
 Smallest 80% plantations. = 23.4. Plantation size Area to the Left z x = Mean + z * Stdev
Smallest 10% 0.1000 -1.28 20+(-1.28)*4 =14.5.
 Largest 10% plantations. = 25.2. Smallest 30% 0.3000 -0.52 20+(-0.52)*4 = 17.9.

 Largest 40% plantations. = 21.0. Smallest 50% 0.5000 0.00 20+ 0*4 = 20.0.
Smallest 80% 0.8000 0.84 20+0.84*4 = 23.4
Largest 10% 0.9000 1.29 20+ 1.29*4 =25.2.
 Middle 50% plantations. = 17.3 to 22.7.
Largest 40% 0.6000 0.26 20+0.26*4 = 21.0.
 Middle 80% plantations. = 14.9 to 25.2.
0.2500 -0.67 20+(-0.67)*4 = 17.3.
Middle 50%
0.7500 0.68 20+0.68*4 = 22.7
0.1000 -1.28 20+(-1.28)*4 = 14.9.
Middle 80% 15
0.9000 1.29 20+1.29*4 = 25.2.

BITS Pilani, Pilani Campus


From Textbook- Using MS Excel
p- 213 & 214. TB-1.

Video download duration is Normal distributed


with Mean duration, μ= 7 secs and Standard
deviation, σ= 2 secs.
15.87%

a. Probability that download speed > 9 secs?


b. Probability that download speed <7 or > 9 secs? 65.87%.
c. Probability that download speed is between 5 to 9 secs? Both blue areas.

Excel output,
Required Excel function
Area=
68.27%
P(x>9) =1-NORM.DIST(9,7,2,TRUE) 0.1587
P(<7 or >9) =NORM.DIST(7,7,2,TRUE)+(1-NORM.DIST(9,7,2,TRUE)) 0.6587
P(5 to 9) =NORM.DIST(9,7,2,TRUE)-NORM.DIST(5,7,2,TRUE) 0.6827

MS Excel function-
=Norm.Dist(X,Mean,Stdev,TRUE).
It gives area to the left of x.

16
BITS Pilani, Pilani Campus
From Textbook- Using Z table
p- 213 & 214. TB-1.

Video download duration is Normal distributed


with Mean duration, μ= 7 secs and Standard
deviation, σ =2 secs.

a. Probability download speed > 9 secs? 84.13%- Light


blue area
15.87%- Dark
blue area
P(>9)= 1 - P(<9)= 1.0000 - 0.8413
= 0.1587, or 15.87%.

65.87%- Two
b. Probability download speed <7 or > 9 secs? light blue areas
P(<7 or >9)= P(<7) + P(>9)
= 0.5000 + (1-0.8413) From z table: p-540 & 541, TB-1.
= 0.6587, or 65.87%. x z = (x-Mea n)/Stdev
Area to the
left of z
68.26%-
5 = (5-7)/2 = -1.00 0.1587
Red area
c. Probability download speed is 5 to 9 secs? 7 = (7-7)/2 = 0. 0.5000
9 = (9-7)/2 = 1.00 0.8413
P(5 to 9)= P(<9) - P(<5)
= 0.8413 - 0.1587
= 0.6826, or 68.26%.
17
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

HWs with solutions

18
Using MS Excel: Obesity

What proportion of 60-year old males have BMI-


a) <18, b) 18 to 23, c) 23 to 27, d) 27 to 32, e) 32 to 37, f) >37?
Under Normal Over Obseity Obesity Obesity
Weight Weight Weight Class 1 Class 2 Class 3
3.3% 12.5% 21.1% 32.2% 21.7% 9.1%

MS Excel function-
3.3% 12.5% 21.1% 32.2% 21.7% 9.1%
=NORM.DIST(x,Mean,Stdev,TRUE)
MS Excel gives area to the left of x.
BMI Area MS Excel Formula
Less than <18 0.0334 =NORM.DIST(18,29,6,TRUE) 3.3% 12.5% 21.1% 32.2% 21.7% 9.1%

18 to 23 0.1253 =NORM.DIST(23,29,6,TRUE)-NORM.DIST(18,29,6,TRUE)

Obesity Class-III
Obesity Class-II
Obesity Class-I
Normal weight
23 to 27 0.2108 =NORM.DIST(27,29,6,TRUE)-NORM.DIST(23,29,6,TRUE)

Underweight

Over weight
27 to 32 0.3220 =NORM.DIST(32,29,6,TRUE)-NORM.DIST(27,29,6,TRUE)
32 to 37 0.2173 =NORM.DIST(37,29,6,TRUE)-NORM.DIST(32,29,6,TRUE)
More than 37 0.0912 =1-NORM.DIST(37,29,6,TRUE)
Sum 1.0000

Potential applications:
Capacity planning & target marketing by Wellness centers, Demand 5 11 17 23 29 35 41 47 53
for nutrition, Demand Obesity Class-III specialists…… 18 23 27 32 37 19
BITS Pilani, Pilani Campus
Using Z table: Distribution of IQ

IQ test results have near Normal distribution, with Mean, μ=100 and Standard
deviation, σ=15.

 What % of people have IQ < 70? = 0.0228, 2.28%.


 What % of people have IQ 70 to 80? = 0.0918 - 0.0228 = 0.0690, 6.90%.
 What % of people have IQ 80 to 90? = 0.2514 - 0.0918 = 0.1596, 15.96%.
 What % of people have IQ 90 to 100? = 0.5000 - 0.2514 = 0.2486, 24.86%. 2.28% 6.9% 15.96% 24.86% 40.82% 8.86% 0.38%
 What % of people have IQ 100 to 120? = 0.9082 - 0.5000 = 0.4082, 40.82%.
 What % of people have IQ 120 to 140? = 0.9962 - 0.9082 = 0.0880, 8.80%.
 What % of people have IQ > 140 = 1.0000 - 0.9962 = 0.0038, 0.38%.

Area from z table- P-540/541, TB-1


Area to the IQ: Mean=100; Stdev=15.
x z = (x-Mean)/Stdev
left of z
70 z= (70-100)/15= -2.00 0.0228
80 z= (80-100)/15= -1.33. 0.0918
90 z= (90-100)/15= -0.67. 0.2514
100 z= (100-100)/15= 0. 0.5000
110 z= (110-100)/15= 0.67. 0.7486
120 z= (120-100)/15= 1.33. 0.9082
130 z= (130-100)/15= 2.00. 0.9772
140 z= (140-100)/15= 2.67. 0.9962 20
BITS Pilani, Pilani Campus
HW: Use Z table AND Excel function

Industrial Loan amount sanctioned-


 Mean amount sanctioned= Rs 300 lakhs (L).
 Stdev= Rs 50 lakhs (L).
Assume Normal distribution.

A borrower is selected at random. What is the


probability (area) that following loan amount
is sanctioned to the borrower-
a) <300 L
b) <350 L
You must do this HW if Normal
c) <375 L
d) <325 L
distribution is new to you.
e) >320 L
f) >270 L
g) between 200 to 400 L
h) between 280 to 380 L

21
BITS Pilani, Pilani Campus
HW-05:
Download Excel file at Taxila. Below Topic-2.

Likely Contents-
1. Find Area- z value is given.
2. Find z value- Area is given.
3. IQ- Get distribution of IQ; Mean and Stdev are given.
4. Obesity- Get distribution of BMI; Mean and Stdev are given.
5. Museum- Revenue projection from Height-based pricing.
6. QuarterFinals- Stadium capacity.
7. Warranty- Financial impact of different duration of warranty, and
Budgeting.
8. JaipurExpress- OnTimeEveryTime.

22
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Nice to know
Big Bang origin of ….

Normal distribution
 This distribution first appeared in a paper
by DeMoivre , a Frenchman, in 1733. On
gambling.
 Other contributors- Laplace- a Frenchman,
and Gauss- a German, 1807.
 The name Normal is attributed to Galton
and Pearson, both English, 1880s/1890s.

24
BITS Pilani, Pilani Campus
Normal distribution?
CBSE. Class 12. Examinations 2015.
“The distribution, in this case, is much more normal and
symmetrical than the individual subjects' distribution.”

The peak on zero marks could mostly be attributed to


students who did not show up for the exams at all.

https://en.wikipedia.org/wiki/Central_Board_of_Secondary_Education

Mean marks= 63.1, Std dev=19.9.


N= 1,002,879.
Poor interpretation of Normal distribution by CBSE.

HW: What do several peaks in the graph might indicate?

25
BITS Pilani, Pilani Campus
Compare following Normal distributions

A B C D

60 70 90 100

Rank above Normal distributions Rank above Normal distributions Rank above Normal distributions
on their means and standard on their means and standard on their means and standard
deviations. deviations. deviations.
Mean: Blue=Red=Yellow=1000. Mean: Blue (30) < Red (50) Mean: A (60) < B (70) < C (90) < D (100).
Stdev: Blue < Red < Yellow. Stdev: Blue = Red. Stdev: D < A < C < B.

Now the solution…..

26
BITS Pilani, Pilani Campus
Beware- Three types of tables

Area is given from Area is given in Area is given from


extreme Left the Middle extreme Right

 Other textbooks may give area in the middle or


to the right of z value.
 TB-1 gives area to the left of z value.
 MS Excel also gives area to the left of z value.

27
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Normal distribution- Applications and HW


Pricing
Weight-based pricing Body-size based pricing Height-based pricing

See next slide.

How much discount you are likely to get at this restaurant?

29
BITS Pilani, Pilani Campus
Application: Height-based pricing
The average height (μ) of visitors to the Statute Museum
has been recorded as 150 cms with standard deviation
(σ) of 5 cms.
The museum is planning to introduce the following
height-based entry fee.
Height, X cm Entry fee, Rs Visitors, % Visitors, nos. Revenue, Rs
< 140 0 2.28 14 -
140-147 10 25.15 151 1,509
147-157 20 64.50 387 7,740
> 157 50 8.08 48 2,423 2.28% 25.15% 64.50% 8.08%
Total 100.00 600 11,672
(Expected) Visitors, % = From Normal distribution.
Excel functions used-
(Expected) Visitors, nos. = (Expected) Visitors, %/100 * Visitors (600). 0.0228; =Norm.Dist(140,150,5,True)
(Expected) Revenue= Fee, Rs * (Expected) Visitors, nos. 0.2515; =Norm.Dist(147,150,5,True) - Norm.Dist(140,150,5,True)
0.6450; =Norm.Dist(157,150,5,True) - Norm.Dist(147,150,5,True)
0.0808; =1-Norm.Dist(157,150,5,True)
What is the expected daily revenue (Expected Value) if
the museum gets 600 visitors a day? Ans- Rs 11,672/day.
Now solution will appear……..
Height is assumed to be Normal distributed,
with Mean= 150 cms and Stdev= 5cms.
This problem also appears in HW-05,
Excel file at Taxila. 30
BITS Pilani, Pilani Campus
Application: Warranty
OnOff Ltd sells 100K nos. of electric bulbs annually.
The Mean life the bulbs is 15K hours and Stdev of 4K 22.7% 30.9% 40.1%
hours. Assume life of the bulbs can be approximated
by the Normal distribution.
=Norm.Dist(12,15,4,True) =Norm.Dist(13,15,4,True) =Norm.Dist(14,15,4,True)
If a bulb fails within the warranty period, OnOff
offers to refund Rs 200.
Warranty, Explected Refund Amount,
Area %
What is the annual expected warranty refund cost if '000 hours failures, nos Rs
the warranty period is- 12 22.7 22,663 45,32,547
a. 12K hours? Rs 45.32 L. 13 30.9 30,854 61,70,751
14 40.1 40,129 80,25,873
b. 13K hours? Rs 61.71 L.
Expected failures, nos= Area %/100 * 100,000.
c. 14K hours? Rs 80.26 L. Refund Amount, Rs= Expected failures * Refund/failure (Rs 200).

d. If the average annual budget for refunds is Rs 25 lakhs,


what should be the warranty period? 10.399 K hours. Equivalent to
Budget, Rs Failures, nos. Area % X, Life
25,00,000 12,500 12.50 10.399 =NORM.INV(0.125,15,4)
This problem also appears in HW-05,
Equivalent failures, nos= Budget amount (Rs 25L)/Refund/failure (Rs200).
Excel file at Taxila. Area %= Equivalent failures/Annual sales * 100.
= 12,500/100,000 * 100= 12.50%.

31

BITS Pilani, Pilani Campus


Generate probability distribution

Runs scored by Tendulkar in an innings,


Mean= 48 and Stdev= 51.

1. Proportion of innings in which Tendulkar scored


between 48 and 99 runs?
2. What is the probability distribution of runs scored?

It will be misuse/abuse if Normal distribution is


used in this case !!!

-156 -105 -54 -3 48 99 150 201 252


32
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

For doctors (andtheirpatients)


Take 5…

For Cardiologists For Gynecologists For Oncologists

https://www.researchgate.net/publication/318569954_Early_diagnosis_of_chronic_conditions_and_lif
https://en.wikipedia.org/wiki/Gestational_age https://www.researchgate.net/publication/24356000_Lung_Cancer_Susceptibility_Model
estyle_modification/figures?lo=1
_Based_on_Age_Family_History_and_Genetic_Variants/figures?lo=1

HW:
For Ophthalmologists For ENT specialists Which of these distributions can be
approximated by Normal distribution?

https://dizziness-and-balance.com/disorders/bppv/bppv.html
34
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Next chapter
Chapter-7: Sampling Distributions
Quantitative Methods

Lecture-10 1Mar’22. 7-9pm.


BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Will be used today- “The Cumulative Standardized
Normal Distribution” table on p-540 & 541.

2. Will be used in the next class- “Critical Values of t” table


on p-542 & 543.

3. PPT for today’s session is available in advance at Taxila,


under Topic-1.

4. Students should refer to the Course Handout available at


Taxila (eLearn) and PPT of Lecture-1 (16Jan) available at
Impartus for prescribed Textbooks, Evaluation plan,
coverage, syllabus, etc.

5. Post your messages only on Discussion Forum at Taxila,


and not at Impartus.

2
BITS Pilani, Pilani Campus
QM Mid-Term test

WILP will share common instructions applicable to all the


courses.

QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots/No computer outputs.
8. Explain well. Express well.

A tip/a warning/an advice- TB-1


 No unfair means.
 Do not plead for fairness with other questions or with other
students if unfair practice is used. This slide was first used in 27Feb class.
3
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb, 20Feb
1 5 Discrete Probability Distributions 20Feb, 23Feb
1 6 The Normal Distribution 23Feb, 27Feb
1 7 Sampling Distributions 1Mar
Textbook #1
1 8 Confidence Interval Estimation
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
5
BITS Pilani, Pilani Campus
Chapter-7
Sampling Distributions
BITS Pilani
Pilani Campus

6
Topics

Chapter-1: Defining and Collecting Data Relevant Pre-recorded lectures for


Chapter-7, accessible from Taxila
1.3 Types of sampling methods.
1.4 Types of survey errors.

Chapter-7: Sampling Distributions


1. Sampling distributions.
2. Sampling distribution of the mean.
3. Sampling distribution of the proportion.

7
BITS Pilani, Pilani Campus
Most important slide of this course

Learn, understand, and master-

1.Error
Class Interval, CI Freq.
100-300 4
300-500 33
500-700 11
2.Frequency/Probability distribution 700-900
900-1100
Total
1
1
50

3.Variation

4.Correlation
This slide was first used in L-03, on 30Jan.

8
BITS Pilani, Pilani Campus
Samples

9
BITS Pilani, Pilani Campus
Census and Sampling
Census Population Sample We want to know-
 Who will win- A or B?
 Entire population (population, tiger,
 Is the Water is safe; Is the soil suitable for crop?
agriculture, health facilities).
 Is the new Drug safe? Is it effective?
Sampling  Bulbs/Concrete cubes meet the specs?
 A portion of the population.  Life of elephants?
 What is the inflation (price change)?
 Quality, voting, blood, soil, customer
 Customer/Employee satisfaction surveys.
surveys, voice, interviews, …. Time spend on
watching the TV in
 What % people believe in Evolution theory?
the last week.

Why samples?
 Quicker. Own a pet- Y/N?
 Cheaper.
 May not participate/Not available.
 When tests are destructive.
 Scientifically chosen samples can
give good accuracy about the
properties of population.
10
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Types of sampling methods


Sampling methods
Non-probability sampling
 Judgement sampling
 Convenience sampling

Probability sampling
 Simple random sampling
 Each item has equal probability of getting chosen
(=1/N, N is population size).
 Systematic sampling
 Every nth customer/item/bottle on the production
line.
Simple random sampling Stratified sampling
 Stratified sampling probability=1/12 of each item Probability=proportional to each strata
 Samples from each strata: Men/Women
(20%/80%); Rural/Urban (30%, 70%);
Steel/Chemicals/Telecom stocks (10%, 20%,
70%); Customers/Non Customers (10%, 90%);
Tourist/NonTourist (60%, 40%).
 Cluster sampling
 Samples from a geographical district. Systematic sampling
Every 3rd item/bottle/customer
Cluster sampling
probability=1/6 of each cluster 12
BITS Pilani, Pilani Campus
Parameter and Statistic

Parameter Statistic
(Population) (Sample) Population

Population Mean, µ Sample Mean, � X Sample


Population Stdev, σ Sample Stdev, S
Population size, N Sample size, n

Proportion in population, π Proportion in sample, p

Characteristics of a population are called parameters and


characteristics of a Sample are called Statistics.

Population and Sample Variance


Formulas-
Population variance, σ2 = 1/N * ∑ (xi-Pop. Mean)2 p-128, TB-1.
Sample variance, S2 = 1/(n-1) * ∑ (xi- Sample Mean)2 p-109, TB-1.

Population variance: divide by Population size, N.


Sample variance: divide by Sample size-1, n-1.

In MS Excel- Variance and Standard deviations:


Population =VAR.P(Range) =STDEV.P(Range)
Sample =VAR.S(Range) =STDEV.S(Range)
13
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Types of survey errors


Survey errors

1. Coverage error-
Excluded from the frame

2. Nonresponse error

3. Measurement error-
Bad or leading question
SE: 45.8-36.3= 9.5.

SE: 37.8-36.3= 1.5.

4. Sampling Error (SE)- SE: 35.5-36.3= -0.8.


(Chance differences from sample to sample).
15
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Sampling distributions
Sampling distribution of the mean
Sampling distribution of the proportion
Relationship between population and
samples
Characteristics of the sample are known. What are
the characteristics of its population?
 Chapter-8 to 11: Confidence Interval Estimation,
Hypothesis Testing, Two Sample Tests and ANOVA, Chi- 11 32 11
32
Square Tests. 5
78 5
45
78 45
𝑋𝑋� =34.2; S=29.3, n=5.

µ=?, σ=?

To understand above-
1. First understand: When samples are drawn from a
population whose characteristics are known, what are the
characteristics of samples?
2. Then establish a relationship between sample
characteristics and population characteristics.
 Chapter-7: Sampling Distributions (Mean and Proportion)

Characteristics of a population/sample can be its-


 Mean, Standard deviation, Median, Mode, Skewness, Kurtosis, etc.,
for Interval/Ratio scale data; Proportion for Nominal/ Category
data, etc. 17
BITS Pilani, Pilani Campus
Sampling distributions
What are the characteristics of samples that are drawn from a
Population distribution
population whose characteristics (mean, stdev., etc.) are known? Mean, μ=36.3,
Stdev, σ=29.8.
Each sample has its own characteristics (mean, stdev., median, etc.). μ=36.3.
Since numerous samples can be taken from a population, a sample
characteristic will have a distribution, called Sampling distribution. �
Distribution of the means of samples, X
Sampling distribution; for n= 4.
Relationship between a Sampling distribution and its Population
Mean, �X = μ=36.3.
distribution- Stdev, σ ̅ = σ/√𝑛𝑛.
X
1. Mean of Sampling distribution, �
X = Mean of Pop. distribution, μ. = 29.8/Sqrt(4)= 14.9.

X = 36.3.
2. Stdev. of Sampling distribution, σX̅ = Stdev. of Pop. distribution/ 𝑛𝑛.
= σ/ 𝑛𝑛.
3. A sampling distribution tends to resemble Normal distribution as Sampling distribution; for n =4.
sample size (n) increases. Mean, �X = μ =36.3.
Stdev, σ ̅ = σ/√𝑛𝑛.
X
= 29.8/Sqrt(4)=14.9.
Stdev. of a sampling distribution is called Standard error.
Population distribution-
Similarly for samples of proportions (Y/N; T/F; Defective/Ok)- Mean, μ=36.3,
Stdev, σ=29.8.
1. Mean of sampling distribution, p= Proportion in the population, π.
2. Stdev of sampling distribution, σp= π ∗ (1 − π)/n. �
X =μ = 36.3..
3. A sampling distribution tend to resemble Normal distribution as  Notice that the Sampling (green colored) distribution has
lower Stdev. than the Population (black colored) distribution.
sample size (n) increases.  And, Stdev. of the Sampling distribution (green colored)
decrease as sample size increases- see the equation. 18
BITS Pilani, Pilani Campus
Central Limit Theorem (CLT)

If population is Normal distributed, Population distribution.


then Sampling distribution is Normal distributed.
μ.

If population is not Normal distributed,


then Sampling distribution approaches Normal
distribution as sample size increases.

That is, Sampling distribution can be taken as Normal


distributed if sample size is large… say 25+ nos.

Sampling distribution.


X =μ.

Central Limit Theorem result is the major reason that


Normal distribution is so important and widely used.

19
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Problems from TB-1

20
Problem-1
p-238, TB-1.

A cereal filling machine is set to fill the boxes with 368g of


cereal. The standard deviation of the filling process is 15g.

1. A random sample of 25 boxes is taken. What is the


standard error (standard deviation) of the mean?
Standard error, σX̅ = σ/ 𝒏𝒏 = 15/ 𝟐𝟐𝟐𝟐 = 3.0g. σX̅ = σ/ 𝒏𝒏 = 15/ 𝟏𝟏𝟏𝟏𝟏𝟏

2. A random sample of 100 boxes is taken. What is the


standard error (standard deviation) of the mean?
Standard error, σX̅ = σ/ 𝒏𝒏 = 15/ 𝟏𝟏𝟏𝟏𝟏𝟏 = 1.5g.

σX̅ = σ/ 𝒏𝒏 = 15/ 𝟐𝟐𝟐𝟐 = 3.0 g

A fourfold increase in the sample size is required to


reduce the standard error to a half. σ=15g

368
Standard error is Standard deviation of the sampling
distribution.
21
BITS Pilani, Pilani Campus
Problem-2
p-239-240, TB-1.

A cereal filling machine is set to fill the boxes with 368g of cereal in
the boxes. The standard deviation of the filling process is 15g.
Assume Normal distribution.
 A random sample of 25 boxes is taken. σ=15g
 What is the probability that the sample mean is below 365g? 368g

� will be 368g.
Mean of the sampling distribution, X,
Its standard error will be σX̅ = σ/ 𝑛𝑛= 15/ 25 = 3.0g.
Sampling distribution will be Normal disturbed (due to CLT).

 The probability that that the sample mean is below 365 g. is 15.87%. σX̅ = σ/ 𝒏𝒏 = 15/ 𝟐𝟐𝟐𝟐 = 3.0 g

To get area from Excel function-


=Norm.Dist(365,368,3,True) 0.1587, or 15.87%.
368g
Area below 365g. 365g
To get area from z table (p-540)- 15.87%

z= (x-Mean)/StdError= (365-368)/3= -1.0.


From z table, Area to the left of z = -1.0 is 0.1587, or 15.87%.
22

BITS Pilani, Pilani Campus


Problem-3
p-241, TB-1.

A cereal filling machine is set to fill the boxes with (μ)


368g of cereal in the boxes. The standard deviation of the
filling process is (σ) 15g. Assume Normal distribution.
σ=15g

 A random sample of 25 boxes is taken. What is the range that 365g


will include middle 95% of sample means?

Mean of the sampling distribution will be 368g (equal to


population mean).
Its standard error will be, σX̅ = σ/ 𝑛𝑛 = 15/ 25 = 3.0g. σX̅ = σ/ 𝒏𝒏 = 15/ 𝟐𝟐𝟐𝟐 = 3.0 g
The sampling distribution will be Normal disturbed, due to CLT.
Middle
95%
 The range that will include middle 95% of sample means is
362.12 to 373.88g.

362.12g 373.88g
To get x values from Excel functions-
=Norm.Inv(0.025,368,3) 362.12, for 2.5% area on the left side.
=Norm.Inv(0.975,368,3) 373.88, for 97.5% area on the left side.

23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Sampling distribution of proportion


24
Problem-4
p-247/248, TB-1.

A report says that 32% adults are unable to stop thinking about work
while on vacation. {That is, population proportion, π is 0.32.}
 A random sample of 200 vacationers is taken.
 What is the probability more than 40% vacationers in the sample
are unable to stop thinking about vacation.

The sampling distribution will be Normal distributed, due to CLT.

Standard error, σp = π ∗ 1 − π /𝑛𝑛


= 0.32 ∗ 1 − 0.32 /200 = 0.033.

 The probability that more than 40% vacationers in the sample are
unable to stop thinking about vacation, is 0.78%.

0.0077 or 0.78%
To get the value from Excel function-
=1-Norm.Dist(0.40,0.32,0.033,True) 0.0077, or 0.78%.
Sample proportion, p= 0.32
Using z table- StdError, σp = 0.033.
z= (p-π)/σp = (0.40-0.32)/0.033= 2.424.
From z table, Area on the left of z= 2.424 is 0.9922.
Area on the right of z= 1.0 - 0.9922 = 0.0078, or 0.78%. 25

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Nice to know

26
A nice way to explain….

A simple, pictorial way to explain Sampling distribution of mean


and Central Limit Theorem….. was available at www.

27
BITS Pilani, Pilani Campus
Beware

1. In TB-1:
 z table gives area to the left of z value (p-540, 541).
 t table gives area to right of t value (542, 543).
 Chi-square table gives area to the right of Chi square value (p-
544).
 F tables give area to the right of F value (p-545 to 551).

2. In MS Excel:
 MS Excel functions for probability distributions are consistent:
They all give areas to the left of z/t/Chi-square/F value.
 Guess- Why all to the left?
 My Guess: Because Bill gates is left-handed.

BITS Pilani, Pilani Campus


BITS Pilani
Pilani Campus

Next chapter
Chapter-8: Confidence Interval estimation

29
Quantitative Methods

Lecture-11 6Mar’22. 10.30am-12.30pm.

BITS Pilani
Pilani Campus Sandeep Kayastha, at Hyderabad

1
Items of interest
1. Will be used today-
 “The Cumulative Standardized Normal Distribution” table on p-
540 & 541.
 “Critical Values of t” table on p-542 & 543.

2. PPT for today’s session is available in advance at Taxila,


under Topic-1.

3. Students should refer to the Course Handout available at


Taxila (eLearn) and PPT of Lecture-1 (16Jan) available at
Impartus for prescribed Textbooks, Evaluation plan,
coverage, syllabus, etc.

4. Post your messages only on Discussion Forum at Taxila,


and not at Impartus.

2
BITS Pilani, Pilani Campus
QM Mid-Term test

WILP will share common instructions applicable to all the


courses.

QM Mid-Term test-
1. Syllabus: Chapter 1 to 8; TB-1.
2. Number of questions-5-7; spread over all the chapters.
3. Numerical: Non-Numerical; planning for … 70:30.
4. Read QP instructions carefully. Questions may have sub-parts.
5. Ensure what is asked is answered, and completely.
6. Show all the workings- no marks without detailed workings.
7. Only hand written will be evaluated. No computer typed/No
Screenshots of the typed material/No computer outputs.
8. Explain well. Express well.

A tip/a warning/an advice- TB-1


 No unfair means.
 Do not plead for fairness with other questions or with other
students if unfair practice is used. This slide was first used in 27Feb class.
3
BITS Pilani, Pilani Campus
The approach

Pace of coverage

Intuitive approach + Visual approach (Stats without equations)

Concept first… calculations later.


Answer first..… calculations later.

4
BITS Pilani, Pilani Campus
Today’s lecture
Textbook # Chapter # Chapter Title Date(s)
1 1 Defining and Collecting Data
16Jan, 23Jan

Mid-Term Test Syllabus


1 2 Organizing and Visualizing Variables
1 3 Numerical Descriptive Measures 30Jan, 6Feb
1 4 Basic Probability 13Feb, 15Feb, 20Feb
1 5 Discrete Probability Distributions 20Feb, 23Feb
1 6 The Normal Distribution 23Feb, 27Feb
1 7 Sampling Distributions 1Mar
Textbook #1
1 8 Confidence Interval Estimation 6Mar
1 9 Fundamentals of Hypothesis Testing: One-Sample Tests
1 10 Two-Sample Tests and ANOVA
1 11 Chi-Square Tests
1 12 Simple Linear Regression
2 7 An Introduction to Linear Programming
2 9 Linear Programming Applications in Marketing, Finance, and Operations Management
2 10 Distribution and Network Models
Textbook #2
5
BITS Pilani, Pilani Campus
Chapter-8
Confidence Interval Estimation
BITS Pilani
Pilani Campus

6
Topics

Chapter-8: Confidence Interval Estimation Relevant Pre-recorded lectures,


accessible from Taxila

1. Confidence interval estimation for the mean (σ known)


since this does not happen because population standard
deviation, σ, is always unknown .
2. Confidence interval estimation for the mean (σ unknown).
3. Confidence interval estimation for the proportion.

Will be done in the next session + not in the syllabus for MidTerm exam
3. Determining sample size.

7
BITS Pilani, Pilani Campus
Estimation of population proportion and
Population mean

NY NN 4
8
N N 4 7
N YN 7 9 4
NN Y N N 9 3 12 7
Y
N N N
3
6 10
N N 8

Sample result: Y=4, N=16. Sample size, n=15. Sum = 101.


Have a credit card= 20% (4/20). Sample mean=101/15=6.73.
Sample Stdev=2.71.

What % of the population Estimation of proportion in the Mean of the population?


has a credit card? population, from sample statistics. Estimation of mean of the population,
from sample statistics.

Proportion (Percent) Mean


Likely to vote for opposition, Children take tuition, Smokers, Life of the battery, Duration of a phone call, Daily change in
Employees insured, Parts/Components manufactured are defective, stock market index, Income of iPhone users, Number of days
Adults who watch PogoTV, Computers infected, Customers have of stay in a hotel/hospital, Resistance (of resistors), Time
multiple accounts, Live in rented accommodation, Loans declined, spent on fb, Tax liability, TRP, Number of ambulance
Have DTH, Liked inflight food, Ordered online, Domestic violence… requests in a day……

Types of Estimation-
1. Estimation of population proportion (left.)
2. Estimation of population mean (right).
8
BITS Pilani, Pilani Campus
Point and Interval estimates

Sample statistics Estimate of population mean

4 Point estimate: Most likely it is an incorrect estimate of the population mean.


8
4 7
7 9 4
9 3 12 7 Interval estimate-1, Somewhat confident that the estimate interval is correct.
6 10
3
8 Interval estimate-2, More confident that the estimate interval is correct.
Sample size, n=15. Sum = 101.
Sample mean=101/15=6.73. Interval estimate-3, Much more confident that the estimate interval is correct.
Sample Stdev=2.71.

Wider the interval, More confidence but Less useful.


In statistics, the most used confidence level is 95%, followed by 90% and 99%.

9
BITS Pilani, Pilani Campus
Making the estimates

1. To get Point estimate, use these equations-


 Population mean estimate= Mean of the Sample.
 Population proportion estimate= Proportion in the Sample.

X-MoE �
X �
X+MoE
2. To get Interval estimate, use these equations-
 Population mean= Sample mean ± Margin of Error.
 Population proportion= Sample proportion ± Margin of Error.

Sample Mean, �
X
Notice that, after rearranging the last two equations-
For population mean-
 Margin of Error= Population mean - Sample mean.
For population proportion-
 Margin of Error= Population proportion - Sample proportion.
.

10
BITS Pilani, Pilani Campus
Confidence level

If the sample mean is used as an estimate of population


mean, then the point estimate is likely to be wrong.

 Therefore, a confidence interval estimate is constructed around


the point estimate. “The confidence interval is constructed such
that the probability that the interval includes population
parameter is known” (p-259, TB-1).

 Mostly 95% (sometimes 90% or 99%) confidence interval is


used. The meaning of 95% confidence interval: 95% of such
interval estimates will contain the true population mean. Sample mean

 z or t values for 95% confidence level can be obtained from


Excel functions or t-table/z-table-
 t-table is used for interval estimate of mean (p-542,543; TB-1).
 z-table is used for interval estimate of proportion (p-540, 541; TB-1).

11
BITS Pilani, Pilani Campus
Equations for Estimating population mean and
proportion
Population mean, µ= Sample mean ± Margin of Error
= Sample mean ± t-value * Std error of mean
= Sample mean ± t-value * S/ 𝑛𝑛
= X� ± tvalue * S/ n

Population proportion, π = Sample proportion ± Margin of error


= Sample proportion ± z-value * Std error of proportion
= Sample proportion ± z-value * 𝑝𝑝 ∗ (1 − 𝑝𝑝)/𝑛𝑛
= 𝑝𝑝 ± zvalue * 𝑝𝑝 ∗ (1 − 𝑝𝑝)/𝑛𝑛

X− μ ObservedValue−Pop. Mean
Standard error is standard deviation of the sampling distribution. z= =
n- sample size. S- Standard deviation of the sample (divide by n-1, and not by n). σ Pop. Stdev
�− μ
X Sample Mean−Pop. Mean
For mean- t-value from t-table, p-542, 543, TB-1. t= =
For proportion: p- proportion in the sample. Z-value from Z-table, p-540, 541; TB-1. S/ n Stdev of Sampling distribution

(A quick, dirty trick: t or Z value is generally between 1.5 to 3).


12
BITS Pilani, Pilani Campus
t (student) distribution

 t-distribution is a theoretical distribution. This distribution is


symmetrical and it resembles the Normal distribution. Total area t
(probability) under the curve=1.
5% 2.5% 2% 1% 0.5%
 t-distribution is flatter than Normal distribution; that is, its variance is
higher. t-distribution is defined only by a single parameter- degrees
of freedom, df. Recall that the Normal distribution is defined by two
parameters- mean and standard deviation.
 t-distribution table on the right and in TB-1 (p-542, 543) give t-values
for the area given in the right tail, for a given degree of freedom, df.

 Reading the t-distribution table-


 For area in the right tail 2.5% and df=15, t-value is 2.131.
 For area in the right tail 5% and df=10, t-value is 1.812.
 For area in the left tail 5% and df=10, t value is -1.812… due to symmetry.
 For area in the left tail 10% and df=20, t-value is -1.341… due to symmetry.

 Excel function gives t-value for the given area on the left tail.
 =T.INV(AreaOnTheLeftOftValue,df).
 =T.INV(0.975,15) =2.131. for 97.5% area on left or 2.5% on right.
 =T.INV(0.950,10) =1.812. for 95.0% area on left or 5,0% on right.

 Not for everyone: The ratio of two Normal distributed variates has t- t-table given in TB-1 has df up to 120.
distribution with n-1 degrees of freedom.
BITS Pilani, Pilani Campus
Degrees of freedom, df

Degrees of freedom: The number of independent


pieces of information that go into the estimate of a
parameter.

� 2
Sample variance, S2= 1/(n-1) * ∑ (Xi-𝑋𝑋)

The number of independent pieces of information that go


into the estimate of variance are n-1, where n is sample
size.
One degree of freedom is lost since sample mean used in
the above equation is computed from sample data.

For using t-distribution, degrees of freedom= Sample size-1.

14
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Confidence interval estimation


for the mean (σ unknown)
15
Example-1
p-268, TB-1.
Sample statistics
130 � = $ 110.27. Stdev.s, S= $ 28.95.
140  Number of invoices, n=100. Mean, X
80

80 120 Point estimate, 𝜇𝜇 = Sample mean.


� = $ 110.27.
𝜇𝜇 = X a.

Interval estimate, 𝜇𝜇 = Sample mean ± Margin of Error


a. Estimate mean value of sales invoices. =X� ± tvalue * Std. Error of mean
b. What is the estimation interval, for = X� ± tvalue * S/ 𝑛𝑛
95% confidence level?
= 110.27 ± 1.9842 * 28.95/ 100
Number of invoices, n=100.
= 110.27 ± 5.74
� = $ 110.27. Stdev.s, S = $ 28.95.
Mean, X = 110.27-5.74 to 110.27+5.74.
104.53 to 116.01 dollars. b.

t-value from Excel function- df= n-1= 100-1= 99.


=T.INV(area in the left tail,df) [2.5% and 97.5%; middle area 95%].
=T.INV(0.025,99) = -1.9842. 2.5% area in the left tail.
=T.INV(0.975,99) = +1.9842. 97.5% area in the left tail.

t-value from table, p-543, TB-1. df= n-1= 100-1= 99.


tvalue = +1.9842. For 2.5% area in the right tail.
16
tvalue = -1.9842. For 2.5% in the left tail, since t-table is symmetric.

BITS Pilani, Pilani Campus


Example-2
p-269, 270. TB-1.

Sample characteristics-
 Sample size, n= 27. Mean, �X= 43.89 days. Stdev.s, S= 25.28 days.

Point estimate, 𝜇𝜇 = Sample mean.


� = 43.89 days.
𝜇𝜇 = X a.

Interval estimate, 𝜇𝜇 = Sample mean ± Margin of error


a. Estimate the mean time taken to process life =X� ± tvalue * Std error of mean
insurance applications. =X� ± tvalue * S/ n
b. What is the estimation interval, for 95% = 43.89 ± 2.0555 * 25.28/ 27
confidence level?
= 43.89 ± 10.00, or
Processing time of 27 applications (days).
33.89 to 53.89 days. b.
73 19 16 64 28 28 31 90 60 56 31 56 22 18
45 48 17 17 17 91 92 63 50 51 69 16 17
t-value from Excel function- df= n-1= 27-1= 26.
 Sample size, n= 27. =T.INV(area in the left tail,df) [2.5% and 97.5%; middle area 95%].
 Mean, �X= 43.89 days. Stdev.s, S= 25.28 days. =T.INV(0.025,26) = -2.0555 2.5% area in the left tail.
=T.INV(0.975,26) = +2.0555 97.5% area in the left tail.

t-value from table, p-542, TB-1. df= n-1= 27-1= 26.


tvalue = +2.0555 For 2.5% area in the right tail.
tvalue = -2.0555 For 2.5% area in the left tail, since t-table is symmetric. 17
BITS Pilani, Pilani Campus
Problem-1
p-286, problem 8.65, TB-1.

a. Estimate the mean weight of Tea bags. Sample characteristics-


b. What is the estimation interval, for 99%  Sample size, n= 50. Mean, �X= 5.5014g. Stdev.s, S= 0.1058g.
confidence level?
Point estimate, 𝜇𝜇 = Sample mean.
Weight of 50 Tea bags (g). � = 5.5014g.
𝜇𝜇 = X a.
5.25 5.42 5.49 5.54 5.58
5.29 5.42 5.50 5.54 5.58
CI estimate, 𝜇𝜇 = Sample mean ± Margin of Error
5.32 5.44 5.50 5.55 5.61
5.32 5.44 5.50 5.55 5.61 =X� ± tvalue * Std. Error of mean
5.34 5.44 5.51 5.56 5.62 =X� ± tvalue * S/ n
5.36 5.45 5.52 5.56 5.63
5.40 5.45 5.53 5.57 5.65 = 5.5014 ± 2.6800 * 0.1058/ 50
5.40 5.46 5.53 5.57 5.67 = 5.5014 ± 0.0401, or
5.40 5.47 5.53 5.57 5.67
= 5.4613 to 5.5415g. b.
5.41 5.47 5.53 5.58 5.77

Sample size, n= 50. t-value from Excel function- df=n-1= 50-1= 49.
Mean, �X= 5.5014g. Stdev.s, S= 0.1058g. =T.Inv(area in the left tail,df) [0.5% and 99.5%; middle area 99%]
=T.Inv(0.005,49) = -2.6800 0.5% area in the left tail.
=T.Inv(0.995,49) = +2.6800 99.5% area in the left tail.

t-value from table, p-542, TB-1. df=n-1= 50-1= 49.


tvalue = +2.6800 For 0.5% area in the right tail.
tvalue = -2.6800 For 0.5% area in the left tail, since t-table is symmetric.
18
BITS Pilani, Pilani Campus
Problem-2
p-286, problem 8.66. TB-1.

a. Estimate the mean width of Steel troughs. Sample characteristics-


b. What is the estimation interval, for 95%  Sample size n= 49. Mean, �X= 8.4209 inches. Stdev.s.S= 0.0461 inches.
confidence level?
Point estimate, 𝜇𝜇 = Sample mean.
Width of 49 Steel troughs (inches).
� = 8.4209 inches.
𝜇𝜇 = X a
8.312 8.476 8.436 8.460 8.396
8.343 8.382 8.413 8.444 8.447 CI estimate, 𝜇𝜇 = Sample mean ± Margin of Error
8.317 8.484 8.489 8.429 8.405 =X� ± tvalue * Std. Error of mean
8.383 8.403 8.414 8.460 8.439
8.348 8.414 8.481 8.412 8.411 =X� ± tvalue * S/ n
8.410 8.419 8.415 8.420 8.427 = 8.4209 ± 2.011 * 0.0461/ 49
8.351 8.385 8.479 8.410 8.420
8.373 8.465 8.429 8.405 8.498
= 8.4209 ± 0.0132, or
8.481 8.498 8.458 8.323 8.409 = 8.4077 to 8.4341 inches. b.
8.422 8.447 8.462 8.420
t-value from Excel function- df= n-1= 49-1= 48.
Sample size n= 49. =T.INV(area in the left tail,df) [2.5% and 97.5%; middle area 95%].
Mean, �X= 8.4209 inches. Stdev.s, S= 0.0461 inches. =T.INV(0.025,48) = -2.011. 2.5% area in the left tail.
=T.INV(0.975,48) = +2.011. 97.5% area in the left tail.

t-value from table, p-542, TB-1. df= n-1= 49-1= 48.


tvalue = +2.011. For 2.5% area in the right tail,.
tvalue = -2.011. For 2.5% area in the left tail. since t-table is symmetric.
19
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Confidence interval estimation


for the proportion
20
Example-1
p-273, TB-1.

Sample characteristics-
 Sample size, n= 100. Invoices containing errors, p= 10.
 Proportion containing errors, p=10/100= 0.10 or 10%.
Point estimate, π = Sample proportion.
π =p= 0.10. a

CI estimate: π = Sample proportion ± Margin of error


= p ± zvalue * Std. Error of proportion
a. Estimate the proportion of invoices (π) = p ± zvalue * p ∗ (1 − p)/n
having errors. = 0.10 ± 1.96 * 0.10 ∗ (1 − 0.10)/100
b. What is the estimation interval, for 95% = 0.10 ± 0.0588, or 0.0412 to 0.1588.
confidence level? 4.12 to 15.88%. b.
 Sample size, n= 100.
 Invoices containing errors- 10. z-value from Excel function-
=NORM.INV(Area in the Left Tail,0,1) [2.5% and 97.5%; middle area 95%].
=NORM.INV(0.025,0,1) = -1.96. 2.5% area in the left tail.
=NORM.INV(0.975,0,1) = +1.96. 97.5% area in the left tail.

z-value from table, p-540, 541, TB-1.


zvalue = -1.96. For 2.5% area in the left tail. p-540.
zvalue = +1.96. For 97.5% area in the left tail. p-541. 21

BITS Pilani, Pilani Campus


Example-2
p-274, TB-1.

Sample characteristics-
 Sample size, n=200. Non-conforming newspapers= 35 nos.
 Proportion non-conforming, p = 35/200 = 0.175 or 17.5%.

Point estimate, π = Sample proportion.


π = p= 0.175. a.

a. Estimate the proportion of newspapers (π) Interval estimate of π = 𝑝𝑝 ± z90% * Std. Error of proportion
having non-conformance attribute? = 𝑝𝑝 ± z90% * p ∗ (1 − p)/n
b. What is the estimation interval, for 90% = 0.175 ± 1.645 * 0.175 ∗ (1 − 0.175)/200
confidence level?
= 0.175 ± 0.0442, or 0.1308 to 0.2192.
Sample size, n= 200. 13.08 to 21.92%. b.
Non conforming newspapers= 35.
z-value from Excel function-
=NORM.INV(area in the left tail,0,1) [5% and 95%; middle area 90%].
=NORM.INV(0.05,0,1) = -1.645 5% area in the left tail.
=NORM.INV(0.95,0,1) = +1.645 95% area in the left tail.

z-value from table, p-540, 541, TB-1.


zvalue = -1.645 For 5% area in the left tail. p-540.
zvalue = +1.645 For 95% area in the left tail. p-541. 22

BITS Pilani, Pilani Campus


Interval estimate- Chapter summary

1. To estimate population mean-


Use the following formula (and t distribution value)
� ± tvalue * S/ n
Interval estimate, 𝜇𝜇 = X

2. To estimate population proportion-


Use the following formula (and z distribution value)
Interval estimate, π = p ± zvalue * p ∗ (1 − p)/n

23
BITS Pilani, Pilani Campus
BITS Pilani
Pilani Campus

Determining Sample Size

You might also like