Probability & Statistics

You might also like

Download as pdf
Download as pdf
You are on page 1of 373
‘or Bachelor of Engineering (Pokhara University) Ce Le Insights on Cr LAR Statistics i eyeoelbsnec Insights on PROBABILITY and STATISTICS Published by : System Inception Author : Anjeeb Lal Shrestha Copyright©: Author All rights reserved. This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the authors! prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of the copyright owner of the book. Edition : Second 2079 BS Computer: The Creation Graphics Bagbazar, Kathmandu CONTENTS INTRODUCTION OF STATISTICS AND PRESENTATION OF DATA Ll 1.2 13 14 1.5 1.6 17 1.8 19 Introduction to Statistics.. Application of Statistics in Engineering... Variable, Types of Variable: Numerical and Categorical Variable... Sources of Data: Primary and Secondary Source..... Presentation and Classification of Data; Stem-and-Leaf Displays..... Frequency Distribution... Diagrammatic and Graphical Presentation of Data: Pareto . Diagram... Pie-Diagram, Histogram, Frequency Curve, and Frequency Polygot Cumulative Frequency Curve or Ogive Curv SOLUTION TO IMPORTANT QUESTIONS... SUMMARIZING AND DESCRIBING THE NUMERICAL. DATA 2.1 2.2 2.3 24 Measure of Central Tendency, Partition Values... Measure of Variation. Coefficient of Variatior Box and Whisker Plot... SOLUTION TQ IMPORTANT QUESTIONS. 3.1 PROBABILITY Random Experiment, Sample Space, Event and Types of Events, Counting Rulle.... 3.2 Various Approaches to Probability ... 3.3 Laws of Probability — Additive, Multiplicative 3.4 Conditional Probability and Independence ...... 3.5 “Baye’s Theorem...... SOLUTION TO IMPORTANT QUESTIONS... 1g RANDOM VARIABLE AND PROBABILITY DISTRIBUTION 4.1 | Random Variable: Discrete and Continuous Random Variable... 4.2 Probability Mass Function. 4.3 Expectation, Laws of Expectation (Addition and Product Law)... 4.4 Discrete Probability Distribution. 4.4.1 Binomial Distribution .. 4.4.2 Poisson Distribution. 4.4.3 Hypergeometric Distribution .. 4.4.4 Negative Binomial Distributio 4.5 Probability Density Function, Cumulative Distribution Function, Expected Values of Continuous Random Variables... 4.6 Continuous Probability Distribution .. 4.6.1 Rectangular Distribution 4.6.2. Exponential Distributio: 4.6.3. Gamma Distribution.. 4.6.4 Normal Distribution . 4.6.5 Log-Normal Distribution .. 4.6.7 Beta Distributios 4.6.8 Chi Square Distribution. SOLUTION TO IMPORTANT QUESTIONS... -- 100 101 102 ass eae” BIVARIATE RANDOM VARIABLES AND JOINT PROBABILITY DISTRIBUTION 5.1 Joint Probability Mass Function, Joint Probability Density Function, Joint Probability Distribution Function ....178 5.2 Marginal Probability Mass Function, Marginal Probability Density Function, Conditional Probability Mass Function ..... i 5.3. Sums and Average of Random Variables..... SOLUTION TO IMPORTANT QUESTIONS SAMPLING AND ESTIMATION 6.1 Population and Sample 6.2 Sampling Distribution of Sample Mean 193 195 6.3 Types of Sampling . 6.4 Determination of Sample Size 6.5 Central Limit Theorem and its Application.... 6.6 Estimation... 6.6.1 Concept of Point Estimation and Interval Estimation .... 6.6.2 Criteria of Good Estimator... 6.6.3 Maximum Likelihood Estimation 6.7 Confidence Interval for Population Mean and Population Proportion..... SOLUTION TO IMPORTANT QUESTIONS... i iu TESTING OF HYPOTHESIS 7.1 Hypothesis.... 7.2 One Sample Test for Mean and Proportion... 7.3. Two Sample Test for Mean raaaaiel and Dependent) and Proportion. SOLUTION TO IMPORTANT QUESTIONS... SIMPLE LINEAR REGRESSION AND CORRELATION 8.1 Simple Correlation and its Propertie: 8.2 Concept of Simple Regression Analys! 8.2.1 Estimation of Regression Coefficient by using Least Square Estimation Method . 8.3. Standard Error and Coefficient of Determination .. 8.4 Inference Concerning Least Square Method SOLUTION TO IMPORTANT QUESTIONS... INTRODUCTION OF STATISTICS AND PRESENTATION OF DATA ai 1.1 Introduction to Statistics Everything dealing with the collection, processing, analysis and interpretation of numerical data belongs to the domains of statistics. Answers provided by statistical approaches can provide the basis for making decisions or choosing actions. There are two branches of statistics: a Descriptive statistic: It consists of procedure used to summarize and describe the important characteristics of a set of measurement. It is used to describe whole population because of which it becomes too expensive or too time consuming. b. Inferential statistic: It consists of procedures used to make inferences about: population characteristics from information contained in a.sample drawn from this population. It is easier, faster, cheaper but less accurate than descriptive statistic. 1,2 Application of Statistics in Engineering _ In engineering, statistics can be used to do different diversified tasks. The importance of statistics can be felt in different fields of engineering. Some of the importance can be listed as below: 1 Collection and presenting data of person or product for analysis and management. 2. Showing the relationship between two different engineering component. 3. For checking the quality of product and controlling the production. ° 4. . Accurate estimation of time, quantity, qualities of different parameters, Introduction of Statistics and Presentation of Data |1| 5. To understand phenomena subject to variety and to effectively predict or control them. 1.3. Variable, Types of Variable: Numerical and Categorical Variable Variable is the number of quantity that can be measured or counted. Age, sex, country of birth, income, education level etc are variables. Some of them can be counted whereas some are measurable. Numerical variables have quantitative attributes. Height, weight, age, volume, voltage etc are numerical variable Categorical variable is also called qualitative variables which can be categorized only. They do not have mathematical, properties. Voting preference, color of things, breeds of dogs etch are categorical variable. 14 Sources of Data: Primary and Secondary Source Primary data are generated by researcher himself/herself from surveys, interview and experiments. Sources of primary data are population census, mailed questionnaire, direct personal interview etc. Secondary data are second hand information, which is not originally collected but collected from already published or unpublished sources. Sources of secondary data are books, journals, newspapers, websites, government records etc. 1.5. Presentation and Classification of Data:. Stem-and- Leaf Displays _ : a The raw data, which is in general, huge and unwieldy, need .to be organized and presented in meaningful and readily comprehensive form in order to facilitate further statistical analysis, Data can be presented in three broad ways: a. Textual presentation b. Tabular presentation c. Graphic or diagrammatic presentation {2| Insights on Probability and Statistics rr eee ern ee eee While presenting data in table, data having some similarity and resemblance should be arranged into groups or classes. This process is known as classification of data. The bases of classification are a. Geographical: arranging data according to geographical region b. Chronological: arranging data according to the order of time : c Quantitative: arranging data according to its numerical magnitude. d. Qualitative: arranging data on the basis of some attribute or quality Stem and leaf display is a technique that is used to present quantitative data in condensed form. We do not lose information on individual observation which is an advantage over a frequency distribution. In this type of presentation, each value is divided into two portions - a stem and a leaf. The leaves for each stem are shown separately in a display. For example, to construct stem leaf for two digit number, we split each score into two parts. The first part contains the first digit which is called the stem. The second part contains the second digit, which is called leaf. So for a score of 52, 5 is stem and 2 is leaf. We draw a vertical line and write the stems on the left side and arrange it in increasing order. Then leaves for all scores are put in right side of vertical line. 5|}2 07 6/5 918 4 F579. 2.6 9-7 1 2 Bl0O-7,1 6 3 4.7 9|6 35228 Fig: Stem and leaf display 1.6 | Frequency Distribution: After the classification of data according to quantitative magnitude, the items are classified into groups or classes Introduction of Statistics and Presentation of Data [3] according to their increasing order in terms of magnitude and the number of items falling into each group is determined, which is known as Frequency Distribution. So, the number of repetitions — falling in particular group is known as the frequency of that group. For an ideal frequency distribution, number of class intervals can be determined with Sturge’s Rule. According to this rule, number of classes (K) is given by K=1+ 3.322 logioN Where, N = total number of observations. Log = logarithm of a number K = number of classes The size of the class interval is then determined. Range Size of class interval = as _ Largest value - Smallest value © K ‘L7 Diagrammatic and Graphical Presentation Pareto Diagram Frequency distribution are easier to visualize and @ comprehend when they are represented graphically or diagrammatically. It also helps to make comparisons between two or more than two sets of data. A Pareto diagram is a bar graph for qualitative data, with the bars arranged according to frequencies. The tallest bar is at left and the smaller bars are farther at the right. Restaurant Complaints 5 oSRESHETES Fees s 2 Aff YG Fig: Pareto diagram 14] insights on Probability and Statistics } 1.8 Pie-Diagram, Histogram, Frequency Curve, and Frequency Polygon A pie chart is the familiar circular graph that shows how the measurements are distributed. A histogram shows how the measurements or information are distributed among the categories with the height of the bar measuring how often a particular category was observed. In this diagram, the different categories or options or any qualitative variables are on horizontal axis and the quantitative or total amounts are on vertical axis. The bars in the histogram are drawn adjacent to each other with no gaps between them. A frequency polygon is obtained by joining the neighboring mid points of upper horizontal side of corresponding rectangles of the histogram. A frequency curve is obtained if there are large number of observations and if the class intervals are taken to be small enough, it may be possible to have sizeable frequency for most of the classes, Then the frequency polygon will closely approximate a curve, which is called frequency curve. 1.9. Cumulative Frequency Curve or Ogive Curve When frequencies are added, they are called cumulative frequencies. The curve ‘obtained by - plotting cumulative frequencies is called a cumulative frequency curve or Ogive curve. There are two methods of constructing Ogive, i. Less than ogive ii, More than ogive In ‘less than’ method, we start with the upper limits of the classes and go on adding the frequencies. When these frequencies are plotted, we get a rising curve. In ‘more than’ method, we start with the lower limits of the classes and from the frequencies we subtract the frequency of each class, When these frequencies are plotted, we get a declining curve, Introduction of Statistics and Presentation of Data |5] Importance Timitation - e Easy to understand ° Provide vague complex idea | Simplified presentation |e Limited information je Reveals hidden facts — |e Lowprecision ‘ le Quick to grasp ¢ Restrict further data analysis je Easy to compare © Possibility of misuse ° je Universally accepted Careful usage SOLUTION TO IMPORTANT QUESTIONS 2 1, The amount of money expended in fiscal year 2019 by Curtin University in various categories is shown in table. + Category: © Teacher's salary Staff's salary Maintenance Research Taxes and others 5.6 Construct a bar chart. Solution: 16] Insights on Probabllity and Statistics 2. Interpretation of the Graph The graph shows the largest amount of money was spent on maintenance. It seems the institute invests small amount on the research projects. It also invests handful of money on its staffs. It also pays huge amount of taxes to the government. For the same data of example 1.1, draw pie chart. * Solution: 3. We know, complete circle is 360°. So, we divide the circle in required proportion, For teacher's salary, =p 360° = 69.7 ~70° For staff's salary, aft x 360° = 15.2 =15° Similarly, for maintenance ~ 190° Research = 14° Taxes and.other = 71° Now, ° . Draw a line chart for the following population growth projections and interpret the result., Wears 5 5° ~ |2010]2020} 2030} 2040/2050 85 and older (millions)| 6.1 | 7.3 | 9.6 | 15.4] 20,9 Introduction of Statistics and Presentation of Data [7] | 4, From the following frequency distribution, Obtained marks|0-10/10-20|20-30/30-40/40-50/|50-60/|60-70) No. ofstudents | 5 | 10 | 17 | 24 | 9 6 :7al a Construct an ogive that will help you the answer to find . the number of students securing marks: i. Less than 35 marks Between 20 and 50 marks | iii. More than 25 marks [2019 Spring] | Solution: ° a Obtained marks|No, of students|Less than c.f.|More than‘ 0-10 5 5 75, 10-20 10 15 70 20-30 17 32, 60 30-40 24 56 - 43 40-50 9 65 19 50-60 6 71 10 60-70 4 75 4 75° 18] Insights on Probability and Statistics 85 and older (millions) : 2010 2020 «2030 «2040 = 2050 Interpretation of the Diagram We get a clear picture of the steadily increasing number of 85 and over people's. We can draw a conclusion the old people have developed survival immunity over years. ee reece errno ‘No. of students Constructing ogive (less than and more than) on graph, i, Less than 35 marks: Locate 35 on less than ogive, the corresponding value of 35 on y axis is 31. ~. No. of students security less than 35 is 31. ii, Locating 20 and 50 on less than ogive, the corresponding frequencies are 15 and 65. «. No. of students securing between 15 and 65 are 65-15=50 iii, Locating 25 on more than ogive, the corresponding frequency is 52. 70 More than ogive Less than ogive _ oO" 10 20° 30-40 50 60 70 Marks The weight (in Ibs) of 40 students in a class are follows. 138 | 172 | 145 | 147 | 150 | 119 | 158 | 152 168 _| 142 | 157 | 147 | 102 | 144 | 165 | 136 164 | 163 | 128 | 135 | 126 | 150 | 146 | 148 145 | 125 | 146 | 153 | 138 | 156 | 173 | 140 135 | 149 | 140 | 144 | 132 | 154 | 142 | 135 i, Construct a frequency: distribution with suitable class interval ii, Construct a less than Ogive. [2019 Fall) Introduction of Statistics and Presentation of Data [9] Solution: a. Construction of frequency distribution: Number of class intervals is decided by the help of Sturges rule. No. of class interval (K) = 1 + 3.322 logioN where N is total no. of observations. _ Largest value - Smallest value Size of class interval = Here, Largest observation = 173 Smallest observation = 102 N=40 K= 1+3322 logio40 =6.322=6 Size of class interval = ABA 102 | 11.83=12 So, no. of class interval = 6 Size of each class = 12 Weight (Ibs) | Tally marks 102-113 I 113-125 I 2 125-137 we 7 137-149 SADE JHE 16 149-161 ML 162-173 MLL For less than Ogive, : eight 102-113 1 113-125 2 125-137 7 10 137-149 16. : 26 149-161 8 134 162-173 6 Frequency distribution is [10] insights on Probability and Statistics Less than Ogive graph: Y 40 35 30 25 20 15 10 5 No. of students 0 6. After the implementation of an economic program to up lift the economic condition of a community following information were found. Weight |Monthly income ‘3 . (Rs!000) 4-6 | 6-8 |8-10/10-12]12-14/ 14-16 | 16-18 lafter the plan (no of families) « ” 8 | 65 | 37) 15 15 5 5 Construct an ogive to i. Find the number of families whose monthly income is between Rs. 8,000 to Rs. 14,000. ii. Find the number of families whose monthly income is above Rs. 12,000. Solution: [2018 Spring] Monthly income | 4-6 6-8 8-10 10-12 12-14- 14-16 16-18 Introduction of Statistics and Presentation of Data |11] i. From more than cf. 3 No. of families whose monthly income is more than. 8,000 is 77. No. of families whose monthly income is more thai 14,000 is 10. -. No. of families whose monthly income is between 8,000 and Rs. 14,000 = 77 - 10 = 67 i. From more thancf. + No. of families whose monthly income is above 12,000 = 25 7. Represent by means of histogram: Wage _» |10-15]15-20|20-25]25-30]30-40|40-60[60-80) INo:ofworkers| 7 | 19 | 28 | 15 | 12 | 12 3 Solution: Since the class interval of given distribution is not equal, have to adjust the frequency by dividing the 5th is frequency by 2 and last two interval frequency by No. of. Length of At WOES ks workers |:

You might also like