Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Lecture 

Descriptive statistics

Associate Prof. Mohamed El Ashhab

Statistics and Data
• What is Statistics science?
Data
Statistics is the science of:

• Collecting, Qualitative Quantitative


• Organizing, Data Data
Consists of Consists of numerical
• Analyzing, and
attributes, labels, or measurements or
• Interpreting data in order nonnumerical counts.
entries.
to make decisions.

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 1
Lecture contents
1. Pareto chart

2. Dot plot

3. Scatter plot

4. Frequency distribution

5. Histogram

6. Stem and Leaf display

Pareto Diagram
1

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 2
Pareto Diagram
• This display, orders each type of failure or defect
according to its frequency, can help engineers Cause of Defect Frequency

identify important defects and their causes. Power fluctuations 6

• When a company identifies a process as a candidate Bad Maintenance 22

for improvement, the first step is to collect data on Operator error 13

the frequency of each type of failure. For example, Worn tool not replaced 2

the performance of a computer‐controlled lathe is Bad material 5

low, so the engineers record the following causes of


defects and their frequencies:

Construction of Pareto Diagram

Cause of Defect Frequency Cause of Defect Frequency Percentage

Power fluctuations 6 Bad Maintenance 22 (22/48) 100% = 45.8 % 

Bad Maintenance 22 Operator error 13 (13/48) 100% = 27.08

Operator error 13 Power fluctuations 6 (6/48) 100% = 12.5

Worn tool not replaced 2 Bad material 5 (5/48) 100% = 10.4

Bad material 5 Worn tool not replaced 2 (2/48) 100% = 4.17

48 100 %

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 3
Construction of Pareto Diagram

Pareto in Excel

Dot Plot
2

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 4
Dot Plot
• In a dot plot, each data entry is plotted, using a point, above a
horizontal axis.
• Example: Use a dot plot to display the ages of the 30 students in the
statistics class.
18 20 21 27 29 20
19 30 32 19 34 19
Ages of Students 24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21

Continued.
9

Dot Plot

Ages of Students

15 18 21 24 27 30 33 36 39 42 45 48 51 54 57

From this graph, we can conclude that most of the values lie
between 18 and 32.

10

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 5
Scatter Plot
3

11

Scatter Plot
• When each entry in one data set corresponds to an entry in another data 
set, the sets are called paired data sets.  

• In a scatter plot, the ordered pairs are graphed as points in a coordinate 
plane.  The scatter plot is used to show the relationship between two 
quantitative variables.

• The following scatter plot represents the relationship between the number 
of absences from a class during the semester and the final grade.

Continued.
12

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 6
Scatter Plot
Absences Grade
x y
Final 100
grade 90 8 78
2 92
(y) 80
5 90
70 12 58
60 15 43
50 9 74
6 81
40

0 2 4 6 8 10 12 14 16
Absences (x)

From the scatter plot, you can see that as the number of absences increases, the final
grade tends to decrease.
Continued.
13

Frequency Distributions
4

14

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 7
Frequency Distribution
• A frequency distribution is a table that divides a set of data into a
suitable number of classes (categories), showing also the number of
items belonging to each class. Instead of knowing the exact value of
each item, we only know that it belongs to a certain class. On the
other hand, grouping often brings out important features of the data,
and the gain in “legibility” usually more than compensates for the loss
of information.

Continued.
15

Frequency Distributions
Consider the following heights of 50 Nano-pillars were measured in nanometers
(nm), during the fabricating a new transmission type electron multiplier of a flat
silicon membrane.

245 333 296 304 276 336 289 234 253 292
366 323 309 284 310 338 297 314 305 330
266 391 315 305 290 300 292 311 272 312
315 355 346 337 303 265 278 276 373 271
308 276 364 390 298 290 308 221 274 343

Continued.
16

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 8
Constructing a Frequency Distribution
The following steps should be followed:

1‐ The maximum number of classes may be determined by formula:

Number of Classes = C=1+3.3 * log n or C = 𝐧 ; where n is the total


number of observations in the data. As it can be seen the number of
classes depends on the number of observations, but it is seldom to use
fewer than 5 or more than 15. The exception to the upper limit is when
the size of the data set is several hundred or even a few thousand.

Continued.
17

Constructing a Frequency Distribution
2‐ Calculate the range of the data (Range = Max – Min).

3‐ Decide about the approximate width of the class denote by h and


obtained by:

h = Round up (Range/Number of Classes)

Starting point of the first class is arbitrary and should be less than or
equal to the minimum value.

Continued.
18

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 9
Frequency distribution ‐ Example 
• From the previous given measurements, the largest observation is 391 and the
smallest is 221, accordingly the Range can be calculated as 391−221 = 170. Also
, the maximum number of classes is 𝟓𝟎  7. However, consider The actual taken
number of classes to be 6, and the width of each class to be 170/6= round-up
(28.3) = 29. Therefore, the grouping range = 29 x 6 = 174 the lower limits of the
first group may be 221, 220, or 219 (less than or equal to the smallest number).

• The number of observations in each class is counted to obtain the frequency


distribution:

Continued.
19

Frequency distribution ‐ Example 

Limits of Classes Tally Frequency Limits of Classes Frequency


(221 – 250] ||| 3 (221 – 250] 3

(250 – 279] |||| |||| 10 (250 – 279] 10

(279 – 308] |||| |||| |||| | 16 (279 – 308] 16

(308 – 337] |||| |||| || 12 (308 – 337] 12

(337 – 366] |||| | 6 (337 – 366] 6

(366 – 395] ||| 3 (366 – 395] 3


Total 50 Total 50

Continued.
20

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 10
Frequency Histogram ‐ Example 

21

Frequency density distribution ‐ Example 
Students’ mid‐term exam grade percentage

Limits of Classes Frequency Class width Frequency density

(0 – 60] 3 60 3 / 60 = 0.6

(60 – 70] 9 10 9 / 10 = 0.9

(70 – 80] 15 10 15 / 10 = 1.5

(80 – 90] 12 10 12 / 10 = 1.2

(90 – 100] 7 10 7 / 10 = 0.7

46

22

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 11
Stem‐and‐Leaf display 
5

23

Stem‐and‐Leaf display 
• Generally, a stem and leaf plot, or stem plot, is a technique used to
classify either discrete or continuous variables.

• A stem and leaf plot is used to organize data as they are collected.

• To illustrate, consider the following humidity readings rounded to the


nearest percent: 29 44 12 53 21 34 39 25 48 23

17 24 27 32 34 15 42 21 28 37

24

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 12
Stem‐and‐Leaf display 
• Proceeding as in frequency distribution, these data 
might be grouped into the following distribution:

• If we wanted to avoid the loss of information 
inherent in the preceding table, we could keep track 
of the last digits of the readings within each class, 
getting

Continued.
25

Stem‐and‐Leaf display 
This can also be written as

• The left-hand column forms the stem, and the numbers to the left of the vertical line are the stem
labels, which in our example are 1, 2, . . . , 5. Each number to the right of the vertical line is a leaf.

• The numbers in a row, the leaves, have the unit 1.0. In the last step, the leaves are written in
ascending order.

• The three numbers in the first row are 12, 15, and 17.

• There should not be any gaps in the stem even if there are no leaves for that particular value.

26

‫ﻣﻘﺭﺭ ﺍﻹﺣﺻﺎء ﺍﻟﻬﻧﺩﺳﻲ ﻭﻧﻅﺭﻳﺔ ﺍﻹﺣﺗﻣﺎﻻﺕ‬


‫ ﻣﺣﻣﺩ ﺍﻷﺷﻬﺏ‬.‫ﺩ‬ 13

You might also like