Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 70

PAN African eNetwork Project

Course Name
BASIC MATHEMATICS Semester - I

Jitendra Kumar
Copyright Amity University 1

Module V: Data Analysis


Syllabus: Data and Statistical Data re!uency Distribution "raphical #epresentation Measure o$ the central %endency Measure o$ Dispersion

Copyright Amity University

Data and Statistical Data


Data: Concept & De$inition Data Sources

Copyright Amity University

Descriptive Statistics: %abular and "raphical 'resentations


Summari(ing )ualitative Data Summari(ing )uantitative Data

Copyright Amity University

Summari(ing )ualitative Data


re!uency Distribution #elative re!uency Distribution 'ercent re!uency Distribution *ar "raphs 'ie Charts

Copyright Amity University

requency !istribution
A A frequency frequency distribution distribution is is a a tabular tabular summary summary of of data data showing showing the the frequency frequency (or (or number) number) of of items items in in each each of of several several non-overlapping non-overlapping classes. classes. The The objective objective is is to to provide provide insights insights about about the the data data that that cannot cannot be be quickly quickly obtained obtained by by looking looking only only at at the the original original data. data.

Copyright Amity University

+,ample: Marada -nn


"uests staying at Marada -nn .ere as/ed to rate the !uality o$ their accommodations as being excellent0 above average0 average0 below average0 or poor1 %he ratings provided by a sample o$ 23 guests are:
"elow Average Above Average Above Average Average Above Average Average Above Average
Copyright Amity University

Average Above Average "elow Average #oor $%cellent Above Average Average

Above Average Above Average "elow Average #oor Above Average Average

requency !istribution

,ating #oor "elow Average Average Above Average $%cellent Total

requency & ' ( ) * &+

Copyright Amity University

,elative requency !istribution


The The relative relative frequency frequency of of a a class class is is the the fraction fraction or or proportion proportion of of the the total total number number of of data data items items belonging belonging to to the the class. class. A A relative relative frequency frequency distribution distribution is is a a tabular tabular summary summary of of a a set set of of data data showing showing the the relative relative frequency frequency for for each each class. class.

Copyright Amity University

'ercent re!uency Distribution


The The percent percent frequency frequency of of a a class class is is the the relative relative frequency frequency multiplied multiplied by by *++. *++. A A percent percent frequency frequency distribution distribution is is a a tabular tabular summary summary of of a a set set of of data data showing showing the the percent percent frequency frequency for for each each class. class.

Copyright Amity University

,elative requency and #ercent requency !istributions

,elative requency ,ating .*+ #oor .*( "elow Average .&( Average .-( Above Average .+( $%cellent Total *.++

#ercent requency *+ *( &( .*+(*++) . *+ -( ( *++ */&+ . .+(

Copyright Amity University

*ar "raph
A bar graph is a graphical device for depicting qualitative data. 0n one a%is (usually the hori1ontal a%is)2 we specify the labels that are used for each of the classes. A frequency2 relative frequency2 or percent frequency scale can be used for the other a%is (usually the vertical a%is). 3sing a bar of fi%ed width drawn above each class label2 we e%tend the height appropriately. The bars are separated to emphasi1e the fact that each class is a separate category.

Copyright Amity University

"ar 4raph
*+ ) 7 re!uenc" 6 5 ( ' & * #oor "elow Average Above $%cellent Average Average #atin$

8arada 9nn :uality ,atings

Copyright Amity University

'ie Chart
The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.

irst draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. <ince there are '5+ degrees in a circle2 a class with a relative frequency of .&( would consume .&(('5+) . )+ degrees of the circle.

Copyright Amity University

#ie >hart
8arada 9nn :uality ,atings
$%cellent (= #oor *+= Above Average -(= "elow Average *(= Average &(=

Copyright Amity University

$%ample? 8arada 9nn

9nsights 4ained from the #receding #ie >hart

0ne-half of the customers surveyed gave 8arada a quality rating of Aabove averageB or Ae%cellentB (looking at the left side of the pie). This might please the manager. or each customer who gave an Ae%cellentB rating2 there were two customers who gave a ApoorB rating (looking at the top of the pie). This should displease the manager.

Copyright Amity University

Summari(ing )uantitative Data


re!uency Distribution #elative re!uency Distribution 'ercent re!uency Distribution 4istogram Cumulative Distributions 5give
Copyright Amity University

Summari(ing )uantitative Data


re!uency Distribution #elative re!uency Distribution 'ercent re!uency Distribution 4istogram Cumulative Distributions 5give
Copyright Amity University

+,ample: 4udson Auto #epair


The manager of Cudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. <he e%amines (+ customer invoices for tune-ups. The costs of parts2 rounded to the nearest dollar2 are listed on the ne%t slide.

Copyright Amity University

$%ample? Cudson Auto ,epair

<ample of #arts >ost(D) for (+ Tune-ups


67 5) 6)6 7& )' 6& 5& 77 )7 (6 7) 57 57 *+* 6( 55 )6 7' 6) (& 6( *+( 57 *+( )) 6) 66 6* 6) 7+ 6( 5( 5) 5) )6 6& 7+ 56 5& 5& 65 *+) 66'

)* 6* *+7( 5&

Copyright Amity University

re!uency Distribution
"uidelines $or Selecting 6umber o$ Classes
@ @
3se between ( and &+ classes. !ata sets with a larger number of elements usually require a larger number of classes. <maller data sets usually require fewer classes. 3se enough classes to show the variation in the data. !o not use so many classes that some contain only a few data items.

@ @ @

Copyright Amity University

re!uency Distribution
"uidelines $or Selecting 7idth o$ Classes
@ 3se classes of equal width. @ Appro%imate >lass Hidth .
Eargest !ata Falue <mallest !ata Falue Gumber of >lasses

Copyright Amity University

re!uency Distribution
Appro%imate >lass Hidth . (*+) - (&)/5 . ).( *+ or 4udson Auto #epair0 i$ .e choose si, classes: requency #arts >ost (D) (+-() 5+-5) 6+-6) 7+-7) )+-)) *++-*+) & *' *5 6 6 ( (+

Total

Copyright Amity University

#elative re!uency and 'ercent re!uency Distributions

#arts ,elative #ercent >ost (D) requency requency (+-() .+5+-5) .&5 &/(+ &5 .+-(*++) 6+-6) .'& '& 7+-7) .**)+-)) .***++-*+) .*+ *+ Total *.++ *++

Copyright Amity University

,elative requency and #ercent requency !istributions

9nsights 4ained from the #ercent requency !istribution

@ @ @ @

0nly -= of the parts costs are in the D(+-() class. '+= of the parts costs are under D6+. The greatest percentage ('&= or almost one-third) of the parts costs are in the D6+-6) class. *+= of the parts costs are D*++ or more.

Copyright Amity University

!ot #lot

0ne of the simplest graphical summaries of data is a dot plot. A hori1ontal a%is shows the range of data values. Then each data value is represented by a dot placed above the a%is.

Copyright Amity University

!ot #lot
Tune-up #arts >ost

(+

5+

6+

7+

)+

*++

**+

>ost (D)

Copyright Amity University

4istogram
Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the hori1ontal a%is. A rectangle is drawn above each class interval with its height corresponding to the intervalIs frequency2 relative frequency2 or percent frequency. 3nlike a bar graph2 a histogram has no natural separation between rectangles of adjacent classes.

Copyright Amity University

4istogram
Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the hori1ontal a%is. A rectangle is drawn above each class interval with its height corresponding to the intervalIs frequency2 relative frequency2 or percent frequency. 3nlike a bar graph2 a histogram has no natural separation between rectangles of adjacent classes.

Copyright Amity University

4istogram
Tune-up #arts >ost
*7 *5 *-

re!uenc"

*& *+ 7 5 &

Parts (+() 5+5) 6+6) 7+7) )+)) *++-**+ Cost %&'

Copyright Amity University

4istogram
Moderately S/e.ed 8e$t
9 A longer tail to the le$t 9 +,ample: e,am scores
.'(

,elative requency
Copyright Amity University

.'+ .&( .&+ .*( .*+ .+( +

4istogram
Moderately #ight S/e.ed
9 A longer tail to the right 9 +,ample: housing values
.'(

,elative requency
Copyright Amity University

.'+ .&( .&+ .*( .*+ .+( +

4istogram
4ighly S/e.ed #ight
9 A very long tail to the right 9 +,ample: e,ecutive salaries
.'(

,elative requency

.'+ .&( .&+ .*( .*+ .+( +

Copyright Amity University

>umulative !istributions
>umulative >umulative frequency frequency distribution distribution shows shows the the number number of of items items with with values values less less than than or or equal equal to to the the upper upper limit limit of of each each class.. class.. >umulative >umulative relative relative frequency frequency distribution distribution J J shows shows the the proportion proportion of of items items with with values values less less than than or or equal equal to to the the upper upper limit limit of of each each class. class. >umulative >umulative percent percent frequency frequency distribution distribution J J shows shows the the percentage percentage of of items items with with values values less less than than or or equal equal to to the the upper upper limit limit of of each each class. class.

Copyright Amity University

Cumulative Distributions
4udson Auto #epair
>ost (D) K () K 5) K 6) K 7) K )) K *+) >umulative >umulative >umulative ,elative #ercent requency requency requency & .+*( .'+ '+ '* & L *' .5& *(/(+ 5& .'+(*++) '7 .65 65 -( .)+ )+ (+ *.++ *++

Copyright Amity University

Cumulative Distributions
4udson Auto #epair
>ost (D) K () K 5) K 6) K 7) K )) K *+) >umulative >umulative >umulative ,elative #ercent requency requency requency & .+*( .'+ '+ '* & L *' .5& *(/(+ 5& .'+(*++) '7 .65 65 -( .)+ )+ (+ *.++ *++

Copyright Amity University

0give

An ogive is a graph of a cumulative distribution. The data values are shown on the hori1ontal a%is. <hown on the vertical a%is are the? @ cumulative frequencies2 or @ cumulative relative frequencies2 or @ cumulative percent frequencies The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines.

Copyright Amity University

0give

Cudson Auto ,epair @ "ecause the class limits for the parts-cost data are (+()2 5+-5)2 and so on2 there appear to be one-unit gaps from () to 5+2 5) to 6+2 and so on.

@ @

These gaps are eliminated by plotting points halfway between the class limits. Thus2 ().( is used for the (+-() class2 5).( is used for the 5+-5) class2 and so on.

Copyright Amity University

0give with >umulative #ercent requencies


Tune-up Tune-up #arts #arts >ost >ost
>umulative #ercent requency *++ 7+ 5+ -+ &+

(7).(2 65)

(+

5+

6+

7+

)+

*++

**+

#arts >ost (D)

Copyright Amity University

>hapter &2 #art " !escriptive <tatistics? Tabular and 4raphical #resentations

$%ploratory !ata Analysis? <tem-and-Eeaf !isplay >rosstabulations and y <catter !iagrams

x
Copyright Amity University

+,ploratory Data Analysis


The techniques of e%ploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summari1e data quickly. 0ne such technique is the stem-and-leaf display.

Copyright Amity University

A stem-and-leaf display shows both the rank order and shape of the distribution of the data. 9t is similar to a histogram on its side2 but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. $ach line in the display is referred to as a stem. $ach digit on a stem is a leaf.

Stem:and:8ea$ Display

Copyright Amity University

$%ample? Cudson Auto ,epair


The manager of Cudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. <he e%amines (+ customer invoices for tune-ups. The costs of parts2 rounded to the nearest dollar2 are listed on the ne%t slide.

Copyright Amity University

<tem-and-Eeaf !isplay
( ) * + , -. a stem & & * + * * 6 & * + ' -

& & & 6 (

& & ' 6 (

( ' ( 6 )

5 7 7

6 7 7 7 ) ) ) - ( ( ( 5 6 7 ) ) ) ) )

a leaf

Copyright Amity University

Stretched Stem:and:8ea$ Display


9f we believe the original stem-and-leaf display has condensed the data too much2 we can stretch the display by using two stems for each leading digit(s).

Hhenever a stem value is stated twice2 the first value corresponds to leaf values of + -2 and the second value corresponds to leaf values of ( ).

Copyright Amity University

<tretched <tem-and-Eeaf !isplay


( ( ) ) * * + + , , -. -. & 6 & ( * ( + ( * 6 * (

& 5 * ( + 7 ' 6 (

& 6 & ( & )

& 7 & 5 '

7 ' 6

7 7

) ) ) ) ) )

6 7 ) )

Copyright Amity University

+,ample: 8ea$ Unit ; 311


-$ .e have data .ith values such as 7.5 **.6 ).* *+.& 7.5 **.6 ).).).* *+.& **.+ **.+ 7.7 7.7 a stem-and-leaf display of these data will be Eeaf 3nit . +.* 7 5 7 ) * *+ & ** + 6

Copyright Amity University

$%ample? Eeaf 3nit . *+


9f we have data with values such as *7+5 *7+5 *6*6 *6*6 *)6*)6- *6)* *6)* *57& *57& *)*+ *)*+ *7'7 *7'7 a stem-and-leaf display of these data will be Eeaf 3nit . *+ *5 7 *6 * ) *7 + ' *) * 6

The 7& in *57& is rounded down to 7+ and is represented as an 7.

Copyright Amity University

Crosstabulations and Scatter Diagrams


Thus far we have focused on presentations that are used to summari1e the data for one variable at a time. 0ften a manager is interested in presentations that will help understand the relationship between two variables. >rosstabulation and a scatter diagram are two methods for summari1ing the data for two variables simultaneously.

Copyright Amity University

>rosstabulation
A crosstabulation is a tabular summary of data for two variables.

>rosstabulation can be used when? @ one variable is qualitative and the other is quantitative2 @ both variables are qualitative2 or @ both variables are quantitative.

The left and top margin labels define the classes for the two variables.

Copyright Amity University

>rosstabulation

$%ample? inger Eakes Comes The number of inger Eakes homes sold for each style and price for the past two years is shown below. quantitative variable #rice ,ange >olonial *7 *& '+ qualitative variable Come <tyle Eog <plit A- rame 5 *&+ *) *5 '( *& ' *( Total
(( -(

K D))2+++ M D))2+++ Total

*++

Copyright Amity University

Crosstabulation
-nsights "ained $rom 'receding Crosstabulation
@ @
The greatest number of homes (*)) in the sample are a split-level style and priced at less than or equal to D))2+++. 0nly three homes in the sample are an A- rame style and priced at more than D))2+++.

Copyright Amity University

>rosstabulation
requency distribution for the price variable #rice ,ange K D))2+++ M D))2+++ Total Come <tyle Eog <plit A- rame 5 *&+ *) *5 '( *& ' *(

>olonial *7 *& '+

Total
(( -(

*++

requency distribution for the home style variable


Copyright Amity University

Crosstabulation: #o. or Column 'ercentages Converting the entries in the table into
ro. percentages or column percentages can provide additional insight about the relationship bet.een the t.o variables1

Copyright Amity University

Cross tabulation: #o. or Column 'ercentages Converting the entries in the table into ro. percentages or column percentages can provide additional insight about the relationship bet.een the t.o variables1

Copyright Amity University

>rosstabulation? ,ow #ercentages

#rice ,ange K D))2+++ M D))2+++

Come <tyle >olonial Eog <plit A- rame '&.6' &5.56 *+.)* '-.(( '*.** '(.(5 &*.7& 5.56

Total
*++ *++

Gote? row totals are actually *++.+* due to rounding.

(>olonial and M D))N)/(All MD))N) % *++ . (*&/-() % *++

Copyright Amity University

>rosstabulation? >olumn #ercentages

#rice ,ange K D))2+++ M D))2+++ Total

Come <tyle >olonial Eog <plit A- rame 5+.++ -+.++ *++ '+.++ (-.&) 6+.++ -(.6* *++ *++ 7+.++ &+.++ *++

(>olonial and M D))N)/(All >olonial) % *++ . (*&/'+) % *++

Copyright Amity University

>rosstabulation? <impsonIs #arado%


!ata in two or more crosstabulations are often aggregated to produce a summary crosstabulation. He must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. <impsonI #arado%? 9n some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data.

Copyright Amity University

<catter !iagram and Trendline


A scatter diagram is a graphical presentation of the relationship between two quantitative variables. 0ne variable is shown on the hori1ontal a%is and the other variable is shown on the vertical a%is. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an appro%imation of the relationship.

Copyright Amity University

<catter !iagram and Trend line

A 'ositive #elationship
y

Copyright Amity University

<catter !iagram and Trendline

A 6egative #elationship
y

Copyright Amity University

<catter !iagram and Trendline

6o Apparent #elationship
y

Copyright Amity University

+,ample: 'anthers ootball %eam


Scatter Diagram and %rend line %he 'anthers $ootball team is interested in investigating the relationship0 i$ any0 bet.een interceptions made and points scored1
x . Gumber of 9nterceptions * ' & * ' y . Gumber of #oints <cored *&*7 *6 '+

Copyright Amity University

Scatter Diagram and %rend line %he 'anthers $ootball team is interested in investigating the relationship0 i$ any0 bet.een interceptions made and points scored1 x . Gumber of 9nterceptions * ' & * '
Copyright Amity University

+,ample: 'anthers ootball %eam

y . Gumber of #oints <cored *&*7 *6 '+

<catter !iagram and Trendline


y
Gumber of #oints <cored

'( '+ &( &+ *( *+ ( +

x + * & ' Gumber of 9nterceptions

Copyright Amity University

$%ample? #anthers ootball Team

9nsights 4ained from the #receding <catter !iagram

@ @ @

The scatter diagram and trendline indicate a positive relationship between the number of interceptions and the number of points scored. Cigher points scored are associated with a higher number of interceptions. The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

Copyright Amity University

$%ample? #anthers ootball Team

9nsights 4ained from the #receding <catter !iagram

@ @ @

The scatter diagram and trendline indicate a positive relationship between the number of interceptions and the number of points scored. Cigher points scored are associated with a higher number of interceptions. The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

Copyright Amity University

%abular and "raphical 'rocedures


!ata :ualitative !ata Tabular 8ethods
@ requency !istribution @ ,elative req. !istribution @ #ercent req. !istribution @ >rosstabulation

:uantitative !ata Tabular 8ethods


@ requency !ist. @ ,el. req. !ist. @ = req. !ist. @ >um. req. !ist. @ >um. ,el. req. !istribution @ >um. = req. !istribution @ >rosstabulation

4raphical 8ethods
@ "ar 4raph @ #ie >hart

4raphical 8ethods
@ !ot #lot @ Cistogram @ 0give @ <tem-andEeaf !isplay @ <catter !iagram

Copyright Amity University

+nd o$ Module V: Data Analysis

Copyright Amity University

%han/ <ou
'lease $or.ard your !uery %o: >/umar=amity1edu CC:
mano>1amity=pana$net1com
Copyright=Amity University

You might also like