Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Statistika Rekayasa

Tabular & Graphical

Types of data

Categorical data (qualitative data)


Categorical Data
Use labels or names to identify an attribute of each
element.
Use either the nominal scale or ordinal scale of
measurement and may be nonnumeric or numeric.
The statistical analysis for qualitative data are rather
limited

Categorical variable is a variable with categorical


data
Example:
Car type (sedan, sport car, SUV, minivan, MPV, and so
on)
Method of payment (cash, credit card, check)
3

Data set for 25 mutual funds

Quantitative data
Quantitative data
Quantitative data are always numeric.
Use either the interval or ratio scale of measurement.
Ordinary arithmetic operations are meaningful only
with quantitative data.

Quantitative variable is a variable with


quantitative data.
Discrete if they are countable data and are collected
by counting. Ex: the number of items
Continuous if they are collected by measuring and
are expressed on a continuous scale. Ex: time to
failure of pump component
5

Question: whats the type of these data?


Types of ships
Categorical

JTSP students collecting data on the number


of ships entering inner channel of Surabaya
West Access Channel in a particular day.
Quantitative discrete

Time until a fuel oil simplex filter getting


clogged up.
Quantitative continuous
6

Tabular and graphical methods for


summarizing data

Summarizing categorical/qualitative data


Frequency Distribution
Relative Frequency
Percent Frequency Distribution
Bar Graph
Pie Chart

Frequency distribution
A frequency distribution is a tabular
summary of data showing the frequency (or
number) of items in each of several
nonoverlapping classes.
The objective is to provide insights about the
data that cannot be quickly obtained by
looking only at the original data.

Example: Marada Inn

10

Example: Marada Inn

11

Relative frequency distribution

12

Percent frequency distribution


The percent frequency of a class is the
relative frequency multiplied by 100.
A percent frequency distribution is a tabular
summary of a set of data showing the
percent frequency for each class.

13

Example: Marada Inn


Relative frequency and percent frequency distributions

14

Bar graph
A bar graph is a graphical device for depicting
qualitative data that have been summarized in a
frequency, relative frequency, or percent
frequency distribution.
On the horizontal axis we specify the labels that
are used for each of the classes.
A frequency, relative frequency, or percent
frequency scale can be used for the vertical axis.
Using a bar of fixed width drawn above each
class label, we extend the height appropriately.
The bars are separated to emphasize the fact
that each class is a separate category.
15

Bar graph Marada Inn

16

Pie chart
The pie chart is a commonly used graphical
device for presenting relative frequency
distributions for qualitative data.
First draw a circle; then use the relative
frequencies to subdivide the circle into
sectors that correspond to the relative
frequency for each class.
Since there are 360 degrees in a circle, a class
with a relative frequency of 0.25 would
consume 0.25(360) = 90 degrees of the circle.
17

Pie chart Marada Inn

18

Example Marada Inn


Insights Gained from the
Preceding Pie Chart
Onehalf of the customers surveyed
gave Marada a quality rating of
above average or excellent
(looking at the left side of the pie).
This might please the manager.
For each customer who gave an
excellent rating, there were two
customers who gave a poor rating
(looking at the top of the pie). This
should displease the manager.
19

Example Five popular softdrinks

20

Relative freq. and percent freq.


distributions

21

Bar chart and pie chart

22

Summarize of quantitative data


Frequency Distribution
Relative Frequency and Percent Frequency
Distributions
Dot Plot
Histogram
Cumulative Distributions
Ogive

23

Example: Hudson Auto Repair


The manager of Hudson Auto would like to get a better
picture of the distribution of costs for engine tuneup parts. A
sample of 50 customer invoices has been taken and the costs
of parts, rounded to the nearest dollar, are listed below.

24

Frequency distribution
Guidelines for Selecting Number of Classes
Use between 5 and 20 classes.

Approximate formula to calculate number of


class may also be introduced as:
range = largest data value smallest data value
class = k = 1+3.3 log N, where N is number of samples
interval = range/class
Data sets with a larger number of elements usually
require a larger number of classes.
Smaller data sets usually require fewer classes.
25

Frequency distribution
Guidelines for selecting width of classes
Use classes of equal width.

Approximate class width =

26

Example: Hudson Auto Repair


Frequency distribution

27

Example: Hudson Auto Repair

28

Example: Hudson Auto Repair

Insights gained from the percent frequency


distribution:
Only 4% of the parts costs are in the $5059 class.
30% of the parts costs are under $70.
The greatest percentage (32% or almost onethird) of the
parts costs are in the $7079 class.
10% of the parts costs are $100 or more.

29

Dot plot
One of the simplest graphical summaries of
data is a dot plot.
A horizontal axis shows the range of data
values.
Then each data value is represented by a dot
placed above the axis.

30

Example: Hudson Auto Repair

31

Histogram
Another common graphical presentation of
quantitative data is a histogram.
The variable of interest is placed on the
horizontal axis and the frequency, relative
frequency, or percent frequency is placed on the
vertical axis.
A rectangle is drawn above each class interval
with its height corresponding to the intervals
frequency, relative frequency, or percent
frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent
classes.
32

Example: Hudson Auto Repair

33

Cumulative distribution
The cumulative frequency distribution shows
the number of items with values less than or
equal to the upper limit of each class.
The cumulative relative frequency
distribution shows the proportion of items
with values less than or equal to the upper
limit of each class.
The cumulative percent frequency
distribution shows the percentage of items
with values less than or equal to the upper
limit of each class.
34

Example: Hudson Auto Repair

35

Ogive
An ogive is a graph of a cumulative distribution.
The data values are shown on the horizontal
axis.
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies

The frequency (one of the above) of each class is


plotted as a point.
The plotted points are connected by straight
lines.
36

Example: Hudson Auto Repair


Ogive
Because the class limits for the partscost data
are 5059, 6069, and so on, there appear to be
one unit gaps from 59 to 60, 69 to 70, and so on.
These gaps are eliminated by plotting points
halfway between the class limits.
Thus, 59.5 is used for the 5059 class, 69.5 is used
for the 6069 class, and so on.

37

Example: Hudson Auto Repair


Ogive with Cumulative Percent Frequencies

38

Exploratory data analysis


The techniques of exploratory data analysis
consist of simple arithmetic and easytodraw
pictures that can be used to summarize data
quickly.
One such technique is the stemandleaf
display.

39

Stemandleaf display
A stemandleaf display shows both the rank
order and shape of the distribution of the data.
It is similar to a histogram on its side, but it has
the advantage of showing the actual data
values.
The first digits of each data item are arranged to
the left of a vertical line.
To the right of the vertical line we record the last
digit for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
40

Example: Hudson Auto Repair

41

Stretched stemandleaf display


If we believe the original stemandleaf
display has condensed the data too much, we
can stretch the display by using two more
stems for each leading digit(s).
Whenever a stem value is stated twice, the
first value corresponds to leaf values of 04,
and the second values corresponds to values
of 59.

42

Example: Hudson Auto Repair

43

Stemandleaf display
Leaf Units
A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is assumed to
equal 1.

44

Example: Leaf unit = 0.1

45

Example: Leaf unit = 10

46

Crosstabulations and scatter diagrams


Thus far we have focused on methods that
are used to summarize the data for one
variable at a time.
Often a manager is interested in tabular and
graphical methods that will help understand
the relationship between two variables.
Crosstabulation and a scatter diagram are
two methods for summarizing the data for
two (or more) variables simultaneously.
47

Crosstabulation
Crosstabulation is a tabular method for
summarizing the data for two variables
simultaneously.
Crosstabulation can be used when:
One variable is qualitative and the other is
quantitative
Both variables are qualitative
Both variables are quantitative

The left and top margin labels define the


classes for the two variables.
48

Example: Finger Lakes Homes


Crosstabulation
The number of Finger Lakes homes sold for each style
and price for the past two years is shown below.

49

Example: Finger Lakes Homes

Insights Gained from the Preceding


Crosstabulation
The greatest number of homes in the sample (19) are
a splitlevel style and priced at less than or equal to
$99,000.
Only three homes in the sample are an AFrame style
and priced at more than $99,000.
50

Crosstabulation: Row or column


percentages
Converting the entries in the table into row
percentages or column percentages can
provide additional insight about the
relationship between the two variables.

51

Example: Finger Lakes Homes

52

Scatter diagram
A scatter diagram is a graphical presentation
of the relationship between two quantitative
variables.
One variable is shown on the horizontal axis
and the other variable is shown on the
vertical axis.
The general pattern of the plotted points
suggests the overall relationship between the
variables.
53

Scatter diagram

54

Example: Panthers Football Team

55

Example: Panthers Football Team

56

Example: Panthers Football Team


The preceding scatter diagram indicates a
positive relationship between the number of
interceptions and the number of points
scored.
Higher points scored are associated with a
higher number of interceptions.
The relationship is not perfect; all plotted
points in the scatter diagram are not on a
straight line.
57

References
Statistics for Business and Economics, Anderson,
Sweeney, and Williams, West Publishing Company.
Statistics for Business and Economics.,
SouthWestern/Thompson Learning

58

You might also like