Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

SEC-2

Ch-7 introduction to graphical Analysis


Use the Beginning.RData file for exercise 5 only ; the file contains the re-
quired data objects. The other data objects are all built into R.

Try Them Out


1. Do all parts of activity on Pages-219-221, 227-228, 255-256.

Back Exercise
1. Look at the warpbreaks data that comes built into R. Create
a box-whisker plot of the number of breaks for the different
tension. Make the plot using horizontal bars and display the
whiskers to the extreme range of the data. How can you draw
this graph to display only a single type of wool?

Sol. > boxplot(breaks ∼ tension, data = warpbreaks, horizontal = TRUE,


range = 0, subset = wool %in% ’A’, col = ’cornsilk’)
> title(xlab = ’Number of breaks’, ylab = ’Tension’, main = ’Wool type
”A”’)

Note that the simplest way to create the box-whisker plot is to use the for-
mula syntax to state how the graph should be constructed. You can use
the range = 0 instruction to force the whiskers to extend to the max and
min. The horizontal = T RU E instruction forces the plot to be displayed
horizontally. To select a single wool type you need to use the subset in-
struction. The title() command has been used to add titles to the plot, but
you could have specified the text as part of the main boxplot() command.

2. The trees data come as part of R. The data is composed of three


columns: the Girth of black cherry trees (in inches), the Height
(in feet), and the Volume of wood (in cubic feet). How can you
make a scatter plot of girth versus volume and display a line of
best-fit? Modify the axes so that the intercept is shown clearly.
Use an appropriate plotting symbol and colors to help make the
chart more interesting.

Sol. > plot(Girth ∼ Volume, trees, ylim = c(5, 20), xlim = c(0,70), pch = 17,
col = ’darkgreen’, xaxs = ’i’, yaxs = ’i’, cex = 1.5, xlab = ’Volume (cubic
ft.)’, ylab = ’Girth (inches)’)

1
> abline(lm(Girth ∼ Volume, trees), lty = 2, lwd = 2, col = ’green’)

Note that the plot() command is best used in this case. The formula no-
tation is the simplest way to specify the data to plot. The axes are scaled
to fit the points into the plot area, and this will not show the origin of the
graph. The line of best-fit will not cross the y-axis unless you modify the
scales of the axes. The xaxs = i and yaxs = i instructions have been used
to remove extra space at the ends of the axes. The abline() command is
used to add a trend line using the lm() command to determine slope and
intercept from a linear model.

3. The HairEyeColor data are built into R. These data are in the
form of a table, which has three dimensions. As well as the usual
rows and columns, the table is split in two: Male and Female.
Use the males table to create a Cleveland dot chart of the data.
Use the mean values for the columns as an additional grouping
summary result.

Sol. > dotchart(HairEyeColor[,,1], gdata = colMeans(HairEyeColor[,,1]), gpch


= 16, gcolor = ’blue’)
> title(xlab = ’Number of individuals’, main = ’Males Hair and Eye color’)
> mtext(’Grouping = mean’, side =3, adj = 1)

Note that you get the main table using HairEyeColor[, , 1], which selects
all rows and columns of the first group (male). To get the column means
you use the colM eans() command on the same data. The other instruc-
tions set the plotting character and color for the group results. The mtext()
command is optional because you could have given this information in a
caption. In this case, the reader is informed that the mean is the group-
ing summary used, and the title is placed at the top (side = 3) and right
justified (adj = 1).

4. Look at the HairEyeColor data again. This time make a bar chart
of the female data. Add a legend and make the colors reflect the
different hair colors.

Sol. > barplot(HairEyeColor[,,2],legend =TRUE, col = c(’black’, ’tan’, ’tomato’,’cornsilk’),


beside = TRUE)
> title(xlab = ’Hair Color’, ylab = ’Frequency’)

5. The bf s data object is part of the example data used in this


book. Here you have two columns, butterfly abundance (count)

2
and habitat (site). How can you draw a bar chart of the median
butterfly abundance from the three sites?

Sol. > barplot(tapply(bfs$count, bfs$site, FUN = median))


> abline(h=0)
> title(xlab = ’Habitat’, ylab = ’Butterfly abundance’)

Note that the $ syntax is used in the preceding code, but you can also use
the with() command to achieve the same result. Alternatively, you might
create a new data object to hold the result of the tapply() command and
then create the bar chart from that:
> with(bfs, barplot(tapply(count, site, FUN = median)))
In any event, the final commands draw a line under the bars to ground
them and add some axis labels.

You might also like