Professional Documents
Culture Documents
Topic2 EDA 3
Topic2 EDA 3
Boxplots are a visual representation of the five numbers in the Five Number
Summary.
They identify the min, max, median, lower and upper quartiles, and point
out the outliers (if any).
Definition 2 (Outliers)
An observation is an outlier if it smaller than Q1 − 1.5 × IQR or larger than
Q3 + 1.5 × IQR.
If there are outliers, mention how many there are, and on which side of the
median they are.
If you are presented with more than one boxplot in a figure, try to compare
their medians and inter-quartile ranges.
Let’s try to get frequency table for variable Cancer, then proportion table or
percentage table for this variable.
Try to label the categories, such as “Yes” and “No” instead of “0” and “1”.
Plot a bar plot and a pie chart for this variable with different color for each
category and add a tittle for the plot.
Plot a histogram
Plot a boxplot
Summary command
> summary(mark)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.50 12.12 18.25 17.50 24.00 28.00
Topic 02 - EDA ST1131 25 / 33
Some Basic Numerical Summaries
53.804124 7.335129
Form a histogram (by frequency and then by probability) with color, tittle.