Professional Documents
Culture Documents
MATH 027A Laboratory Activity No. 3 ACOSTA
MATH 027A Laboratory Activity No. 3 ACOSTA
3
Describing Data Numerically
1. Objective(s):
3. Discussion:
Google Sheets is a free software that can help you analyze data. It has statistical functions that can
be used to generate needed computations which are subject to statistical interpretations.
This laboratory activity assesses students’ understanding of important concepts in describing data
graphically. It also assesses students’ understanding in generating simple descriptive statistics
necessary for reports.
Graphs are important since they can present data visually, lift out the most necessary facts, and can
easily be interpreted.
4. Resources:
Google Sheets
1
5. Procedure:
2
Exercise 1
1.5 Now, let us remove some of the ridiculous data points! Make a boxplot of ‘Height’.
1.7 Find out which observations represent the five outliers that have a height of 10 inches or less –
(5.085, 5.4, 5.5, 5.5, 10)
1.8 After removing the outliers, create a new histogram again. How would you describe the shape
of the distribution? Right-skewed
About 1/4 of the students are less than _65 inches tall.
3
1.11 Give the value that completes the following sentence.
About 1/4 of the students are more than _70 inches tall.
1.12 What is an interval that describes the middle one half of the students’ heights?
1.14 Now let’s compare the variable ‘Height’ for the different genders. Create side by side
boxplots:
Now, considering only the boxplots from ‘Male’ and ‘Female’:
d. Are there any other noticeable differences between genders in their distribution
of height?
1.15 Creating a side-by-side boxplot like this one is one of the first steps in answering the
following question: Is there a statistically significant difference in height between college aged
men and women? More on this later in the
semester.
Exercise 2
2.1 The following table shows the film lengths (in minutes) of a sample of videotape versions of n
= 22 films directed by Alfred Hitchcock. Films are listed in alphabetical order.
Film Lengths Film Lengths
(min) (min)
The Birds 119 Psycho 108
Dial M for Murder 105 Rear Window 113
Family Plot 120 Rebecca 132
Foreign Correspondent 120 Rope 81
Frenzy 116 Shadow of a 108
Doubt
I Confess 108 Spellbound 111
4
The Man Who Knew Too 120 Strangers on a 101
Much Train
Marnie 130 To Catch a Thief 103
North by Northwest 136 Topaz 126
Notorious 103 Under Capricorn 117
The Paradise Cane 116 Vertigo 128
2.2 Calculate the five-number summary statistics (minimum, maximum, first quartile (Q1), second
quartile (Q2), and third quartile (Q3) for this data. The data below are ordered from minimum to
maximum.
2.4 Calculate the lower and upper fences for a boxplot of this data using the IQR from part
2.3. Recall that the lower fence is at position Q1 – 1.5×IQR with the upper fence at Q3 +
1.5×IQR.
2.5 Use the fences from part 2.4 to determine the data values for the endpoint of the lower
whisker, the endpoint of the upper whisker, and outliers (if any). Outliers are points beyond the
fences.
2.6 Print the descriptive statistics values such as the mean, median, mode (if there are
any), standard deviation, and variance.
2.7 Construct a boxplot using the statistics from parts 2.1 – 2.4. In the plot, make sure to include
and label the following: Q1, Q2, Q3, IQR, lower whisker endpoint, upper whisker endpoint, and
outliers (if any).
Exercise 3
5.1 Temperatures in the cities of Math Village and Stat Village are greatest in the month of
August. The highest temperature, in degrees Fahrenheit, in Math Village for each August from
1980 to 2021 is given below. The temperatures are sorted from minimum to maximum over this
42-year period.
69.1 72.3 75.6 77.6 78.1 79.0 79.1 79.7 81.8 82.5 83.1 83.5 83.5 83.8
84.1 84.4 84.6 84.8 85.7 86.5 86.7 86.8 87.3 87.3 87.5 87.7 87.8 88.0
88.2 88.3 88.4 88.5 88.7 89.1 89.2 89.3 89.5 89.5 89.5 89.7 89.8 89.8
The highest temperature, in degrees Fahrenheit, in Stat Village for each August from 1980 to 2021
is given below. The temperatures are sorted from minimum to maximum over this 42-year period.
5
70.1 70.1 70.1 70.4 71.7 724 72.7 72.9 73.7 75.0 77.4 78.2 78.7 78.9
79.2 79.5 79.6 79.8 79.8 80.0 80.1 80.1 80.2 84.8 85.0 86.1 86.4 88.3
89.1 90.4 90.4 91.6 92.2 93.2 94.5 97.7 98.6 98.7 98.7 99.5 100.6 102.0
5.2 Create comparison boxplots for the highest temperature in Math Village versus Stat Village in
August from 1980 to 2021. Use a meaningful title and correctly label the axes with units.
5.3 Given the comparison boxplots in part 5.2, answer the following true/false questions about the
data from both villages.
a. The temperatures are more variable for Stat Village than Math Village.
c. Stat Village has a greater median temperature for those 42 years than Math Village.
e. It is obvious from the boxplots that Stat Village’s mean temperature for those 42 years
is less than Math Village’s temperatures.
f. The lower whisker endpoint for Stat Village is less than the lower whisker endpoint
for Math Village.
g. Stat Village’s second quartile is less than Math Village’s first quartile.
h. If you prefer August high temperatures that are consistently around 85 degrees Fahrenheit,
then you should move to Stat Village.
Exercise 1
1.5
1.1
6
1.8 1.9
Exercise 2
2.2 – 2.6
Exercise 3
7
2.7
7. Conclusion
8
When describing data quantitatively, one must compute a number of summary statistics that
encapsulate the essential features of a dataset, including its variability, central tendency, and
distribution shape. The median indicates the midway value when the data is sorted in either
ascending or descending order, but the mean, which is the sum of all data points divided by the
total number of points, gives insight into the average or usual value of the dataset. The value or
values that appear in the dataset the most frequently are also identified by the mode. Larger values
of statistics like range, variance, and standard deviation indicate higher variability, and are used to
quantify the spread or dispersion of the data. Moreover, skewness and kurtosis measure the
distribution's tailed and asymmetry, respectively, providing insights into its form. Effective
communication of complicated datasets is made possible by numerical summaries, which also
make it easier to compare data, spot trends, and make defensible conclusions based on the
quantitative characteristics of the information.
9
Name: Acosta, Arthur James U. Laboratory Activity No.: 3
10
8. Assessment (Rubric for Laboratory Performance):
11
Rating =
(Total Score / 24) x 50
+ 50%
12