P1 Data Analysis Project - EQAO

Data Analysis Project

In this project we will use data collected by the Grade 9 teachers last semester and analyze it using
the techniques we’ve learned in this unit.

Data: The Google Sheet below contains the marks (in percentages) of 5 grade 9 classes. The marks
are split into two categories, Course Marks Prior to EQAO and EQAO marks. You will be analyzing
the marks for ONE of the grade 9 classes. The class you will be analyzing has been chosen for you

Grade 9 Google Sheet (Course Marks Prior to EQAO vs EQAO Marks)

Class 1: Araf, Aniruddha, Kishona, Iniya, Nafeesa, Sadia

Class 2: Srijani, Ahnaf, Syed, Sonia, Faieaz

Class 3: Kathika, Marcina, Tanya, Shahzada, Praveen

Class 4: Saquib, Mika, Tahsin, Oswald, Vishnuka

Class 5: Subhodip, Dhannush, Evelyn, Sahran, Tazwar

Part 1: One Variable Analysis

a. Calculate the Mean, Median and Mode for Course Marks Prior to EQAO. (hint: there are functions
in google sheets/Desmos that will calculate these for you) [3 marks]
The mean is 69.04545455. The median is 69. The mode is 69.

b. Interpret the mean, median and mode from a). Please be as detailed and specific as possible.
[3 marks]

The mean is 69.04545455, which means that the typical mark this class had is approximatetly 69.
The median is 69, which means that half of the class got lower, higher, or equal to 69.The mode
is 69, which means that most of the class’s mark before the EQAO is a 69. Because the mean,
median, and mode is close in number, it indicates that the data is evenly distributed.

c. Which of mean, median and mode do you consider to be the best "central" measure? Include the
concept of an outlier in your explanation. [2 marks]
II think that the best central measure is the median because it is the middle or center number of
the data set. For the median, outliers don’t affect them much because the median mostly
depends on the order of the data rather than which number stands out in a set of data.

d. Create two box and whisker plots for Course Marks Prior to EQAO and EQAO Marks (need help?
review 4.2 Desmos activity). Copy the images and paste them below [4 marks]

e. State the following: [6 marks]

Course Marks Prior to EQAO:

Min: 34 Max: 95 Q1:57

Q3:85 Range:61 IQR:28

EQAO Marks:
Min:55 Max:77 Q1:67

Q3:73 Range:22 IQR:6

f. Use the box and whisker plot to discuss the spread of the data when comparing the grade 9
classes Course Marks Prior to EQAO and EQAO marks. [4 marks]

Looking at the range, the marks before the EQAO mark are more distributed, and spread out. This
means that classmates had a wide range of marks, the interquartile range is also much bigger,
meaning that 50% of classmates got a very different mark from each other. Looking at the range
the EQAO marks are closer together,and the interquartile range is much smaller, meaning that
50% got a similar mark. Seeing Q1 ( which is 57) for the marks before the EQAO, we know that
25% of the class got less than 57%, but Q1 for the EQAO marks, ( which is 67) we know that 25% of
the class got more than 67%, which is much better than the EQAO marks. The Q3 marks for the
EQAO which is 73% tells us that 25% of the class got more than 73% on the EQAO. For the marks
before the EQAO ( 85) we know that 25% of the class scored more than 85% which is much better
than the EQAO mark.

g. Based on your comparisons from f) did the class do better prior to the EQAO or on the EQAO?
[2 mark]
I think that the class did better before the EQAO. This is because the IQR went down on the
EQAO. My reasoning for this is stated in the answer above.

Part 2: Two Variable Analysis

a. State a reasonable hypothesis to describe the relationship between course marks prior to EQAO
and EQAO marks. [1 mark]
The higher the mark is before the EQAO the more likely to get lower the EQAO mark will be.

b. State the null hypothesis. [1 mark]

The higher the mark is before the EQAO the less/equally likely to get lower on the EQAO.
c. Using Course Marks Prior to EQAO and EQAO Marks data, create a scatter plot. Set Course Marks
Prior to EQAO as your independent variable. Include a line of best fit. (need help? review the 4.4
Desmos activity). Label your axes and provide an appropriate title for your graph. Paste your graph
below. [4 marks]
Line of best fit: y=-0.11362x+77.754

d. Are there any outliers on the scatter plot you created? If yes, which points and why. If not, explain
why there aren’t any. [2 marks]
Yes there is an outlier. I think coordinate (69,55) is an outlier because it is the only point apart
from the others on the graph with no other points beside it, while even if other points are slightly
apart where the majority is, there is at least one other point close to it.

e. Write the equation for the line of best fit. Explain what the variables y and x represent. [2 marks]
Line of best fit: y=-0.11362x+77.754

y represents the EQAO Marks and is the dependant variable

x represents the Marks Prior to the EQAO and is the independent variable

f. Write down your current mark in our Grade 9 class. Using the equation from d), calculate the
expected EQAO mark you would get? Show your work. [2 marks]
Current Course Mark: 83.4%
Expected EQAO Mark:y=-0.11362x+77.754
y= about 68%

g. Write down the mark you want to get on the EQAO (please choose a mark over 50). Using the
equation from d), estimate the mark you should have in the course prior to completing the EQAO?
Show your work. [2 marks]
Mark I want to get on the EQAO: about 78%

Course mark I need before the EQAO: 1.5%

y= about 78%

h. State the correlation coefficient. What does this tell you about our data? [2 marks]
The correlation coefficient is a negative correlation because when the mark before the EQAO (
the x variable ) increases the EQAO mark ( the y variable ) decreases. This tells that the lower
the mark is before the EQAO, the higher the EQAO mark is. In other words, the higher the mark is
before the EQAO, the lower the EQAO mark is.

i. From the scatter plot, what conclusions can you make about the relationship between grade 9
students' course marks prior to the EQAO and their EQAO marks. Does your conclusion agree with
your hypothesis?. Explain, be specific. [2 mark]
From this scatterplot, I think that the class got higher on the EQAO than their marks before the
EQAO because majority of the value of the dependant variable ( y coordinate ) is higher in value
than the independent variable which is the marks before the EQAO ( x coordinate ). This
scatterplot has a negative correlation, because when the mark before the EQAO ( the x variable )
increases the EQAO mark ( the y variable ) decreases. This tells that the lower the mark is before
the EQAO, the higher the EQAO mark is. In other words, the higher the mark is before the EQAO,
the lower the EQAO mark is. Also since the points in the scatter plot are pretty close together I
can say that most of the class for similar marks either before the EQAO or on the EQAO. Seeing
the scatter plot I would say that the class got similar marks on the EQAO because the line going
through the points in the scatter plot is horizontal, going through 80 in the y axis. My conclusion
agrees with my hypothesis because I stated that “the higher the mark is before the EQAO the
more likely to get lower the EQAO mark will be”. My conclusion is also the same, that the higher
your mark is before the EQAO, the lower the EQAO mark will be.

j. Based on your conclusion from i) and your answers from f) and g) how can you better prepare for
the EQAO? [1 mark]
I can prepare for the EQAO by getting the lowest mark I can because according to the equation I
made and the working shown in question g, if I want a 78% on the EQAO, I will need 1.5% before
the EQAO. I can also make sure I am prepared to do well on the EQAO.

