Cici Yan - Bivariate Data Summative Assessment

Studying Bivariate Data 

Math 9E Summative Assessment 

You will find a data set that includes a minimum of 30 points (I recommend somewhere between 30 - 
60 data points). You will then use the skills from this unit to complete the graphic organizer below by 
analyzing the data and explaining why your conclusions are important to the general population. 

Criterion C: communicating 
Level 1-2  Level 3-4  Level 5-6  Level 7-8 

i. Use​ limited  i. Use​ some  i. Usually ​use  i. Consistently ​use 

mathematical  appropriate  appropriate  appropriate 
language.  mathematical  mathematical  mathematical 
ii. Use​ limited forms  language.  language.  language. 
of mathematical  ii. Use​ appropriate  ii. Usually ​use  ii. Use​ appropriate 
representation to  forms of  appropriate forms  forms of 
present  mathematical  of mathematical  mathematical 
information.  representation to  representation to  representation to 
iii. Communicate  present  present information  consistently present 
through lines of  information  correctly.  information 
reasoning that are  adequately.  iii. Usually move  correctly. 
difficult to  iii. Communicate  between different  iii. Move effectively 
understand.  through lines of  forms of  between different 
reasoning that are  mathematical  forms of 
complete.  representation.  mathematical 
iv. Adequately  iv. Communicate  representation. 
organize  through lines of  iv. Communicate 
information using  reasoning that are  through lines of 
a logical structure.   complete and  reasoning that are 
coherent.   complete, coherent 
v. Present​ work that  and concise. 
is usually organized  v. Present​ work that 
using a logical  is consistently 
structure.  organized using a 
logical structure. 

Task Specific Clarifications: 

​ Reminders in order to do well: 
● Be sure to use the technical/mathematical language that we practiced in this unit. 
● Make sure that your explanations are clear and that they make sense.   
● Make sure that you discuss both your graph and your equation to show that you understand both 
representations of the data AND how they are related to each other.   

Student Justification  Teacher Justification 

I think I   
Criterion D: applying maths in real world contexts 
Level 1-2  Level 3-4  Level 5-6  Level 7-8 

i. Identify​ some of  i. Identify​ the  i. Identify​ the  i. Identify​ the 

the elements of the  relevant elements  relevant elements  relevant elements 
authentic real-life  of the authentic  of the authentic  of the authentic 
situation.  real-life situation.  real-life situation.  real-life situation. 
ii. Apply  ii. Select,​ with some  ii. Select​ adequate  ii. Select​ adequate 
mathematical  success, adequate  mathematical  mathematical 
strategies to find a  mathematical  strategies to model  strategies to model 
solution to the  strategies to  the authentic  the authentic 
authentic real-life  model the  real-life situation.  real-life situation. 
situation, with  authentic real-life  iii. Apply​ the selected  iii. Apply​ the selected 
limited success.  situation.  mathematical  mathematical 
  iii. Apply  strategies to reach  strategies to reach 
mathematical  a valid solution to  a correct solution to 
strategies to reach  the authentic  the authentic 
a solution to the  real-life situation.  real-life situation.  
authentic real-life  iv. Explain​ the degree  iv. Justify​ the degree 
situation.  of accuracy of the  of accuracy of the 
iv. Discuss ​whether  solution.  solution. 
the solution makes  v. Explain​ whether the  v. Justify​ whether the 
sense in the  solution makes  solution makes 
context of the  sense in the context  sense in the context 
authentic real-life  of the authentic  of the authentic 
situation.  real-life situation.  real-life situation. 

Task Specific Clarifications: 

Reminders in order to do well: 
● Make sure that your responses show critical thinking. 
● Your responses need to show me how you connect the math part of the unit (statistics) with real 
world applications (the “so what” part of the unit). 
● Ensure you discuss the accuracy of your answers (trendlines, interpolating, extrapolating). 

Student Justification  Teacher Justification 

I think I should get a 7 on this because I   

use the technical/mathematical 
language that we practiced in this 
unit to communicatie my thought. 
All of the responses show my 
critical thinking. Last I my responses 
show you how can I connect the 
math part of the unit (statistics) 
with real world applications. 

Your Work Begins Here 

Provide the reference​ for your data set here. 
Data link: 
APA citation:   
Socialblade. (2018, September 13). Top 5000 youtube channels data from socialblade. Retrieved 
December 13, 2018, from 

Describe​ your data set. What are you comparing? Why do you think that this will be 
interesting or important data to study? (​ ​Minimum​ 4 sentences) 

About the data: 

Socialblade was a well known company that mainly focused on provide datas and statistics. 
This company worked with YouTube, instagram and many more different companies by 
recorded the datas of these companies and analyzed the datas to Statistical chart. At the 
official website of Socialblade, we found a data set that was about the top 5000 YouTube 
channels and some basic information of them. 
There were 3 variables that showed in this data. Video uploads, ​Subscribers and Video views 
of the video. 
The first time I noticed this data was because in one day typed the key word “youtube” on 
the research bar of a data app and this data set just showed up on the first of the research 
list. So I just click this data set. After I really looked at this data, I felt this was a interesting 
data just because my favorite youtuber was on the top 5000 youtuber list. All because of 
my favorite youtuber I decided to choose this data set.   

Identify​ the two variables that are being compared in this data. Which is the independent 
variable and which is the dependent variable? ​(M
​ inimum​ 2 sentences) 

The variables I am comparing: 

As the data said, there were 3 different variables in this data. However for keep the matters 
simple I would only compare 2 variables. The 2 variables I would like to compare were Video 
uploads and video views. Because the two variables that I interested the most were Video 
uploads and video views.  
I think my X variable will be video uploads, and my Y variable will be video views. I think like 
that because I think the video views of a video was dependent on how much videos you 

Select​ a Global Context and ​explain​ how this data relates to that Global Context. (​ ​Minimum 
4 sentences) 
The global context: 

I chose this data set because I had been researching a data set for Identities and 
relationships and this data set were the really interesting data that fixed in this global 
contexts. This data set fixed in this global contexts because this data set were all about how 
one individual connected to the big environment through internet. To be more detail, a video 
had been upload to Youtube. We could think that video as a individual,because it 
represented that individual by sharing the person’s life,hobbies, music etc. Others looked the 
video and gave likes and comments. Which gave their ideas and some of their experiment 
through the video. At this point, 2 different individuals connected to each other by one video 
which meant there were a new relationship between 2 individual while they never knew each 
other before. Video views could gave you a basic information of how many other individuals 
connected with you by viewed your life. Video uploads showed you how much you wanted 
to drew relationships with others and subscribers showed you how much people enjoyed 
your life.  

Predict​ what kind of correlation you will find when you graph this data and e
​ xplain​ why you 
think it will be that type of correlation. (​ ​Minimum​ 3 sentences) 

My prediction for this data set would be more videos was uploaded by the youtuber then 
more video views which the youtuber would received. Based on my prediction, this data 
would be a passive linear association. I believed there will be a strong association between 
these two variables because the two variables were very ​connected to each other. 

Now use Google Sheets to c

​ reate​ a scatterplot of your data. ​Insert​ your graph with a 
trendline here. 

What is the e
​ quation​ of the trendline and the ​R2​ value​

? What does your equation tell you 
about the data? What does your R​ value tell you about your equation? (​ ​Minimum​ 3 


Y(views)= 145881x(uploads) + 5616336614 
This equation came from the formula of linear function which is Y=mx+b. M meant the slope 
of the line, and b represented the y-intercept which was t​ he point where the line crosses the 
vertical y-axis.  
In this equation y-intercept (92, 5616336614)meant when you at least had to published 92 
videos to got yourself famous. Every time you posted 1 more videos you will got 145881 
more video views than before.(slop)  
R​2​ value: 0.117 
The R​2 value

told me the association between x and y variable. More the R​2 ​value was close 
to 1 then the stronger association between x and y variable were. when R​2 value

was equal 
to 1 then the association between x and y variable were perfect. In this particular example 
the association between video views (X variable) and video uploads (Y variable) were weak. 
Because the R​2 value

was equal to 0.117 which was not even close to 1. This meant my 
prediction of the association (Details was on the periodic section)were wrong, the correct 
association were passive (weak) linear association. 

Based on your trendline, What ​predictions​ or inferences can you make about your data? 
Provide an expected value (show your work) AND j​ ustify​ your result. ​(​Minimum​ 4 sentences) 
● You should have at least one piece of information that would be considered 
● You should have at least one prediction that requires you to extrapolate your data. 
● You should explain why both those points are significant. 
● Number Predictions: 
Interpolating prediction: If a youtuber uploaded 75000 videos then the video views would be  
● How to get this answer:  
Y(views)= 145881x(uploads) + 5616336614 
Y= (145881 х 75000) + 5616336614=16557411614  
● Extrapolation prediction:  
If a youtuber wanted 70000000000 video views of each video, then the youtuber would 
uploaded 441344 videos. 
● H ​ ow to get this answer:  
Y(views)= 145881x(uploads) + 5616336614 
70000000000= (145881 х X) + 5616336614 
X= ​441344 videos 
● Perdiction: 
​ ore videos was uploaded by the youtuber then more video views which the youtuber 
would received.  
● Justify the graph: 
1. Cluster- from the graph we know that most of the youtuber only uploaded around 
676 to 4710 videos in their channel. I think this is because even knew many of the 
youtubers are creative and liked to share as many video as possible. However it just 
to unreal to posted that much video because it was hard to produced a good quality 
video and they needed some time to relax too.  
2. Outliers - From this graph we knew that there were a outlier in this graph. 
(12661,47548839843) After checking the data I found out that the youtube account of 
this outlier was called ​T-SERIES. T-SERIES was India's largest Music Label & Movie 
Studio and almost whole entirely of India was a fan of T-SERIES and India was one of 
the countries that had the most population. Just think if every of the Indians watch 
the videos produced by T-SERIES in TV then how much video views will produce in a 
day. Well probably a lot. So I am not surprised that T-SERIES will be a O​ utliers. 


Why might someone be interested in your study?​ In other words, what makes this 
interesting data to study? What information or deeper understanding does it provide? How 
could people use the results of your study? ​(​Minimum​ 4 sentences) 

The youtuber probably would love this data set. Because the youtubers made money out of 
video views. The more views the more money the youtuber get. If a professional youtuber 
looked this data then that youtuber would get a idea of how much amount of videos they 
should produce to made the amount of video views that they expected. So they would get 
the amount of the moneys they want.  
What could you r​ ecommend as a follow-up study​? I.e.: How could your conclusions be 
used to help people make decisions or to create a new study? (​ ​Minimum​ 4 sentences) 

Since many youtubers need video views to make money. So I think the follow-up study will 
be the relationship between video views and how much money do the youtubers get. So the 
youtubers can have a very specific goal of how much video views they need for each video. 
This study will still be able to fix in the same global context which is ​Identities and 
relationships. Because all of the studies that related to social medias has the same idea of 
one individual connected to the big environment through internet. 

