Big Data Education

Four Ways Big Data Will Revolutionize Education
New technologies allow schools, colleges and universities to analyse absolutely

everything that happens. From student behaviour, testing results, careers
developments of students as well as educational needs based on changing
societies. A lot of these data has already been stored and is used for statistical
analysis by government agencies such as the National Center for Educational
Statistics. With the rise of more and more online education and the development
of MOOC’s all the data gets a completely new meaning. Big data allow for very
exciting changes in the educational field that will revolutionize the way
students learnand teachers teach. To stimulate this trend, the US Department of
Education (DOE) was part of a host of agencies to share a $200 million initiative
to begin applying Big Data analytics to their respective functions, as described in
a post by James Locus.
Improve Student Results

The overall goal of Big Data within the educational system should be to improve
student results. Better students are good for society, organisations as well
educational institutions. Currently, the answers to assignments and exams are
the only measurements on the performance of students. During his or her
student life however, every student generates a unique data trail. This data trail
can be analysed in real-time to deliver an optimal learning environment for the
student as well to gain a better understanding in the individual behaviour of the
students.
It is possible to monitor every action of the students. How long they take to
answer a question, which sources they use, which questions they skipped, how
much research was done, what the relation is to other questions answered,
which tips work best for which student etc. Answers to questions can be checked
instantly and automatically (except for essays perhaps) give instant feedback to
students.
In addition, Big Data can help to create groups of students that prosper due to
the selection of who is in a group. Students often work in groups where the
students are not complimentary to each other. With algorithms it will be possible
to determine the strengths and weaknesses of each individual student based on
the way a student learned online, how and which questions were answered, the
social profile etc. This will create stronger groups that will allow students to have
a steeper learning curve and deliver better group results.
Create Mass-customized Programs

All this data will help to create a customized program for each individual student.
Big data allows for customization at colleges and universities, even if they have
10.000s of students. This will be created with blended learning; a combination of
online and offline learning. It will give students the opportunity to develop their
own personalized program, following those classes that they are interested in,
working at their own pace, while having the possibility for (offline) guidance by
professors. Providing mass customization in education is a challenge, but thanks
to algorithms it becomes possible to track and assess each individual student.
We already see this happening in the MOOC’s that are developed around the
world now. When Andrew Ngtaught the Machine Learning class at Stanford
University, generally 400 students participated. When it was developed as a
MOOC at Coursera in 2011, it attracted 100.000 students. Normally this would
take Andrew Ng 250 years to teach the same amount of students. 100.000
students participating in a class generate a lot of data that will deliver insights.
Being able to cater for 100.000 students at once, also requires the right tools to
be able to process, store, analyse and visualize all data involved in the course. At
the moment, these MOOC’s are still mass made, but in the future they can be
mass customized.
With 100.000 students participating in a MOOC, it will give universities the

possibility to find the absolute best students from all over the world. Based on the
individual behaviour of the students, their grades, their social profile and their
networking skills algorithms can find the best students. These students can then
receive a scholarship that will increase the overall level of the university.
Improve the Learning Experience in Real-time

When students start working on their own, in their customized blended learning
program, the vast amount of teaching, which most of the time is covered by
general topics that have to appeal to all students from different levels, can be
done online and by themselves. The professor can monitor all students in real-
time and start a much more interesting and deeper conversation on the topic of
choice. This will give students the possibility to gain a better understanding of the
topics.
When students are monitored in real-time, it can help to improve the digital
textbooks and course outlines that are used by the students. Algorithms can
monitor how the students read the texts. Which parts are difficult to understand,
which parts are easy and which parts are unclear. Based on how often a text is
read, how long it takes to read a text, how many questions are asked around that
topic, how many links are clicked for more information etc. If this information is
provided in real-time, authors can change their textbooks to meet the needs of
the students thereby improving the overall results.
Even more, Big Data can give insights in how each student learns at an
individualized level. Each student learns differently and the way a student learns
affects the final grade of course. Some students learn very efficiently while other
may be extremely inefficient. When the course materials are available online, it
can be monitored how a student learns. This information can be used to provide
a customized program to the student or provide real-time feedback to become
more efficient in learning and thus improve their results.
Reduce Dropouts, Increase Results

All these analyses will improve the student results and perhaps also reduce
dropout rates at universities or colleges. Dropouts are expensive for educational
institutes as well as for society. When students are closely monitored, receive
instant feedback and are coached based on their personal needs, it can help to
reduce dropout rates as mentioned as well in a post by Hortonworks.
Using predictive analytics on all the data that is collected can give educational
institute insights in future student outcomes. These predictions can be used to
change a program if it predicts bad results on a particular program or even run
scenario analysis on a program before it is started. Universities and colleges will
become more efficient in developing a program that will increase results thereby
minimizing trial-and-error.
After graduation, students can still be monitored to see how they are doing in
the job market. When this information is made public, it will help future students
in their decision when choosing the right university.
Big data will revolutionize the learning industry in the coming years. More and
more universities and colleges are already turning to Big Data to improve overall
student results. Smarter students who study faster will have a positive effect on
organisations and society. Therefore, lets not wait and lets embrace Big Data in
education!
Big Data Applications in Education
Education offers a number of applications for the use of big data. While many of these remain rather cutting
edge, a number of them have found their way into the classroom and onto your phone.
Big Data & Education

Education is one of the first places that we're exposed to the idea of data. After all, our performance through
school is in large part based on the data that our teachers keep on us throughout the school year, known as
our grades. However, with the rise of big data, the blanket term given to the ability to gather massive amounts
of digital information and interact with it, schools may find themselves in a position to implement a great deal of
big data-motivated changes. In this lesson, we'll look at some hypothetical uses for big data in education as
well as some ways that it's being used right now.
Capturing Attention
As someone who has spent more than a little time at the head of a classroom, one of the worst things you can
do for student achievement is start to lose a child's attention. However, when you've got a large number of
faces behind those desks, it may not always be apparent who's still focused on your lesson. That's why some
big data advocates want to start to apply biometric data to students. By tracking things like heart rate, facial
expressions, and even other objects that are touched, the data can be analyzed in real time and sent back to
the teacher so that he or she can do something to regain engagement. The really interesting thing is that these
measurements can be taken via a camera on the ceiling or a watch-like device, so if you've got a mental image
of each student wearing a bunch of electrodes, you can be relieved!
‘Big data’ was supposed to
fix education. It didn’t. It’s
time for ‘small data.’
For over a decade, “big data” and “analytics” have increasingly become a part of the
education world. (Big data is a term used to describe data sets so large that they can
only be analyzed by computers, and analytics is used to describe how the data is
collected, analyzed and used.) Big data lovers believe the information can help
policy-makers make systemic improvements in student outcomes — but, so far, that
hasn’t happened. Here is a post about the problems with big data in education and
about something new that could actually make a real difference: “small data.” What
is it? Here’s the post by Pasi Sahlberg and Jonathan Hasak.
Sahlberg, one of the world’s leading experts on school reform and educational
practices, is a visiting professor of practice at the Harvard Graduate School of
Education and author of the best-selling “Finnish Lessons: What Can the World
Learn About Educational Change in Finland?” The former director general of
Finland’s Center for International Mobility and Cooperation, Sahlberg has written a
number of important posts for this blog, including “What if Finland’s great teachers
taught in U.S. schools,” and “What the U.S. can’t learn from Finland about ed
reform.”
Hasak, based in Boston, is working to change public policies to better support youth
who are disconnected from the labor market and disengaged from school. Follow
him on twitter @JonathanHasak
By Pasi Sahlberg and Jonathan Hasak

One thing that distinguishes schools in the United States from schools around the
world is how data walls, which typically reflect standardized test results, decorate
hallways and teacher lounges. Green, yellow, and red colors indicate levels of
performance of students and classrooms. For serious reformers, this is the type of
transparency that reveals more data about schools and is seen as part of the solution
to how to conduct effective school improvement. These data sets, however, often
don’t spark insight about teaching and learning in classrooms; they are based on
analytics and statistics, not on emotions and relationships that drive learning in
schools. They also report outputs and outcomes, not the impacts of learning on the
lives and minds of learners.
After The No Child Left Behind Act became law in 2002, it required all students in
grades 3 to 8 each year and once in high school to be tested in reading and
mathematics using external standardized tests. On top of that states had their own
testing requirements to hold schools and teachers accountable. As a result, various
teacher evaluation procedures emerged in response to data from these tests. Yet for
all of these good intentions, there is now more data available than can reasonably be
consumed and yet there has been no significant improvement in outcomes.
If you are a leader of any modern education system, you probably care a lot about
collecting, analyzing, storing, and communicating massive amounts of information
about your schools, teachers, and students based on these data sets. This
information is “big data,” a term that first appeared around 2000, which refers to
data sets that are so large and complex that processing them by conventional data
processing applications isn’t possible. Two decades ago, the type of data education
management systems processed were input factors of education system, such as
student enrollments, teacher characteristics, or education expenditures handled by
education department’s statistical officer. Today, however, big data covers a range of
indicators about teaching and learning processes, and increasingly reports on
student achievement trends over time.
With the outpouring of data, international organizations continue to build regional
and global data banks. Whether it’s the United Nations, the World Bank, the
European Commission, or the Organization for Economic Cooperation and
Development, today’s international reformers are collecting and handling more data
about human development than before. Beyond government agencies, there are
global education and consulting enterprises like Pearson and McKinsey that see
business opportunities in big data markets.
Among the best known today is the OECD’s Program for International Student
Assessment (PISA), which measures reading, mathematical, and scientific literacy of
15-year-olds around the world. OECD now also administers an Education GPS, or a
global positioning system, that aims to tell policymakers where their education
systems place in a global grid and how to move to desired destinations. OECD has
clearly become a world leader in the big data movement in education.
Despite all this new information and benefits that come with it, there are clear
handicaps in how big data has been used in education reforms. In fact, pundits and
policymakers often forget that Big data, at best, only reveals correlations between
variables in education, not causality. As any introduction to statistics course will tell
you, correlation does not imply causation.
Data from PISA, for example, suggests that the “highest performing education
systems are those that combine quality with equity.” What we need to keep in mind
is that this statement expresses that student achievement (quality) and equity
(strength of the relationship between student achievement and family background)
of these outcomes in education systems happens at the same time. It doesn’t mean,
however, that one variable would cause the other. Correlation is a valuable part of
evidence in education policy-making but it must be proved to be real and then all
possible causative relationships must be carefully explored.
The problem is that education policymakers around the world are now reforming
their education systems through correlations based on big data from their own
national student assessments systems and international education data bases
without adequately understanding the details that make a difference in schools.
A doctoral thesis in the University of Cambridge, for example, recently concluded
that most OECD countries that take part in the PISA survey have made changes in
their education policies based primarily on PISA data in order to improve their
performance in future PISA tests. But are changes based on big data really well
suited for improving teaching and learning in schools and classrooms?
We believe that it is becoming evident that big data alone won’t be able to fix
education systems. Decision-makers need to gain a better understanding of what
good teaching is and how it leads to better learning in schools. This is where
information about details, relationships and narratives in schools become important.
These are what Martin Lindstrom calls “small data”: small clues that uncover huge
trends. In education, these small clues are often hidden in the invisible fabric of
schools. Understanding this fabric must become a priority for improving education.
To be sure, there is not one right way to gather small data in education. Perhaps the
most important next step is to realize the limitations of current big data-driven
policies and practices. Too strong reliance on externally collected data may be
misleading in policy-making. This is an example of what small data look like in
practice:
1. It reduces census-based national student assessments to the necessary

minimum and transfer saved resources to enhance the quality of
formative assessments in schools and teacher education on other
alternative assessment methods. Evidence shows that formative and
other school-based assessments are much more likely to improve
quality of education than conventional standardized tests.
2. It strengthens collective autonomy of schools by giving teachers more

independence from bureaucracy and investing in teamwork in schools.
This would enhance social capital that is proved to be critical aspects of
building trust within education and enhancing student learning.
3. It empowers students by involving them in assessing and reflecting
their own learning and then incorporating that information into
collective human judgment about teaching and learning (supported by
national big data). Because there are different ways students can be
smart in schools, no one way of measuring student achievement will
reveal success. Students’ voices about their own growth may be those
tiny clues that can uncover important trends of improving learning.
Edwards Deming once said that “without data you are another person with an
opinion.” But Deming couldn’t have imagined the size and speed of data systems we
have today. Automation that relies on continuously gathered data is now changing
our daily lives. Drivers today don’t need to know how to use maps anymore when
they can use smart navigators that find them the best routes: airline pilots spend
more time flying on autopilot than by hand. Similar trends are happening in
education systems with countless reformers trying to “disrupt” schools as they are.
Big data has certainly proved useful for global education reform by informing us
about correlations that occurred in the past. But to improve teaching and learning, it
behooves reformers to pay more attention to small data – to the diversity and beauty
that exists in every classroom – and the causation they reveal in the present. If we
don’t start leading through small data we might find out soon enough that we are
being led by big data and spurious correlations.
Big Data in the Education
System
Big data in education is a hot topic, and getting hotter. Advocates tout its potential for reform, naysayers are
concerned about privacy, and skeptics don’t see the point of spending the money.
Few people seem to have a clear understanding of what big data in education means. Most don’t understand
the differences between fundamental data types. The responsibility for clarifying and communicating this
understanding starts with the organizations building data platforms or applications. Let’s see if we can clarify
some of the misconceptions.
Two Keys of Big Data: Privacy and Benefits to the

User
Let’s look at a recent example of big date in education: inBloom. InBloom, which is no longer operating, offered
a platform for school districts to manage student data. The company collapsed after security allegations,
including concerns that confidential information may be made public without parental permission. inBloom
made the mistake of holding personally identifiable information (PII) and failed to be clear about privacy and
public benefits. For an education company to get big data right, it needs to operate in the opposite way: by
avoiding holding unnecessary PII and communicating clearly how its service makes good use of users’ data.
(For the record: Knewton doesn’t hold any PII unless a user is able to consent and wants us to use the
information for a specific reason, like creating a private learning profile that can be carried by that user from
app to app.)
What is the Potential for Big Data?

Education produces tremendous amounts of data for two main reasons:
1. Academic study involves hours of school and homework for many years. These extended interactions
with materials produce a wealth of useful information.
2. Education content is tailor-made for big data, since it often revolves around gathering data on student
performance and learning. This concrete information can be analyzed for numerous insights.
Only recently have advances in technology and data science made it possible to make use of these vast data
sets. The potential benefits of big data in education range from improved self-paced learning to tools that
enable instructors to address student weaknesses, create productive peer groups, and free up class time for
creativity and problem solving.
The 5 Categories of Big Data

At Knewton, we divide educational data into five types: one pertaining to student identity and onboarding
(organizational training), and four student activity-based data sets that have the potential to improve learning
outcomes. They’re listed below in order of how difficult they are to attain:
1) Identity Data
Who are you? Are you allowed to use this application? What admin rights do you have? What district are you
in? How about demographic info?
2) User Interaction Data
User interaction data includes engagement metrics, click rate, page views, bounce rate (percentage of people
who come and go without exploring the website), etc. These metrics have long been the cornerstone of internet
optimization for consumer web companies, which use them to improve user experience and keep their
audience engaged.
This is the easiest to collect of the data sets that affect student outcomes. Everyone who creates an online app
can and should get this information for themselves.
3) Inferred Content Data
How well does a piece of content “perform” across a group, or for any one subgroup, of students? What
measurable student proficiency gains result when a certain type of student interacts with a certain piece of
content? How well does a question actually assess what it intends to?
Efficacy data on instructional materials isn’t easy to generate — it requires algorithmically normed assessment
items. However it’s possible now for even small companies to “norm” small quantities of items. (Years ago,
before we developed more sophisticated methods of norming items at scale, Knewton did so using Amazon’s
“Mechanical Turk” service.) Then, by splitting up instructional content and measuring (via the normed items) the
resulting student proficiency gains of students using each pool, it’s possible to tease out differences in content
efficacy.
4) System-Wide Data
Rosters, grades, disciplinary records, and attendance information are all examples of system-wide data.
Assuming you have permission (e.g. you’re a teacher or principal), this information is easy to acquire locally for
a class or school. But it isn’t very helpful at small scale because there is so little of it on a per-student basis.
At very large scale it becomes more useful, and inferences that may help inform system-wide
recommendations can be teased out. But even a lot of these inferences are tautological (e.g. “if we improve
system-wide student attendance rates we boost learning outcomes”), unreliable (because they hopelessly
muddle correlation and causation), or inactionable (because they point to known, societal problems that no one
knows how to solve). So these data sets — which are extremely wide but also extremely shallow on a per-
student basis — should only be used with an understanding of their limits.
5) Inferred Student Data
Exactly what concepts does a student know, at exactly what percentile of proficiency? Was an incorrect answer
due to a lack of proficiency, or forgetfulness, or distraction, or a poorly worded question, or something else
altogether? What is the probability that a student will pass next week’s quiz, and what can she do right this
moment to increase it?
Inferred student data are the most difficult type of data to generate — and the kind Knewton is focused on
producing. Doing so requires low-cost algorithmic assessment norming. Without normed items, you don’t have
inferred student data; you only have crude guesswork at best. You also need sophisticated database
architecture and tagging infrastructure, complex taxonomic systems, and groundbreaking machine learning
algorithms.
To build a system capable of gathering and analyzing this data, you need teams of teachers, course designers,
technologists, and data scientists. Then you need a lot of content and an even bigger number of engaged
students and instructors interacting with that content. No one would build this system to get inferred student
data for just one application — it would be much too expensive. Knewton can only accomplish it by amortizing,
over every app our platform supports, the cost of creating these capabilities. To our knowledge, we’re the only
ones out there doing it.
Educators are sometimes skeptical of adaptive apps because almost all of them go straight from
gathering user interaction data to making recommendations, using simple rules engines with no inferred
content data or inferred student data. (It is precisely because we envisioned a world in which everyone
would try to build these apps that we created Knewton — so that app makers could all build them on top of low
cost, yet highly accurate inferred content data and inferred student data.)
Big Data and the Future of Education
Big data is going to impact education in a big way. It is inevitable. It has already begun. If you’re part of an
education organization, you need to have a vision for how you will take advantage of big data. Wait too long
and you’ll wake up to find that your competitors have left you behind with new capabilities and insights that
seem almost magical.
No one institution will build functionality to acquire all five of the above data sets. Most institutions will build
none. Yet every institution must have an answer for all five to get the most out of modern data analysis
capability. The answer will come in assembling an overall platform by using the best solution for each major
data set.
It is incumbent upon the organizations building these solutions to make them as easy to integrate as possible,
so that institutions can get the most value from them. Even more importantly, we must all commit to the
principle that the data ultimately belong to the students and the schools. We are merely custodians, and we
must do our utmost to safeguard it while providing maximum openness for those to whom it belongs.

Big Data Education

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Education

Uploaded by

Copyright:

Available Formats

Four Ways Big Data Will Revolutionize Education

New technologies allow schools, colleges and universities to analyse absolutely

Improve Student Results

Create Mass-customized Programs

With 100.000 students participating in a MOOC, it will give universities the

Improve the Learning Experience in Real-time

Reduce Dropouts, Increase Results

Big Data & Education

By Pasi Sahlberg and Jonathan Hasak

1. It reduces census-based national student assessments to the necessary

2. It strengthens collective autonomy of schools by giving teachers more

some of the misconceptions.

Two Keys of Big Data: Privacy and Benefits to the

What is the Potential for Big Data?

creativity and problem solving.

The 5 Categories of Big Data

in? How about demographic info?

2) User Interaction Data

can and should get this information for themselves.

3) Inferred Content Data

student basis — should only be used with an understanding of their limits.

5) Inferred Student Data

moment to increase it?

ones out there doing it.

seem almost magical.

You might also like