CH 1 2 (2) Revised Completed

Education Analytics Based on Machine_erudition
BITS ZG628T: Dissertation
By
NISHANT SRIVASTAVA
2016HT13105
Dissertation work carried out at
YSCART, Bangalore
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

PILANI (RAJASTHAN)
April 2019
Table of Contents
Chapter 1-----------------------------------------------------------------------------------------------01-10.
Introduction.
Research Aim and objectives.
Back ground and scope.
Research hypothesis.
Research variables.
Data set defining.
Chapter 2--------------------------------------------------------------------------------------------- - 11-21.
Literature review.
Imperial Analysis.
Research Gaps and issues.
Chapter 3---------------------------------------------------------------------------------------------- 22-35.
Regression and data analysis.
Regression analysis.
Data analysis.
Chapter 4--------------------------------------------------------------------------------------------- 36-45.
Research methodology.
Machine_erudition approaches
Results and Graphs.
Regression numerical Analysis.
Comparison analysis by Py CAM TOOL.
Coding of python language
Chapter5.
Simulation and result discussion------------------------------------------------------- ----------45-51.
Chapter 6
Finding and conclusion------------------------------------------------------------------------------59-65
Chapter7.
References. -----------------------------------------------------------------------------------------------69-75
DECLARATION
Date:
We do hereby recommend that the thesis work prepared under our supervision by ----
----------------------------entitled Education Analytics Based on Machine_erudition
Be accepted in partial fulfilment of the requirements of the degree of M.TECH in
CSE (MACHINE_ERUDITION ANALYTICS ) for examination.
Counter signed
______________________________
CERTIFICATE OF APPROVAL*
The foregoing thesis is hereby approved as a creditable study in the area of
information technology carried out and presented in a manner satisfactory to
warrant its acceptance as a pre-requisite to the degree for which it has been
submitted. It is understood that by this approval the undersigned do not
necessarily endorse or approve any statement made, opinion expressed or
conclusion drawn therein but approve the thesis only for the purpose for which
it is submitted.
COMMITTEE ON
FINAL EXAMINATION
FOR EVALUATION
OF THE THESIS
List of figures and tables:
Figure 1.1. Intelligent System Model for Educational analytics

Figure.1.2. Example of K means clustering using R
Figure.1.3. Classification as a task
Figure.1.4. Prediction as a task
Fig. 3.2 Linear Regression
Fig. 3.3 Non-Linear Regression
Fig 5.1 regression graph represent numerical analytics.
Fig5.2 graph represent python code for education analytics.
Fig5.3 graph represent passing percentage yearly basis.
Table 1.1 Examples from both industry practice and academic research.
Table 1.2 Example data
Table 2.1 Educational analytics trends during the period 1998-2002.
Table 5.1 education analytics report 2018.
Table 5.2 passing percentage yearly basis report 2018.
5
Abstract:
Advanced education is at the core of the national open motivation in a developing number of
nations. Chiefs need to consider all the more efficiently the job of colleges as instruments of
financial advancement and social versatility, making it basic imperative to ground advanced
education approaches cautiously on proof about what works. Likewise, at the institutional
dimension, colleges must figure out how to direct their transformative endeavors with an
increasingly exhaustive examination of their qualities and shortcomings and a more profound
comprehension of the elements behind the aftereffects of fruitful colleges. Utilization of PCs in
instruction is currently pervasive. Understudies approach course media through an assortment of
gadgets including PCs, tablets, and advanced cells. The information produced by internet
erudition stages can possibly change the manner in which we educate and learn. In this short, we
investigate the job of examination in instructing and erudition. Specifically, how erudition
investigation can prompt instructive procedure development. We will investigate history, current
state, devices, security issues, contextual investigations, and future potential for examination
based instruction.
This examination gathers and outlines data on the utilization of erudition investigation. It
recognizes how erudition investigation has been utilized in the advanced education division,
and the normal advantages for advanced education foundations. In this exploration
proposition we will execute the Regression investigation for training examination of advanced
education. For this continuation of this examination strategy we will utilize essential
informational collection of different school information for abilities investigation and affirm
with optional informational index.
Strategy: relapse and AI approaches are utilized for this exploration. We will finish up results
as type of plotted charts and results as diagram premise on the numerical qualities driven of
direct relapse investigation.
Idea: the examination is fundamentally investigation of advanced education framework and
assess the aptitude of progress of understudies improving. By utilizing idea relapse, grouping
procedures and information digging systems for fruition this proposed work.
Systems: in this examination procedure, proposed we are executing as reproduce information
mining strategies and grouping strategy and character-izeization methods. As proceed for
execute results getting relapse examination and AI approaches are utilized fo r instruction
correlation of understudy aptitudes assessment
.Keywords: data_mining, clustering techniques, machine_erudition, regression analysis,
statistical tools. KDD cup data set. Classification techniques, data set. Education
analytics, python code, PYCAM tool, WEKA results.
6
CHAPTER 1
INTRODUCTION
1.1Introduction
There is pressure in higher educational institutions to provide up institutional effectiveness (C.

Romero & Ventura, 2010 accountable for student success (finding new ways to apply analytical
Even though data_mining (DM) has been applied in numerous industries and sectors, the
application of DM to educational contexts is limited found that they can apply data_mining to rich
educational data sets that come from course mgmt systems such as Angel, Blackboard, Web CT,
and Moodle. The emerging field of educational data_mining examines the unique ways of ape
solve educationally related problems. The recent literature related to educational data_mining.
data_mining is an emerging discipline that focuses on applying data_mining tools and techniques
to educationally related data ranging from using data_mining to improve institutional effectiveness
to improving student erudition processes mining, so this paper will focus exclusively on success
and processes directly related t retention, personalized recommender systems, and evaluation of
student erudition within course mgmt systems (CMS) are all topics within the broad f Researchers
interested in educational data_mining established the Data_mining (2009) and a yearly
international conference literature draws from several reference disciplines including data_mining,
visualization, machine_erudition and psychometrics works are published in the Conference on
Artificial Intelligence in International Journal of Artificial Intelligence in Education is a large part
of data_mining [1].
1.2. Educational Task
It is a continual process for formation of Vision and Mission of an institution, to nurture the talent
of students which addresses issues in a responsive, ethical and innovative manner to meet the
academic and administrative objectives. This task can divide into two types:
7
1.2.1 Decision making task
Active participation of the hybrid group of stakeholder to fulfil administrative oriented objectives
[2].
1.2.2 Learner based task.
Active participation of Primary stakeholder to fulfil academic objectives.
1.3 Background of Educational Analytics
There are different ways that educational analytics (2007) defined academic analytics that will
help faculty and advisors become more proactive in identifying at responding accordingly. In this
way, the results retention. Academic analytics focuses on processes that occur at the department,
unit, or college and university level. This type of analysis does not focus on the details of each
individual course, so it can be said that academic analytics has a macro perspective. Considered a
sub-field of educational analytics. Defined educational analytics as “an emerging discipline,
concerned with developing methods for exploring the unique types of data that come from
educational settings, and using those methods to better understand students, and the settings which
they learn [3]. Their definition does not mention data_mining, open to exploring and developing
other analytic related data. Also, many educators would not know how to use data_mining tools,
thus there is a need to make it easy for educators to conduct advanced analytics against data that
pertains to them (such as online CMS data, etc.). Research in Higher Education Journal
Educational data-mining research, Page making processes. Acquaintance discovery and
data_mining can be thought of as tools ational effectiveness. The complexity of data_mining to
establish a standard process for data_mining activities. The Cross Industry Standard Process for
Data_mining (CRISP-DM) is a life cycle process analyzing data_mining models. The CRISP
important because it gives specific tips and techniques on how to move from understanding the
business data through deployment of a data_mining model. CRISP-DM has six phases, include
business understanding, data understanding, data preparation, modeling, evaluation, and . The
8
benefits of CRISM-DM are that it is non-proprietary and software vendor neutral, and provides a
solid framework for guidance in data_mining. The model also includes templates to aid in analysis.
This process is used in a number may not be explicitly stated as such. Data_mining has its roots in
machine_erudition, artificial intelligence, computer science, . There are a variety of different
data_mining techniques and approaches, such as clustering, classification, and association rule
mining. Each of these approaches can be used to quantitatively analyze large data sets to find
hidden meaning an ata mining is an exploratory process, but can be used for confirmatory
investigations. It is different from other searching and analysis techniques y exploratory, where
other analyses are typically problem While data_mining has been applied in a variety of industries,
government, military, retail, and banking, data_mining has not received much attention in
educational context Educational data_mining is a field of study that analyzes and applies
data_mining related problems. Applying data_mining this way can help researchers and
practitioners discover new ways to uncover patterns and trends [4].
1.5 Approaches of data_mining in educational data
Data_mining is the field of computer science that aims to find out different potential factors and
patterns to help decision making.
Figure 1.1. Intelligent System Model for Educational analytics
9
The model in Fig.1 intends to design the Educational Data_mining. In this way, Data_mining can
facilitate Institutional Memory. Data_mining [25], also popularly known as Acquaintance
Discovery in Databases, refers to extracting or “mining" acquaintance from large amounts of data.
An educational system typically has a large number of educational data. This data [26] may be
students’ data, teachers’ data, alumni data, resource data, etc. EDM focuses on the development
of methods for exploring the unique types of data that come from an educational context. These
data come from several source, including data from traditional face-to-face class room
environment, educational software, online courseware, etc. Data_mining techniques are used to
operate on large volumes of data to discover hidden patterns and relationships helpful for decision-
making. Various algorithms and techniques such as Classification, Clustering, Regression,
Artificial Intelligence, Neural Networks, Association Rules, Decision Trees, Genetic Algorithm,
Nearest Neighbour method etc., are used for acquaintance discovery from databases.
1.5.1 Clustering Techniques
Clustering can be defined as the identification and classification of objects into different groups,
or more precisely, the partitioning of a data set into subsets (clusters) so that the data in each subset
(ideally) share some common trait of similar classes of objects (figure 1.2) [5].
Figure.1.2. Example of K means clustering using R
10
1.5.2 Classification
Classification models describe data relationships and predict values for future observations (Figure
3). Classification is the task of erudition a target function that maps each attribute set X to one of
the predefined class labels Y. There are different classification techniques, namely Decision Tree
based Methods, Rule-based Methods, and Memory based reasoning, Neural Networks, Naïve
Bayes and Bayesian Belief Networks, Support Vector Machines. In classification [26] test data is
used to estimate the accuracy of the classification rules. If the accuracy is acceptable, the rules can
be applied to the new data tuples. The classifier-training algorithm uses these pre-classified
examples to determine the set of parameters required for proper discrimination.
Figure.1.3. Classification as a task..
1.5.3 Predication Regression techniques (figure1.4) can be adapted for predication [6].
Regression analysis can be used to model the relationship between one or more independent
variables and dependent variables. In data_mining, independent variables are attributes already
known and response variables are what we want to predict. Unfortunately, many real-world
problems are not simply prediction. Therefore, more complex techniques (e.g., logistic regression,
decision trees, or neural nets) may be necessary to forecast future values.
11
Figure.1.4. Prediction as a task
1.6 Goals
Educational data_mining researchers [6] view the following as the goals for their research:
1. Predicting students’ future erudition behavior by creating student models that incorporate
such detailed information as students’ acquaintance, motivation, metacognition, and
attitudes;
2. Discovering or improving domain models that character-izeize the content to be learned
and optimal instructional sequences;
3. Studying the effects of different kinds of pedagogical support that can be provided by
erudition software; and
4. Advancing scientific acquaintance about erudition and learners through building
computational models that incorporate models of the student, the domain, and the
12
software’s pedagogy. To accomplish these four goals, educational data_mining research
uses the five categories of technical methods [7] described below.
5. Prediction entails developing a model that can infer a single aspect of the data (predicted
variable) from some combination of other aspects of the data (predictor variables).
Examples of using prediction include detecting such student behaviours as when they are
gaming the system, engaging in off-task behaviour, or failing to answer a question correctly
despite having a skill. Predictive models have been used for understanding what behaviors
in an online erudition environment—participation in discussion forums, taking practice
tests and the like—will predict which students might fail a class. Prediction shows promise
in developing domain models, such as connecting procedures or facts with the specific
sequence and amount of practice items that best teach them, and forecasting and
understanding student educational outcomes, such as success on posttests after tutoring [8].
6. Clustering refers to finding data points that naturally group together and can be used to
split a full dataset into categories. Examples of clustering applications are grouping
students based on their erudition difficulties and interaction patterns, such as how and how
much they use tools in a erudition mgmt system [9], and grouping users for purposes of
recommending actions and resources to similar users. Data as varied as online erudition
resources, student cognitive interviews, and postings in discussion forums can be analysed
using techniques for working with unstructured data to extract character-izeistics of the
data and then clustering the results. Clustering can be used in any domain that involves
classifying, even to determine how much collaboration users exhibit based on postings in
discussion forums[10]
7. Relationship mining involves discovering relationships between variables in a dataset and
encoding them as rules for later use. For example, relationship mining can identify the
relationships among products purchased in online shopping [11].
8. Association rule mining can be used for finding student mistakes that co-occur, associating
content with user types to build recommendations for content that is likely to be interesting,
or for making changes to teaching approaches [12]. These techniques can be used to
associate student activity, in a erudition mgmt system or discussion forums, with student
13
grades or to investigate such questions as why students’ use of practice tests decreases over
a semester of study.
9. Sequential pattern mining builds rules that capture the connections between occurrences of
sequential events, for example, finding temporal sequences, such as student mistakes
followed by help seeking. This could be used to detect events, such as students regressing
to making errors in mechanics when they are writing with more complex and critical
thinking techniques, and to analyze interactions in online discussion forums.
10. Key educational applications of relationship mining include discovery of associations
between student performance and course sequences and discovering which pedagogical
strategies lead to more effective or robust erudition. This latter area—called teaching
analytics is of growing importance and is intended to help researchers build automated
systems that model how effective teachers operate by mining their use of educational
systems. iv. Distillation for human judgment is a technique that involves depicting data in
a way that enables a human to quickly identify or classify features of the data. This area of
educational data_mining improves machine-erudition models because humans can identify
patterns in, or features of, student erudition actions, student behaviors, or data involving
collaboration among students. This approach overlaps with visual data analytics.
11. Discovery with models is a technique that involves using a validated model of a
phenomenon (developed through prediction, clustering, or manual acquaintance
engineering) as a component in further analysis. A sample student activity discerned from
the data was ―map probing. A model of map probing then was used within a second model
of erudition strategies and helped researchers study how the strategy varied across different
experimental states. Discovery with models supports discovery of relationships between
student behaviours and student character-izeistics or contextual variables, analysis of
research questions across a wide variety of contexts, and integration of psychometric
modelling frameworks into machine-learned models.
In the remainder of this section, each area is explored in more detail along with examples from
both industry practice and academic research.
14
Table 1.1 Examples from both industry practice and academic research.
Application Area Questions Type of Data Needed for

Analysis
User acquaintance modelling What content does a student Student’s responses (correct,
know (e.g., specific skills and incorrect, partially correct),
concepts or procedural time spent before responding
acquaintance and higher order to a prompt or question, hints
thinking skills) [13] requested, repetitions of
wrong answers, and errors
made The skills that a student
practiced and total
opportunities for practice
Student’s performance level
inferred from system work or
collected from other sources,
such as standardized tests
User behavior modeling What do patterns of student Student’s responses (correct,

behavior mean for their incorrect, partially correct),
erudition? Are students time spent before responding
motivated? to a prompt or question, hints
requested, repetitions of
made Any changes in the
classroom/school context
during the investigation period
of time
15
User experience modeling Are users satisfied with their Response to surveys or
experience? questionnaires Choices,
behaviors, or performance in
subsequent erudition units or
courses
User profiling What groups do users cluster Student’s responses (correct,

into? incorrect, partially correct),
time spent before responding
to a prompt or question, hints
requested, repetitions of
made
Domain modeling What is the correct level at Student’s responses (correct,

which to divide topics into incorrect, partially correct)
modules and how should these and performance on modules
modules be sequenced? at different grain sizes
compared to an external
measure A domain model
taxonomy Associations
among problems and between
skills and problems
Erudition component analysis Which components are Student’s responses (correct,

and instructional principle effective at promoting incorrect, partially correct)
analysis erudition? What erudition and performance on modules
principles work well? How at different levels of detail
effective are whole curricula? compared to an external
measure A domain model
taxonomy Association
16
structure among problems and
between skills and problems
Trend analysis What changes over time and Varies depending on what
how? information is of interest;
typically would need at least
three data points
longitudinally to be able to
discern a trend Data collected
include enrollment records,
degrees, completion, student
source, and high school data in
consecutive years
Application Area Questions Type of Data Needed for

Analysis
Adaptation and What next actions can be Varies depending on the

Personalization suggested for the user? How actual recommendation given
should the user experience be May need to collect historical
changed for the next user? data about the user and also
How can the user experience related information on the
be altered, most often in real product or service to be
time? recommended Student’s
academic performance record
17
1.7. Machine learning basics.
1.7.1. Definition
A common definition of machine_erudition is: “A computer program is said to learn from

experience E with respect to some class of tasks T and performance measure P if its performance
at tasks in T, as measured by P, improves with experience E.” Basically, machine_erudition is the
ability of a computer to learn from experience. Experience is usually given in the form of input
data. Looking at this data, the computer can find dependencies in the data that are too complex for
a human to form. Machine_erudition can be used to reveal a hidden class structure in an
unstructured data, or it can be used to find dependencies in a structured data to make predictions.
Latter is the main focus of the thesis [6].
1.7.2 Predictive analytics
Predictive analytics is the act of predicting future events and behaviours present in previously
unseen data, using a model built from similar past data. It has a wide range of applications in
different fields, such as finance, education, healthcare, and law . The method of application in all
these fields is similar. Using previously collected data, a machine_erudition algorithm finds the
relations between different properties of the data. The resulting model is able to predict one of the
properties of future data based on properties.
Table 1.2 shows example data about students who passed or failed at an exam, along with other
information about students.
Table 1.2 Example data
18
The aim is to predict if the student has passed the exam or not by looking at the other variables
(the column of the table). In this case, the column “Passed” is called the dependent variable, and
every other variable is called the independent variable. In the “Passed” column, “1” means student
has passed the exam and “0” means failure in the exam. By applying a machine_erudition
algorithm to this data, a function can be created, also known as the prediction model that gives the
value for the dependent variable as output, and takes every other variable as input. The act of
creating a prediction model from previously known data is called training, and such data is called
the training data or a training set. After the model is created, it must be applied to another data set
to test its effectiveness [7]. Data used for such purpose is called test data or test set. The reason for
using two different sets is to ensure that the model is flexible enough to be used on data sets other
than the one it was built with. Otherwise, the problem of over fitting may occur, which is when a
model is accurate with its original data set, but performs poorly on other data sets, because it is
overly complicated. A common method to avoid over fitting is to divide the input data set into
training and test sets.
1.8 Erudition Analytics Challenges in Education
The review of the literature revealed the LA challenges about data tracking, data collection, and
data analysis, a connection with erudition sciences, erudition environment optimization, emerging
technology, and ethical concerns regarding legal and privacy issues.
1.8.1 Data tracking
The digital tracking of information is a technique used by analysts to determine how best to present
new erudition opportunities as the wave of education continues to move forward into the second
decade of the 21st Century. The tracking of big data represents the monitoring system. Current
trend tracking indicators regarding the delivery and dissemination of instruction depend on the
erudition mgmt system used by the institution. Platforms such as Moodle, Canvas, EPIC, and
Blackboard have the capability to track the number of times an individual logs into the course
room. These platforms also provide significant documentation to determine how involved the
student was upon their login. Such tracking provides those who plan and implement new
19
educational programs with valuable information. The monitoring reveals how engaging the
curriculum presented is, as well as identifying areas that cause confusion. Data collection. The
collection of data can be a challenge when looking at LA. Nonetheless, it represents an important
component in planning for continued implementation of educational program growth. Educators
must consider several elements. They must consider the availability of resources at a venue. Next,
instructors must establish a viable social platform as it directly relates to interactions between
learners to synthesize the educational content. Finally, instructors must discriminate whether the
learner population possesses the requisite suitability for this type of erudition environment and
acquaintance acquisition. Besides these challenges, gaps exist because of the inability to share
proprietary information gathered by the institution. Further, another problem emerges because the
creation of the ideal framework to disseminate educational curriculum takes teamwork, especially
among the organizations bidding against one another to capture the learner population who want
to engage in this type of erudition experience [5].
1.8.2 Evaluation process
An important consideration of data collection concerns how erudition analytics has become a
force in the evaluation process. As greater amounts of educational resources become available
online, there is a subsequent increase in the total data available regarding erudition interactions.
For erudition analytics to help instructor evaluation to function appropriately, data needs to be
delivered in a timely and accurate manner. Erudition analytics can provide powerful tools for
developing meaning from interactions and actions within a higher education erudition
environment. With the unprecedented explosion of available data for online interactions, it is
critical for the continued development of the evaluation process. LA can translate from other fields
as interest in the data growth in education becomes more focused. Noted that statistical evaluation
of rich data sources already exists within other professions and fields [8].
1.8.3 Data analysis
Technical challenges also exist from the assimilation of the data analysis because of the
presentation format of the data. Erroneous data can skew the findings causing a misinterpretation
of the overall population. Such scenarios are commonplace in the online erudition environment.
20
For example, an instructor may create a student profile to isolate an assignment that requires
grading, test the ease of submission process, or to determine if there are any gaps in the
presentation of the curriculum as it appears for students. Creation of a non-existent learner
introduces redundant information that appears in the course without identification. This data does
not represent student information but rather misinformation created by the instructor that flows
into the big data pool of information. When manually conducting data analysis, this information
can be easily identified from the population. However, working with data collection from the
erudition analysis vantage point adds a significant margin of error to the outcome of overall results.
Erudition sciences connection. According to Pea, personalized erudition and erudition
opportunities demonstrate an inability to leverage erudition analytics optimally; therefore, “the
endgame is personalized Cyber erudition at scale for everyone on the planet for any acquaintance
domain”. Asserted that to optimize and fully understand erudition requires understanding how
acquaintance develops and how to support acquaintance development. Further, researchers must
understand the components of identity, reputation, and affect. Researchers must find ways to
connect “cognition, metacognition, and pedagogy” to help improve erudition processes. With a
stronger connection to erudition sciences, erudition analytics can promote effective erudition
design. Erudition environment optimization. Noted that as learners expand the boundaries of the
erudition mgmt system into open or blended erudition settings, researchers must discover the
problems faced by students and how to determine success from the learners’ perspectives. This
process will encumber a shift toward more challenging datasets that may include mobile,
biometric, and mood data. Besides the individual erudition aspect of erudition analytics,
researchers are seeking to address another component known as social erudition analytics. In this
context, social erudition analytics focuses on the collaboration and interaction of learners in a
socialized erudition environment, not just on individual erudition outcomes [4].
1.8.4 Emerging technology
The full potential of erudition analytics relating to erudition requires continued and emerging
technology that presently remains in the younger stages. This revelation presents a challenge as
the technology continues to develop to stay constant with the growth of erudition analytics.
21
Further, to fully understand the method and practice of teaching, more research is needed. Research
focusing on erudition analytics and pedagogy is still in the beginning stages.
1.8.5 Ethical and privacy issues
Another issue that emerges about erudition analytics concerns the ethical, legal, and risk
considerations. Because of dynamic changes in technology as well as how users store data and
applications in cloud-based systems, “the challenges of privacy and control continue to affect
adoption and deployment”. Further, the ethical and legal complexities of erudition analytics
challenge institutions that seek to implement their usage. For example, these considerations can
include obvious areas of privacy considerations such as consent, data accuracy, how to respect
privacy, maintaining anonymity, opting out of data gathering, and the potential effects to students.
Additional concerns include data interpretation, data ownership, data preservation, sharing data
with parties outside of the institution, and proper training of staff members regarding the handling
of data. Further, the question becomes who owns this aggregate data, because having an
infrastructure with the capacity to house large amounts of information becomes a daunting task.
Because of these different issues, institutions must achieve a balanced approach to safeguard data
while also assuring benefits to the educational process through the use of four guiding principles.
These principles consist of clear communication, care, proper consent, and complaint. Institutions
must demonstrate adherence to legal and ethical parameters to safeguard student privacy concerns
while also achieving the educational goals for students and educators [8].
1.9 Background
A part of new project idea our organisation want to initiate to develop a product, which helps not
only teachers, students but parents also those who are illiterate and not able to read and write but
listen and understand the language.
1.10 Objectives
Identifying the attributes, which are majorly, effected the student’s performance and providing
suggestion by using Machine_erudition Algorithms and Big Data. Along with python
Implementation and Pycam tool.
22
In objectives, we will explain regression analysis on student data set for finding parameters which
is affected education system and scope of success ratio.by using linear regression we will execute
numerical analysis based on various parameters to show employability ratio increased.
1.11 Scope of work
Research on machine_erudition has yielded techniques for acquaintance discovery or data_mining

that discover novel and potentially useful information in large amounts of unstructured data. These
techniques find patterns in data and then build predictive models that probabilistically predict an
outcome. Applications of these models can then be used in computing analytics over large datasets.
Two areas that are specific to the use of big data in education are educational data_mining and
erudition analytics. Although there is no hard and fast distinction between these two fields, they
have had somewhat different research histories and are developing as distinct research areas.
Generally, educational data_mining looks for new patterns in data and develops new algorithms
and/or new models, while erudition analytics applies known predictive models in instructional
systems.
Educational data_mining and erudition analytics research are beginning to answer increasingly
complex questions about what a student knows and whether a student is engaged. For example,
questions may concern what a short-term boost in performance in reading a word says about
overall erudition of that word, and whether gaze-tracking machinery can learn to detect student
engagement. Researchers have experimented with new techniques for model building and with
new kinds of erudition system data that have shown promise for predicting student outcomes.
Previous sections presented the research goals and techniques used for educational data_mining
and erudition/visual analytics. This section presents broad areas of applications that are found
in practice, especially in emerging companies. These application areas were discerned from the
review of the published and gray literature and were used to frame the interviews with industry
experts. These areas represent the broad categories in which data_mining and analytics can be
applied to online activity, especially as it relates to erudition online. This is in contrast to the
more general areas for big data use, such as health care, manufacturing, and retail.
23
These application areas are (1) modelling of user acquaintance, user behaviour, and user
experience; (2) user profiling; (3) modelling of key concepts in a domain and modelling a
domain’s acquaintance components, (4) and trend analysis..
. Another application area concerns how analytics are used to adapt to or personalize the user’s
experience. Each of these application areas uses different sources of data, and Exhibit, briefly
describes questions that these categories answer and lists data sources that have been used thus
far in these applications. In the remainder of this section, each area is explored in more detail
along with examples from both industry practice and academic research.
Working with big data using data_mining and analytics is rapidly becoming common in the
commercial sector. Tools and techniques once confined to research laboratories are being
adopted by forward-looking industries, most notably those serving end users through online
systems. Higher education institutions are applying erudition analytics to improve the services
they provide and to improve visible and measurable targets such as grades and retention. Now,
with advances in adaptive erudition systems, possibilities exist to harness the power of feedback
loops at the level of individual teachers and students. Measuring and making visible students’
erudition and assessment activities open up the possibility for students to develop skills in
monitoring their own erudition and to see directly how their effort improves their success.
Teachers gain views into students’ performance that help them adapt their teaching or initiate
interventions in the form of tutoring, tailored assignments, and the like. Adaptive erudition
systems enable educators to quickly see the effectiveness of their adaptations and interventions,
providing feedback for continuous improvement. Researchers and developers can more rapidly
compare versions of designs, products, and approaches to teaching and erudition, enabling the
state of the art and the state of the practice to keep pace with the rapid pace of adoption of online
and blended erudition environments.
24
1.12 Tools and techniques
Implementation Techniques:
The challenges faced in processing Big Data technologies are overcome by using various
techniques. The most popular techniques used in educational data_mining are listed below.
Regression – Regression is used in predicting values of a dependant variable by estimating the

relationship among variables using statistical analysis
Nearest Neighbour – In this technique the values are predicted based on the predicted values of
the records that are nearest to the record than needs to be predicted.
Clustering – Clustering involves grouping of records that are similar by identifying the distance
between them in an n-dimensional space where n is the number of variables.
Classification – Classification is the identification of the category/class to which a value belongs

to, based on previously categorized values.
Open Source Tools:
Several Open source tools exist which help in taming Big Data [9] some of the top tools are
listed below.
Mongo DB is a cross platform document oriented database mgmt system. It uses JSON like
documents instead of a table-based architecture.
Hadoop is a framework that allows distributed processing of big datasets across clusters of
networked computers using simple programming models.
Map Reduce is a programming model and framework used by Hadoop. It enables processing huge
amount of data in parallel on large clusters of compute nodes.
25
Orange is a python based tool for processing and mining big data. It has an easy to use interface
with drag & drop functionalities with variety of add-ons.
Weka is a java-based tool for processing large amount of data. It has a vast selection of algorithms
that can be used in mining data.
Proprietary Tools:
SAP HANA is a proprietary in-memory RDBMS capable of handling large amount of data. It uses
Parallel in Memory relational query techniques, columnar stores.
1.13 Plan of Work
The first chapter is an introduction of the thesis work. The rest of the chapters are organized as
follows:
 Chapter 1 Introduction of Education analytics, concept. The first chapter is an
introduction of the thesis work
 Chapter 2 Machine_erudition frame work. The second chapter is discussed about
literature survey.
 Chapter 3 proposed work. Regression analysis for evaluation of higher education
analytics based on numerical authentication based on linear regression method and multiple
regression method along with machine_erudition approaches.
 Chapter 4. Research methodology. In this chapter we describe regression analysis and
numerical analysis.
 Chapter5. Implementation detail and Result discussion with simulation TOOL.
 Chapter 6. Conclusion and findings.
 Chapter 7. References.
26
CHAPTER 2
LITERATURE SURVEY
2.1 Introduction
Sunil Erevelles [13] et al., illustrated that the study of consumer analytics lies at the junction of
big data and consumer behaviour. Data provide behavioral insights about consumers; marketers
translate these insights into market advantage. Hidden insights means, predicting the possible
activities that are unexploited by the consumers. Even though big data is the new form of capital
in recent trend, it failed to exploit its benefits in many firms. To profit from this new form of
capital, firms must allocate appropriate physical, human and organizational capital resources to
big data. The conceptual frame work is introduced to illustrate the impact of big data, using this
frame work, a firm can create a value and gain a competitive advantage. In today’s evolving
technology the consumer data for any organization is generated incessantly of both transactional
and behavioral data [11]. The persistent rapidity of data is constantly generated in 3 dimensions
volume, velocity and variety. The volume of data is constantly increasing the consumer big data 1
zeta byte every two years. The velocity of data created is analyzed by evidence at given time.
Comparing the census data and clothing retailer data i.e. what the consumers are posting on social
networks about the retailer, gives the ability to make decisions. The variety of structured and
unstructured data are been organized using various software that bring order to the unstructured
data. The standard generalized mark-up language software enables the viewing of videos to
determine common elements that an organization wants to capture. Resources such as physical
capital resources, organizational capital resources. Enormous amount of data generated in context
of resources [11].
In this hyper-competitive environment, organization must often update and reconfigure the
resources with the changes in environment to sustain in competitive advantage. Both the dynamic
and adaptive capability achieved through consumer insights got from Big data [16]. Ignorance is
27
define to say that don’t know, in general researcher focus on what they know, similarly it’s
important to focus on ignorance because it facilitates latitude and liberty for inspiring creativity
within an organization. Inductive reasoning, one method of scientific review starts with observing
a phenomenon before forming hypothesis.
Farshad Kooti [14] et al., Recently huge amount of population are spending large fraction of their
economy in shopping and purchases. Consumer from affluent areas purchase more expensive item
frequently, which results in more money spend on online shopping. Temporal patterns of consumer
is identified to fine their finite budgets and they will wait for last purchase to buy it. It’s observed
that shoppers who email each other purchase more similar items than socially unconnected
shoppers. Using temporal patterns prediction is improved for consumer and when they will make
online purchase again.
Mostafa Sabbaghi [15] et al., indicating that the paper aims at providing astute statistical analysis
of Electronic Waste(e-waste) dynamic nature by reviewing the effects of design features, brand
and consumer type on the electronics tradition time and end of use time-instorage.
Higher education today operated in a complex and competition environment and therefore has to
face its challenges accordingly. More over different stack holders in higher education has paved
way for big data to pay a crucial role. Vast data that keeps coming every day can be utilized only
through big data. Big data benefits have not be used its fullest possible extent in health care. In
terms of business value health care has not reached out of big data. Readily available messages
from social media and consumer generated content in the internet can be used to solve real life
problems using Big Data analytics that will eventually reshape our understanding of the field and
decision making. The challenges of storing, accessing it in real time, analysis, obstacles and
security become paramount. This paper discusses as to how using predictive analysis in big data
could be used to progress decision making in different applications.
Ben K. Daniel [16] et al., suggested that today big data operated in a complex and competition
environment and therefore has to face its challenges accordingly. More over different stack holders
in higher education has paved way for big data to pay a crucial role. Vast data that keeps coming
every day can be utilized only through big data. Data today is too big and too fast. Therefore
28
conventional database cannot process them. Therefore a method to capture, store, distribute,
manage and analyze diverse larger data is big data. Stored data can be properly explored using
analytic techniques. System such as Apache Hadoop, Horton works, and map reduce and tableau
are powerful software that could be used even without advanced technical acquaintance. Big data
in higher education can have an impeding effect in mgmt decision making theory vast available
administrative and operational data can be processed and assessed to predict future performance
and identify potential areas in academic programming, research teaching and erudition. A large
scale statistical techniques combined with predictive modelling helps improve decision making.
Big data can have an important role in three data models: Descriptive, Prescriptive and Predictive.
Descriptive analytics analysis the raw data received and predictive analysis tries to figuring out
future probabilities based on predictive analysis, prescriptive information are given to students and
stack holders. Value of big data will be based on creating governing structures and creating more
progressive and better policies. Security and privacy are other challenges faced by big data.
Greenberg and Buxton’s [17] et al., stressed the need for “higher education to transform its own
culture.” Information technology should be used to apply rigorous approaches to analytics in
“supporting evidence-based decision-making and mgmt”. In similar context, the online erudition
research community must bring transparency to effective practice of erudition analytics to deter
potentially wrongful uses of big data in online courses.
Kelderman [18] et al., reported that accreditors are attempting to keep pace with new federal
regulations to provide tighter oversight on online programs, “requiring colleges to prove that
students learn as much in distance courses as in face-to-face courses”. These requirements upsurge
the pressure on educational institutions to respond to new rules and provide clear valuations of
quality online education. Moreover, the trainer has a need to know what is happening in the online
course; the use of erudition analytics would produce information about student progress and the
instructional process.
Siemens [19] et al., insisted that the online erudition community wants to guide the direction as to
how acquaintance analytics are used in defining and evaluating big data in online courses. This
guidance includes the need for defining data, emerging erudition analytics methodologies and
29
tools, picturing and sharing the nature of education analytics output, and informing effective
process and practice that leads to expressive decision-making about learner performance.
Waltman [20] et al., claimed, needing cited Papers are not always suggestive of impactful research.
However, as the authors further noted, on normal this idea does tend to hold true. As such, it is
sensible to assume that high citation rates do imitate a certain level of excellence.
Zheng Xiang [21] et al., suggested that big data generated through internet traffic, mobile
transaction, user generated content, social media, sensor networks and other. This Big Data is
crucial for business intelligence and intern help understand customers, competitors, market
character-izeistic, products, business environment impact of technologies and so forth .
Unstructured human authored document can be put into sentiment analysis technologies. As far as
this research paper is concerned, it studies customer satisfaction in a hotel as soon as the person
receives the product or service given the complexity of hotel guest satisfaction measuring them is
very challenging. Though customer reviews are found in many travel websites, expedia and travel
velocity, allow only their customers to write reviews and share their experience. This prevents in
authentic reviews. The main goal of this study is to understand the content and the structure of
customer reviews and how their associated with Hotel guest satisfaction which pertains to overall
rating. For this data were collected during the period of December 18-29 in 2007 using an
automated web crawler. It collected 10,537 hotels resulting in 60,648 customer reviews. From it
6642 unique words were identified. Microsoft access with unique identifier were assigned to every
hotel property and customer reviews. After this data analysis followed test analysis process which
include, stemming, misspelling identification, removal of stop words, such as pronounce, adverbs,
conjunction. Coding scheme was established to guide the domain identification process, removing
generic nouns that lack specificity, generic verb, words with high ambiguity and finally hotel brand
name. Based on the findings and using pivotal table in Excel sheet, 416 words were considered
irrelevant. After this the findings were presented in two parts. (i)Basic description. (ii) Clean data.
Interestingly top ten sites had 60% of total properties in clean data, while had only 34% original
data set. Conventional method rely on set of predefined hypothesis justified using previously
existing acquaintance, big data let research understand a new pattern of reflective of customers
evaluation there by generating and creating new acquaintance. This study is not based on sentiment
30
analysis which are subjective in nature, were as it is purely analytical in nature. Although this study
evolves a new field of acquaintance and understanding unlike conventional guest survey studies,
it has much limitation and therefore the finding should be treated with caution, because customer
reviews are basically a self-selection of bias. However it does not reduce the internal validity.
Therefore future study applying method of triangulation to multiple sources of data to validate the
semantic structure could be evolved using big data analytics. Authors might have improved their
survey little deep. May be in future it may fulfil [[13].
Hyun Jeong “Spring” Han [22] et al., indicated that the hotels are rated based on guest/customer
satisfaction. The strategy resulted in negative comments having more weight than the positive
comments. This uneven weighing, leads to guest’s bad feeling of poor service which will submerge
the good service of positive feelings. In this study, text analytics using regression analysis to
improve guest’s assessments and their ratings is done through big data analytics.
Zhen Xiang [23] et al., suggesting that the association between guest experience and satisfaction
appears strong, suggesting that these two domains of consumer performance are character-
izeistically connected. This study discloses that big data analytics can produce new visions into
variables that have been widely studied in presenting hospitality literature. It even implications for
theory and practice as well as directions for future research are discussed. Incorporates, Data
Confidentiality and Data quality and Privacy. It prevents unauthorised accessing of data. Data
quality depends on data Privacy. Data governance is a total sum of usability, availability, integrity
and security. Data governance is essential to get funds, increase confidence levels, increase speed
in accessing data, fast decision making and precise trustworthy information. It has six steps
namely, Data extraction, content analysis, data maintenance, process computing, secure delivery
and fast delivery. Its benefits are six folds. They are heterogeneous data integrations, security and
privacy, accounts deeper acquaintance, Data validity, data protection and faster delivery.
Henry C. Lucas [24] et al. Information technology has altered the traditional way of doing business
by redefining business capabilities and entering into a new market space. There has been a marked
change in the process of doing business [5] creating new organisation like Amazon, Facebook,
Google; developing new relationships in terms of social media; creating new user experience [11].
31
creating new market like iTunes. Impacts of information technology have been different on
individuals, firms and economy or society at large.
Nevertheless, since only a fraction of the world population access digital technologies to achieve
‘native’-like fluency in their use, the term “digital natives” is not a fit description and for this
reason (amongst others) it has become less accepted in the current educational discourse.
Education, experience, breadth of use and self-efficacy are more relevant than age in explaining
how people become “digital natives”. As a response, proposed a different classification based on
a study comprising 2096 students in Australian universities: “power users (14% of sample),
ordinary users (27%), irregular users (14%) and basic users (45%)”. However, rather than a
discrete classification, a more useful typology is a continuum, as individuals are placed along it
depending on a number of factors.
Jones and Shao [25] et al. indicate that various demographic factors affect student responses to
new technologies, such as gender, mode of study (distance or place-based) and whether the student
is a home or international one. A JISC report questions the validity of certain attributed character-
izeistics of this generation. Examples are: a preference for “quick information” and the need to be
constantly connected to the web, now proved to be myths: these traits are not generational.
Whilst Turkle [26] et al., notes that young people have digital devices alwayson and always-on-
them, becoming virtually ‘tethered’, this behaviour is not restricted to young people. For these
reasons, this term has increasingly become replaced by the term digital residents and its counterpart
digital visitors.In any case, we acacquaintance that many of our students today are not only
engaged in digital technologies in a daily basis, but in their world there have always been digital
technologies in various forms. Even with the proviso that this behaviour may not be generalisable
“outside of the social class currently wealthy enough to afford such things”, it is an observable
behaviour that is becoming increasingly common as digital technologies have become more
affordable than ever before. This suggests that in the planning of a study involving higher
education students as participants, not only those in this generation should be considered, but also
those outside it, such as mature students. Ascribing generational traits to today’s learners is
somewhat an overgeneralisation.
32
As Jones and Shao [27] et al., point out, global empirical evidence indicates that, on the whole,
students do not form a generational cohort but they are “a mixture of groups with various interests,
motives, and behaviours”, not cohering into a single group or generation of students with common
character-izeistics. In particular, research on higher education students often focus on the standard
age band of students under 21 years of age, not accounting for mature students (this term is
typically used to refer to those who are over this threshold upon entrance). Even amongst this
group, there are significative differences in behaviour and attainment. Studies have found that
older mature students were more likely to study part-time than full-time, as family and work
commitments have been acquired. In fact, 90% of part-time undergraduate students are 25 years
old or over and as many as 67% are over 30 (Smith, 2008).
On this note, Baxter and Hatt [28] et al., argued that mature students could be disaggregated
according to age bands seemingly correlating with various levels of academic success. Therefore,
instead of considering standard and mature students solely (under and over 21 respectively), they
introduce the distinction between younger and older matures, as those over 24 were more likely to
progress through into their second year, despite a longer period time out of education. In general
the younger mature learners were more at risk of leaving the course than older mature students.
However, even this division may well be still a poor generalisation about (mature) students, as
beside their age, there are a myriad of more relevant factors affecting their experience, such as
their route into HE, their background and motivation to study, all are difficult (if not pointless) to
use for a classification of mature learners. An approach that acacquaintances the individual
character-izeistics of learners is to be preferred to those requiring conflating them into a
homogeneous group, as conclude, requiring educational providers to act on means to identify these
character-izeistics in order to adopt such an approach.
Erudition analytics (also known as academic analytics and educational data_mining), are widely
regarded as the analysis of student records held by the institution as well as course mgmt system
audits, including statistics on online participation and similar metrics, in order to inform
stakeholders decisions in HE institutions. Academic analytics are considered as useful tools to
study scholarly innovations in teaching and erudition. According to these authors, the term
academic analytics was originally coined by the makers of the virtual erudition environment (VLE)
33
Blackboard TM, and it has become widely accepted to describe the actions “that can be taken with
real-time data reporting and with predictive modelling” which in turn helps to suggest likely
outcomes from certain behavioural patterns. Educational data_mining involves processing such
data (collected from the VLE or other sources) through machine_erudition algorithms, enabling
acquaintance discovery, which is “the nontrivial extraction of implicit, previously unknown, and
potentially useful information from data”. Whilst data_mining does not explain causality, it can
discover important correlations which might still offer interesting insights. When applied to higher
education, this might enable the discovery of positive behaviours, such as for example, whether
students posting more than a certain number of times in an online forum tend to have higher final
marks, or whether attendance at lectures is a defining factor for academic success, or even for any
of its measures such as “retention, progression and completion”.
Watanabe et al. [29] in this thesis, the use of badges by all participants is easily enforced in an
environment with a strict dress code, such as school uniforms. Since our population of interest is
higher education students, smartphones are probably more appropriate than smart badges as sensor
carriers, but it is nonetheless interesting to see how much can be learned from sensor data,
especially when combined with erudition analytics, as in the case of certain behaviours can be
found to be related to a measure of success. Smartphones present another advantage over badges.
Equipped with ambient light sensors, proximity sensors, accelerometers, GPS, camera(s),
microphone, compass and gyroscope, plus WiFi, Bluetooth radios, a variety of applications can be
built to gather a great range of sensed data. Thanks to their communication and processing
capabilities, smartphones could support a sensing architecture such as the one depicted. Contextual
information can be inferred from the sensor data hence gathered, and the context determined as in,
for example, location. However, it has been long accepted that “there is more to context than
location”. Contextual information broadly falls into one of two types: physical environment
context (such as light, pressure, humidity, temperature, etc) and human factor related context such
as information about users (habits, emotional state, bio-physiological conditions, etc), their social
environment (co-location with others, social interaction, group dynamics, etc), and their tasks
(spontaneous activity, engaged tasks, goals, plans etc). [21].
34
In (Ferguson, 2012) [30], erudition analytics is defined as the measurement, collection, analysis
and reporting of data about learners and their contexts, for purposes of understanding and
optimizing erudition and the environments in which it occurs.
According to Johnson et al. “erudition analytics is an educational application of web analytics

aimed at learner profiling, a process of gathering and analysing details of individual student
interactions in online erudition activities. The goal is to build better pedagogies, empower active
erudition, target at-risk student populations, and assess factors affecting institutions in one year’s
time or less to make use of mobile erudition and student data that can be gathered through online
erudition environments [31].
Further, Kasemsap, [32] et al., highlights that application of erudition analytics contributes to
improvement of educational performance and reach strategic goals in information age. Further is
presented the role of erudition analytics (LA) in global higher education illustrating the theoretical
and practical overview of LA, LA and educational data_mining, LA and erudition mgmt system,
LA and Course Signals, LA and acquaintance perspectives, LA and social networking sites and
significance of LA in global higher education. Concerning application of erudition analytics as
strong driver for modernization of higher education, some universities in the United Kingdom,
United States and in Australia have been deployed erudition analytics at a national level. Currently
there is not such a steep progress in pre-tertiary education. The report Erudition Analytics in Higher
Education presents an overview of the evidence currently available of the impact that analytics are
having on teaching and erudition at universities in the United States, Australia and the United
Kingdom.
Completion and student success.” Considering erudition analytics as one of the promising trends
that leads to new insights on learners' behaviour, interactions, and erudition paths, as well as to
improvement of the technology enhanced erudition methods in a data-driven way expected
erudition analytics to be mature and become a reality by 2016. There are many references to
erudition analytics research and implementation in higher education. Providing high quality,
relevant and widely accessible higher education is a fundamental goal of the European Higher
Education Area (High Level Group, 2014). The report includes s list of new modes of erudition
35
and teaching to modernize higher education among which more personalized erudition informed
by better data. The advances in big data and erudition analytics can help higher education system
customize teaching tools and develop more personalized erudition pathways based on student data.
Bring Your Own Device (BYOD), along with erudition analytics and adaptive erudition, are
according to expected to be increasingly adopted by higher education.
The results of this study indicate that feature engineering provides more improvement to prediction
results than method selection. Despite feature engineering was done in a limited capacity, it made
a bigger difference in prediction performance. Furthermore, biggest leap in improvement was
made in the case of decision trees, where both feature selection and feature modification is applied
to the data. When trying to improve the prediction of student performance, the modification of
input data is an important factor besides selecting the right method for the data. Although feature
engineering was more effective than method selection, the combination of both approaches
provided the best results. In both data sets, best possible accuracy values were a clear improvement
over the baseline accuracy values. This shows that using machine_erudition is an effective way of
predicting the student performance.
2.2 Trends of educational analytics research during the period 1998-2012.
A survey on EDM for the period 1998 to 2002 is listed in table-2.1. The leverage points of this
survey are the trends of DM Techniques, Tools, Dataset used and respective Educational
outcomes.
Table 2.1 Educational analytics trends during the period 1998-2002.
36
Author(s) and Data_mining Data_mining Dataset Educational Out
Year Technique(s) Tool(s) Come
Zaïane,O. Xin ,M. and Time Series Analysis DB Miner Collected in web logs Design and
Han,J.1998 [33] from WWW, web log modification of
data cube & web log websites using access
database patterns in weblogs
Sison,R.Shimura, M. Classification To discover

1998[34] aggregate and
individual paths of a
learner in distance
education system
Ingram.1999- 2000 Web Mining Collected in web logs Planning

[35] from WWW
Ha et al. 2000[36] Web Mining On-line user

behaviors
Za¨ıane O. 2001[37] WebLogMiner WebSIFT To provide a more

systematic view to the
modern AIWBES
Za¨ıane O.2002[38] Association Rules SPSS data analyzer, WIRE website Student behavior
Mining C++ interaction data2001 erudition
37
Chapter 3.
Regression Analysis for Education Analytics.
3.1 REGRESSION MODELS
Regression models involve the following variables:
 The unknown parameters, denoted as B, which may represent a scalar or vector.

 The independent variables, denoted as X.
 The dependent variable, denoted as Y.
In various fields of application, different terminologies are used in place of dependent and
independent variables.
A regression model relates Y to a function of X & B.

Y=f(X,B)
The approximation is usually formalized as E(Y/X)= f(X,B). To carry out regression analysis, the
form of the function f must be specified. Sometimes the form of this function is based on
acquaintance about the relation between Y & X that does not rely on data. If no such acquaintance
is available, a flexible or convenient form of f is chosen.
Assume now that the vector of unknown parameters B is of length k. In order to perform a
regression analysis the user must provide information about the dependent variable Y:
3.2 HOW DOES REGRESSION WORKS?
You do not need to understand the mathematics used in regression analysis to develop quality
regression models for data_mining. However, it is helpful to understand a few basic concepts.
The goal of regression analysis is to determine the values of parameters for a function that cause
the function to best fit a set of data observations that you provide. The following equation expresses
38
these relationships in symbols. It shows that regression is the process of estimating the value of a
continuous target (y) as a function (F) of one or more predictors (x1 , x2 , ..., xn), a set of parameters
(θ1 , θ2 , ..., θn), and a measure of error (e).
y = F(x,θ) + e
The process of training a regression model involves finding the best parameter values for the
function that minimize a measure of the error, for example, the sum of squared errors.
3.3 TYPES OF REGRESSION
There are different families of regression functions and different ways of measuring the error.[30]
 Linear Regression
The simplest form of regression to visualize is linear regression with a single predictor. A linear
regression technique can be used if the relationship between x and y can be approximated with a
straight line, as shown in Figure below.[30]
Fig. 3.2 Linear Regression
39
3.3.1 Linear regression is a common Statistical Data Analysis technique. It is used to determine
the extent to which there is a linear relationship between a dependent variable and one or more
independent variables. There are three types of linear regression. These are:
i. Simple linear regression
ii. Multiple linear regression.
iii. Multivariate linear regression.[32]
In simple linear regression a single independent variable is used to predict the value of a
dependent variable.
Or we can say that, Simple regression pertains to one dependent variable (y) and one independent
variable (x):
y= f(x)
In linear regression, the model specification is that the dependent variable, yi is a linear
combination of the parameters. For example, in simple linear regression for modelling n data
points there is one independent variable: xi, and two parameters, B0 and B1:
Straight line: yi = B0 + B1xi + ei, i=1,2,……….n.
In multiple linear regression, there are several independent variables or functions of independent
variables.
Adding a term in xi^2 to the preceding regression gives:
Parabola: yi = B0 + B1xi + B2xi^2 + ei, i = 1,2,……….n

This is still linear regression; although the expression on the right hand side is quadratic in the
independent variable xi, it is linear in the parameters B0, B1, B2.
In both cases, ei is an error term and the subscript I indexes a particular observation.
40
Parametric nonlinear regression models the dependent variable (also called the response) as a
function of a combination of nonlinear parameters and one or more independent variables (called
predictors). The model can be univariate (single response variable) or multivariate (multiple
response variables).
The parameters can take the form of an exponential, trigonometric, power, or any other nonlinear
function. To determine the nonlinear parameter estimates, an iterative algorithm is typically
used.[33]
Y=f (X, B) + e
Where,
B represents nonlinear parameter estimates to be computed and e represents the error terms.
All of the models we have discussed thus far have been linear in the parameters (i.e., linear in the
beta's). For example, polynomial regression was used to model curvature in our data by using
higher-ordered values of the predictors. However, the final regression model was just a linear
combination of higher-ordered predictors.
3.4 LINEAR REGRESSION
A model is linear when each term is either a constant or the product of a parameter and
a predictor variable. A linear equation is constructed by adding the results for each term. This
constrains the equation to just one basic form:[36]
Response = constant + parameter * predictor + ... + parameter * predictor
Y = b o + b1. X1 + b2. X2 + ... + bk Xk
In statistics, a regression equation (or function) is linear when it is linear in the parameters. While
the equation must be linear in the parameters, you can transform the predictor variables in ways
that produce curvature. For instance, you can include a squared variable to produce a U-shaped
curve.
41
Y = b o + b1X1 + b2X12 linear regression equation.
This model is still linear in the parameters even though the predictor variable is squared. You can
also use log and inverse functional forms that are linear in the parameters to produce different
types of curves.
3.4 SOME OTHER TYPES OF REGRESSION
3.4.1 LOGISTIC REGRESSION
3.4.2 Logistic regression models a relationship between predictor variables and a categorical
response variable. For example, we could use logistic regression to model the relationship between
various measurements of a manufactured specimen (such as dimensions and chemical
composition) to predict if a crack greater than 10 mils will occur (a binary variable: either yes or
no). Logistic regression helps us estimate a probability of falling into a certain level of the
categorical response given a set of predictors. We can choose from three types of logistic
regression, depending on the nature of the categorical response variable:
3.4.3 Binary Logistic Regression:

Used when the response is binary (i.e., it has two possible outcomes). The cracking example given
above would utilize binary logistic regression. Other examples of binary responses could include
passing or failing a test, responding yes or no on a survey, and having high or low blood pressure.
3.4.4Nominal Logistic Regression:
Used when there are three or more categories with no natural ordering to the levels. Examples of
nominal responses could include departments at a business (e.g., marketing, sales, HR), type of
search engine used (e.g., Google, Yahoo!, MSN), and color (black, red, blue, orange).
42
3.4.5 Ordinal Logistic Regression:
Used when there are three or more categories with a natural ordering to the levels, but the ranking
of the levels do not necessarily mean the intervals between them are equal. Examples of ordinal
responses could be how students rate the effectiveness of a college course on a scale of 1-5, levels
of flavors for hot wings, and medical condition (e.g., good, stable, serious, critical).[E]
3.4.6 The multiple binary logistic regression model is the following:
π = exp (β0+β1X1+…+βp−1Xp−1)
1+exp (β0+β1X1+…+βp−1Xp−1)
=exp (Xβ)
1+exp (Xβ)
=1
1+exp (−Xβ)
Where here π denotes a probability and not the irrational number 3.14....
 π is the probability that an observation is in a specified category of the binary Y variable,
generally called the "success probability."
 Notice that the model describes the probability of an event happening as a function
of X variables. For instance, it might provide estimates of the probability that an older
person has heart disease.
 With the logistic model, estimates of π from equations like the one above will always be
between 0 and 1. The reasons are:
o The numerator exp(β0+β1X1+…+βp−1Xp−1) must be positive, because it is a power of a
positive value (e).
o The denominator of the model is (1 + numerator), so the answer will always be less than
1.
o With one X variable, the theoretical model for π has an elongated "S" shape (or sigmoidal
shape) with asymptotes at 0 and 1, although in sample estimates we may not see this "S"
shape if the range of the X variable is limite
43
3.5 Formula of applied regression analysis:
Y dependent variable = F(x independent variable, θ) + e.
3.6 Regression Methodology:
Education analysis  primary data set  secondary data set pre ----
evaluation  post evaluation  comparison results -> results plotted
graphs.
We take data analysis for various movie as given below.

Three methods are used: 1. post Analytics method. 2. Post Analytics method.3. Next change
method.
Linear regression based equation of comparison.
 Independent variable β0 to β9.
 Hypothesis. H0 to H11.
Linear regression perform on generate. Therefore.
R education analytics = β0 ≠ H0 + β1 = H1+ β2 ≠ H3+ β4 ≠ H5+ β6 ≠ H6+ β7 ≠ H7+ β8

≠ H8+ Β9 ≠ H9+= β10 ≠ H10 += β11 ≠ H11+ ei….
44
3.7 HYPOTHESIS FOR RESEARCH DATA SET.
The following hypotheses regarding variables impacting one Education analytics:
1. Aptitude skills.
2. Communication skills.
3. Mental ability test MAT.
4. Core subjective skills.
5. Reasoning skills.
6. .presentation skills.
1. The importance of quality in higher education institutions for employability

2. To know the actions required for implementation of Quality Mgmt Improved
3. To determine the success of Total Quality Mgmt actions through various measures.
3.8 Research Variables

To study the influence of affecting six key variables are identified, like given below these
variables are constant variables which affected quality ration on education analytics system.
.1. Commitment of top mgmt

2. Course delivery
3. Campus facilities
4. Courtesy
5. Customer feedback and improvement
3.9 Research Independent variables:

Independent variables are using to perform regression numerical analysis so we take independent
variables for changing mode.
1. Subjective marks.
2. Periodic test marks.
45
3. Assignment marks.
4. Trimester marks.
5. Final exam marks.
3.10 Dependent variables:

For using linear regression analysis we will take dependent variable for each level of research
analysis there are not changeable. They behave like as constant variables. After that we will
show data set analysis which is collected on student basis.
1. Teaching skills.
2. Syllabus pattern.
3. Teaching methodology.
4. Delivery of lecture pedagogy.
5. Pattern of exams.
3.11 Real data set.
Educational Process (EPM): A Erudition Analytics Data Set Data Set. This real data set
collected by board of education department site for analysis of education system based on
machine_erudition approaches along with regression method for using numerical analysis
execution in part of research methodology chapter.
Data Set
Number of
Character- Education data set 100 Area: Computer
Instances:
izeistics:
Attribute
Integer and character-ize Number of Date 2019-04-
Character- 23
value Attributes: Donated 01
izeistics:
46
Classification,
Regression, Clustering,
Associated Missing Number of
N/A 76931
Tasks: Values? Web Hits:
machine_erudition
approaches
3.11 Data collection and Analysis. These are various attributes which is given in
data set. in these attributes we will take data analysis for education quality improvement for
student approach good employability in future aspects. These are 23 attributes which is given in
real data set. Among them 100 records taken form analysis part completion.
1. Institutions-name 50 ns.
2. State 05.
3. location m.p
4. Control Edu board
5. number-of-students 500
6. male:female (ratio) 2:3
7. student:faculty (ratio) 1:25
8. sat-verbal 15
9. sat-math 20
10. expenses 10k.
11. percent-financial-aid 1k
12. number-of-applicants 500
13. percent-admittance 50.
14. percent-enrolled 50
15. academics 05.
16. social 10
17. quality-of-life Average
18. academic-emphasis Pedagogy
19. subjective marks. S1+s2+s3+s4+s5.
20. course pattern. m.pbaord
21. presentation skills. Leaning
22. course contents. Eng medium
Physical
23. teaching methodlogy. presence
47
Chapter 4.
Research methodology
4.1 Simulation and Result Analysis based on linear regression model.

4.1.1 Implementation:
Regression analysis is research technique for generate a numerical analysis for education analytics
Now we will take an data sample for result analysis. Linear regression is the method to generate
numerical values based on various dependent variables and independent variables.
By numerical analysis. it will show by linear regression method for authentication of success
employability for using variables. After this numerical linear analysis shows in education system,
we will improve quality mgmt principles for using regression analysis. it will execute numerical
values for showing employability ratio achieved by students.
The linear regression formulae:-
R= (B1.P1) + (B2.P2) +( B3.P3) +( B4.P4)+ (B5.P5) +------- (Bn.Pn)
Where R= education analytics.

Bn= parameters defined in real data set.
Pn= coefficient of relation of parameter.
We take an data analysis for regression details is given below:
48
4.2 Linear Regression Analysis for employability ratio determined. By using
education process real data set.
Data collection –education process mining real data set EPM 2018.
Core Subjective marks [m] – physics+ chemistry+ maths Percentage( 0.32)
Mental ability test MAT[m] -Reasoning marks. (0.23).
Aptitude marks[m] -Quantative study (0.22)

Class attendance marks[m]- (0.28).
Assignment submission marks[m]- (0.25).
Communication skills [m]- (0.24).
Practical skills marks [m]- (0.32).
Writing skills+ reading and erudition skills [wr]- (0.21). complex [c marks reduce]- (0.1)
4.2.1 NOW APPLY LINEAR TREGRESSION ANALYSIS FOR REVENUE:

Y dependent variable=f (X independent variable, B)
R revenue= (B1.P1) + (B2.P2) + (B3.P3) --- (Bn.Pn.)
Education analytics =Success ratio Log 10 R=0.32+0.23+0.24+0.28+0.25+0.22+0.21+0.32+ (-

0.1).
R=1.97.
Log10 1.97.
Antilog (1.97) = 93.325 percentile.
Success ratio =93.25 percentile out of 100 percentile based on grading

system. Employability success ratio gross = 100.0 percentile
49
Percentage Error= (93.33-104.04)/104.04=10.29%. Minus cut off.
The overall percentile of success ratio for achieve employability based on given various types of
parameters like dependent variable, independent variable and constant variable which all
determined through real data set.
Measures used Mean Standard Rank based

S. Score Deviation on mean
No score
1 Students performance based on assignments 4.82 0.5482 1
2 Student opinion surveys 4.41 0.5133 4
3 Alumni surveys 3.75 1.2972 7
4 Course final exam 4.65 0.4537 2

5 Class average GPA compared with class 4.02 0.5198 5
average grade
6 Standardized tests 3.87 1.3025 6
7 Failure rate for the course 4.58 0.4962 3
50
4.3 real implementation by using machine_erudition approaches.
4.3.1 Implementation by using python language and Py CAM tool:
By using python code we will execute programming of various attributes using proper education
analytics performed for success ratio determined by tool and result of graph shown the presence
of measured values.
By using regression analysis we perform linear regression for numerical performed success ratio
for education analytics for good employability. Now we will execute machine_erudition
approaches by using python code for execute various parameters get value for core subjective
marks, presentation skills and leaning skills and math MAT skills etc. than we showing the
percentile of getting success ratio and failure ratio also determined.
4.4 coding of python:
1. coding of input value determine percentile.

name=input("enter name of the student :")
phy=int(input("enter marks in physics :"))
chem=int(input("enter marks in chemistry :"))
math=int(input("enter marks in maths :"))
bio=int(input("enter marks in biology :"))
eng=int(input("enter marks in english :"))
total_marks=phy+chem+math+bio+eng
total_aggrigate = phy+chem+math
print(name, "total in physics, chemistry, maths is",total_aggrigate)
print()
per=total_aggrigate/3
51
print(name,"your percentage in physics, chemistry, maths is:",per)
total_percentage= total_marks/5
print(name,"your total percentage is :",total_percentage)
print()
if per>65 and total_percentage>=70:
print(name,"your percentage in physics, chemistry, maths is",per,"and your total percentage
is",total_percentage,"you are eligible for science")
print()
if bio>=60:
print(name,"you score",bio,"in biology you can also take biology")
else:
print(name,"you can't take biology because your marks in biology is less than 60")
if 60<=per<=65 and 60<=total_percentage<70:

print(name, "your percentage in physics, chemistry, maths is", per,"and your total percentage
is",total_percentage,"you are eligible for commerce")
print()
if bio>=60:
else:
print(name,"you can't take biology because your marks in bio less than 60")
if total_percentage<60:
print("sorry you are not eligible for science and commerce")
print()
if bio>=60:
print(name,"you score",bio,"in biology you can take biology")
else:
print(name,"try for any other course")
52
2. Regression implemented coding by python.
name=input("enter name of the student :")
phy=int(input("enter marks in physics :"))
chem=int(input("enter marks in chemistry :"))
math=int(input("enter marks in maths :"))
bio=int(input("enter marks in biology :"))
eng=int(input("enter marks in english :"))
total_marks=phy+chem+math+bio+eng
total_aggrigate = phy+chem+math
print(name, "total in physics, chemistry, maths is",total_aggrigate)
print()
print(name,"your percentage in physics, chemistry, maths is:",per)
print()
print(name,"your percentage in physics, chemistry, maths is",per,"and your total percentage
print()
if bio>=60:
else:

print(name, "your percentage in physics, chemistry, maths is", per,"and your total percentage
53
is",total_percentage,"you are eligible for commerce")
print()
if bio>=60:

else:

print()
if bio>=60:

else:
54
3. Coding of python for aptitude, MAT skills
print(name, "total in aptitude,mat,communicationis",total_aggrigate)

print()
print(name,"your percentage in aptitude,mat,communicationis:",per)
print()
print(name,"your percentage in aptitude,mat,communicationis",per,"and your total percentage
print()
if bio>=60:
else:

print(name, "your percentage in aptitude,mat,communicationis", per,"and your total
percentage is",total_percentage,"you are eligible for commerce")
print()
if bio>=60:
else:
print()
if bio>=60:
else:
55
4.5 codes execute on Py CAM TOOL Snap Shot:
56
4.5.1 snap shot of python code execute on Py CAM TOOL
57
4.6 output result taken from Py Cam TOOL.
1. Output
58
2. Output
59
3. Output
60
4. Output
61
Chapter 5.
Simulation and result discussion
5.1 simulation result graph :
In this simulation of result graph shows result by getting python code and tool will get result
by Py Cam Tool. As continue this process we are creating by graph for finalize simulation
work with the help of numerical analysis which is performed by linear regression method.
5.1 graph represent measured values for percentile
education data set core sub marks presentation aptitude MAT communication
education analytics 93 80 75 65 60
100
90
80
70
60
50 passing percentage
40 employabilty ratio
30
20
10
0
2019 2014 2010 2006 2000
5.1 graph
62
passing percentage
2019
2014
2010
2006
2000
5.2 graph
passing percentage
2019
2014
2010
2006
2000
5.3 graph
63
5.2 survey report of IBM Annual report 2018.
education survey
report 2018 2019 2014 2010 2006 2000
passing percentage 94 78 68 62 50
employability ratio 68 60 52 49 40
100
90
80
70
60
50 passing percentage
30
20
10
0
2019 2014 2010 2006 2000
5.4 Graph
64
passing percentage
2019
2014
2010
2006
2000
5.5 Graph
2000
2006

passing percentage
2014
2019
0 20 40 60 80 100
5.6 Graph
65
5.3 graph represent employment ration 2018 report
Employment ration analytics reprt 2018.
employment ratio 2018 58
emloyment ratio 2010 49
percentage ratio of
employability 2018.
60
58
56
54
52
50
48
46
44
employment ratio 2018 emloyment ratio 2010
5.7 graph
66
percentage ratio of
employability 2018.
5.8 graph
These graphs showing various education Analysis report along with regression results and python
code result. in this simulation and result chapter explain in detail various attributes achieve
excellent education employability.
The survey results indicate that the popular measures respondents would take to measure the
success of TQM actions are Students performance based on assignments, Course final exam and
Failure rate for the course.
67
Chapter 6.
Conclusion and findings.

The financial, social, social, innovative changes add to erudition society. The present development
of monetary development can be significantly expanded if India turns out to be super power in
erudition part. A reasonable TQM demonstrate for brilliance is Higher Education Institutes
depends on the accompanying five factors which lead to understudy fulfilment is proposed.
1. Commitment of top administration: Top administration, through their supervision

everything being equal, ought to guarantee that everyone is focused on accomplishing quality
2. Course conveyance: Expert information must be coordinated with master expertise to

transmit that erudition – the intensity to gain erudition must be coordinated with enthusiasm to
transmit it.
3. Campus offices: Utmost consideration is to be appeared in giving superb foundation and

physical offices in the grounds for understudy erudition, co-curricular and additional curricular
exercises.
4. Courtesy: An emotive and uplifting demeanour towards understudies will prompt suitable
erudition condition.
5. student criticism and improvement: Constant input from the understudies prompting consistent
improvement in the process is the way to accomplishing perfection.
68
6. Based on excellent performance in better education analytics students will achieve good
employability ratio improved.
7. On using regression analysis we will determine the percemtile ratio of various skill
measures perform excellent education analytics syatem for total quality man agent
determined.
The advanced education framework should be reinforced which will be fit for sharpening the
framework to achieve all-round, multifaceted identity; to secure administration character-izeistics,
to hone correspondence and relational aptitudes, to procure information of the most recent patterns
in innovation, to have presentation to modern atmosphere and to pick up certainty to confront
changes in the exceedingly focused and consistently evolving world.
69
Chapter 7.
REFERENCES
[1]. Anaya, A. R., and J. G. Boticario. 2009. ―A Data_mining Approach to Reveal Representative
Collaboration Indicators in Open Collaboration Frameworks. In Educational Data_mining 2009:
Proceedings of the 2nd International Conference on Educational Data_mining, edited by T. Barnes,
M. Desmarais, C. Romero, and S. Ventura, 210–219.
[2]. Amershi, S., and C. Conati. 2009. ―Combining Unsupervised and Supervised Classification
to Build User Models for Exploratory Erudition Environments.‖ Journal of Educational
Data_mining 1 (1): 18–71.
[3]. Arnold, K. E. 2010. ―Signals: Applying Academic Analytics. EDUCAUSE Quarterly 33 (1).
[4]. Baker, R. S. J. d., S. M. Gowda, and A. T. Corbett. 2011. ―Automatically Detecting a

Student’s Preparation for Future Erudition: Help Use Is Key. In Proceedings of the 4th
International Conference on Educational Data_mining, edited by M. Pechenizkiy, T. Calders, C.
Conati, S. Ventura, C. Romero, and J. Stamper, 179–188.
[5]. Blikstein, P. 2011. ―Using Erudition Analytics to Assess Students’ Behavior in Open-Ended
Programming Tasks.‖ Proceedings of the First International Conference on Erudition Analytics
and Acquaintance. New York, NY: Association for Computing Machinery, 110–116.
[6]. Jeong, H., and G. Biswas. 2008. ―Mining Student Behavior Models in Erudition-by-
Teaching Environments.‖ In Proceedings of the 1st International Conference on Educational
Data_mining, Montréal, Québec, Canada,127–136.
[7]. Köck, M., and A. Paramythis. 2011. ―Activity Sequence Modeling and Dynamic Clustering
for Personalized E-Erudition. Journal of User Modeling and User-Adapted Interaction 21 (1-2):
51–97.
[8]. Koedinger, K. R., R. Baker, K. Cunningham, A. Skogsholm, B. Leber, and J. Stamper. 2010.
―A Data Repository for the EDM Community: The PSLC DataShop.‖ In Handbook of
70
Educational Data_mining, edited by C. Romero, S. Ventura, M. Pechenizkiy, and R.S.J.d. Baker.
Boca Raton, FL: CRC Press, 43–55.
[9] YiChuan Wang, LeeAnn Kung, Chaochi Ting, “Beyond a Technical Perspective:
Understanding Big Data Capabilities in Health Care”, publications on ResearchGate , 2015.
[10] Baker, R. S. J. D. “Erudition, schooling, and data analytics”. Handbook on innovations in

erudition for states,districts, and schools, Philadelphia, PA: Center on Innovations in Erudition ,
2013, pp. 179–190.
[11] BasU.A, “Five pillars of prescriptive analytics success”s. Analytics-magazine.org, 2013, pp.
8–12.
[12] Ben K. Daniel, “Big Data and analytics in higher education: Opportunities and challenges”,
British journal of educational technology. September , 2015.
[13] Sunil Erevelles (2009) “Combining unsupervised and supervised classification to build user
models for exploratory erudition environments” Journal of Educational Data_mining.Vol.1, No.1,
pp. 18-71.
[14] Farshad Kooti (2005), “Educational Data_mining: a case study” in Proc. Conf. on Artificial
Intelligence in Education Supporting Erudition through Intelligent and Socially Informed
Technology. IOS Press, Amsterdam, The Netherlands, pp. 467-474.
[15] Mostafa Sabbaghi (2006) “Using Feature Selection and Unsupervised Clustering to Identify
Affective Expressions in Educational Games”, in Proc .Of The Intelligent Tutoring Systems
Workshop on Motivational and Affective Issues. pp. 21-28.
[16] Ben K. Daniel (2010), “Discovering Vital Patterns From UST Students Data by Applying
Data_mining Techniques”, in Proc. Int. Conf. On Computer and Automation Engineering, China:
IEEE, 2010, 2,547-551.DOI:10.1109/ICCAE.2010.5451653.
[17] Greenberg and Buxton’s (2011) “Measuring job satisfaction among teachers in Abu Dhabi:
design and testing differences” in Proc.Int. Conf. on NIE, 4th Redesigning Pedagogy.Singapore.
71
[18] Kelderman (2004) , “Detecting Student Misuse of Intelligent Tutoring Systems” in Proc.
Lecture Notes in Computer ScienceVol.3220,531-540.
[19] Siemens.(2009), “The state of Educational Data_mining in 2009:A review and future vision”
Journal of Educational Data_mining, Vol.1,No. 1,pp.3-17.
[20] Waltman 2012), “On Instructional Utility, Statistical Methodology, and the Added Value of
ECD: Lessons Learned from the Special Issue”, Journal of Educational Data_mining.Vol.4,
No.1,pp.224-230.
[21] Zheng Xiang. (2011), “Mining Student Data to Analyze Students’ Performance”
.International Journal of advanced Computer Science and applications.2,6.
[22]. Hyun Jeong “Spring” Han 2009. ―A Data_mining Approach to Reveal Representative
Collaboration Indicators in Open Collaboration Frameworks.‖ In Educational Data_mining 2009:
Proceedings of the 2nd International Conference on Educational Data_mining, edited by T. Barnes,
M. Desmarais, C. Romero, and S. Ventura, 210–219.
[23]. Zhen Xiang 2009. ―Combining Unsupervised and Supervised Classification to Build User
Models for Exploratory Erudition Environments.‖ Journal of Educational Data_mining 1 (1): 18–
71.
[24]. Henry C. Lucas 2010. ―Signals: Applying Academic Analytics. EDUCAUSE Quarterly
33(1).http://www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum
/SignalsApplyingAcademicAnalyti/199385
[25]. Jones and Shao 2011. ―Automatically Detecting a Student’s Preparation for Future
Erudition: Help Use Is Key.‖ In Proceedings of the 4th International Conference on Educational
Data_mining, edited by M. Pechenizkiy, T. Calders, C. Conati, S. Ventura, C. Romero, and J.
Stamper, 179–188
[26] Whilst Turkle, “A New Data_mining Model Adopted for Higher Institutions,” Procedia
Comput. Sci., vol. 65, no. Iccmit, pp. 836–844, 2015.
72
[27] Jones and Shao, “Data_mining Algorithms and their applications in Education Data_mining,”
Int. J., vol. 7782, pp. 50–56, 2014. [22] A. P. K. H. Rashan, Data_mining Applications in the
Education Secto. 2011.
[28] Baxter and Hatt, “Predicting Erudition and Affect from Multimodal Data Streams in
TaskOriented Tutorial Dialogue,” Proc. 7th Int. Conf. Educ. Data Min., no. Edm, pp. 122–129,
2014.
[29] Watanabe, “Erudition Individual Behavior in an Educational Game: A Data-Driven

Approach,” Proc. 7th Int. Conf. Educ. Data Min., no. Edm, pp. 114–121, 2014.
[30] Ferguson, “Hybrid Agent Based Educational Data_mining Model for Student Performance
Improvement,” no. 4, pp. 45–47, 2012.
[31] Johnson, “Predicting academic success from student enrolment data using decision tree
technique,” Int. J. Appl. Inf. Syst., vol. 4, no. 3, pp. 1–6, 2016.
[32] Kasemsap, “Does Education matter? Vocational education and social mobility strategies in
young people of Barcelona and Lisbon. A comparative study.,” ULHT, 2014.
[33] Zaïane,O.,Xin,M. and Han J.(1998), “Discovering Web Access Patterns and Trends by
Applying OLAP and Data_mining Technology on Web Logs &rdquo”, in Proc. of Advances in
digital libraries.pp.19-29.
[34] Raymund and Shimura, M. (1998), “Student Modeling and Machine_erudition”, In Proc. Int.
Conf. on Artificial Intelligence in Education.pp.128-158.
[35] Ingram, (1999-2000), “Using Web Sever Logs in Evaluating Instructional web sites”, Journal
of Educational Technology Systems. Vol.28,No.2.
[36] Ha, S., Bae, S., and Park, S. (2000) “Web mining for distance education” in Proc. Int. Conf.
On Mgmt of Innovation and Technology, IEEE. Pp.715-719.
[37] Zaïane, O. (2001), “Web Usage Mining for a Better Web-Based Erudition Environment”, in
Proc. Int. Conf. on Advanced Technology for Education.Alberta,pp.60-64.
73
[38] Zaïane, O., (2002), “Building a Recommender Agent for e-Erudition Systems.” in Proc. 7th
Int. Conf. on Computers in Education. New Zealand, pp.55–59.
---------------------------------------------------------------------------------------------------------------------
74

CH 1 2 (2) Revised Completed

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 1 2 (2) Revised Completed

Uploaded by

Copyright:

Available Formats

Education Analytics Based on Machine_erudition

BITS ZG628T: Dissertation

Dissertation work carried out at

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE

The foregoing thesis is hereby approved as a creditable study in the area of

information technology carried out and presented in a manner satisfactory to

submitted. It is understood that by this approval the undersigned do not

necessarily endorse or approve any statement made, opinion expressed or

Figure 1.1. Intelligent System Model for Educational analytics

Fig. 3.3 Non-Linear Regression

Fig 5.1 regression graph represent numerical analytics.

Fig5.2 graph represent python code for education analytics.

Fig5.3 graph represent passing percentage yearly basis.

There is pressure in higher educational institutions to provide up institutional effectiveness (C.

1.2. Educational Task

1.2.2 Learner based task.

Active participation of Primary stakeholder to fulfil academic objectives.

1.3 Background of Educational Analytics

1.5 Approaches of data_mining in educational data

Figure 1.1. Intelligent System Model for Educational analytics

1.5.1 Clustering Techniques

Figure.1.2. Example of K means clustering using R

Figure.1.3. Classification as a task..

Application Area Questions Type of Data Needed for

User behavior modeling What do patterns of student Student’s responses (correct,

User profiling What groups do users cluster Student’s responses (correct,

Domain modeling What is the correct level at Student’s responses (correct,

Erudition component analysis Which components are Student’s responses (correct,

Application Area Questions Type of Data Needed for

Adaptation and What next actions can be Varies depending on the

A common definition of machine_erudition is: “A computer program is said to learn from

1.7.2 Predictive analytics

Table 1.2 Example data

1.8 Erudition Analytics Challenges in Education

1.8.1 Data tracking

1.8.2 Evaluation process

1.8.3 Data analysis

1.8.4 Emerging technology

1.8.5 Ethical and privacy issues

1.11 Scope of work

Research on machine_erudition has yielded techniques for acquaintance discovery or data_mining

Regression – Regression is used in predicting values of a dependant variable by estimating the

Classification – Classification is the identification of the category/class to which a value belongs

Open Source Tools:

1.13 Plan of Work

 Chapter 6. Conclusion and findings.

According to Johnson et al. “erudition analytics is an educational application of web analytics

2.2 Trends of educational analytics research during the period 1998-2012.

Table 2.1 Educational analytics trends during the period 1998-2002.

Sison,R.Shimura, M. Classification To discover

Ingram.1999- 2000 Web Mining Collected in web logs Planning

Ha et al. 2000[36] Web Mining On-line user

Za¨ıane O. 2001[37] WebLogMiner WebSIFT To provide a more

Regression Analysis for Education Analytics.

3.1 REGRESSION MODELS

Regression models involve the following variables:

 The unknown parameters, denoted as B, which may represent a scalar or vector.

A regression model relates Y to a function of X & B.

3.2 HOW DOES REGRESSION WORKS?

3.3 TYPES OF REGRESSION

Fig. 3.2 Linear Regression

i. Simple linear regression