Human Activity Recognization

Human Activity Analysis using Machine Learning Classification Techniques
ABSTRACT
In recent times, smart phones are playing a vital role to recognize the human activities and became a well-
known field of research. Detail overview of various research papers on human activity recognition are
discussed in this paper. Artificial Intelligence(AI) models are developed to recognize the activity of the human
from the provided UCI online storehouse. The data chosen is multivariate and we have applied various
machine classification techniques Random Forest, kNN, Neural Network, Logistic Regression, Stochastic
Gradient Descent and Naïve Bayes to analyse the human activity. Besides building AI models, the dimension
of the dataset is reduced through feature selection process.Precision and Recall values were calculated and a
Confusion Matrix for each model was made. Experiment results proved that the Neural Network and logistic
regression provides better accuracy for human activity recognition compared to other classifiers such as k-
nearest neighbor (KNN), SGD , Random Forest and Naïve Bayes though they take higher computational time
and memory resources.
Keyword: Artificial Intelligence(AI) models are developed to recognize the activity of the human from the
provided UCI online storehouse.
Table of Contents
ABSTRACT
LIST OF FIGURES
1
CHAPTER-1
INTRODUCTION
1.1 WHAT IS SOFTWARE.....................................................................................................1
1.2 WHAT IS SOFTWARE DEVELOPMENT LIFE CYCLE……………………………...1
CHAPTER-2
LITERATURE REVIEW
2.1 RELATED WORK…………………..................................................................................5
CHAPTER 3
PROBLEM IDENTIFICATION AND OBJECTIVE
3.1 PROBLEM STATEMENT..................................................................................................8
3.2 PROJECT OBJECTIVE.......................................................................................................8
CHAPTER 4
METHODOLOGY
4.1 METHODOLOGY...............................................................................................................9
4.1.1USING PYTHON TOOL ON STANDALONE MACHINE LEARNING
ENVIRONMENT……………………………………………………………………………...9
4.2 DATA DESCRIPTION………………………………………………………………….10
4.3 EVALUATION CRITERIA USED FOR CLASSIFICATION………………………….11
4.3.1 CONFUSION MATRIX.................................................................................................12
2
4.3.2 ACCURACY AND PRECISION...................................................................................12
4.3.3 RECALL AND F-SQUARE...........................................................................................13
4.3.4 SENSITIVITY, SPECIFICITY AND ROC....................................................................13
4.3.5 SIGNIFICANCE AND ANALYSIS OF ENSEMBLE METHOD IN MACHINE
LEARNING.............................................................................................................................13
4.4 UML DIAGRAMS……………………………………………………………………….14
4.4.1 USE CASE DIAGRAM………………………………………………………………..14
4.4.2 STATE DIAGRAM……………………………………………………………………15
CHAPTER 5
OVERVIEW OF TECHNOLOGIES
5.1 ALGORITHMS USED......................................................................................................17
5.1.1 DECISION TREE INDUCTION………………………………………………………17
5.1.3 ARTIFICIALNEURALNETWORK..............................................................................22
5.1.4 SUPPORT VECTOR MACHINE MODEL...................................................................25
5.1.5 KERNAL FUNCTIONS……………………………………………………………….28
5.2 TENSORFLOW………………………………………………………………………….28
CHAPTER 6
IMPLEMENTATION AND RESULTS
6.1 FRAMEWORK DESIGN………………………………………………………………..31
3
6.2 CODING AND TESTING……………………………………………………………….34
CHAPTER 7
CONCLUSION………………………………………………………….40
REFERENCE...............................................................................................42
Existing System :
In controlled conditions, the bulk of known techniques shows human activities as a collection of
picture elements captured in surveillance images or photos and various classification patterns are used to
identify the main activity tag. The datasets considered are all generic. These limits, on the alternative hand,
create an unrealistic surroundings that doesn't account for real-global situations and fails to fulfil the
necessities for an appropriate human hobby dataset. Examine how human beings pass approximately their each
day lives, consisting of analyzing and eating. We can hire tool to gather statistics from checking out and
training. After accumulating the statistics, it'll transmit it to the document output
Proposed System :
In this proposed system, we present the convolution neural network method for video action recognition. The
input footage will be recorded using the webcam. The video sequence's number of frames is changed. The
precise frame segment is subsequently identified by using CNN (Convolution Neural Network) approach. The
maximum criteria weights from the extracting features frames are subsequently obtained using a convolutional
neural network. Ultimately, the activity in the video will be identified, and a labelling (activity name) will be
assigned
CHAPTER 1 INTRODUCTION
1.1 Introduction to the project
To recognize, detect and classify the activity of the human many applications have been developed with
human centered monitoring and researchers have proposed different solutions.Human activity recognition is
one of the important technology to monitor the dynamism of a person and this can be attained with the support
of Machine learning techniques. Threshold-based algorithm is simpler and faster which is often applied to
4
recognize the human activity. But Machine algorithm provides the reliable result. Numerous sensors have been
deployed to observe the human dynamic characteristics. This paper intends to measure the effectiveness of
various machine learning classification algorithms. Low cost and commercial smartphones are used as sensors
to record the activities of the human. Different studies have been conducted in the intelligent environment to
observe the activities of the human. We developed AI Models for “Human Activity Recognition using
smartphones Data set” from UCI online storehouse. The motivation behind our work is to implement machine
learning algorithms in real world datasets so that their accuracy can be studied and effective conclusions can
be drawn.
With smartphones becoming a major part of daily human life, extensive research has been going on to widen
the capability of applications that can be processed using a smartphone. These include many important
applications on smartphone such as health monitoring, fall detection, human survey systems and home
automation, etc. The most important technology involved in all such applications is that of Smartphone-based
Human Activity Recognition (HAR) systems. The evolution of HAR systems is increasingly creating demand
in health-care domain, physiotherapist assistance, and cognitive impairment support, etc. The applications like
human survey systems and location indicator have wider applications too. Extensive training process forms the
crux of the system and is a necessary procedure whenever a new activity is added to the system for
recognition. Fine tuning and training of algorithm parameters are required to be done to implement the system
on different devices with various built-in sensing devices. However, the process of labeling a training dataset
is a time-consuming procedure. Due to negligible installation cost and robustness, smart phones are
increasingly becoming the main platform for human activity recognition. In this paper, we have focused on
applying different machine learning algorithms along with appropriate feature selection to find the best fit for
HAR systems in terms of efficiency and accuracy.
Objective :
Human activity recognition has been studied extensively over the past years and researchers have proposed
different solutions for the problem. The existing approaches typically use vision sensor, inertial sensor, and the
mixture of both with application of Machine learning and threshold-base algorithms. Machine learning has led
to more accurate, reliable results, whereas threshold-based algorithms have been found to be faster/simpler.
Cameras have been used to capture body posture along with multiple accelerometers and gyroscopes attached
to different body positions have been used as common solutions. Often vision and inertial sensors have also
been used for the purpose. Data processing is the next essential part of using these ML algorithms. The results
products highly vary with the quality of the input features fed to the algorithm. There have been previous
works which were focused on synthesizing the most useful features from the time series data set. Analyzing
the signals obtained in both time and frequency domain has been found as a commonly used approach. These
5
machine learning problems, however, have often been found as extensively time-consuming and labor
expensive.
Algorithms used :
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It
can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and
based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
The below diagram explains the working of the Random Forest algorithm:
6
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class of the dataset, it is possible that some
decision trees may predict the correct output, while others may not. But together, all the trees predict the
correct output. Therefore, below are two assumptions for a better Random forest classifier:
There should be some actual values in the feature variable of the dataset so that the classifier can predict
accurate results rather than a guessed result.
The predictions from each tree must have very low correlations.
Why use Random Forest?
Below are some points that explain why we should use the Random Forest algorithm:
It takes less training time as compared to other algorithms.
It predicts output with high accuracy, even for the large dataset it runs efficiently.
It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?
Random Forest works in two-phase first is to create the random forest by combining N decision tree, and
second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the
category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
7
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to the
Random forest classifier. The dataset is divided into subsets and given to each decision tree. During the
training phase, each decision tree produces a prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts the final decision. Consider the below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.
Land Use: We can identify the areas of similar land use by this algorithm.
Marketing: Marketing trends can be identified using this algorithm.
8
Advantages of Random Forest
Random Forest is capable of performing both Classification and Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
Although random forest can be used for both classification and regression tasks, it is not more suitable for
Regression tasks.
Artificial Neural Network Tutorial
Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our
Artificial Neural Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These
neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network.
In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing
map, Building blocks, unsupervised learning, Genetic algorithm, etc.
9
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.
Backward Skip 10sPlay VideoForward Skip 10s
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
10
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:
Biological Neural Network Artificial Neural Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to

mimic the network of neurons makes up a human brain so that computers will have an option
to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
11
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to

understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.
Artificial Neural Network primarily consists of three layers:
Input Layer:
12
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the

output. Activation functions choose whether a node should fire or not. Only those who are
fired make it to the output layer. There are distinctive activation functions available that can
be applied upon the sort of task we are performing.
Advantages of Artificial Neural Network (ANN)
Parallel processing capability:
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Storing data on the entire network:
13
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
Capability to work with incomplete knowledge:
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
Assurance of proper network structure:
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
Unrecognized behavior of the network:
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
14
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
Difficulty of showing the issue to the network:
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The duration of the network is unknown:
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Science artificial neural networks that have steeped into the world in the mid-20th century
are exponentially developing. In the present time, we have investigated the pros of artificial
neural networks and the issues encountered in the course of their utilization. It should not be
overlooked that the cons of ANN networks, which are a flourishing science branch, are
eliminated individually, and their pros are increasing day by day. It means that artificial
neural networks will turn into an irreplaceable part of our lives progressively important.
How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.
15
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the
Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each
of them in details:
16
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:
F(x) = (1/1 + exp(-????x))
Where ???? is considered the Steepness parameter.
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output

layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
17
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.
WHAT IS SOFTWARE?
Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s
elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or
binary form for all major platforms from the Python Web site, https://www.python.org/, and
may be freely distributed. The same site also contains distributions of and pointers to many
free third party Python modules, programs and tools, and additional documentation.
The Python interpreter is easily extended with new functions and data types implemented in
C or C++ (or other languages callable from C). Python is also suitable as an extension
language for customizable applications.
This tutorial introduces the reader informally to the basic concepts and features of the Python
language and system. It helps to have a Python interpreter handy for hands-on experience, but
all examples are self-contained, so the tutorial can be read off-line as well.
For a description of standard objects and modules, see The Python Standard Library. The
Python Language Reference gives a more formal definition of the language. To write
extensions in C or C++, read Extending and Embedding the Python Interpreter and Python/C
API Reference Manual. There are also several books covering Python in depth.
18
This tutorial does not attempt to be comprehensive and cover every single feature, or even
every commonly used feature. Instead, it introduces many of Python’s most noteworthy
features, and will give you a good idea of the language’s flavor and style. After reading it,
you will be able to read and write Python modules and programs, and you will be ready to
learn more about the various Python library modules described in The Python Standard
Library.
The Python Standard Library
While The Python Language Reference describes the exact syntax and semantics of the
Python language, this library reference manual describes the standard library that is
distributed with Python. It also describes some of the optional components that are commonly
included in Python distributions.
Python’s standard library is very extensive, offering a wide range of facilities as indicated by
the long table of contents listed below. The library contains built-in modules (written in C)
that provide access to system functionality such as file I/O that would otherwise be
inaccessible to Python programmers, as well as modules written in Python that provide
standardized solutions for many problems that occur in everyday programming. Some of
these modules are explicitly designed to encourage and enhance the portability of Python
programs by abstracting away platform-specifics into platform-neutral APIs.
The Python installers for the Windows platform usually include the entire standard library
and often also include many additional components. For Unix-like operating systems Python
is normally provided as a collection of packages, so it may be necessary to use the packaging
tools provided with the operating system to obtain some or all of the optional components
19
Dealing with Bugs
Python is a mature programming language which has established a reputation for stability. In
order to maintain this reputation, the developers would like to know of any deficiencies you
find in Python.
It can be sometimes faster to fix bugs yourself and contribute patches to Python as it
streamlines the process and involves less people. Learn how to contribute.
Documentation bugs
If you find a bug in this documentation or would like to propose an improvement, please
submit a bug report on the tracker. If you have a suggestion how to fix it, include that as well.
If you’re short on time, you can also email documentation bug reports to docs@python.org
(behavioral bugs can be sent to python-list@python.org). ‘docs@’ is a mailing list run by
volunteers; your request will be noticed, though it may take a while to be processed.
See also
Documentation bugs on the Python issue tracker
Using the Python issue tracker
Bug reports for Python itself should be submitted via the Python Bug Tracker
(https://bugs.python.org/). The bug tracker offers a Web form which allows pertinent
information to be entered and submitted to the developers.
The first step in filing a report is to determine whether the problem has already been reported.
The advantage in doing so, aside from saving the developers time, is that you learn what has
been done to fix it; it may be that the problem has already been fixed for the next release, or
additional information is needed (in which case you are welcome to provide it if you can!).
To do this, search the bug database using the search box on the top of the page.
20
If the problem you’re reporting is not already in the bug tracker, go back to the Python Bug
Tracker and log in. If you don’t already have a tracker account, select the “Register” link or,
if you use OpenID, one of the OpenID provider logos in the sidebar. It is not possible to
submit a bug report anonymously.
Being now logged in, you can submit a bug. Select the “Create New” link in the sidebar to
open the bug reporting form.
The submission form has a number of fields. For the “Title” field, enter a very short
description of the problem; less than ten words is good. In the “Type” field, select the type of
your problem; also select the “Component” and “Versions” to which the bug relates.
In the “Comment” field, describe the problem in detail, including what you expected to
happen and what did happen. Be sure to include whether any extension modules were
involved, and what hardware and software platform you were using (including version
information as appropriate).
Each bug report will be assigned to a developer who will determine what needs to be done to
correct the problem. You will receive an update each time action is taken on the bug.
Introduction to Data Mining
Data mining integrates approaches and techniques from various disciplines such as machine
learning, statistics, artificial intelligence, neural networks, database management, data
warehousing, data visualization, spatial data analysis, probability graph theory etc. In short,
data mining is a multi-disciplinary field.
21
22
Statistics
Statistics includes a number of methods to analyze numerical data in large quantities. Different
statistical tools used in data mining are regression analysis, cluster analysis, correlation
analysis and Bayesian network. Statistical models are usually built from a training data set.
Correlation analysis identifies the correlation of variables to each other. Bayesian network is a
directed graph that represents casual relationship among data found out using the Bayesian
probability theorem. Given below is a simple Bayesian network where the nodes represent
variables whereas edges represent the relationship between the nodes.
Machine Learning
Machine learning is the collection of methods, principles and algorithms that enables learning and
prediction on the basis of past data. Machine learning is used to build new models and to search for a
best model matching the test data. Machine learning methods normally use heuristics while searching
for the model. Data mining uses a number of machine learning methods including inductive concept
learning, conceptual clustering and decision tree induction. A decision tree is a classification tree that
decides the class of an object by following the path from the root to a leaf node. Given below is a
simple decision tree that is used for weather forecasting.
23
Database Oriented Techniques
Advancements in database and data warehouse implementation helps data mining in a number of
ways. Database oriented techniques are used mainly to develop characteristics of the available data.
Iterative database scanning for frequent item sets, attribute focusing, and attribute oriented induction
are some of the database oriented techniques widely used in data mining. The iterative database
scanning searches for frequent item sets in a database. Attribute oriented induction generalizes low
level data into high level concepts using conceptual hierarchies.
Neural Networks
A neural network is a set of connected nodes called neurons. A neuron is a computing device that
computes some requirement of its inputs and the inputs can even be the outputs of other neurons. A
neural network can be trained to find the relationship between input attributes and output attribute by
adjusting the connections and the parameters of the nodes.
Data Visualization
The information extracted from large volumes of data should be presented well to the end user and
data visualization techniques make this possible. Data is transformed into different visual objects
such as dots, lines, shapes etc and displayed in a two or three dimensional space. Data visualization is
an effective way to identify trends, patterns, correlations and outliers from large amounts of data.
24
Summary
Data mining combines different techniques from various disciplines such as machine learning,
statistics, database management, data visualization etc. These methods can be combined to deal with
complex problems or to get alternative solutions. Normally data mining system employs one or more
techniques to handle different kinds of data, different data mining tasks, different application areas
and different data requirements.
Patterns in Data Mining
1. Association
The items or objects in relational databases, transactional databases or any other information
repositories are considered, while finding associations or correlations.
2. Classification
 The goal of classification is to construct a model with the help of historical data that can accurately
predict the value.
 It maps the data into the predefined groups or classes and searches for the new patterns.
For example:
To predict weather on a particular day will be categorized into - sunny, rainy, or cloudy.
3. Regression
 Regression creates predictive models. Regression analysis is used to make predictions based on
existing data by applying formulas.
 Regression is very useful for finding (or predicting) the information on the basis of previously
known information.
4. Cluster analysis
 It is a process of portioning a set of data into a set of meaningful subclass, called as cluster.
 It is used to place the data elements into the related groups without advanced knowledge of the
group definitions.
25
5. Forecasting
Forecasting is concerned with the discovery of knowledge or information patterns in data that can
lead to reasonable predictions about the future.
Technologies used in data mining
Several techniques used in the development of data mining methods. Some of them are mentioned
below:
1. Statistics:
 It uses the mathematical analysis to express representations, model and summarize empirical data or
real world observations.
 Statistical analysis involves the collection of methods, applicable to large amount of data to
conclude and report the trend.
2. Machine learning
 Arthur Samuel defined machine learning as a field of study that gives computers the ability to learn
without being programmed.
 When the new data is entered in the computer, algorithms help the data to grow or change due to
machine learning.
 In machine learning, an algorithm is constructed to predict the data from the available
database (Predictive analysis).
 It is related to computational statistics.
The four types of machine learning are:
1. Supervised learning
 It is based on the classification.
 It is also called as inductive learning. In this method, the desired outputs are included in the training
dataset.
2. Unsupervised learning
Unsupervised learning is based on clustering. Clusters are formed on the basis of similarity measures
and desired outputs are not included in the training dataset.
26
3. Semi-supervised learning
Semi-supervised learning includes some desired outputs to the training dataset to generate the
appropriate functions. This method generally avoids the large number of labeled examples (i.e.
desired outputs) .
4. Active learning
 Active learning is a powerful approach in analyzing the data efficiently.
 The algorithm is designed in such a way that, the desired output should be decided by the algorithm
itself (the user plays important role in this type).
3. Information retrieval
Information deals with uncertain representations of the semantics of objects (text, images).
For example: Finding relevant information from a large document.
TEXT CLASSIFICATION
Several prediction approaches are being employed in Software engineering discipline like
correction cost prediction test effort prediction, quality prediction, security prediction,
reusability prediction, fault prediction and energy prediction. However, the aforementioned
approaches have only reached initial phase and further research is required to be steered to
attain at the quantity of robustness. Software bug prediction is out and away the foremost
prevalent research field in these approaches for prediction and lots of research centers have
initiated new projects during this field. Software metrics and fault data are accustomed
predict the models that are buggy.
Bugs are often introduced in any level of software development they'll come from design or
coding, or external environment Depending upon the character of faults, some faults in
software can cause anything from a straightforward miscalculation to a full system collapse.
per a survey detection and removal of software’s faults cover around 50% of the general
project budget therefore, finding and fixing faults as early as possible may save many
software development cost. It helps in identifying the fault proneness in software modules
while testing phase using some underlying properties regarding the code. Consequently, helps
in allocating testing resources efficiently and thoroughly.
27
The existence of software bugs affects mainly on software reliability, quality and
maintenance cost. For achivement of the bug-free software is additional task, even when the
software is applied carefully to data because most of time the hidden bugs are for additional
developing software bug prediction model which could predict the buggy modules within the
primary phase may be a true challenge in software engineering.
Software bug prediction is additionally a necessary activity in software development. this
may be because predicting the buggy modules before software deployment achieves the user
satisfaction, improves the software performance. Moreover, prediction of the software bug
firstly improves software adaptation to different environments and increases the resource
utilization.
Various techniques are proposed to tackle Software Bug Prediction (SBP) problem. the
foremost known techniques are Machine Learning (ML) techniques. The Machine Learning
techniques are extensively utilized in SBP to predict the buggy modules supported previous
bug data, essential data metrics and different software computing techniques.
In this, four supervised ML learning classifiers are wont to evaluate the ML capabilities in
SBP. The study discussed Naïve Bayes (NB) classifier, Support Vector Machine(SVM),
Decision Tree (DT) classifier and Artificial Neural Networks (ANNs) classifier.The
discussed ML classifiers are applied to the dataset. In addition to, this we compare between
NB classifier, DT classifier and ANNs classifier.
The comparison supported different evaluation measures like accuracy, precision, recall, F-
measures and also the ROC curves of the classifiers. Abundant research has been tired the
past to construct and evaluate the fault prediction models to predict the faulty modules within
the software systems. Many statistical and machine learning techniques had been proposed
and studied to predict the faults in an exceedingly software. These techniques include
Logistic Regression, Discriminant Analysis, Classification Trees, Case-based Reasoning,
Bagging, Boosting, and SVM.
Most of the earlier software bug prediction studies have predicted the bug proneness of the
software sub-modules in terms of buggy also as non-buggy. There are quite a lot of disputes
with this binary class classification. whether or not the performance of the anticipated models
28
were reported excellent, the interpretation of the finding is hard to put into the right usability
context, idea and identification of the amount of bugs per module. The subsequent insights of
measures of assesment confirmed the prescient exactness and consistency of the
manufactured bug expectation models. Software bug forecast plans to anticipate the bug
inclined programming modules by utilizing some hidden properties of the merchandise
framework. it's planned to enable them to streamline the merchandise quality confirmation
endeavours and furthermore intended to advance the merchandise fault distinguishing proof
and expulsion endeavours and value. Commonly, programming issue forecast is performed
via preparing the expectation models utilizing venture properties loaded down with fault data
for a known task and through this manner, utilizing the prepared model to foresee
shortcomings for the obscure project. The proposed framework initially recognized the fault
dataset qualities of the given programming and thus the seeable of these distinguished
attributes the suggestion with relation to the appropriateness of fault expectation procedure is
given. Thus, if there isn't any critical change within the fault attributes for a given discharge
at that point there won't be change within the proposal with relation to fault forecast
strategies. Yet, for true, there is a stimulating change within the bug datasets qualities then
the proposal with relation to the bug expectation procedures is modified.
As the world clothed to be increasingly reliant on innovation with each passing day,
programming naturally become an important organ for improvement. Since programming is
required wherever today, its advancement is an exceedingly shrewd and exact process,
including different advances. called programming advancement life cycle, these means
incorporate arranging, examination, outline, improvement and usage, testing and
maintenance. These means plow ahead to form the perfect programming for purchasers. It's
obvious that innovation is quickening at a quick pace and other people are ending up assist
subject thereto for every reason.
The first chapter provides a basic introduction of Bug prediction techniques and the way
bugs in software modules affect software industry. The second chapter gives an account of
the review of machine learning techniques and various technologies using which prediction
29
are done. The third chapter states the matter statement and objectives. The fourth chapter
deals with the tools and methodology employed in the work.
1.2 WHAT IS SOFTWARE DEVELOPMENT LIFE CYCLE(SDLC)
The software development life cycle can be a framework that defines the tasks performed at each step
within the software development process. SDLC could be a structure followed by a development
team within the software organization. It consists of an in depth plan describing the way to develop,
maintain and replace specific software. The life cycle defines a technique for improving the standard
of software and therefore the overall development process. The software development life cycle is
additionally called the software development process. SDLC consists of following activities:
1.Planning: the foremost important parts of software development, requirement gathering or

requirement analysis are usually done by the foremost skilled and experienced software engineers
within the organization. After the necessities are gathered from the client, a scope document is made
during which the scope of the project is decided and documented.
2.Implementation: The software engineers start writing the code in keeping with the client's
requirements.
3.Testing: this is often the method of finding defects or bugs within the created software.
4.Documentation: Every step within the project is documented for future reference and for the
development of the software within the development process. the planning documentation may
include writing the appliance programming interface (API).
5.Deployment and maintenance: The software is deployed after it's been approved for release.
6.Maintaining: Software maintenance is completed for future reference. Software improvement and
new requirements (change requests) can take longer than the time needed to form the initial
development of the software.
SDLC is nothing but Software Development Life Cycle. It is a standard which is used by software
industry to develop good software.
SDLC (Spiral Model):
30
Fig 1: Spiral Model
Stages of SDLC:
 Requirement Gathering and Analysis
 Designing
 Coding
 Testing
 Deployment
Requirements Definition Stage and Analysis:

The requirements gathering process takes as its input the goals identified in the high-level
requirements section of the project plan. Each goal will be refined into a set of one or more
requirements. These requirements define the major functions of the intended application, define
operational data areas and reference data areas, and define the initial data entities. Major functions
include critical processes to be managed, as well as mission critical inputs, outputs and reports. A
user class hierarchy is developed and associated with these major functions, data areas, and data
entities. Each of these definitions is termed a Requirement. Requirements are identified by unique
requirement identifiers and, at minimum, contain a requirement title and textual description.
31
Fig 2: Requirement Stage
These requirements are fully described in the primary deliverables for this stage: the
Requirements Document and the Requirements Traceability Matrix (RTM). the requirements
document contains complete descriptions of each requirement, including diagrams and references to
external documents as necessary. Note that detailed listings of database tables and fields are not
included in the requirements document. The title of each requirement is also placed into the first
version of the RTM, along with the title of each goal from the project plan. The purpose of the RTM
is to show that the product components developed during each stage of the software development
lifecycle are formally connected to the components developed in prior stages.
In the requirements stage, the RTM consists of a list of high-level requirements, or goals, by
title, with a listing of associated requirements for each goal, listed by requirement title. In this
hierarchical listing, the RTM shows that each requirement developed during this stage is formally
linked to a specific product goal. In this format, each requirement can be traced to a specific product
goal, hence the term requirements traceability. The outputs of the requirements definition stage
include the requirements document, the RTM, and an updated project plan.
Design Stage:
The design stage takes as its initial input the requirements identified in the approved
requirements document. For each requirement, a set of one or more design elements will be produced
as a result of interviews, workshops, and/or prototype efforts. Design elements describe the desired
software features in detail, and generally include functional hierarchy diagrams, screen layout
diagrams, tables of business rules, business process diagrams, pseudo code, and a complete entity-
relationship diagram with a full data dictionary. These design elements are intended to describe the
32
software in sufficient detail that skilled programmers may develop the software with minimal
additional input.
Fig 3: Design Stage

When the design document is finalized and accepted, the RTM is updated to show that each
design element is formally associated with a specific requirement. The outputs of the design stage are
the design document, an updated RTM, and an updated project plan.
Development Stage:
The development stage takes as its primary input the design elements described in the
approved design document. For each design element, a set of one or more software artifacts will be
produced. Software artifacts include but are not limited to menus, dialogs, data management forms,
data reporting formats, and specialized procedures and functions. Appropriate test cases will be
developed for each set of functionally related software artifacts, and an online help system will be
developed to guide users in their interactions with the software.
33
Fig 4: Development Stage
The RTM will be updated to show that each developed artefact is linked to a specific design element,
and that each developed artefact has one or more corresponding test case items. At this point, the
RTM is in its final configuration. The outputs of the development stage include a fully functional set
of software that satisfies the requirements and design elements previously documented, an online
help system that describes the operation of the software, an implementation map that identifies the
primary code entry points for all major system functions, a test plan that describes the test cases to be
used to validate the correctness and completeness of the software, an updated RTM, and an updated
project plan.
Integration & Test Stage:
During the integration and test stage, the software artefacts, online help, and test data are
migrated from the development environment to a separate test environment. At this point, all test
cases are run to verify the correctness and completeness of the software. Successful execution of the
test suite confirms a robust and complete migration capability.
During this stage, reference data is finalized for production use and production users are
identified and linked to their appropriate roles. The final reference data (or links to reference data
source files) and production user list are compiled into the Production Initiation Plan.
34
Fig 5: Integration and Test stage
The outputs of the integration and test stage include an integrated set of software, an online
help system, an implementation map, a production initiation plan that describes reference data and
production users, an acceptance plan which contains the final suite of test cases, and an updated
project plan.
Installation & Acceptance Stage
During the installation and acceptance stage, the software artifacts, online help, and initial
production data are loaded onto the production server. At this point, all test cases are run to verify the
correctness and completeness of the software. Successful execution of the test suite is a prerequisite
to acceptance of the software by the customer.
After customer personnel have verified that the initial production data load is correct and the
test suite has been executed with satisfactory results, the customer formally accepts the delivery of
the software.
35
Fig 6: Installation and Acceptance Stage
The primary outputs of the installation and acceptance stage include a production application,
a completed acceptance test suite, and a memorandum of customer acceptance of the software.
Finally, the PDR enters the last of the actual labour data into the project schedule and locks the
project as a permanent project record. At this point the PDR "locks" the project by archiving all
software items, the implementation map, the source code, and the documentation for future reference.
2.4 SYSTEM ARCHITECTURE
Architecture Flow:
Below architecture diagram represents mainly flow of request from the users to database
through servers. In this scenario overall system is designed in three tiers separately using three layers
called presentation layer, business layer, data link layer. This project was developed using 3-tier
architecture.
3-Tier Architecture:
The three-tier software architecture (three layer architecture) emerged in the 1990s to
overcome the limitations of the two-tier architecture. The third tier (middle tier server) is between the
user interface (client) and the data management (server) components. This middle tier provides
process management where business logic and rules are executed and can accommodate hundreds of
users (as compared to only 100 users with the two tier architecture) by providing functions such as
queuing, application execution, and database staging.
The three tier architecture is used when an effective distributed client/server design is needed
that provides (when compared to the two tier) increased performance, flexibility, maintainability,
36
reusability, and scalability, while hiding the complexity of distributed processing from the user.
These characteristics have made three layer architectures a popular choice for Internet applications
and net-centric information systems..
Advantages of Three-Tier:
 Separates functionality from presentation.
 Clear separation - better understanding.
 Changes limited to well define components.
CHAPTER-2
LITERATURE REVIEW
2.1 RELATED WORK
Anguita et al. (2012) demonstrated how activities of human are recognized by exploiting dissimilar sensors so
as to give adjustment to exogenous registering assets. At the point when these sensors are joined to the
subject's body, they license nonstop checking of various physiological signs. He has presented a framework to
recognize human physical activities using inertial navigation system. As these cell phones are constrained as
far as vitality and processing power, equipment benevolent methodology is proposed for multiclass
characterization. This strategy adjusts the standard Support Vector Machine (SVM) and endeavors fixed-point
number juggling for computational cost decrease. An examination with the customary SVM demonstrates a
noteworthy improvement as far as computational expenses while keeping up comparative precision, which can
add to grow increasingly maintainable frameworks for AmI.[1]
Shahroudy et al. (2016) examined late methodologies top to bottom based human movement examination
accomplished exceptional execution and demonstrated the viability of 3D portrayal for arrangement of activity
classes. As of now accessible profundity based and RGB+D-based activity acknowledgment benchmarks have
various restrictions, including the absence of preparing tests, unmistakable class names, camera perspectives
and assortment of subjects. In this paper presented a vast scale dataset for human activity acknowledgment of
56,000 video tests and 4 million edges, gathered from 40 particular subjects. It contains sixty diverse activity
classes including day by day, shared, and wellbeing related activities. Moreover, another repetitive neural
system structure is proposed to demonstrate the long haul transient connection of the highlights for all the
body parts, and use them for proper activity characterization. Finally demonstrated the benefits of applying
profound learning strategies over cutting edge hand that includes cross-subject and crossassessment criteria for
the chosen dataset. [2]
37
Oyelade et al. (2010) considered the capacity to screen the advancement of understudies' scholarly execution is
a basic issue to the scholastic network of higher learning. A framework for breaking down understudy's
outcomes dependent on group examination and utilizations standard measurable calculations to mastermind
their scores information as indicated by the dimension of their execution is depicted.Inorder to assess the
academic performance of the students, cluster analysis and standard statistical algorithms are used to the
considered student dataset containing student score of one particular semester. The number of clusters to be
obtained is given as input for the chosen random samples The arithmetic mean of each cluster is obtained and
the process is repeated until there is no change in the data points. The performance is evaluated using
deterministic model for the chosen semester with offered nine courses and applied fuzzy model to predict their
academic performance. [3]
Vesantoet al. (1999) contemplated oneself sorting out guide (SOM) is a productive apparatus for
representation of multidimensional numerical information. In this paper, a review and order of both old and
new strategies for the perception of SOM is displayed. The reason for existing is to give a thought of what sort
of data can be gained from various introductions and how the SOM can best be used in exploratory
information perception. The majority of the displayed techniques can likewise be connected in the broader
instance of first making a vector quantization and afterward a vector projection [4]
Viola et al. (2010) presented an idea for programmed concentrating on highlights inside a volumetric
informational index. The client chooses a center, i.e., object of enthusiasm, from a lot of pre-characterized
highlights. The framework proposed here naturally decides the most expressive view on this component. A
trademark perspective is evaluated by a novel data theoretic system which depends on the shared data
measure. Perspectives change easily by changing the concentration starting with one element then onto the
next one. This instrument is constrained by changes in the significance dissemination among highlights in the
volume. The most astounding significance is allotted to the element in core interest. Aside from perspective
determination, the centering component additionally directs visual accentuation by appointing an outwardly
increasingly unmistakable portrayal. To permit a reasonable view on highlights that are regularly impeded by
different pieces of the volume, the centering for instance consolidates remove sees.[5]
Jiang et al. (2009) considered learning an ideal Bayesian system classifier is a NP- difficult issue, learning-
improved innocent Bayes has pulled in much consideration from scientists. In this paper, improved
calculations and proposed a covered up innocent Bayes (HNB). In HNB, a shrouded parent is made for each
characteristic which joins the impacts from every other property. HNB is tested as far as characterization
exactness, utilizing the 36 UCI informational collections chosen by Weka, and contrast it with gullible Bayes
(NB), specific Bayesian classifiers (SBC), innocent Bayes tree, tree-expanded guileless Bayes, and found the
middle value of one-reliance estimators (AODE). The exploratory outcomes demonstrate that HNB essentially
38
beats NB, SBC, NBTree, TAN, and AODE. In numerous information mining applications, a precise class
likelihood estimation and positioning are likewise attractive. [6]
Vincenzi et al. (2011) presented a demonstrating structure that joins AI strategies and Geographic Information
Systems to help the administration of an essential aquaculture species, Manila shellfish (Rudi tapes
philippinarum). They utilizedVenice tidal pond (Italy), the primary site in Europe for the generation of R.
Philippinarum, to represent the capability of this demonstrating approach. To research the relationship between
the yieldofR. philippinarum and a lot of ecological variables, utilized a Random Forest (RF) calculation. The
RF demonstrate was tuned with an expansive informational index (n = 1698) and approved by an autonomous
informational collection (n = 841). By and large, the model gave great expectations of site-explicit yields and
the examination of minor impact of indicators demonstrated considerable understanding among the displayed
reactions and accessible natural learning for R. philippinarum. [7]
Jie Hu et al. (2013) proposed the method to recognize the human facial expressions from the recorded videos.
The input is bounding boxes detected from sequence of images represented in three layers and the chosen
Weizmann dataset contains nine human actions of ten different peoples. The model built contain the spatial
and temporal part to capture the typical characteristics. Random forest method is applied on the Weizmann,
UCF and facial expression datasets to recognize the human behavior and obtained the improved performance
on the Weizmann dataset.[8]
Aggarwal et al. (2011).Discussed Space-time volume approaches and sequential approaches to recognize the
activities form the input images.[9] The greater part of these applications require a robotized acknowledgment
of abnormal state exercises, made out of various straightforward (or nuclear) activities of people. This article
gives a point by point diagram of different best in class investigate papers on human action acknowledgment.
A methodology based scientific classification is picked that analyzes the points of interest and restrictions of
each methodology.Eneaer al.(2016) discussed the usage of Low cost RGB-D sensors in surveillance, human
computer interaction and computed features from the skeleton joints. Histogram was developed from the
skeleton features which was denoted as various key poses.The current methodologies concentrated on minor
AI estimates which don't give the best outcomes when contrasted with the presently accessible arrangements.
They are inclined to overfitting, under fitting, on account of choice trees in Random Forest.[10]
Qingzhong et al (2018) proposed a method to identify human activity using smartphone sensor data based on
two categories motion based and phone movement based. After extracting features machine learning
classification models were implemented to analyse the human activity. Finally performance is analysed using
Convolutional Neural Network [11]. Ms.S.Roobini et al (2019) used deep learning approach to identify the
human motion. And also compared Convolutional Neural Network with Long-Short Term Memory and
39
Recurrent Neural Network with Long-Short Term Memory, finally proved that Recurrent Neural Network
with Long Short Term Memory (RNNLSTM) provides better accuracy with lower mean absolute percentage
error. Hence they suggested RNNLSTM can be used to reduce the human loss of lives in recognizing the
activities of human in real world[12]. Human activity recognition is an embryonic research field in smart
environments. The performance of the human activities can be identified by extracting the features from the
raw data sensors after preprocessing and segmentation steps. Feature selection plays an important role in
activity recognition, hence appropriate features has to be chosen, dimensionality reductions has to be applied if
necessary before passing it to the classifier.([13], [14], [15]).
CHAPTER-3
OBJECTIVES
CHAPTER-4
METHODOLOGY
Machine learning is a branch of computer science that allows computers to learn without explicit
programming. Machine learning is one of the most interesting methods. As the name suggests, it
provides the computer with the ability to learn, which makes it more human. Machine learning is in
vogue right now, probably in far more places than expected. Humans can learn from their past
experiences, but computers cannot. Machine learning is a branch of artificial intelligence that helps
systems learn from their past experiences and improve them without the need for human intervention.
In other words, machine learning is similar to gardening. Seeds algorithms, nutrient data, you
gardeners, and plants programs. Its main purpose is to create computer programs. Machine learning is
inextricably linked to data analysis and statistics since the effectiveness of a learning algorithm is
determined on the data used. Learning techniques, in general, are data-driven methods that combine
core computer science concepts with notions from statistics,
probability, and optimization. Machine learning is used in a variety of industries, including

pharmaceutical, military, marketing, and security. The system uses machine learning algorithms to
analyze datasets, including daily actual past data, and make predictions for future days.
40
Neural networks currently occupy a very important place in machine learning. Using neural networks
makes it possible to capture abstract features from original data and therefore fulfill our goal of
assessing the epidemic situation in a novel manner. The simplest and most intuitive neural network is
a fully connected neural network. The neuron is its most basic component. Neurons can extract input
from previous neurons in the previous layer and use the activation function to perform nonlinear
modifications. A sophisticated nonlinear task occurs when all neurons combine with the appropriate
structure, weighting, and bias. A neural network can theoretically equip any complex task with
infinite neurons, which allows it to perform any task. On the other hand, better performance and
shorter calculation time are not favorable. It is important to design a network with a specific feature
to balance performance and time consumption.
A. Linear Regression
Regression analysis is a statistical study of dependencies and the relationship between two or more
independent variables. There are still many regression techniques, but linear regression is the most
commonly used.
In regression analysis, linear models and linear regression are mostly used. In that sequence, linear
regression was also used. Linear regression is based on variables that use a large number of
independent variables. From day to day, the independent variables vary. The built-in linear model
involves the relationship between the underlying parameters and the data points. To determine
accuracy and prediction, the rail and test approach is used. The system can be trained and tested with
autonomy before making estimates.
B. The KNN algorithm
The KNN algorithm (k-nearest neighbor) is also widely used. Both taxonomy and regression
approaches can be used with KNN. In the field of regression analysis, the KNN regression model is
widely used. As a result, the KNN model was implemented, and the method performed superbly.
The result of KNN regression is an attempt to give prominence to the object. The average value is
KNN. It sets the average numerical target of the algorithm. The distance function is the same in KNN
regression and taxonomy. etal
[14] KNN's next 90 days (k-nearest neighbor) forecast This is the KNN-based forecast result.
Prediction Linear regression is similar to the prediction result. Trained data using train and test
approach. The machine isolated all this information. It splits the train automatically and tests the data
after learning the data. Dhaka City data were compiled differently, so linear regression and k-nearest
neighbors gave the same result. Its accuracy is determined, which helps us to understand the
operation of the whole model.
41
C. Polynomial Regression
Linear regression can be considered as a specific context of polynomial regression. Linear regression
works with both known continuous data and interrelated variables (target variable and independent
variable). What if we know that the variables are connected, and the connection does not appear to be
simple? We can apply polynomial regression to fit polynomial equations in our dataset. Polynomial
Regression is a supervised Machine Learning Algorithm that is learned using previous data and then
validated using another dataset.
Figure 1. Polynomial Regression of degree 5
Figure 2. Linear Regression
M. Singh et al [13] Because the loss function and error rate in a simple linear model are high, the
accuracy predicted by the simple linear model is lower than the accuracy predicted by the Polynomial
model for non-linear data sets like those shown in Figures 1 and Figure 2. As a result, Polynomial
Regression is a linear model with minor modifications that helps to improve accuracy for non-linear
and complex datasets.
We utilize the Polynomial Features function to transform the data into a polynomial, and then use
linear regression to fit the parameter in polynomial regression. The graphic illustration of this can be
found in Figure 3. The polynomial characteristics convert the equation into an nth (degrees) equation.
So, we must select it carefully because if the degree of polynomial is too low, the data will not fit,
and if it is too high, the data will be overfit.
V. K. Gupta et al [7], The train and test data were converted for polynomial regression. the expected
values from August 7, 2020, to August 28, 2020, as depicted visually. As a result, the polynomial
Regression Algorithm has a 93 percent accuracy. Feature Selection To get the greatest results from
our model, this stage involves feature extraction and selection. Having good and best features allows
us to visualize the data's underlying structure. Feature Engineering has a substantial impact on the
model's performance. It could entail separating or aggregating features to create new ones, or it could
entail gathering data from external sources.
42
E. Gambhir et al [1], Dimensionality reduction makes it easier to evaluate and draw conclusions from
a dataset. The dates were eliminated, and the useless parameters such as longitude and latitude were
removed in order to draw better inferences from this dataset. Were transformed into a date- and-time
object.
A. Bansal et al [10], Feature selection techniques are used in machine learning because they reduce
training time, simplify the model so that users and researchers can understand it, reduce the
dimensionality curse, and improve generalization. Another significant advantage of removing
redundant or irrelevant features is that less redundant data translates to fewer decisions based on
noise, resulting in less overfitting.
Information gain is used in mutual information feature selection to calculate the surprise or reduction
in entropy caused by dataset transformation in some way. Typically, information gain is used by
evaluating the information gain of each variable and then selecting the variable that minimizes
entropy by maximizing information gain and splitting the dataset best into groups for effective
classification. Each variable's gain in the context of the target variable is evaluated for the purpose of
using information gain in feature selection. The calculation is reciprocal between the two randomly
chosen variables.
Recursive feature elimination removes recursive features and builds the model on the attributes that
remain. The draper feature elimination technique is illustrated by recursive feature elimination.
Model accuracy is used in this case to determine which attributes from the list of attributes would
contribute the most to helping predict the target variable.
Correlation is defined as a measure of how two variables change in relation to one another in
correlation feature selection. It is not uncommon for some features, despite being designed to
measure different qualities, to be influenced by a common mechanism and to vary in tandem.
D. Fbprophet
Fbprophet An open-source algorithm developed by Facebook to estimate time-series data using an
additive model to match non-linear trends with annual, monthly, and daily seasonal as well as holiday
effects. It works best with time series with significant seasonal effects and multi- season historical
data. The Prophet apologized for the missing data and trend shifts and it usually handles outliers well.
We validate our data against the rolling mean, just like any other time series model.
There are three target classes in our dataset, each with many discrete instances. The following are the
target classes:
43
(1) Confirmed cases: The number of instances that have been confirmed as of a specific date. It
can be increased or decreased based on the following day, time, and location, which is only
applicable to Indian states.
(2) Death cases: The total number of death cases at any given time. It can be increased or
decreased based on the following day, time, and location, which is only applicable to Indian states.
(3) Cured cases: The total number of cured cases at any given time. It can be increased or
decreased based on the following day, time, and location, which is only applicable to Indian states.
E. Performance Tuning
V. K. Gupta et al [7], The popular prediction models utilized in the investigation are listed in
Figure 4, and the packages employed by these models are open-source libraries written in the R
programming language and licensed under the GNU GPL. All of the packages are used here, each
with its own way for model building. which are tuned for better results.
Italy, USA, UK, and France are the two countries currently in Section 4 While India is in Section 3.
This study attempts to establish a system for pre-distribution of the number of cases affected by
COVID-19 using machine learning methods. The data used for the study include daily reports of
cases of recent infections worldwide because of COVID-19. This is a shocking situation in the world
as the number of confirmed cases is increasing day by day. The number of people affected by the
Covid-19 pandemic in different parts of the world is not well known.
A. Data Preprocessing
The purpose of this step is to convert raw data into a form suitable for machine learning. Systematic
and clean data allows the data scientist to obtain accurate results from the machine learning model
used. This process includes data formatting, cleaning, and modeling. At the modelling stage, the data
scientist trains many models to explain which one of them gives the most accurate predictions.
B. Model Training
The data scientists first collect the data and divide it into three subsets, he could proceed with model
training. This process involves “feeding” the algorithm with training details. The algorithm will
process the data and extract a
44
model that is able to find the target value (attribute) in the new data - the answer you want to get by
predicting analysis. The purpose of model training is to improve the model. Two styles of model
training are very common - supervised and supervised learning. The choice of each style depends on
whether you have to predict certain attributes or elements of group data in the same way.
Supervised Learning allows processing of data by targeted signals or labeled data. These qualities are
mentioned in the historical data prior to training. With supervised learning, a data scientist can solve
planning and deferred problems.
Unsupervised learning, with this training style, the algorithm analyzes unlabeled data. The purpose of
model training is to find hidden connections between data objects and structural similarities or
differences. Unsupervised learning aims to solve problems such as assembly, learning about the rules
of assembly, and size reduction. For example, it can be used in data, furthering the phase to reduce
the complexity of the data.
C. Model Testing
The purpose of this step is to develop a simpler model that is able to create a target value quickly and
efficiently. A data scientist can achieve this goal by modifying the model. That is the use of model
parameters to achieve the best performance of the algorithm One of the most effective methods of
model testing and correction of the opposite
D. Cross Validation
Validation is very common and the planning method is used. Includes separate training database into
ten equal parts (folders). The model provided is training only nine folders and then tested in the tenth
(the one that had been left out). Training continues until the entire herd is set aside and used for
testing. As a result of the model performance measurement, the specialist calculates the accumulated
points for each set of parameters. A data scientist trains models with different sets of hyperparameters
to define which model has the highest predictive accuracy. Guaranteed points fall on the performance
of the middle model in ten catch folders. There the data scientist examines models with
hyperparameter values that find the best guaranteed points. There are various error metrics for
machine learning activities.
EVALUATION CRITERIA USED FOR CLASSIFICATION
4.3.1 Confusion Matrix
45
The confusion matrix is additionally called as Error matrix. it's a table that's often accustomed
describe the performance of a classification method on a collection of test data that actual value are
known. Each tuple of the matrix denotes the occurrences within the actual class. Each column of the
matrix denotes the occurances in an exceedingly predicted class. The confusion matrix is represented
as:
True Positive: Case during which a Fault predicted is “Yes” and therefore the Fault actually exist.
True Negative: Case within which a Fault predicted “No” and therefore the Fault actually don’t exist.
False Positive: Case within which a Fault predicted “Yes” and also the Fault actually don’t exist.
False Negative: Case during which a Fault predicted “No”, but the Fault actually exist.
4.3.2 Accuracy and Precision
In classification, accuracy and precision are two important evaluation parameters. Accuracy is
defined because the aggregation of true positive and true negative instances divided by 100.And
Precision is fraction of true positive and predicted yes instances. The formula of Accuracy and
Precision are given below:
4.3.3 Recall and F-Square
Recall is defined because the fraction between True Positive instances and Actual yes instances
whereas F-Square is that the fraction between product of the recall and precision to the summation of
recall and precision parameter of classification. The formula of recall and precision given below:
46
4.3.4 Sensitivity, Specificity and ROC
Sensitivity is defined because the fraction of true positive and actual yes instances whereas specificity
is that the difference between one and false positive rate value. ROC is defined because the fraction
between truth positive rate and therefore the false positive rate.
4.3.5 MCC
MCC is a measure that studies both true and false positives and negatives. The MCC can be obtained
as
47
FIG.4.3.5 Classificaion Model
48
1. FEASIBILITY STUDY
Feasibility study is conducted once the problem is clearly understood. The feasibility study which is a
high level capsule version of the entire system analysis and design process. The objective is to
determine whether the proposed system is feasible or not and it helps us to the minimum expense of
how to solve the problem and to determine, if the Problem is worth solving. The following are the
three important tests that have been carried out for feasibility study.
This study tells about how this package is useful to the users and its advantages and
disadvantages, and also it tells whether this package is cost effective are not. There are three types of
feasibility study, they are
 Economic Feasibility.
 Technical Feasibility.
 Operational Feasibility.
3.1 TECHNICAL FEASIBILITY
Evaluating the technical feasibility is the trickiest part of a feasibility study. This is because,
at this point in time, not too many detailed design of the system, making it difficult to access issues
like performance, costs on (on account of the kind of technology to be deployed) etc. A number of
issues have to be considered while doing a technical analysis. Understand the different technologies
involved in the proposed system before commencing the project we have to be very clear about what
are the technologies that are to be required for the development of the new system. Find out whether
the organization currently possesses the required technologies. Is the required technology available
with the organization?
3.2 OPERATIONAL FEASIBILITY
49
Proposed project is beneficial only if it can be turned into information systems that will
meet the organizations operating requirements. Simply stated, this test of feasibility asks if the
system will work when it is developed and installed. Are there major barriers to Implementation?
Here are questions that will help test the operational feasibility of a project:
 Is there sufficient support for the project from management from users? If the current system
is well liked and used to the extent that persons will not be able to see reasons for change,
there may be resistance.
 Are the current business methods acceptable to the user? If they are not, Users may welcome
a change that will bring about a more operational and useful systems.
 Are there major barriers to Implementation? Here are questions that will help test the
operational feasibility of a project
 Have the user been involved in the planning and development of the project?
 Since the proposed system was to help reduce the hardships encountered. In the existing
manual system, the new system was considered to be operational feasible.
3.3 ECONOMIC FEASIBILITY
Economic feasibility attempts 2 weigh the costs of developing and implementing a new
system, against the benefits that would accrue from having the new system in place. This feasibility
study gives the top management the economic justification for the new system. A simple economic
analysis which gives the actual comparison of costs and benefits are much more meaningful in this
case. In addition, this proves to be a useful point of reference to compare actual costs as the project
progresses. There could be various types of intangible benefits on account of automation. These
could include increased customer satisfaction, improvement in product quality better decision making
timeliness of information, expediting activities, improved accuracy of operations, better
documentation and record keeping, faster retrieval of information, better employee morale.
System Design
50
UML Diagrams
UML (Unified Modeling Language) is a standard language for specifying,

visualizing, constructing, and documenting the artifacts of software systems. UML was
created by the Object Management Group (OMG) and UML 1.0 specification draft was
proposed to the OMG in January 1997. It was initially started to capture the behavior of
complex software and non-software system and now it has become an OMG standard.
This tutorial gives a complete understanding on UML.
UML is a standard language for specifying, visualizing, constructing, and

documenting the artifacts of software systems.
UML was created by the Object Management Group (OMG) and UML 1.0 specification
draft was proposed to the OMG in January 1997.
OMG is continuously making efforts to create a truly industry standard.
 UML stands for Unified Modeling Language.
 UML is different from the other common programming languages such as C+

+, Java, COBOL, etc.
 UML is a pictorial language used to make software blueprints.
 UML can be described as a general-purpose visual modeling language to

visualize, specify, construct, and document software system.
 Although UML is generally used to model software systems, it is not limited
within this boundary. It is also used to model non-software systems as well. For
example, the process flow in a manufacturing unit, etc.
UML is not a programming language but tools can be used to generate code in
various languages using UML diagrams. UML has a direct relation with object- oriented
analysis and design. After some standardization, UML has become an OMG standard.
51
Components of the UML
UML diagrams are the ultimate output of the entire discussion. All the elements,
relationships are used to make a complete UML diagram and the diagram represents a
system. The visual effect of the UML diagram is the most important part of the entire
process. All the other elements are used to make it complete.
UML includes the following nine diagrams, the details of which are described in
the subsequent chapters.
 Class diagram
 Object diagram
 Use case diagram
 Sequence diagram
 Collaboration diagram
 Activity diagram
 State chart diagram
 Deployment diagram
 Component diagram
The following are the main components of uml: -
1. Use-case Diagram
2. Class Diagram
3. Sequence Diagram
4. Activity Diagram
5. Collaboration Diagram
52
4.4 UML DIAGRAMS
4.4.1 USE CASE DIAGRAM
A use case diagram in the Unified Modeling Language (UML) is a type

of behavioral diagram defined by and created from a Use-case analysis. Its
purpose is to present a graphical overview of the functionality provided by a
system in terms of actors, their goals (represented as use cases), and any
dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the
actors in the system can be depicted.
1|Page
Reading Dataset
Preprocessing
Split the dataset
System
Training the Dataset
Prediction
Accuracy
2|Page
4.4.2 Sequence DIAGRAM
System Dataset
Reading the dataset
Preprocessing
Split the dataset into training and testing set
Train the dataset and create model
Prediction
Accuracy
Collaboration diagram
2: Preprocessing
3: Split the dataset into training and testing set
4: Train the dataset and create model
5: Prediction
6: Accuracy
System Dataset
1: Reading the dataset
3|Page
Sample Code:
from flask import Flask
from flask import Flask, flash, redirect, render_template, request, session, abort
import os
import shutil
from werkzeug.utils import secure_filename
from werkzeug.datastructures import FileStorage
import mysql.connector
import sys
import glob
import pandas as pd
from sklearn.svm import SVC
import numpy as np
import sys
4|Page
from datetime import date
from datetime import datetime
import smtplib
from csv import reader
lst=[]
app = Flask(__name__)
db= mysql.connector.connect(user='root', database='human')
@app.route('/')
def home():
return render_template('index.html')
@app.route('/userhome')
def userhome():
return render_template('userhome.html')
@app.route('/adminhome')
def adminhome():
return render_template('adminhome.html')
5|Page
@app.route('/userregister')
def userregister():
return render_template('registration.html')
@app.route('/upload')
def upload():
return render_template('upload.html')
@app.route('/aboutus')
def aboutus():
return render_template('aboutus.html')
@app.route('/feedback')
def feedback():
return render_template('feedback.html')
@app.route('/userlogin')
6|Page
def userlogin():
@app.route('/Recommend')
def Recommend():
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('input'):
for filename in filenames:
print(os.path.join(dirname, filename))
import os
# Read dataset into dataframe
data=pd.read_csv('input/train.csv')
7|Page
data.head()
data.info()
data.shape
data.columns
data.Activity.value_counts()
data['activity_code'] = data.Activity.astype('category').cat.codes
data.activity_code
8|Page
data
data1=data.drop('Activity',axis=1)
data1
x_col=data1.columns.to_list()
x_col.pop(-1)
x_data=data1[x_col]
y_col='activity_code'
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y =
train_test_split(data1[x_col],data1[y_col].values,test_size=0.1)
train_x.shape,test_x.shape,train_y.shape,test_y.shape
test_y
# Build Random Forest model using ensemble.RandomForestClassifier
9|Page
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
rf=RandomForestClassifier()
rf.fit(train_x,train_y)
test_y_pred= rf.predict(test_x)
print("TEST ",test_x.iloc[0])
score=accuracy_score(test_y,test_y_pred)
print("Testing Accuracy", accuracy_score(test_y,test_y_pred))
return render_template('Recommend.html',score=score)
@app.route('/neural')
def neural():
from subprocess import check_output
10 | P a g e
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
train = pd.read_csv('input/train.csv')
test = pd.read_csv('input/test.csv')
#Feature matrix
train_features = train.iloc[:,:561].to_numpy()
test_features = test.iloc[:,:561].to_numpy()
train_results = train.iloc[:,562:].to_numpy()
test_results = test.iloc[:,562:].to_numpy()
train_resultss=np.zeros((len(train_results),6))
test_resultss=np.zeros((len(test_results),6))
print(train_resultss)
for k in range (0,len(train_results)):
if train_results[k] =='STANDING':
11 | P a g e
train_resultss[k][0]=1
elif train_results[k] =='WALKING':
elif train_results[k] =='WALKING_UPSTAIRS':
elif train_results[k] =='WALKING_DOWNSTAIRS':
elif train_results[k] =='SITTING':
else:
for k in range (0,len(test_results)):
if test_results[k] =='STANDING':
test_resultss[k][0]=1
elif test_results[k] =='WALKING':
elif test_results[k] =='WALKING_UPSTAIRS':
elif test_results[k] =='WALKING_DOWNSTAIRS':
12 | P a g e
elif test_results[k] =='SITTING':
else:
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=561))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(6, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
model.fit(train_features, train_resultss,epochs=30,batch_size=128)
13 | P a g e
score = model.evaluate(test_features, test_resultss, batch_size=128)
print(score[1])
s=score[1]
return render_template('Recommend1.html',score=s)
@app.route('/index')
def index():
@app.route('/adminlogin')
def adminlogin():
@app.route('/userregisterdb', methods=['POST'])
def do_userregisterdb():
uid=request.form['userid']
name=request.form['name']
email=request.form['email']
phno=request.form['phno']
14 | P a g e
area=request.form['area']
password=request.form['password']
cursor = db.cursor()
cursor.execute('insert into users values("%s", "%s", "%s", "%s", "%s","%s")'

%\
(uid,name,email,phno,area,password))
db.commit()
return render_template('index.html',msg="Registered Successfully")
@app.route('/login', methods=['POST'])
def do_login():
flag=False
username=request.form['username']
sql = "SELECT * FROM users WHERE userid= '%s' and password = '%s' "
% (username,password)
print("Sql is ",sql)
15 | P a g e
rows_count = cursor.execute(sql)
data = cursor.fetchall()
if len(data) > 0:
session['logged_in'] = True
session['uid'] = username
flag=True
else:
flag=False
if flag:
return render_template('userhome.html',msg="User Login Successfully")
else:
return render_template('index.html',msg="Username/password not Match")
#admin module starts
@app.route('/adminlogin', methods=['POST'])
def do_adminlogin():
flag=False
##print ("Admin Login")
username=request.form['username']
16 | P a g e
if username=='admin' and password=='admin':
session['logged_in'] = True
flag=True
else:
flag=False
if flag:
return render_template('adminhome.html',msg="Login success")
else:
@app.route('/viewfeedback')
def do_viewfeedbackdb():
##print ("Welcome feedback view")
sql = "SELECT users.userid,name,feedback,date,time FROM users,feedback

where users.userid=feedback.userid"
17 | P a g e
if len(data) > 0:
return render_template('viewfeedback.html',ddata=data)
@app.route('/viewadminfeedback')
def do_viewadminfeedbackdb():
##print ("Welcome feedback view")
sql = "SELECT users.userid,name,feedback,date,time FROM users,feedback

where users.userid=feedback.userid"
if len(data) > 0:
return render_template('viewadminfeedback.html',ddata=data)
@app.route('/profile')
18 | P a g e
def profiledb():
uid=session['uid']
sql = "SELECT userid,name,password,area,email,phno FROM users where

userid='%s'" % (uid)
if len(data) > 0:
return render_template('profile.html',ddata=data)
@app.route('/viewusers')
def viewusersdb():
sql = "SELECT userid,name,email,phno,area FROM users"
19 | P a g e
if len(data) > 0:
return render_template('viewusers.html',ddata=data)
@app.route('/userfeedbackdb', methods=['POST'])
def do_userfeedbackdb():
userid=request.form['userid']
feedback=request.form['feedback']
today = date.today()
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
cursor.execute('insert into feedback(userid,feedback,date,time) values("%s",

"%s","%s","%s")' % \
(userid,feedback,today,current_time))
db.commit()
return render_template('userhome.html',msg="Thank You for Your

Valueable FeedBack")
20 | P a g e
@app.route('/addplacesdb', methods=['POST'])
def do_addplacesdb():
name=request.form['name']
email=request.form['email']
phno=request.form['phno']
uname=request.form['username']
cursor.execute('insert into patient values("%s", "%s", "%s", "%s", "%s")' % \
(name,email,phno,uname, password))
db.commit()
return render_template('patientlogin.html')
#admin module starts
@app.route('/uploadDB', methods=['POST'])
def do_uploadDB():
f = request.files['files']
df = pd.read_csv(f,encoding='cp1252')
df.to_csv("dataset.csv",index=False)
21 | P a g e
return render_template('adminhome.html',msg="File UploadedSuccessfully")
#admin module Ends
@app.route("/logout")
def logout():
session['logged_in'] = False
return home()
if __name__ == "__main__":
app.secret_key = os.urandom(12)
app.run(debug=True,host='0.0.0.0', port=8000)
Results :
1. TESTING
SYSTEM TESTING
22 | P a g e
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to check
the functionality of components, sub assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the
Software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type addresses a specific
testing requirement.
TYPES OF TESTS
Unit testing
Unit testing involves the design of test cases that validate that the internal program
logic is functioning properly, and that program inputs produce valid outputs. All decision
branches and internal code flow should be validated. It is the testing of individual software
units of the application .it is done after the completion of an individual unit before
integration. This is a structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test a specific business
process, application, and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and contains clearly
defined inputs and expected results.
Integration testing
Integration tests are designed to test integrated software components to determine if

they actually run as one program. Testing is event driven and is more concerned with the
basic outcome of screens or fields. Integration tests demonstrate that although the
components were individually satisfaction, as shown by successfully unit testing, the
combination of components is correct and consistent. Integration testing is specifically aimed
at exposing the problems that arise from the combination of components.
23 | P a g e
Functional test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.
Organization and preparation of functional tests is focused on requirements, key functions,

or special test cases. In addition, systematic coverage pertaining to identify Business process
flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.
System Test
System testing ensures that the entire integrated software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system testing is
the configuration oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and integration points.
White Box Testing

White Box Testing is a testing in which in which the software tester has knowledge of
the inner workings, structure and language of the software, or at least its purpose. It is
purpose. It is used to test areas that cannot be reached from a black box level.
24 | P a g e
Black Box Testing
Black Box Testing is testing the software without any knowledge of the inner workings,
structure or language of the module being tested. Black box tests, as most other kinds of tests,
must be written from a definitive source document, such as specification or requirements
document, such as specification or requirements document. It is a testing in which the
software under test is treated, as a black box .you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
6.1 Unit Testing:
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.
Test strategy and approach

Field testing will be performed manually and functional tests will be written in detail.
Test objectives
 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
Features to be tested
 Verify that the entries are of the correct format
 No duplicate entries should be allowed
 All links should take the user to the correct page.
6.2 Integration Testing
25 | P a g e
Software integration testing is the incremental integration testing of two or more
integrated software components on a single platform to produce failures caused by interface
defects.
The task of the integration test is to check that components or software applications,
e.g. components in a software system or – one step up – software applications at the company
level – interact without error.
Test Results: All the test cases mentioned above passed successfully. No defects encountered.
6.3 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.
CONCLUSION
Human activity analysis is a popular activity in the growing industry and we have applied different
machine learning algorithm. Comparative study performed among the applied various techniques
kNN, SVM, Random forest, Neural Networks, Logistic regression and Naïve Bayes. In them, Logistic
Regression and neural network gave good results whereas Naive Bayes result was not good. The
implementation of Neural Network on Python gave better results than the one provided in the Orange
tool.The limitations of this work is though the efficiency of neural network is good, the model is not
dynamic. The inability of getting trained with real time data will force us to train the model everytime
new data comes.In future, these results can be used for making smart watches and similar devices
which can track a user’s activity and notify him/her of the daily activity log. They can also be used for
monitoring elderly people, prison inmates,or anyone who needs constant supervision.
26 | P a g e
REFERENCES
1. Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge L. Reyes-Ortiz(2012). Human
Activity Recognition on Smartphones Using a Multiclass HardwareFriendly Support Vector Machine.
Springer International Workshop on Ambient Assisted Living.Lecture notes in Computer Science.
Vol(7657), pp 216- 223.
2. Jun Liu, Amir Shahroudy, Dong Xu, Gang Wang(2016). Spatio-Temporal LSTM with Trust Gates
for 3D Human Action Recognition. European Conference on Computer Vision., pp 816-833.
Vol(9907) Oyelade, Oladipupo, Obagbuwa(2010). Application of k-Means Clustering algorithm for
prediction of Students’
3. Academic Performance. International Journal of Computer Science and Information

Security,7(1).292-295
4. JuhaVesanto(1999). SOM-Based Data Visualization Methods. Intelligent Data Analysis,

Laboratory of Computer and Information Science, Helsinki University of Technology, P. O. Box
5400, FIN-02015 HUT, Finlandvol. 3(2), pp. 111-126.
5. Ivan Viola(2010). Information Theory in Computer Graphics and Visualization. Proceeding in

SA’11SIGGRAPH Asia 2011 Courses.
6. Jiang, L., Zhang, H., &Cai, Z. (2009). A novel Bayes model: Hidden NaïveBayes.IEEE
Transactions on Knowledge and Data Engineering, 21(10),1361–1371.
7. Vincenzi S., Zucchetta M., Franzoi P., Pellizzato M., PranoviF.,De Leo G.A., Torricelli P.(2011).
Application of a Random Forestalgorithm to predict spatial distribution of the potential yield
ofRuditapesphilippinarumin the Venice lagoon, Italy. Ecol. Model.222, 1471−1478.
8. Jie Hu, Yu Kong, Yun fu. (2013). Activity Recognition by Learning Structural and Pairwise Mid-
level Features Using Random Forest . 2013 10th IEEE International Conference and Workshops on
Automatic Face and Gesture Recognition (FG) Automatic Face and Gesture Recognition
9. J.K.Aggarwal. M.S.Ryoo.(2011). Human activity analysis: A review. ACM Computing Surveys.

43(3) 10.
27 | P a g e
[10] Enea Cippitell ,Ennio Gambi , Susanna Spinsante, Francisco Florez-Revuelta (2016). Human
Action Recognition Based on Temporal Pyramid of Key Poses Using RGB-D Sensors. Springer
International, Conference, Advanced Concepts for Intelligent Vision Systems.PP 510-521.vol(10016)
11.
[11] Qingzhong Liu, Zhaoxian Zhou, SarbagyaRatnaShakya, PrathyushaUduthalapally, MengyuQiao,

and Andrew H. Sung(2018). “Smartphone Sensor-Based Activity Recognition by Using Machine
Learning and Deep Learning Algorithms” International Journal of Machine Learning and Computing,
Vol. 8, No. 2.
[12]Ms.S.Roobini, Ms.J.FenilaNaomi(2019). Smartphone Sensor Based Human Activity Recognition

using Deep Learning Models. International Journal of Recent Technology and Engineering . ISSN:
2277-3878, Volume-8, Issue1. 13.
[13] Ye, G. Stevenson, and S. Dobson, “Kcar: A knowledgedrivenapproach for concurrent activity
recognition,”Pervasive and MobileComputing, vol. 19, pp. 47 – 70, 2015. 14.
[14] O. D. Lara and M. A. Labrador, “A survey on human activity recognitionusing wearable

sensors,”IEEE Communications Surveys and Tutorials,vol. 15, no. 3, pp. 1192–1209, 2013. 15.
[15]M. Ziaeefard and R. Bergevin, “Semantic human activity recognition: Aliteraturereview,”Pattern

Recognition, vol. 48, no. 8, pp. 2329 – 2345,2015.
28 | P a g e

Human Activity Recognization

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Activity Recognization

Uploaded by

Copyright:

Available Formats

Human Activity Analysis using Machine Learning Classification Techniques

1.1 WHAT IS SOFTWARE.....................................................................................................1

1.2 WHAT IS SOFTWARE DEVELOPMENT LIFE CYCLE……………………………...1

2.1 RELATED WORK…………………..................................................................................5

PROBLEM IDENTIFICATION AND OBJECTIVE

3.1 PROBLEM STATEMENT..................................................................................................8

3.2 PROJECT OBJECTIVE.......................................................................................................8

4.1.1USING PYTHON TOOL ON STANDALONE MACHINE LEARNING

4.2 DATA DESCRIPTION………………………………………………………………….10

4.3 EVALUATION CRITERIA USED FOR CLASSIFICATION………………………….11

4.3.1 CONFUSION MATRIX.................................................................................................12

4.3.3 RECALL AND F-SQUARE...........................................................................................13

4.3.4 SENSITIVITY, SPECIFICITY AND ROC....................................................................13

4.3.5 SIGNIFICANCE AND ANALYSIS OF ENSEMBLE METHOD IN MACHINE

4.4 UML DIAGRAMS……………………………………………………………………….14

4.4.1 USE CASE DIAGRAM………………………………………………………………..14

4.4.2 STATE DIAGRAM……………………………………………………………………15

5.1 ALGORITHMS USED......................................................................................................17

5.1.1 DECISION TREE INDUCTION………………………………………………………17

5.1.4 SUPPORT VECTOR MACHINE MODEL...................................................................25

5.1.5 KERNAL FUNCTIONS……………………………………………………………….28

IMPLEMENTATION AND RESULTS

6.1 FRAMEWORK DESIGN………………………………………………………………..31

1.1 Introduction to the project

Random Forest Algorithm

Why use Random Forest?

It takes less training time as compared to other algorithms.

It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

Applications of Random Forest

Marketing: Marketing trends can be identified using this algorithm.

Random Forest is capable of performing both Classification and Regression tasks.

It is capable of handling large datasets with high dimensionality.

Disadvantages of Random Forest

Artificial Neural Network Tutorial

Backward Skip 10sPlay VideoForward Skip 10s

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Cell nucleus Nodes

An Artificial Neural Network in the field of Artificial intelligence where it attempts to

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to

Artificial Neural Network primarily consists of three layers:

It determines weighted total is passed as an input to an activation function to produce the

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Storing data on the entire network:

Capability to work with incomplete knowledge:

Having a memory distribution:

Having fault tolerance:

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

Unrecognized behavior of the network:

Difficulty of showing the issue to the network:

The duration of the network is unknown:

How do artificial neural networks work?

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:

A feed-forward network is a basic neural network comprising of an input layer, an output