5.classification and Prediction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Unit-5 Classification and Prediction

5.1 Concept of Classification and Prediction, Evaluating Classification


Algorithms
 Concept of Classification and Prediction

We use classification and prediction to extract a model, representing the data


classes to predict future data trends. This analysis provides us the best
understanding of the data at a large scale. Classification predicts the categorical
labels of data with the prediction models.

 Classification:
The world of data mining is known as an interdisciplinary one. It requires
a range of disciplines such as analytics, database systems, machine
learning, simulation, and information sciences. The classification of the
data mining system allows users to understand the system and to align their
criteria with such systems. Classification is about the discovery of a model
that distinguishes groups and concepts of data. The definition is to forecast
the class of objects by using this model. The derived model relies on the
study of training data sets.

A classification task starts with a data set where the assignments of the
class are known. For example, based on observable data for multiple loan
borrowers over some time, a classification model may be established that
forecasts credit risk. The data could track job records, homeownership or
leasing, years of residency, number, and type of deposits, in addition to the
historical credit ranking, and so on. The goal would be credit ranking, the
predictors would be the other characteristics, and the data would represent
a case for each consumer.

How does Classification works?

With the help of the bank loan application that we have discussed above,
let us understand the working of classification. The Data Classification
process includes two steps –
 Building a Classifier or Model
 Using Classifier for Classification

 Prediction:
Prediction is one of the data mining techniques that discovers the relation
between independent variables and relation between dependent variables.
To detect the inaccessible data, it uses regression analysis and detects the
missing numeric values in the data. If the class mark is absent, so
classification is used to render the prediction. Due to its relevance in
business intelligence, the prediction is common. If the class mark is absent,
so the prediction is performed using classification.

Following are the examples of cases where the data analysis task is
Prediction:
Suppose the marketing manager needs to predict how much a particular
customer will spend at his company during a sale. We are bothered to
forecast a numerical value in this case. Therefore, an example of numeric
prediction is the data processing activity. In this case, a model or a
predictor will be developed that forecasts a continuous or ordered value
function.

 Evaluating Classification Algorithms

In data mining, classification involves the problem of predicting which category


or class a new observation belongs in. The derived model (classifier) is based on
the analysis of a set of training data where each data is given a class label. The
trained model (classifier) is then used to predict the class label for new, unseen
data.

To understand classification metrics, one of the most important concepts is the


confusion matrix.

Fig: Confusion Matrix


 True Positive: Prediction Positive and result true.
 True Negative: Prediction Negative and result true.
 False Positive (Type 1 error): Prediction Positive result wrong.
 False Negative (Type 2 error): Prediction Negative result wrong.

For accuracy:

The precision metric shows the accuracy of the positive class. It measures how
likely the prediction of the positive class is correct.

Sensitivity computes the ratio of positive classes correctly detected. This metric
gives how good the model is to recognize a positive class.

5.2 Bayesian Classification, Decision Tree Classification, Concept of Entropy


 Bayesian Classification
Bayesian classification uses Bayes theorem to predict the occurrence of any
event. Bayesian classifiers are the statistical classifiers with the Bayesian
probability understandings. The theory expresses how a level of belief, expressed
as a probability.
Bayes's theorem is expressed mathematically by the following equation that is
given below.

Where X and Y are the events and P (Y) ≠ 0


P(X/Y) is a conditional probability that describes the occurrence of event X is
given that Y is true.
P(Y/X) is a conditional probability that describes the occurrence of event Y is
given that X is true.
P(X) and P(Y) are the probabilities of observing X and Y independently of each
other. This is known as the marginal probability.
Interpretation:

In the Bayesian interpretation, probability determines a "degree of belief." Bayes


theorem connects the degree of belief in a hypothesis before and after accounting
for evidence. For example, lets us consider an example of the coin. If we toss a
coin, then we get either heads or tails, and the percent of occurrence of either
heads and tails is 50%. If the coin is flipped numbers of times, and the outcomes
are observed, the degree of belief may rise, fall, or remain the same depending
on the outcomes.
For proposition X and evidence Y,
 P(X), the prior, is the primary degree of belief in X
 P(X/Y), the posterior is the degree of belief having accounted for Y.
 The quotient represents the supports Y provides for X.

Bayes theorem can be derived from the conditional probability:

Where P (X⋂Y) is the joint probability of both X and Y being true, because

Bayesian Network:

Bayesian Network falls under the classification of Probabilistic Graphical


Modelling (PGM) procedure that is utilized to compute uncertainties by utilizing
the probability concept. Generally known as Belief Networks, Bayesian
Networks are used to show uncertainties using Directed Acyclic
Graphs (DAG)

A Directed Acyclic Graph is used to show a Bayesian Network, and like some
other statistical graph, a DAG consists of a set of nodes and links, where the links
signify the connection between the nodes.
The nodes here represent random variables, and the edges define the relationship
between these variables.

A DAG models the uncertainty of an event taking place based on the Conditional
Probability Distribution (CDP) of each random variable. A Conditional
Probability Table (CPT) is used to represent the CPD of each variable in a
network.

 Decision tree classification


A decision tree is a structure that includes a root node, branches, and leaf nodes.
Each internal node denotes a test on an attribute, each branch denotes the outcome
of a test, and each leaf node holds a class label. The topmost node in the tree is
the root node.

Fig: Decision tree


Benefits of Decision tree:
 It does not require any domain knowledge.
 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and
fast.
Decision Tree Terminologies:
 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more
homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from
the tree.
 Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.

Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by ASM). The root node splits
further into the next decision node (distance from the office) and one leaf node
based on the corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the decision node
splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:
 Concept of Entropy

Entropy is the measures of impurity, disorder or uncertainty in a bunch of


examples.

Entropy controls how a Decision Tree decides to split the data. It actually effects
how a Decision Tree draws its boundaries.

On the figure below is depicted the splitting process. Red rings and blue crosses
symbolize elements with 2 different labels. The decision starts by evaluating the
feature values of the elements inside the initial set. Based on their values,
elements are put in Set 1 or Set 2. In this example, after the splitting, the state
seems tidier, most of the red rings have been put in Set 1 while a majority of blue
crosses are in Set 2.

So decision trees are here to tidy the dataset by looking at the values of the feature
vector associated with each data point. Based on the values of each feature,
decisions are made that eventually leads to a leaf and an answer.

At each step, each branching, you want to decrease the entropy, so this quantity
is computed before the cut and after the cut. If it decreases, the split is validated
and we can proceed to the next step, otherwise, we must try to split with another
feature or stop this branch.

Before and after the decision, the sets are different and have different sizes. Still,
entropy can be compared between these sets, using a weighted sum, as we will
see in the next section.
Equation of Entropy:

H(S) = -P (+) log2(p (+)) – P (-) log2(P (-))


P (+)/P (-) = Probability of positive class / Probability of negative class
S = Subset of training example

5.3 Linear Regression, Concept of Non-linear regression


Regression can be defined as a data mining technique that is generally used for the
purpose of predicting a range of continuous values (which can also be called
“numeric values”) in a specific dataset. For example, Regression can predict sales,
profits, temperature, distance and so on.
 Linear Regression
It is simplest form of regression. Linear regression attempts to model the
relationship between two variables by fitting a linear equation to observe the data.
Linear regression attempts to find the mathematical relationship between
variables. If outcome is straight line then it is considered as linear model and if it
is curved line, then it is a nonlinear model. The relationship between dependent
variable is given by straight line and it has only one independent variable.
Y =A +BX
Model ‘Y', is a linear function of 'X'. The value of 'Y' increases or decreases in
linear manner according to which the value of 'X' also changes.
 Concept of Non-linear regression
Non-Linear regression is a type of polynomial regression. It is a method to model
a non-linear relationship between the dependent and independent variables. It is
used in place when the data shows a curvy trend, and linear regression would not
produce very accurate results when compared to non-linear regression. This is
because in linear regression it is pre-assumed that the data is linear.

The scatter plot shows the relationship between GDP and time of a country, but
the relationship is not linear. Instead after 2005 the line starts to become curve
and does not follow a linear straight path. In such cases, a special estimation
method is required called the non-linear regression.

You might also like