Professional Documents
Culture Documents
Unit 1
Unit 1
Unit 1
Unit 1
2)
a) Explain the working of find-s algorithm with an example.
Ans)
Find-S Algorithm:
(OR)
Example:
Here is an example to find target hypothesis using the Find-S algorithm. This
example is divided into multiple parts:
1. Training data generation: As we know that the concept learning works on past
experiences, we need to have training data ready for the learning process. This
step involves training data generation with a simple example. To test the
application you can create your own data, the application currently generates test
data as Map[String, Any].
2. Trainer initialization: This task involves the creation of a Trainer with a model
(Find-S) and some basic configuration like training ratio (Ration between
training samples and validation samples, typically represented by the double
value in range 0 to 1 where 0 represents 0% and 1 represents 100%).
5. Testing: To test the training model we can pass a sample object into the model
using predict function and compare actual output with expected output for
verification.
Avoiding overfitting:
Since the ID3 algorithm continues splitting on attributes until either it
classifies all the data points or there are no more attributes to splits on. As a
result, it is prone to creating decision trees that overfit by performing really well
on the training data at the expense of accuracy with respect to the entire
distribution of data.
There are, in general, two approaches to avoid this in decision trees: - Allow the
tree to grow until it overfits and then prune it. – Prevent the tree from growing
too deep by stopping it before it perfectly classifies the training data.
A decision tree’s growth is specified in terms of the number of layers, or depth,
it’s allowed to have. The data available to train the decision tree is split into
training and testing data and then trees of various sizes are created with the help
of the training data and tested on the test data. Cross-validation can also be used
as part of this approach. Pruning the tree, on the other hand, involves testing the
original tree against pruned versions of it. Leaf nodes are removed from the tree
as long as the pruned tree performs better on the test data than the larger tree.
(OR)
Regression Trees: where the target variable is continuous and tree is used
to predict it's value.
Classification tree example: Consider the widely referenced Iris data classification
problem introduced by Fisher [1936; see also Discriminant Function
Analysis and General Discriminant Analysis (GDA)]. The data file Irisdat reports the
lengths and widths of sepals and petals of three types of irises (Setosa, Versicol,
and Virginic). The purpose of the analysis is to learn how we can discriminate
between the three types of flowers, based on the four measures of width and
length of petals and sepals. Discriminant function analysis will estimate several
linear combinations of predictor variables for computing classification scores (or
probabilities) that allow the user to determine the predicted classification for
each observation. A classification tree will determine a set of logical if-then
conditions (instead of linear equations) for predicting or classifying cases instead:
The interpretation of this tree is
straightforward: If the petal width is
less than or equal to 0.8, the
respective flower would be classified
as Setosa; if the petal width is
greater than 0.8 and less than or
equal to 1.75, then the respective
flower would be classified as
Versicol; else, it belongs to class
Virginic.
(OR)
Visit the below link for CART example:
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
https://www.datasciencecentral.com/profiles/blogs/introduction-to-
classification-regression-trees-cart