Professional Documents
Culture Documents
Cia 4 ML
Cia 4 ML
SUBMITTED BY
SHIVANGI GUPTA (20221026)
BUSINESS UNDERSTANDING
Abalone is a sort of shellfish that is highly common. Their flesh is prized as a delicacy, and
their shells are frequently used in jewellery. The topic of assessing the age of abalone based on
its physical properties is addressed in this paper. Alternative techniques of estimating their age
are time-consuming. Therefore this subject is of interest. Depending on the species, abalone can
live up to 50 years. Environmental elements such as water flow and wave activity play a
significant role in how quickly they grow. Those from protected waters often develop more
slowly than those from exposed reef areas due to differences in food availability. Estimating
the age of abalone is challenging because to the fact that their size is determined not only by
their age, but also by the availability of food. Furthermore, abalone can develop so-called
'stunted' populations, which have substantially distinct growth characteristics than other abalone
populations. The abalone age prediction problem has been classified as a classification problem
in most of the research on the dataset, which entails assigning a label to each case in the dataset.
In this case, the label represents the abalone's ring count, which is an actual quantity. As a
result, the classifier will be unable to distinguish between many classes and will perform
insignificantly. The age of abalone has a positive correlation with its price. However, identifying
an abalone's age is a time-consuming operation. As the abalone matures, rings form in its inner
shell, generally at a pace of one ring per year. Cutting the shell of an abalone allows access to the
rings. A lab technician examines a shell sample under a microscope and counts the rings after
polishing and staining them.
DATA UNDERSTANDING
The abalone dataset is a collection of measurements of different abalones' physical features.
There are 4177 examples of it. To demonstrate the algorithms in action, we'll use the Abalone
dataset that has previously been collected. With this data, we can create a number of regression
models to investigate how different independent variables affect our dependent variable, Rings.
Knowing how each factor influences the Abalone's age can help oceanographers, jewelers, and
businesses better examine their production, distribution, and pricing strategies. To understand
the data, you must first understand what it contains. Understanding the type (continuous
numeric, discrete numeric, or categorical) and meaning of each feature and the number of
instances and features in the dataset is essential.
VARIABLES
Sex: This is the gender of the abalone and has categorical value (M, F or I).
Length: The longest measurement of the abalone shell in mm. Continuous numeric value.
Shucked Weight: Weight of just the meat in the abalone in grams. Continuous numeric
value.
Viscera Weight: Weight of the abalone after bleeding in grams. Continuous numeric
value.
Shell Weight: Weight of the abalone after being dried in grams. Continuous numeric
value.
Rings: This is the target that is the feature that we will train the model to predict. As
mentioned earlier, we are interested in the age of the abalone and it has been established that
number of rings + 1.5 gives the age. Discrete numeric value
MODELLING
1. It is a classification method used to determine the probability of an event's success or failure in
R. Binary dependent variables (true/false, yes/no) are utilised in logistic regression. In a binomial
distribution, the logit function is employed as a link function. Modelling (Multiple Linear
regression)
• The model selected for predicting the lifespan/Ageing of Abalone was Logistic Regression,
o Detect the number of rings on the Abalone an ordinal scale of long life or short life.
> library(readxl)
> View(abalone)
> summary(abalone)
> summary(abalonelogmod1)
> exp(cbind(coef(abalonelogmod1),confint(abalonelogmod1)))
> round(exp(cbind(coef(abalonelogmod1),confint(abalonelogmod1))),3)
> summary(abalonelogmod2)
> library(readxl)
> abalone <- read_excel("C:/Users/Shivangi Gupta/Desktop/abalone.xlsx")
> View(abalone)
> abalonelogmod1<-glm(W ~ Sex + Length + Diameter + Rings, family=binomial (link="logit"),
data=abalone)
> set.seed(555)
> ind <- sample(2, nrow(abalone), replace = TRUE,prob = c(0.6, 0.4))
> training <- abalone[ind==1,]
> testing <- abalone[ind==2,]
> library(MASS)
> linear <- lda(W~., training)
> linear
> attributes(linear)
HISTOGRAM
> p <- predict(linear, training)
> ldahist(data = p$x[,1], g = training$W)
> library(devtools)
> library(klaR)
> tab1
> sum(diag(tab1))/sum(tab1)
CONCLUSION
The dataset was examined with Logistic regression and covered with the basics of Machine
learning and examined the model constructions and workflow. It was relevantly evident that
the model accuracy was comparatively good. Apparently, there isn't much of a difference
between Males and Females, a claim that can be confidently made given the small variation in
intersex means for each of the eight regressors. In addition, because the accuracy indicator
exceeds the "No-Information Rate" (the theoretical accuracy that would be attained if all
observations were assigned "No" and then compared actual data), the model might be
considered a more-or-less decent predictor of abalone sex. To summarize, after accounting for
all of the interfering factors, I was generally pleased with the findings obtained by logistic
regression. Given the abundance of other, more advanced, and generally more effective
classification algorithms, I strongly urge their use with respect to the dataset, believing that
they will result in increased accuracy and, as a result, more precise results. Measurement is to
find how good a model is # After creating a model using training data and then apply this
model on testing data and find error, if the error is less then the model is economical and
practical in nature. Our R-squared value to predict the age of an abalone was low (0.3607).
However, a high R-squared does not necessarily indicate that the model has a good fit. All of
our predictor variables were statistically significant with p-values that were lower than the α of
0.05. For this reason, we were still able to draw some conclusions about our variables but
maybe Multiple Linear Regression model is not the best way to predict the age of an abalone.
The dataset was examined with Logistic regression and covered with the basics of Machine
learning and examined the model constructions and workflow. It was relevantly evident that
the model accuracy was comparatively good. , I was generally pleased with the findings
obtained by logistic regression. Given the abundance of other, more advanced, and generally
more effective classification algorithms, I strongly urge their use with respect to the dataset,
believing that they will result in increased accuracy and, as a result, more precise results.
3. CLASSIFICATION
We'll use four classifiers to classify the data: random forest, decision tree, KNN and SVM. We'll
also figure out which parameters are best for each classifier. We don't utilise cross validation to
find the optimal parameter because there are numerous objectives with a total of We utilise the
simple grid search strategy to find the optimal parameter for each classifier.
RANDOM FOREST
Random Forest is a type of ensemble learning technique that creates a large number of decision
trees during training. For classification problems, it predicts the mode of the classes, and for
regression tasks, it predicts the mean of trees. During tree construction, it employs the random
subspace approach and bagging. It comes with a built-in feature importance indicator
CODES
> datarf <- abalone1
> str(datarf)
> library(randomForest)
> install.packages("randomForest")
> set.seed(222)
> print(rf)
> attributes(rf)
> rf$confusion
>library(caret)
>p1 <- predict(rf, train)
> head(p1)
> head(train$D)
>head(p2)
> head(test$D)
> confusionMatrix(p2, test$D)
> importance(rf)
> varUsed(rf)
> getTree(rf, 1, labelVar = TRUE)
> getTree(rf, 1, labelVar = TRUE)
> MDSplot(rf, train$D)
K-NN
KNN is a Supervised Learning algorithm that predicts the output of data points using a labelled
input data set.It is one of the most basic Machine Learning algorithms, and it may be used to
solve a wide range of issues. It is primarily based on resemblance of features. KNN compares a
data point's similarity to that of its neighbour and assigns it to the most similar class.KNN is a
non-parametric model, which means it makes no assumptions about the data set, unlike most
algorithms. Because the algorithm can now handle realistic data, it becomes more effective.KNN
is a lazy algorithm, which implies that instead of learning a discriminative function from the
training data, it memorises it. Both classification and regression problems can be solved with
KNN.
> set.seed(1234)
> set.seed(222)
SVM
Support vector machines (SVMs) are supervised learning models with related learning algorithms
for classification and regression analysis in machine learning. It's primarily used to solve
categorization challenges. Each data item is displayed as a point in n-dimensional space (where n
is the number of features), with the value of each feature being the value of a specific coordinate
in this algorithm. The hyper-plane that best distinguishes the two classes is then used to classify
the data. SVMs may also conduct non-linear classification, implicitly translating their inputs into
high-dimensional feature spaces, in addition to linear classification.
CODES
> View(abalone)
> data(abalone)
> data(Abalone)
> str(abalone)
> library(ggplot2)
> library(e1071)
> mymodel <- svm(D~., data=abalone)
>summary(mymodel)
> plot(mymodel, data=abalone, abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))
library(predtoolsTS)
> pred <- predict(mymodel, abalone)
> tab
> plot(mymodel, data=abalone, abalone.Height~abalone.Length,
slice=list(abalone.Height=3,abalone.Length=4))
> tab
>1-sum(diag(tab))/sum(tab)
>data=abalone,kernel="polynomial")
> mymodel <- svm(Diameter~Length, data=abalone,kernel="polynomial")
>summary(mymodel)
> library(predtoolsTS)
> tab
>1-sum(diag(tab))/sum(tab)
> mymodel <- svm(Diameter~Length, data=abalone, kernel="sigmoid")
> summary(mymodel)
> set.seed(123)
> tmodel <- tune(svm, Diameter~Length, data=abalone, ranges = list(epsilon = seq(0,1,0.1),
cost=2^(2:9)))
>summary(tmodel)
>tab
>1-sum(diag(tab))/sum(tab)
DECISION TREE
In machine learning, a Decision Tree is a supervised method. It assigns a target value to each
data sample using a binary tree graph (each node has two children). The tree leaves represent the
target values. Starting at the root node, the sample is propagated through nodes until it reaches
the leaf. A choice is made in each node about which descendant node it should travel to. The
feature of the selected sample is used to make a choice. It is usually one of the factors considered
while making a decision (one feature is used in the node to make a decision). The process of
discovering the best rules at each internal tree node based on the chosen metric is known as
decision tree learning.
CODES
library(party)
>print(mytree)
>plot(mytree,type="simple")
>tab<-table(predict(mytree), mydata$D)
>print(tab)
>1-sum(diag(tab))/sum(tab)
CONCLUSION
We cross-validated each of the models on the test data before optimising them. Because cross
validation is a random process, we use pairwise t-tests to see if there is a statistically significant
difference between the performance of any two improved classifiers. First, we run each of the
best models via a 10-fold stratified cross-validation procedure (without any repetitions). Second,
we use a paired t-test to compare the accuracy of the RF model to that of other models, because
the RF model is the most accurate. RF outperforms other models in terms of f1-score and
weighted average recall, followed by KNN. At the same time, KNN has a higher precision score.
Other classifications are similar; but, because to the enormous number of goal levels, we did not
print them all. In the confusion matrix, the scenario is the same.
OUTPUTS
MULTILINEAR AND LINEAR REGRESSION
## Sex Length Diameter Height Whole.weight Shucked.weight Viscera.weight ## 1 M
0.455 0.365 0.095 0.5140 0.2245 0.1010
## 2 M 0.350 0.265 0.090 0.2255 0.0995 0.0485
## 3 F 0.530 0.420 0.135 0.6770 0.2565 0.1415
## 4 M 0.440 0.365 0.125 0.5160 0.2155 0.1140
## 5 I 0.330 0.255 0.080 0.2050 0.0895 0.0395
## 6 I 0.425 0.300 0.095 0.3515 0.1410 0.0775
## Shell.weight Rings
## 1 0.150 15
## 2 0.070 7
## 3 0.210 9
## 4 0.155 10
## 5 0.055 7
## 6 0.120 8
## Sex Length Diameter Height
## Length:4177 Min. :0.075 Min. :0.0550 Min. :0.0000
## Class :character 1st Qu.:0.450 1st Qu.:0.3500 1st Qu.:0.1150 ##
Mode :character Median :0.545 Median :0.4250 Median :0.1400
##Mean :0.524 Mean :0.4079 Mean :0.1395
##3rd Qu.:0.615 3rd Qu.:0.4800 3rd Qu.:0.1650
##Max. :0.815 Max. :0.6500 Max. :1.1300
## Whole.weight Shucked.weight Viscera.weight Shell.weight
## Min. :0.0020 Min. :0.0010 Min. :0.0005 Min. :0.0015
#The pairwise plot indicates there is lack of linearity in the dependent va riable column(Rings). Also
the other columns show multi-collinearity.
1.3.1 --
## v tibble 3.1.3 v purrr 0.3.4 ## v tidyr 1.1.3
v stringr 1.4.0 ## v readr 2.0.0 v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.0.5 ## Warning:
package 'tidyr' was built under R version 4.0.5 ## Warning: package 'readr' was
built under R version 4.0.5 ## Warning: package 'purrr' was built under R
version 4.0.5 ## Warning: package 'forcats' was built under R version 4.0.5
Length Diameter Height Whole.weight
Shucked.weight ## Length 1.0000000 0.9868116 0.8275536
0.9252612 0.8979137
## Diameter 0.9868116 1.0000000 0.8336837 0.9254521 0.8931625
## Height 0.8275536 0.8336837 1.0000000 0.8192208 0.7749723
## Whole.weight 0.9252612 0.9254521 0.8192208 1.0000000 0.9694055
## Shucked.weight 0.8979137 0.8931625 0.7749723 0.9694055 1.0000000
## Viscera.weight 0.9030177 0.8997244 0.7983193 0.9663751 0.9319613
## Shell.weight 0.8977056 0.9053298 0.8173380 0.9553554 0.882617 1
## Rings 0.5567196 0.5746599 0.5574673 0.5403897 0.4208837
## Viscera.weight Shell.weight Rings ## Length 0.9030177 0.8977056 0.5567196
#An upward inclination is observed and hence the model is not linear and
co ntains outliers.
shapiro.test(abalone$Rings)
##
## Shapiro-Wilk normality
test ##
## data: abalone$Rings
summary(regfinal)
##
## Call:
## lm(formula = Rings ~ ., data =
train.norm) ##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2569 -1.2994 -0.3121 0.8494 14.2412
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
>0 1
0 129 643
1 1399 2006
Call:
glm(formula = W ~ S, family = binomial(link = "logit"), data = abalone)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2235 0.4200 0.4200 0.7457 0.7457
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.38370 0.09201 25.91 <2e-16 ***
S -1.24595 0.10257 -12.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
0 771 1
1 345 3060
> abalonelogmod2<-glm(abalone$W ~ abalone$D, family=binomial(link="logit"), data=abalone)
> summary(abalonelogmod2)
Call:
glm(formula = abalone$W ~ abalone$D, family = binomial(link =
"logit"), data = abalone)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.0066 0.0256 0.0256 0.0256 1.5323
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.80414 0.06477 -12.415 <2e-16
***
abalone$D 8.83031 1.00207 8.812 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
, , = 0, = 0
0 1
0 129 642
1 0 0
, , = 1, = 0
0 1
0 0 0
1 81 264
, , = 0, = 1
0 1
0 0 1
1 0 0
, , = 1, = 1
0 1
0 0 0
1 1318 1742
> library(readxl)
> abalone <- read_excel("C:/Users/Shivangi Gupta/Desktop/abalone.xlsx")
> View(abalone)
> abalonelogmod1<-glm(W ~ Sex + Length + Diameter + Rings, family=binomial (link="logit"),
data=abalone)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> abalonelogmod1<-glm(W ~ H + S + D + R, family=binomial (link="logit"), data=abalone)
> summary(abalonelogmod1)
Call:
glm(formula = W ~ H + S + D + R, family = binomial(link =
"logit"), data = abalone)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.9568 0.0130 0.0146 0.0282 2.2501
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2090 0.2217 -9.964 < 2e-16
***
H 3.3347 0.1779 18.741 < 2e-16 ***
S -0.2395 0.2222 -1.078 0.281
D 6.9416 1.0064 6.897 5.30e-12 ***
R 1.3141 0.3195 4.113 3.91e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2.5 % 97.5 %
(Intercept) 0.110 0.070 0.168
H 28.070 19.930 40.056
S 0.787 0.510 1.218
D 1034.393 229.832 18253.516
R 3.721 2.016 7.059
DATA PARTITION
> set.seed(555)
> ind <- sample(2, nrow(abalone), replace = TRUE,prob = c(0.6, 0.4))
> training <- abalone[ind==1,]
> testing <- abalone[ind==2,]
> library(MASS)
> linear <- lda(W~., training)
Warning message:
In lda.default(x, grouping, ) : variables are collinear
> linear
Call:
Prior probabilities of
groups: 0 1
0.1787268 0.8212732
Group means:
S SexI SexM Length D Diameter H Height
0 0.8628319 0.7986726 0.1371681 0.3333407 0.0000000 0.2503319 0.1283186 0.08277655
1 0.5917188 0.2161772 0.4082812 0.5666827 0.8955224 0.4430380 0.9740010 0.15133606
`Whole weight` `Shucked weight` `Viscera weight` `Shell weight` R 0
0.1962909 0.08656305 0.0428219 0.05803982 0.03539823
1 0.9655279 0.42000289 0.2107434 0.27752792 0.41550313
Rings
0 6.597345
1 10.630236
$names
[1] "prior" "counts" "means" "scaling" "lev" "svd" "N" "call"
[9] "terms" "xlevels"
$class
[1] "lda"
HISTOGRAM
> p <- predict(linear, training)
> ldahist(data = p$x[,1], g = training$W)
> library(devtools)
Actual
Predicted 0 1
0 394 48
1 58 2029
> sum(diag(tab))/sum(tab) [1]
0.9580862
> p2 <- predict(linear, testing)$class
> tab1 <- table(Predicted = p2, Actual = testing$Species)
Error in table(Predicted = p2, Actual = testing$Species) :
Actual
Predicted 0 1
0 286 40
1 34 1288
> sum(diag(tab1))/sum(tab1) [1]
0.9550971
Thus, Linear Discriminant Analysis has helped to produce robust,
decent, and interpretable classification results, and
classifying abalone shells on the basis of their gender which
was not possible on the first glance. The continuous independent
variables help in determining the classifying variable that is
gender.
RANDM FOREST
> datarf <- abalone1
> str(datarf)
$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...
$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...
$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...
$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
- attr(*, "spec")=
.. cols(
.. S = col_double(),
.. Sex = col_character(),
.. Length = col_double(),
.. D = col_double(),
.. Diameter = col_double(),
.. H = col_double(),
.. Height = col_double(),
.. R = col_double(),
.. Rings = col_double(),
.. W = col_double()
.. )
- attr(*, "problems")=<externalptr>
>
> table(datarf$D)
0 1
1116 3061
> set.seed(123)
>
> library(randomForest)
> install.packages("randomForest")
> set.seed(222)
> print(rf)
Call:
randomForest(formula = D ~ W, data = train, ntree = 300, mtry = 8, importance =
TRUE, proximity = TRUE)
Confusion matrix:
0 1 class.error
1 1 2137 0.0004677268
> attributes(rf)
$names
$class
> rf$confusion
0 1 class.error
0 548 232 0.2974358974
1 1 2137 0.0004677268
> library(caret)
> head(p1)
123456
111111
Levels: 0 1
> head(train$D)
[1] 1 1 0 1 1 1
Levels: 0 1
Reference
Prediction 0 1
0 548 1
1 232 2137
Accuracy : 0.9202
Kappa : 0.775
Sensitivity : 0.7026
Specificity :
0.9021
Prevalence : 0.2673
'Positive' Class : 0
> head(p2)
123456
010111
Levels: 0 1
> head(test$D)
[1] 0 1 0 1 1 1
Levels: 0 1
Reference
Prediction 0 1
0 223 0
1 113 923
Accuracy : 0.9102
Kappa : 0.7432
Specificity :
0.8909
Prevalence : 0.2669
'Positive' Class : 0
> plot(rf)
> hist(treesize(rf), main = "No. of nodes for the Trees", col = "green")
> varImpPlot(rf)
varImpPlot(rf, sort=T, n.var=10, main="Top 10 - Variable Importance")
> importance(rf)
0 1 MeanDecreaseAccuracy MeanDecreaseGini
> varUsed(rf)
[1] 300
partialPlot(rf,train,Height"1")
> getTree(rf, 1, labelVar = TRUE)
left daughter right daughter split var split point status prediction
1 2 3 W 0.5 1 <NA>
2 0 0 <NA> 0.0 -1 0
3 0 0 <NA> 0.0 -1 1
KNN
> data <- read.csv(file.choose(), header = T)
> set.seed(1234)
> set.seed(222)
SVM
> View(abalone)
> data(abalone)
> data(Abalone)
> str(abalone)
$ Length : num [1:4177] 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
$ Diameter : num [1:4177] 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
$ Height : num [1:4177] 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
$ Whole weight : num [1:4177] 0.514 0.226 0.677 0.516 0.205 ...
$ Shucked weight: num [1:4177] 0.2245 0.0995 0.2565 0.2155 0.0895 ...
$ Viscera weight: num [1:4177] 0.101 0.0485 0.1415 0.114 0.0395 ...
$ Shell weight : num [1:4177] 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
> library(ggplot2)
> library(e1071)
> mymodel <- svm(D~., data=abalone)
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: radial
cost: 1
gamma: 0.06666667
epsilon: 0.1
> library(predtoolsTS)
Warning message:
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
- 0 0 0 0 0 0 0 0 0 0 0
0.42832379789287
4
- 0 0 0 0 0 0 0 0 0 0 0
0.40111345882097
9
- 0 0 0 0 0 0 0 0 0 0 0
0.38643363691343
6
- 0 0 0 0 0 0 0 0 0 0 0
0.36859118843068
- 0 0 0 0 0 0 0 0 0 0 0
0.339139940558743
- 0 0 0 0 0 0 0 0 0 0 0
0.317581220057663
- 0 0 0 0 0 0 0 0 0 0 0
7 0.202278624410508
- 0 0 0 0 0 0 0 0 0 0 0
0.184111808267246 Actual
- 0 0 0 0 0 0 0 0 0 0 0
0.158358211476489 Predicted 0.06
- 0 0 0 0 0 0 0 0 0 0 0 0.065 0.07 0.075 0.08
- 0.148342875783461
0 0 0 0 0 0 0 0 0 0 1 0.085 0.09 0.095 0.1
0.428323797892874
- 0 0 0 0 0 0 0 0 0 0 0 0.105 0.11
- 0
0.146884610550649 0 0 0 0 1 0 0 0 0 0
0.401113458820979
- 0 0 0 0 0 0 0 0 0 0 0
- 0 0
0.128110023710106 0 0 0 0 0 0 0 0 0
0.386433636913436
- 0 0 0 0 0 0 0 0 0 0 0
- 0 0
0.124924507377761 0 0 0 0 0 0 1 0 0
0.368591188430687
- 0 0 0 0 0 0 0 0 0 0 0
- 0 0
0.123656899763398 0 0 0 0 0 0 1 0 0
0.339139940558743
- 0 0 0 0 0 0 0 0 0 0 0
- 0 0
0.113946442560223 0 0 0 1 0 0 0 0 0
0.317581220057663
- 0 0 0 0 0 0 0 0 0 0 0
- 0 0
0.113028091306822 0 0 0 0 1 0 0 0 0
0.202278624410508
- 0 0 0 0 0 0 0 0 0 0 0
0.101605807977074
- 0 0 0 0 0 0 0 0 0 0 0
0.091708002702802
9
- 0 0 0 0 0 0 0 0 0 0 0
0.079459304461938
8
- 0 0 0 0 0 0 0 0 0 1 0
0.184111808267246
- 0 0 0 0 0 0 0 0 0 0 0
0.158358211476489
- 0 0 0 0 0 0 0 0 0 0 0
0.148342875783461
- 0 0 0 0 0 0 0 1 0 0 0
0.146884610550649
- 0 0 0 0 0 0 0 0 1 0 0
0.128110023710106
- 0 0 0 0 0 0 0 0 0 1 0
0.124924507377761
- 0 0 0 0 0 0 0 0 0 0 1
0.123656899763398
- 0 0 0 0 0 0 0 0 0 1 0
0.113946442560223
- 0 0 0 0 0 0 0 1 0 0 0
0.113028091306822
- 0 0 0 0 0 0 0 0 0 1 0
0.101605807977074
- 0 0 0 0 0 0 0 1 0 0 0
0.091708002702802
9
- 0 0 0 0 0 0 0 0 0 0 1
0.079459304461938
8
-0.146884610550649 0 0 0 0 0 0 0 0 0 0
-0.128110023710106 0 0 0 0 0 0 0 0 0 0
-0.124924507377761 0 0 0 0 0 0 0 0 0 0
-0.123656899763398 0 0 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0
0.0917080027028029
- 0 0 0 0 0 0 0 0 0 0
0.0794593044619388
Actual
Predicted 0.165 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215
- 0 0 0 0 0 0 0 0 0 0 0
0.428323797892874
- 0 0 0 0 0 0 0 0 0 0 0
0.401113458820979
- 0 0 0 0 0 0 0 0 0 0 0
0.386433636913436
- 0 0 0 0 0 0 0 0 0 0 0
0.368591188430687
- 0 0 0 0 0 0 0 0 0 0 0
0.339139940558743
- 0 0 0 0 0 0 0 0 0 0 0
0.317581220057663
- 0 0 0 0 0 0 0 0 0 0 0
0.202278624410508
- 0 0 0 0 0 0 0 0 0 0 0
0.184111808267246
- 0 0 0 0 0 0 0 0 0 0 0
0.158358211476489
- 0 0 0 0 0 0 0 0 0 0 0
0.148342875783461
- 0 0 0 0 0 0 0 0 0 0 0
0.146884610550649
- 0 0 0 0 0 0 0 0 0 0 0
0.128110023710106
- 0 0 0 0 0 0 0 0 0 0 0
0.124924507377761
-0.123656899763398 0 0 0 0 0 0 0 0 0 0 0
-0.113946442560223 0 0 0 0 0 0 0 0 0 0 0
-0.113028091306822 0 0 0 0 0 0 0 0 0 0 0
-0.101605807977074 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0
0.0917080027028029
- 0 0 0 0 0 0 0 0 0 0 0
0.0794593044619388
Actual
- 0 0 0 0 0 0 0 0
0.428323797892874
- 0 0 0 0 0 0 0 0
0.401113458820979
- 0 0 0 0 0 0 0 0
0.386433636913436
- 0 0 0 0 0 0 0 0
0.368591188430687
- 0 0 0 0 0 0 0 0
0.339139940558743
- 0 0 0 0 0 0 0 0
0.317581220057663
- 0 0 0 0 0 0 0 0
0.202278624410508
- 0 0 0 0 0 0 0 0
0.184111808267246
- 0 0 0 0 0 0 0 0
0.158358211476489
- 0 0 0 0 0 0 0 0
0.148342875783461
- 0 0 0 0 0 0 0 0
0.146884610550649
- 0 0 0 0 0 0 0 0
0.128110023710106
- 0 0 0 0 0 0 0 0
0.124924507377761
- 0 0 0 0 0 0 0 0
0.123656899763398
- 0 0 0 0 0 0 0 0
0.113946442560223
- 0 0 0 0 0 0 0 0
0.113028091306822
- 0 0 0 0 0 0 0 0
0.101605807977074
- 0 0 0 0 0 0 0 0
0.091708002702802
9
- 0 0 0 0 0 0 0 0
0.079459304461938
8
> 1-sum(diag(tab))/sum(tab)
[1] 0.9997606
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: linear
cost: 1
gamma: 1
epsilon: 0.1
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06
0.0418229150443036 0 1 0 0 0 0 0 0 0 0 0 0
0.07036860050603 0 0 0 0 0 1 0 0 0 0 0 0
73
0.08668042076991 0 0 0 0 0 1 1 0 0 0 0 0
25
0.09075837583583 0 0 0 0 0 0 0 1 0 0 0 0
95
0.09483633090180 0 0 0 0 0 0 2 0 0 0 0 0
36
0.10299224103375 0 0 0 0 1 0 0 0 0 0 0 0
7
0.10707019609972 0 0 0 0 1 0 0 1 0 1 0 0
5
0.11114815116570 0 0 0 1 2 0 1 0 0 0 0 0
2
0.11522610623166 0 0 1 1 0 1 0 1 0 1 0 0
0.11930406129764 0 0 0 0 0 0 1 0 0 0 1 0
7
0.12338201636358 0 0 0 0 0 0 0 3 0 1 1 0
2
0.12745997142956 0 0 0 0 0 0 1 0 1 1 0 0
0.13153792649550 0 0 0 0 0 0 0 1 2 0 0 0
9
0.13561588156144 0 0 0 0 0 2 0 1 1 0 0 0
2
0.13969383662744 0 0 0 0 0 0 0 1 1 1 0 0
2
0.14377179169336 0 0 0 0 1 0 0 2 0 1 1 1
1
0.14784974675935 0 0 0 0 0 0 0 0 2 1 1 0
0.15192770182533 0 0 0 0 0 0 0 0 2 2 2 0
1
0.15600565689123 0 0 0 0 0 1 0 0 0 0 3 1
8
Actual
Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115
0.041822915044303 0 0 0 0 0 0 0 0 0 0 0
6
0.070368600506037 0 0 0 0 0 0 0 0 0 0 0
3
0.086680420769912 0 0 0 0 0 0 0 0 0 0 0
5
0.090758375835839 0 0 0 0 0 0 0 0 0 0 0
5
0.09483633090180 0 0 0 0 0 0 0 0 0 0 0
36
0.10299224103375 0 0 0 0 0 0 0 0 0 0 0
7
0.10707019609972 0 0 0 0 0 0 0 0 0 0 0
5
0.11114815116570 0 0 0 0 0 0 0 0 0 0 0
2
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.11930406129764 0 0 0 0 0 0 1 0 0 0 0
7
0.12338201636358 0 0 0 0 0 0 0 0 0 0 0
2
0.12745997142956 0 0 0 1 0 0 0 0 0 0 0
0.13153792649550 0 0 0 0 0 0 0 0 0 0 0
9
0.13561588156144 0 0 0 0 0 0 0 0 0 0 0
2
0.13969383662744 0 0 0 0 0 0 0 0 0 0 0
2
0.14377179169336 0 0 0 0 0 0 0 0 0 0 0
1
0.14784974675935 1 0 0 0 0 0 0 0 0 0 0
0.15192770182533 0 0 0 0 0 0 0 0 0 0 0
1
0.15600565689123 0 0 0 0 0 0 0 0 0 0 0
8
Actual
Predicted 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165 0.17
0.04182291504430 0 0 0 0 0 0 0 0 0 0 0
36
0.07036860050603 0 0 0 0 0 0 0 0 0 0 0
73
0.08668042076991 0 0 0 0 0 0 0 0 0 0 0
25
0.09075837583583 0 0 0 0 0 0 0 0 0 0 0
95
0.09483633090180 0 0 0 0 0 0 0 0 0 0 0
36
0.1029922410337 0 0 0 0 0 0 0 0 0 0 0
57
0.1070701960997 0 0 0 0 0 0 0 0 0 0 0
25
0.11114815116570 0 0 0 0 0 0 0 0 0 0 0
2
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.11930406129764 0 0 0 0 0 0 0 0 0 0 0
7
0.12338201636358 0 0 0 0 0 0 0 0 0 0 0
2
0.12745997142956 0 0 0 0 0 0 0 0 0 0 0
0.13153792649550 1 0 0 0 0 0 0 0 0 0 0
9
0.13561588156144 0 0 0 0 0 0 0 0 0 0 0
2
0.13969383662744 0 0 0 0 0 0 0 0 0 0 0
2
0.14377179169336 0 0 0 0 0 0 0 0 0 0 0
1
0.14784974675935 0 0 0 0 0 0 0 0 0 0 0
0.15192770182533 0 0 0 0 0 0 0 0 0 0 0
1
0.15600565689123 0 0 0 0 0 0 0 0 0 0 0
8
Actual
Predicted 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.225
0.04182291504430 0 0 0 0 0 0 0 0 0 0 0
36
0.07036860050603 0 0 0 0 0 0 0 0 0 0 0
73
0.08668042076991 0 0 0 0 0 0 0 0 0 0 0
25
0.09075837583583 0 0 0 0 0 0 0 0 0 0 0
95
0.09483633090180 0 0 0 0 0 0 0 0 0 0 0
36
0.10299224103375 0 0 0 0 0 0 0 0 0 0 0
7
0.10707019609972 0 0 0 0 0 0 0 0 0 0 0
5
0.11114815116570 0 0 0 0 0 0 0 0 0 0 0
2
0.11522610623166 0 0 0 0 0 0 0 0 0 0 0
0.11930406129764 0 0 0 0 0 0 0 0 0 0 0
7
0.12338201636358 0 0 0 0 0 0 0 0 0 0 0
2
0.12745997142956 0 0 0 0 0 0 0 0 0 0 0
0.13153792649550 0 0 0 0 0 0 0 0 0 0 0
9
0.13561588156144 0 0 0 0 0 0 0 0 0 0 0
2
0.13969383662744 0 0 0 0 0 0 0 0 0 0 0
2
0.14377179169336 0 0 0 0 0 0 0 0 0 0 0
1
0.14784974675935 0 0 0 0 0 0 0 0 0 0 0
0.15192770182533 0 0 0 0 0 0 0 0 0 0 0
1
0.15600565689123 0 0 0 0 0 0 0 0 0 0 0
8
Actual
0.04182291504430 0 0 0 0 0 0
36
0.07036860050603 0 0 0 0 0 0
73
0.08668042076991 0 0 0 0 0 0
25
0.09075837583583 0 0 0 0 0 0
95
0.09483633090180 0 0 0 0 0 0
36
0.10299224103375 0 0 0 0 0 0
7
0.10707019609972 0 0 0 0 0 0
5
0.11114815116570 0 0 0 0 0 0
2
0.11522610623166 0 0 0 0 0 0
0.11930406129764 0 0 0 0 0 0
7
0.12338201636358 0 0 0 0 0 0
2
0.12745997142956 0 0 0 0 0 0
0.13153792649550 0 0 0 0 0 0
9
0.13561588156144 0 0 0 0 0 0
2
0.13969383662744 0 0 0 0 0 0
2
0.14377179169336 0 0 0 0 0 0
1
0.14784974675935 0 0 0 0 0 0
0.15192770182533 0 0 0 0 0 0
1
0.15600565689123 0 0 0 0 0 0
8
> 1-sum(diag(tab))/sum(tab)
[1] 0.9997606
> data=abalone,kernel="polynomial")
> summary(mymodel)
Call:
Parameters:
SVM-Type: eps-regression
SVM-Kernel: polynomial
cost: 1
degree: 3
gamma: 1
coef.0: 0
epsilon: 0.1
> library(predtoolsTS)
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
- 0 1 0 0 0 0 0 0 0 0 0
1.0603036263162
3
- 0 0 0 0 0 1 0 0 0 0 0
0.7382935452387
54
- 0 0 0 0 0 1 1 0 0 0 0
0.5770500872595
89
- 0 0 0 0 0 0 0 1 0 0 0
0.5392023791437
68
- 0 0 0 0 0 0 2 0 0 0 0
0.5023152397564
95
- 0 0 0 0 1 0 0 0 0 0 0
0.4313732797135
74
- 0 0 0 0 1 0 0 1 0 1 0
0.3972937652948
27
- 0 0 0 1 2 0 1 0 0 0 0
0.3641254320485
79
- 0 0 1 1 0 1 0 1 0 1 0
0.3318559330951
34
- 0 0 0 0 0 0 1 0 0 0 1
0.3004729215377
61
- 0 0 0 0 0 0 0 3 0 1 1
0.2699640504955
12
- 0 0 0 0 0 0 1 0 1 1 0
0.2403169730742
1
- 0 0 0 0 0 0 0 1 2 0 0
0.2115193423810
17
- 0 0 0 0 0 2 0 1 1 0 0
0.1835588115112
- 0 0 0 0 0 0 0 1 1 1 0
0.1564230335840
44
- 0 0 0 0 1 0 0 2 0 1 1
0.1300996617189
87
- 0 0 0 0 0 0 0 0 2 1 1
0.1045763489860
5
Predicted 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 0.165
-1.06030362631623 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0
0.738293545238754
- 0 0 0 0 0 0 0 0 0 0 0
0.577050087259589
- 0 0 0 0 0 0 0 0 0 0 0
0.539202379143768
- 0 0 0 0 0 0 0 0 0 0 0
0.502315239756495
- 0 0 0 0 0 0 0 0 0 0 0
0.431373279713574
- 0 0 0 0 0 0 0 0 0 0 0
0.397293765294827
- 0 0 0 0 0 0 0 0 0 0 0
0.364125432048579
- 0 0 0 0 0 0 0 0 0 0 0
0.331855933095134
- 0 0 0 0 0 0 0 0 0 0 0
0.300472921537761
- 0 0 0 0 0 0 0 0 0 0 0
0.269964050495512
-0.24031697307421 0 0 0 0 0 0 0 0 0 0 0
- 0 1 0 0 0 0 0 0 0 0 0
0.211519342381017
-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0
0.156423033584044
- 0 0 0 0 0 0 0 0 0 0 0
0.130099661718987
-0.10457634898605 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0
0.079840748521407
4
- 0 0 0 0 0 0 0 0 0 0 0
0.055880513428906
3
- 0 0 0 0 0 0 0 0 0 0 0
0.57705008725958
9
- 0 0 0 0 0 0 0 0 0 0 0
0.53920237914376
8
- 0 0 0 0 0 0 0 0 0 0 0
0.50231523975649
5
- 0 0 0 0 0 0 0 0 0 0 0
0.43137327971357
4
- 0 0 0 0 0 0 0 0 0 0 0
0.39729376529482
7
- 0 0 0 0 0 0 0 0 0 0 0
0.36412543204857
9
- 0 0 0 0 0 0 0 0 0 0 0
0.33185593309513
4
- 0 0 0 0 0 0 0 0 0 0 0
0.30047292153776
1
- 0 0 0 0 0 0 0 0 0 0 0
0.26996405049551
2
- 0 0 0 0 0 0 0 0 0 0 0
0.24031697307421
- 0 0 0 0 0 0 0 0 0 0 0
0.21151934238101
7
-0.1835588115112 0 0 0 0 0 0 0 0 0 0 0
- 0 0 0 0 0 0 0 0 0 0 0
0.15642303358404
4
- 0 0 0 0 0 0 0 0 0 0 0
0.13009966171898
7
- 0 0 0 0 0 0 0 0 0 0 0
0.10457634898605
- 0 0 0 0 0 0 0 0 0 0 0
0.07984074852140
74
- 0 0 0 0 0 0 0 0 0 0 0
0.05588051342890
63
- 0 0 0 0 0 0 0
1.06030362631623
SVM-Kernel: sigmoid
cost: 1
gamma: 1
coef.0: 0
epsilon: 0.1
>
> tab
Actual
Predicted 0 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06
- 0 0 0 0 0 0 0 0 0 0 0 0
64.80864640785
5
- 0 0 0 0 0 0 0 0 0 0 0 0
61.003629272392
8
- 0 0 0 0 0 0 0 0 0 0 0 0
55.447067842471
4
- 0 0 0 0 0 0 0 0 0 0 0 0
53.966716893405
7
- 0 0 0 0 0 0 0 0 0 0 0 0
52.448562184744
5
- 0 0 0 0 0 0 0 0 0 0 0 0
50.892037073733
5
- 0 0 0 0 0 0 0 0 0 0 0 0
49.296646290102
4
- 0 0 0 0 0 0 0 0 0 0 0 0
47.66197949152
7
- 0 0 0 0 0 0 0 0 0 0 0 0
45.987726511967
7
- 0 0 0 0 0 0 0 0 0 0 0 0
40.726225435013
5
- 0 0 0 0 0 0 0 0 0 0 0 0
38.893173083007
2
- 0 0 0 0 0 0 0 0 0 0 0 0
37.021161402670
7
- 0 0 0 0 0 0 0 0 0 0 0 0
35.110919629362
3
- 0 0 0 0 0 0 0 0 0 0 0 0
33.163447319594
8
- 0 0 0 0 0 0 0 0 0 0 0 0
31.180050288203
3
- 0 0 0 0 0 0 0 0 0 0 0 0
29.16238012641
5
- 0 0 0 0 0 0 0 0 0 0 0 0
27.112477508983
5
Actual
Predicted 0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115
- 0 0 0 0 0 0 0 0 0 0 0
64.808646407855
- 0 0 0 0 0 0 0 0 0 0 0
61.003629272392
8
- 0 0 0 0 0 0 0 0 0 0 0
55.447067842471
4
- 0 0 0 0 0 0 0 0 0 0 0
53.966716893405
7
- 0 0 0 0 0 0 0 0 0 0 0
52.448562184744
5
- 0 0 0 0 0 0 0 0 0 0 0
50.892037073733
5
- 0 0 0 0 0 0 0 0 0 0 0
49.296646290102
4
- 0 0 0 0 0 0 0 0 0 0 0
47.661979491527