Download as pdf
Download as pdf
You are on page 1of 14
ELSEVIER Contents late available at ScienceDirect Knowledge-Based Systems journal homepage: www. clsevier.com/locatekknosys Robust twin depth support vector machine based on average depth Jiamin Xu“, Huamin Wang *”, Libo Zhang’, Shiping Wen” ‘ colege of if inteligenc, Southwest Universi, Chonging 400715, PR Chine ° clege of inormationEnpnering, Unierty of Technology Syne, A Tnstate of rca Itetigene, yoy, 2007, Aas ARTICLE INFO ABSTRACT Iie oy Received 27 Jly 2022 Received in vevied for 19 Api 2023 Accept 5 May 2023, ‘alable eine 15 May 2023 ‘win suppoc vector machines average cept 1S one of the dassical machine learning algorihins, twin support vector machine (TWSVM) can construct two nonparalel Byperplanes, which Keeps the hyperplane close to points of one clas, but fa ‘away from the other cass. Being diferent fom the classical VM algorithm, TWSVM can reduce the ‘computational complexity by replacing one large quadratic programming problems {QPPs) witha pair ‘of smaller ones. Although compared to SVM classifier, TWSVM is four times fase, is stil insufficient in resisting noise or outliers. chis paper, we propose a twin depth support vector machine (TDSYM), ‘binary SVM classifier that considers the influence of depth when calculating the distance. A novel Average dept is proposed and applied on TWSVM to construct a robust SVM framework, which can Identify outliers in the dataset By strengthening the center and weakening the edge, a better generalization performance is achieved, and the SRM principle is also implemented, TDSVM can be applied to any place where ‘TWSVM can be applied, which is also useful for reducing the influence of outliers or noise on data and obtaining more robust results. Finally. the advantages of the method are veriied theoretically ‘and empirically by experiments. Experimental resus on eight UCI datasets and one synthetic dataset “Gemonstrate the elleciveness and robustness TDSVM, The classification accuracy of TDSVM is beter than other compared algorithms on almost every dataset, whether a certain percentage of Gaussian noise is slzoduced of not (© 2023 Elsevier BY. All rghts reserved 1. Introduction With the rapid development of big data and the application of classification and regression algorithms inthe real world, such as, intelligent diagnosis [1]. gene data classification [2], face recog- nition [2], and epidemiological forecasting [4]. noise oF outliers are always inevitably contained in the collected data, and their influence on the model performance is worth further studying. The noise or outliers in the dataset directly affect the gener alization performance and classification accuracy of the model. Therefore, how to deal with datasets with noise of outliers, and how to design models with strong robustness and high accuracy are particularly important. ‘Support vector machine (SVM) is a powerful algorithm for classification and regression [4-5]. The structural risk minimiza- tion (SRM) principle is implemented and the minimization of the ‘generalization error is realized simultaneously, which improves the generalization ability of the model without the limitation * Coresponing author Eat edirese: aujamindemallswueducn {uy hnwangoswuedicn HM. Wang) Lohangesaedu.ce (L Zhang, wenshipingbuesteducn Se repo on 10 1016 kop 2022110627 of data dimension [7-9]. However, to decide an optimal hyper plane in classical SYM, two half planes are required to be strictly parallel. which greatly limits its application on solving heteroge- ‘neous distribution learning problems, In order to overcome these challenges, twin support Vector machine (TWSVM) and gener alized eigen value proximal support vector machine (GEPSVM) hnave been proposed 10,11], GEPSVM proposed by Mangasarian and Wild assigns points by constructing a pair of nonparallel hyperplanes, so that each point is closer to the hyperplane of corresponding class and fat from another hyperplane as fat as possible. After two celated generalized eigenvalue problems are solved, the eigenvector corresponding to the smallest eigenvalue is the parameter ofthe hyperplane. Although TWSVM also assigns points by constructing tWo nonparallel hyperplanes, it adopts a ‘completely different way from GEPSVM in solving the hyperplane parameters, that is, by solving two QPPs. This method reduces the algorithm complexity of the model by four times compared with, ‘SVM and improves the accuracy. So far, applications in unbal- anced data classification [12,13], image classification [14.15], re- ‘gression [16], data clustering [17,18] and other fields have proved the good performance of TWSVM. However, TWSVM only realizes ‘the empirical risk minimization instead of the SRM principe. If the size of sample is very small, the overfiting problem could 134 HM Wong L Zhang ea be serious, and the effect of empirical risk minimization learning may not be very satisfactory, Based on this, Shao proposed twin ‘bound support vector machine (TBSVM), which realized the SRM principle on the basis of TWSVM [15]. The overfiting problem ‘was addressed and the performance ofthe model was farther im= proved [20]. Kumar [21] formulated a least squares twin support vector machine (LSTSVM). which formulated equality constraints instead of inequality constraints and reduce the computational time, In addition, over the past decade, many variants of TWSVM have been proposed. ‘The main improvements focus on: enhanc- ing the performance of classifiers on specific datasets, strength- ening resistance against noise or outliers, and yielding as litle ‘computation time as possible. Gao et al. [22] formulated 1-no- tm LS-TSVM (NLS-TSVM) in which pertinent features can be selected automatically to make the model have better perfor- mance on high-dimensional datasets. To solve the class imbalance problem in the multicategory classification, Nasiri et al, [23] proposed energy-based LS-TSVM (ELS-TSVM), The difference be- ‘ween LS-TSVM and ELS-TSVM is that the constraints of L-TSVM. are replaced with an energy model of E1S-ISVM, Hybrid fea- ture selection based weighted LS-TSVM (HFS based WS-TSVM) method [24] further enhances the classification accuracy on un- balanced datasets, In order to make the model perform better ‘when noise or outliers exist, Liet al. [25] proposed a robust SVM algorithm in which privileged information is introduced to ‘compress the influence of noise, Densty-based weighting multi- surface least squares classification (DWISC) [26] assigns weights for each data point based on the density of sample points. Cor- relation information between sample points is exploited and the influence of noise or outliers is indirectly weakened. Xu etal [27] formulated a structural least square twin support vector machine (S-LSTSVM), which applies some vital data distribution informa- tion into the formulation of LS-TSVM. It performs better than LS-TSVM in terms of noise sensitivity and has higher computa- tional efficiency than other models based on structural informa- tion, Inspired by the enhancement of model performance after implementing structural risk minimization in TBSVM, improved LS-TSVM (ILS-TSVM) [28] implements structural risk minimiza- tion based on LS-TSVM, Moreover, robust ELS-TSVM (RELS-TSVM) proposed by Tanveer etal. [29] combines the advantages of ILS- TTSVM and ELS-TSVM, and the robustness of model is further improved, To further improve the performance of LSTSVM on large-scale datasets, Tanveer et al, [30] proposed a large-scale least squares twin suppart vector machine (LS-LSTSVM) in which computation time is reduced by using Lagrangian functions to avoid calculating inverse matrices, Recently, Yuan and Yang pro- posed a robust algorithm, which is referred to Capped f, »-norm metric based robust least squares twin support vector machine (Cla p-LSTSVM) [31]. by replacing L2-norm with a more robust ‘capped f; y-norm in the formulation to address the impact of ‘outliers or noise. As a result, CL; p-LSTSVM is superior to other ‘existing Variants of TWSVM in terms of robustness. However, most of these variants of TWSVM view all sample points as being of equal importance and ignore the information provided by the relative position of sample points. Bu it is very meaningful to measure the relative position between each sample Point in pattern recognition [32]. Generally speaking, the sample Point in the central region is often not an outlier. This motivates us to construct a new classifier which can make full use of the samples in the central regions. Data depth is an important statistic which can measure the position of any point which is relative tothe center or the other data points [33], Although there have been many results about this concept in the statistical lter- ature [24,35], its application to pattem recognition is relatively owniedge Used Systems 276 (2023) 10827 limited, which is still an open problem. As we all known, the ‘computation of the many types of depth, such as Tukey depth, is very demanding, even in low dimensional spaces [36] In order to calculate the kind of “average” depth, Fraiman and Muniz, designed an algorithm by calculating the univariate depths of ‘the function values and integrating them over a specific interval in [37]. Inspired by this method, we calculate the univariate depths of each feature on a particular dataset and then calculate the average over all features, which results in a novel “average” depth about each data point. This method is not as accurate as multivariate depth but has a great advantage on computation ‘complexity, Considering the influence of outliers or noise on construct- ing hyperplanes, we propose an improved algorithm called twin depth support vector machine (TDSVM) based on the concept of depth in the field of robust statistics. Instead of using the multi variate Tukey depth which is computationally demanding [38], ‘we apply the average depth to TDSVM. Different from other variants of TWSVM, TDSVM of this paper takes the influence of ‘the depth into account in the process of calculating the distance from each point to the corresponding hyperplane. The outliers are far from the best hyperplane, so it is easy to pull the hyperplane toward a direction away from the best hyperplane when outliers ‘exist. But this problem can be solved very well by introducing, average depth. The TOSVM method sufficiently take advantage of each point in the central regions by assigning a relatively large weight to it, and such a weight is set based its average depth. On the one hand, small weights are assigned to the points, ‘whose depth value is small, and thus the influence of outliers fo noise on the construction of hyperplanes is weaken. On the ‘other hand, large weights are assigned lo the points whose depth, value is large especially in the central region, which can enhance ‘the aggregation of classes and resist che disturbance, Further ‘more, two small QPPs are solved by utilizing Lagrange multiplier ‘method and Karush Kun Tucker (KKT) conditions. The decision function based on average depth for prediction is determined by two nonparalel hyperplanes. Then, the classification model is obtained. Finally, advantages of the method proposed in this paper are verified by some experiments on eight UCI datasets and ‘one synthetic dataset, which demonstrates the effectiveness and robustness of TDSVM, In this article, our main contributions are summarized as follows: (1) A novel average depth is proposed to measure the relative position of any point and identify outliers in the dataset. The significant advantage of this depth is calculation simplicity. (2) To construct a robust SVM framework, the average depth is innovatively applied to TWSVM to obtain TDSVM. Furthermore, in order to reduce overfitting problem when the sizeof sample is very small, the regularization term is added to the decision func tion of proposed TDSVM. Experimental results show that better performance is achieved by the propased method compared with, the other algorithms in most cases. (3) The proposed method is extended from linear classifiers to nonlinear kemel classifiers, by which both linear and nonlinear program can be solved. (4) Based on both the synthetic dataset and eight UCI datasets, the proposed method in this paper have some advantages com- pared with other algorithms. For example, TDSVM can signif- icantly address the impact of noise or outliers. Therefore, its {generalization ability and robustness can make it be a more competitive supervised algorithm. The rest of this paper is organized as follows, Section 2 briefly ‘dwells on TWSVM and TBSYM, Section 3 is composed of the detailed formulation and theoretical analysis of our proposed TTDSVM. Section 4 reports on the experimental results, and con- cluding remarks are given in Section 5, 1 HM Wong L Zhang ea 2. Preliminaries In this section, we briefly review TWSVM and TBSVM for cla- ssification problems, For more details, please refer to [10,19] 2.1. Twin support vector machine Since the classical SVM requires a lot of training time in solving a large QPP. TWSVM is proposed on the basis of classical SVM [7], which achieves comparable accuracy with less training time, For classification problems, training data matrices A © RO" and B © RT denote the class +1 and class —1 respectively. ‘Then, the number of training data points of class +1 and class —1 is my and m, respectively, and the number of data features is n ‘We try to learn a function which is used to predict the class for a new point without label. Instead of devermining one hyperplane by solving a QPP in SVM, TWSVM determines two non-parallel hyperplanes by solving following pair of OPPs: ciwsvmat) Mi 3 (ie) (a + eb) +0elg a st —(o"FebMtq2e, @20 2 Min 5 (Bo + e.b2)" (Bo +48) +aeta 8 st Ao teb™)+q>e. a20. @ where ¢ and ¢» are the vectors of ones of appropriate dimension, 6 and 6; are the penalty parameters. q isthe slack variable ‘The dal problem of the primal problem TWSVMI can be ‘obtained according to the dual theory It can be expressed 2s max, ela eG (HTH) Te st Osase, 6) ‘and the dual problem of the primal problem TWSVM2 can be 6 whereH = [Aes J ‘of Lagrange multipliers. TWSVM classifier reduces the computational complexity by replacing a large size QPP with a par of smaller size QPPS, which is four times faster in comparison with SVM classifier. Defined the augmented vector u; = [w'", BY)", up = [w, b?]", then, ‘hen Supe wees oan eae 9 ew da Dont is assigned tothe class according to the closer distance be- tween the point and the corresponding hyperplane. The decsion function is described by [8 ex Janda. are the vectors 0 Te"l, where | | denotes the absolute value and jaa denotes the fa norm ofa vector «, One ofthe disadvantages of TWSVM is that it ‘only attains the empirical risk minimization and does not realize the SKM principle If the size of sample is very small, averfitting problem could be serious and the performance of algorithm could be damaged. f(s) = are pin ” Remark 1. Nonlinear classifiers can be obtained by introducing kernel function, Mote details about the kernel-generated TWSVM fefer to Ref. [10] owniedge Bsed Systems 276 (2023) 110827 22, Twin bound support vector machine In general, the regularization term contributes to improw ing overfitting problem. For the purpose of improving the pe- formance of TWSVM classifier, twin bound support vector ‘machine(TBSVM) attains the SRM principle by introducing a reg- Uularization term. The (1)-(4) can be rewritten as (rasvett) 1 eau 4 eyb°)? (Aut eat “Mia, ost eb) (al + et) terelat do [lofi + 67] ®) st ul" beh) 4920 Tasvi2) g=0. 0°) Min, 3 (Go! +0) (Bo + e.b2) telat del lo™f} + 62)'] (10) st Aw +eb™)taze, a20, an ‘where cy and cy are the penalty parameters about the regulariza- tion cerm Similar to (5) and (6), the dual problems of the primal problem TTHSVMI and TRSVM2.can be expressed as max, ela = Jo! (HH +oi)'6' 12 st Osasc @ and may ely — fy HG + al) Hy 3) st Osy=q where H = [Ae]. G = [8 @ and ay are the vectors of Lagrange multipliers. Once augmented vectors uy and uy are known, two nonparalel hyperplanes are obtained so that a new data point is assigned to the class according to closer distance between the point and the corresponding hyperplane. ‘The decision function is as same as (7) Remark 2. Nonlinear classifiers generated by kernel can be ‘obtained. More details about TBSVM refer to Ref, [15] 3. Twin depth support vector machine based on depth, Based on TWSVM and TASVM. average depth will be intro- ‘duced fo construct a novel model: TDSVM in this section, witch includes linear classifier and nonlinear kernel classifier. 3.1, Average depth For a set of data x = (x4, %3) 00058 005 Ha) %) © Re they are sorted in ascending order to get Xi} Xa} iu. Xa Where 2 isthe ith smallest value, So the ascending rank of the xy is 1 For any x, its ascending rank is denoted by Re. Correspondingly, if the set of data are sorted in descending order, the descending rank ofthe x, is r-+1—Rgy. Since we often pay the same attention, to both upper and lower boundary of the dataset, the position of the data point can be measured by the depth value, a statistic that summarizes the information of ascending rank and decreasing rank of data. The depth of x in dataset x = (44, X20 2-85. --%4) 14 HM Wong L Zhang ea ownledge Bsed Systems 276 (2023) 110527 The tpl seth (2) escent eae (0) Ascending rank {0 Dents % % Tor Tare Pam Pa 2 ie as 2 = ” t o8 = > tsa ot > as “ > 8 a F ms sr : us is roe & is denoted by (14), which means the minimum of the ascending rank and descending rank Daye = min (Rays +1 hPa B2 com aa For a data point in matrices X e RY" with m features and m data points, the definition of average depth is defined as +m. as) Formula (15) can caleulate the univariate depths of each fea- ture and the average aver al features to replace the multivariate Tukey depth, which is more accurate but more computationally demanding. ‘The example of computing the average depth is shown in ‘able 1. A.B, Cand DE, Fare each from two randomly generated samples witha size of 250, For the point A the ascending rank Rake; = 20. Then, the depth of A about the first dimension ca be calculated by formila (14), which is = 250. Dep — min 20.231) = 20. Dy, yc, can be obtained in the same manner. Finally, the average depth of A is obtained by calculating the average overall dimensions, ie. Dy, y= 22428 = 42, ‘Remark 3. As shown in Fig. 1, if data point is near to the center ‘of datasets in each dimension, its average depth will be large. On. the contrary, ifa data point is far away from the center af datasets in each dimension, its average depth will be small. So the noise ‘or outliers will be penalized by this type of average depth. 32. Linear TDSVM Both TWSVM and TBSVM construct two hyperplanes that fit the sample points ofthe corresponding class. However, they view all samples points as being of equal importance, which weak- fens the robustness. To enhance the robustness, the hyperplanes should best fic points in the central region of classes, whose depth is relative large. Therefore, based on the idea of TWSVM and ‘TBSVM, we construct average depths for each of the hyperplane pair and propose a novel method: twin depth support vector ma- chine (TDSVM). TDSVM utilizes adjusted depth ratio as weight to modify the distance from the sample point to the corresponding hyperplane. Then, like many variants of TWSVM. it also generate two nonparallel proximal hyperplanes, which is nearer to one ‘lass and far away from the other class. In order to obtain TDSVM, classifier, the following pair of QPPs must be solved SVM) gz lta bed )]" [du + (Aa! + e6!2)] voelat do [lof + 0°)'] (1) st — Go" Feb tg>e, 420. an Average depth of points ee % 20 ewe a $ Daye teh (18) Bet yy = m, rpsvaz) tn 5 [tex al +0269] [dex Bl” 026%) tatiat dea[]o™fi + 6)'] (9) st a bey (20) 4s, | ; en where da Adi yd poesia, Whe te = Ldap. dy > day, a... the penalty pax rameters and e;, ey ar€ Vectors of ones of appropriate dimensions We provide the detailed explanation and proof for the main ‘equations. First, we present a useful lemma as follows: Lemma 1 mm. (22) 134 HM Wong L Zhang ea Proof. For i= 1,2,...,m, from (14) and (15), we have maxicicm {Dx.ve} {Pras} {rmin(Ragp + 1 = Rane) (23) when azpae [Pion } = sae (Dan) == Ema [ Pra) (24) | FD Be Psd es) 2 lmin 9 +1 By} i maxtism (Dx, a) tem =2k-+ 1, aK (min. Baie} = a {2 vs} 2k en maxis (Druns} {i m=2k os ™ AE m=2k+ When m > +00. k — too, 4G — 4, which indicates From Lemma 1, the depth rato is less than ot equal to ! approximately, since m is mostly large. The meaning of formula (18) means sample points are of equal importance and their ‘weight are all equal to 4 when theit depth values are greater than the threshold, ahd that point’s weight is equal to the corresponding depth Fatio when the depth vai i less than the threshold =, This algorithm makes the points with larger depth value play & greater role in the determination of hyperplane, and the points with smaller depth value play a smaller role in the determination of hyperplane “The ftst term af 15) isthe sum of squared distances adjusted according to the depth valve from the hyperplane to points of Corresponding class. In order to make the result of classification as accurate as possible, we need to minimize the adjusted square sum of the distance from the hyperplane to the data points of cortesponding class. The (17) constrains the distance to be at least 1 from the hyperplane to the points of the other class. The second term of (16) is to minimize the sum of error variables, Which is used to measure the error wherever the distance from the hyperplane to the points of the other class is less than 1 ‘The parameter c, measures the tolerance of noise and misclas- sification due to points belonging to class -1. When ¢y is ve owniedge Used Systems 276 (2023) 10827 large, samples with errors in classification are almost not allowed to exist, which is similar to a hard margin SVM problem and easy to ead to overfitting. When ¢; tends to 0, the classifier no longer pays attention to whether the classification is correct and only requires that the larger the margin, the better. Then ‘meaningful solutions will not be obtained and the algorithm will, ‘not converge, which causes underftting, The third term of (15) is 2 regularization term, which makes the structural risk minimized ‘The Lagrangian about problem (16) can be expressed as aol 1(0°. 6, goa ) = 3 [de® (Ao! [daw (Ac + eb") 56 (lol +0") al (— (Bw! + eb") +.q-e2)) —8"4, where a, 8 are the vectors of Lagrange multipliers. For w'", bi 4. B, from the KKT conditions, we have 29) seca 650 + (da AY [dam (A0'" + ey6!")] + Ber (0) cob + (dy # 01) [da x (Aw't + :6")] +e G1 cea =0, (22) = (Bu! eb!) +20, 920, (3) a (~ (Bw! +erb!") +q-e) (4 w20, p20. (5) From (32) and (35), we have Osasa, (8) Then, (30) and (31) ean be combined as (aa? deel [dea dee J+onl +[ 8 gf Jo G7) where H = [Ae ]andG = [Be ], We define the augmented vector uy = [then the above equation can be rewritten as [ihe het) tel]us +e =o. (38) ‘Asa result, according to (38) the parameter sy of byperplane can be obtained y= —[l et! et) toll ha, 69) where Fis an identity matrix of appropriate dimensions. Accord- Ing to (302-35) and (38), the dual problem ofthe primal problem TDSVMI can be expressed 3s mone ca LoVe AT deat ) 8) 69) $0) = arg min in, 6 4. Experiments and results 4.1, Evaluation criteria and experimental setup To show the different performance of TDSVM. TBSVM, Cl -- LSTSVM and TWSVM, we used synthetic datasets introduced out- liers and some standard datasets from UCI machine learning repository. Experiments have been implemented by using python 3.10 running an a PC with an AMD R5 processor (2.38 GHz) with 16 GB RAM, The accutacy of the experimental method is mea- sured by the ratio of the sum of true positive and true negative samples to the total number of samples. In order to get objective results, the standard tenfold cross-validation is implemented. The datasets are divided into ten parts, nine of which are selected as ‘the training set and the rest is reserved for testing each time. Ten ‘experiments are carried out, The performance of each method is, ‘measured by the average and standard deviation of experiment results for ten times. All the parameters of the final results are the optimal parameters obtained by the grid search methods. For Gy, Go. Cs, Ca we find the best parameters in the range of 2° t0 2°, and for cs, its range depends on concrete datasets. 134 HM Wong L Zhang ea ‘able 2 owniedge Used Systems 276 (2023) 110827 [Comparison of classification acuray on synthetic easels introduced outers TWSVM TBSVM Clap-LSTSVM TDSVM Fig. 2 Comparison revus on synthetic data 42, Experiments on the synthetic datasets introduced outliers In the process of data analysis, the accuracy and effective. ness of the results are often damaged by the existence of out- liers. In order to verify the advantages of our algorithm on datasets with outliers, TWSVM, TBSVM, CL p-LSTSVM and TDSVM are applied on a synthetic dataset with outliers. Fig. 2 shows four paits of nonparallel hyperplanes constructed by four classi- fiers respectively. And the performance of the four classifiers is compared. From Fig, 2, we can observe that the pair of nonparallel hy- perplanes in TDSVM well characterizes the synthetic dataset with outliers, which is hardly affected by outliers. While those in TWSVM and TBSVM and Cl ,-LSTSVM are affected by outliers to a certain extent, which could result in offset. According to the final results of the experiments, the classification accuracy ‘of TDSVM (with the optimal parameters cy = 2—', cq = 2 and c= 4.06) on the test set is 96.67%. On the same test set the classification accuracy of TBSVM, TWSVM and Cla -1STSVM_ are 92.33%, 91.67% and 93.33% respectively. After Gaussian noise is introduced into the synthetic dataset, the similar results are obtained. The comparison of specific parameters and results are shown in Table 2. 134 HM Wong L Zhang ea owniedge Bsed Systems 276 (2023) 10827 ‘ables ‘Accuracy % ‘curacy ‘Accuracy & ‘ecuracy eaGle=a/G=% aeaieaa S10 ee cm eie 270% 14) Piss mn re rigor UPA liver nos £737 99.2879 69.28.4795 10.15 +591 680 14) 2p 2p 2/15/01 Tables Performance comparison of nonlinear {RAF] casters on UC datasets Dataset TaN TeV Twa TE e=oie ale anaine fm eale 210 14) 45 rn Hiisjoo Pine fags 41383 Apena is 1353 BURA iver 7837 +761 Ta ss62 b4s2 2640 245% 8 Bina at rinsjo2 580% 14) rintia55 rn Pilar epates ‘nso + 1083 sas fe34 1238 Therefore, it can be proved that the robustness of TDSVM classifier is better than the other three classifiers Remark 4, Depth is used to measure the position of data points in statistics, and the position of data points plays an important role in data classification. The points at the center of the datas are often conducive to the accurate classification of data points, ‘while outliers and the points at the boundary usually have large residuals. Therefore, in order to suppress the influence of outliers ‘on data classification to a certain extent, the outliers can be penalized according to the depth value. The experimental results show that the classifiers without considering depth values are sensitive to outliers and have poor robustness. However, when depth is introduced, the influence of outliers is weakened and the robustness of the model is improved. 43. Experiments on UCI datasets, 421. Experiments on UCT datasets without noise Table 3 shows the classification accuracy and standard devia- tion of aur TDSVM, TESVM, TWSVM and Cl, p-LSTSVM based on ‘optimal parameters when we use the linear Kernel and standard tenfold cross-validation methodology. According to the perfor- mance of the classifiers based on the optimal parameters on the test datasets, the performance of the algorithms is compated. In Table 3, the best results are shown in bold. Based on this, we ‘ean find that the classification accuracy of our TDSVM is better than that of other classification methods on seven datasets, For ‘example, for Sonar the accuracy of our TDSVM is 80.78% (e, = 22 G = 16 = 331), while the accuracy of TBSVM is 7983% (6 = 2G = 1) and the accuracy of TWSVM is 78.79% (6, = 1 = 2-1) and the accuracy of CL, p-LSTSVM is 71.26% (c) = 2: P= 17. € = 02). For other datasets similar results can be ‘obtained. It is easy to see that by introducing the average depth of data points and selecting the optimal value of parameter cs, TDSVM can handle outliers well. Better performance are shown in TDSVM than other algorithms on two common kernel functions linear and RBF. Table 4 shows the results when we use the RBF Kernel, which is similar to the results when we use the linear kernel in Table 3, Based on optimal parameters, the classification accuracy of our ‘TDSVM on all datasets except Sonar and Pirx is better than that of other classification methods. in other words, TDSVM gains the best classification accuracy on 6 out of 8 datasets, and TWSVM. obtains the best classification accuracy on 2 out of 8 datasets. In ‘comparison, TRSVM obtains the best classification accuracy of 1 ‘out of 8 datasets Fig. 3 shows the effect of c and cy on the results when the value of cs is fixed and the linear kernel is used. The selected ‘optimal parameters and accuracy are shown on each subplot Similarly, Fig. 4 shows the situation when using the RBF kernel. It can be found from these figures that the selection of parame- {ets is very important to the classification accuracy, so selecting appropriate parameters can make the algorithm perform better 13% HME Wong Zhang owniedge Used Systems 276 (2023) 110827 sonar Hepatitis Fig 3. sentity penance 6 TDSUM wih respect ac sng ea ere Heart Ionosphere sonar Hepatitis Fi. Sensitivity pesrmane of SVM wih reset and sig ere ° 134 HM Wong L Zhang ea owniedge Used Systems 276 (2023) 110827 ‘able § ‘Accuracy % ‘curacy ‘nccuracy & ‘ecuracy esGle=aie=% aeaieaa S10 ees em aie 270% 14) ririan rin 22" P/19j00s UPA liver Fo74s 388 Toa2 = 7.43 1792 680% 14) 2828 Table 6 Dataset TaN TeSvi Twa TE = oie ale penioeny fm eable 210 14) 21606 rip 9005 Pine Tai6 + 1456 ja16 = 1456 236 = 1556 BUPA Tver Fors + 834 6985 = 636 67532613 245% 8 263s 22 p7jer 580 14) 21/354 an 2717/00 epates S996 4 1140 nas nas wo Fig. 5 shows the running time of each algorithm on UCL datasets. In order to reduce the random error caused by con- tingency, ten repeated tests are cartied out and their average running time is compared. It is observed that TDSVM is not the best one but has no obvious disadvantage in learning efficiency ‘compared with other algorithms. TDSVM is only a little slower ‘than TBSVM, since a little time is required to calculate the depth values of data points in each dimension, 432. Experiments on UCI datasets with noise In order to further confirm that the introduction of aver- age depth is conducive to the robustness of the classifier, we introduce 10x, 20% and 30% Gaussian noise into UC datasets separately and then apply TDSVM, TBSVM Clo p-LSTSVM_ and TWSVM to it, respectively. Best results are obtained with the ‘optimal parameters. Like the experiments on the UCI datasets ‘without noise, the performance of the classifiers is also measured. by the classification accuracy and standard deviation through the standard tenfold cross-validation methodology. The experimental results of four methods on UCI datasets with 10%, 20% and 30% Gaussian noise are shawn in Tables 5, 6,7 (using, linear kernel) and Tables 8, 9, 10 (using RBF kernel) From Tables 5, 6,7, is observed that although the accuracy ‘of the classifiers decreases after introducing Gaussian noise, the classification accuracy of our TDSVM on most datasets with 10% Gaussian noise is better than that of other classification methods ‘with optimal parameters and linear kernel. TDSVM gains the best 0 it ‘iheugh 8 sigulythe flowing” Het, Ionesphere, Sonar, Px BUPA Live, ‘Asuallan Wpbe and Hepat classification accuracy on 6 out of 8 datasets with 10% Gaussian noise. For example, for lonosphere dataset, the average accuracy of our TDSVM is 91.74% (cy = 2-*, ¢ = 2-*, cs = 7.00). while the accuracy of TBSVM is 89.74% (¢y = 2-8, = 2-*) and the 134 HM Wong L Zhang ea owniedge Bed Systems 276 (2023) 110827 ‘able 7 ‘Accuracy % ‘curacy ‘Accuracy & ‘ecuracy esG/e=aiG=% aeaiees S10 ee emis 270% 14) Paria mn 22" 5101 UPA liver 96.09.2467 680% 14) rl 3j008 1 19) 24458 2h rit Pinger Table # Performance comparison of nonlinear (RF asfiers on UCl datasets inodueed 10% noise Accuracy % ‘Accuracy & ‘Accuracy % ‘Accuracy Heanor aaa = 708 ors7 a Bigea7 ot TyT0=7 08 270° 1a) aaa mt Psjo2 SUPA iver 7360 +339 1331 = 037 6781+ 10.40 sas =570 ‘nusralian 797 +493 s7ass52 82324575 63.77 626 80 4) 2/543 2p a rso1 sable 9 Performance comparison of nonlinear (RF casfiers on UCl datasets introduced 20% noise Accuracy * ecurey & necuraey & neces f= ola = ale = Gaolena eStore Gaile Feat WHET Sa = 790 Fase 109 Tabz7 lonosphere. 9487 + 422 3487 + 422, 7666+ 1034 91472322 351% 34 273 2 rt 2p3jo2 UPA liver 129 +595 70.72=6.30 67814928 3.692905 ‘australian 9739 +516 37.102569 81.4545.06 67542724 80x 1) 22571 22 phisyo2 Wpbe Bost 77 sions 77 39.47 1005 198% 34) Piss. rn Pha accuracy of TWSVM is 87.75% (¢y = 2. ¢ = 2°) and the accuracy (of CL; p-LSTSVM is 90.88% (¢) = 2~7. p = 1.9, ¢ = 01). For other datasets, similar results are obtained, Likewise, TDSVM gains the best classification accuracy on 6 out of 8 datasets with 20% noise land gains the best classification accuracy on 7 out of 8 datasets ‘with 30% noise That is to say, the classification accuracy of our TDSVM on most datasets with 20% or 30% noise is superior to chat of other classification methods. In the same manner, from “Tables 8, 9, 10, itis observed that ‘the accuracy of the classifies further decreases after introducing more Gaussian noise, However, based on optimal parameters and 134 HM Wong L Zhang ea owniedge Used Systems 276 (2023) 10827 ‘Accuracy % Accuracy & Accuracy % Accuracy BUPA liver 46 4 1107 (6,65 = 19.09 1.122730 “Austalian 86.96 1 483 86814466 82614729 69.86 +510 vee Ee a BF kernel, the proposed TDSVM gains the best classification ‘accuracy of 7 out of & datasets with 10% oF 20% noise and gains the best classification accuracy on 6 out of 8 datasets with 30% This result demonstrates how influence of noise on algorithm ‘can be effectively reduced by introduced average depth, The support vector machine framework introduced average depth is more robust, which is inline with our previous related research, Therefore, the experimental results essentially support ou by pothesis that TDSVM hhas the greatest classification performance in most standard datasets, especially in standard datasets with Gaussian noise 433. Statistical analysis in this section, the Friedman test and Nemenyi posthoc test are used to vey the proposed TDSVM is significantly superior to ‘THSVA, TWSVM, and Ch, y-LSTSVM. We performed the Friedman test to check whether the null hypothesis that all algorithms have the same performance is fase. Under the nll hypothesis, Friedman statistic is distributed according to x? with k— 1 degrees of freedom kk+1P [ee-8i where k is the number of employed algorithms. and N is the rnumber of UCI datasets. Tale 11 shows the Friedman test results ‘on employed UCI datasets. For @ = 0.05, the critical value of £, (3,21) is F321) = 3.072. For four classifiers, TDSVM, TTBSVM, TWSVM, Cl; >-LSTSVM, by comparing the values of Fr and F. from Table 11, it can be found that F > Fe holds in all tests. Thus, the null hypothesis should be rejected. We can conclude that not all algorithms have the same performance, For further pairwise comparison, the Nemenyi test is performed on RN ee ca (62) four algorithms, The critical difference CD can be calculated by (63) At p = 0.05, CD = 1.658. Table 12 shows the difference between TDSVM and other remaining algorithms. In the linear case, there exists the difference chat is larger than CD value in three paits of comparison, which concludes that the proposed TDSVM is superior to other compared algorithms. Further. in the nonlinear case (using RBF kernel), the difference between TDSVM. and Clp.p-ASTSVM is larger than CD but the difference between, ‘the two remaining pairs of comparison is smaller than CD. Hence, it is concluded that the proposed TDSVM is significantly bet- ter than TBSVM, TWSVM and Cla -LSTSVM in the linear case and significantly better than CL, p-ISTSVM and TWSVM in the nonlinear case. But the Nemenyi testis not powerful enough to dletect the significant difference between TDSVM and TBSVM in the nonlinear case ‘We may objectively conclude from the previous experimental results and analysis that by computing average depths of sample points and using them as the weight of distance in the objective function, the effect of outliers or noise can be reduced and classifi- cation performance can be enhanced. Consequently, the praposed TDSVM is a competitive supervised algorithm with robustness 5. Conclusions As using the average depth as the weight of points can well weaken the influence of noise or outliers, we construct a novel ‘model TDSVM by introducing average depth into the model of ‘TWSVM in this paper. This method utilizes the concept of depth, in the field of robust statistics, which can effectively reduce the lass of classification accuracy when noise or outliers are intro- ‘duced into dataset. At the same time, the SRM principle is also 13% HME Wong Zhang tae owniedge Used Systems 276 (2023) 10827 Diterence of average ranks for THSVM, TWSVM Cl ASTSUM an proposed OSV Witt naive Uonesr oy Gaussian vase os Gaussian noice 20% Gansian ooee Soe Gaussian noise implemented. Experiments on synthetic datasets introduced out- liers and several standard datasets from the UCI machine learning repository show that our TDSVM is very competitive. The classi- fication accuracy is better than other algorithms on almost every dataset and the model has good generalization ability as well Furthermore, we also introduce 10%, 20% and 30% Gaussian noise into UCI standatd dataset to further confirm the robustness of the ‘TDSVM. From the experimental results, we can also conclude that the method used in this paper is competitive for classification problems. In addition, multivariate Tukey depth is more accurate for measuring the position of sample points, but it is computa- tionally demanding in high-dimensional dataset, which limits its applicability. Ifthe calculation complexity is improved, we can farther consider using multivariate Tukey depth to develop better classification models, which is our future main work. (RediT authorship contribution statement Jiamin Xu: Investigation, Methodology, Algorithm design, ‘Wiiting - original draf. Huamin Wang: Investigation, Method- ology, Supervision, Writing - review & editing. Libo Zhang: Software, Validation. Shiping Wen: Supervision. Writing - re- viewing Declaration of competing interest We declare that we have no financial and personal relation- ‘hips with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service andjor company that coulé be construed as influencing the position presented in, or the review of, the revised manuscript. Data availability [No data was used for the research described in the article Acknowledgments This work is supported by the Natural Science Foundation of, ‘Chongqing, China (Nos. cstc2021jcyj-msxmX0565), Fundamental Research Funds for the Central Universities, China (No, SWUOZ1002), the Project of Science and Technology Re- search Program of Chongqing Education Commission of China (Nos. K12D-K202100203), National Natural Science Foundation of China (No. U1804158). 2 ESAS) ACL SVE TONVAT Tos 148 an 2000 so 2500 1500 ras References (015, nig dross nto for new dsnes Sted om fry ewes Cocco at Baga) 200 Deeicae dung the Cond pandemic? Feecning bused 0 SVM xii sin ane Markov aenngatreresion pe Sit App To SOnedat Reddo, A ALzoub.L Qa. Haus M.A nat SM, Concur. Compa ect Exp. 34 (8 (20005 a Sangasaan Noles Portriming SAM. 1994 Abe Support vector machine for pate esc 2005. sein a genre eigenvalue EE Teas Patern Aa Mah ine 260 (208 9 fea tah support vector machin for inblned data Cetin Peers feet 10? (2020) 10740. A'S Hong. Inbulaced as ciation bed on desing fe winsupet veer machine, Compa Se. Sys 142017) 1 ‘comin, 2013 th ntenaona Conger on image sd’ Sa Soe Beesea cee sn rae CSP-bE2017 renga Tn supp ver pepe Sai fr Daten scomon Near Compst ap 25) (2014) 07-1220 bing tang co San eptised i ppt vec: ete Sion grit enhanced by eee empl mode decamposten sod fated recarer ynoSe 98 (080) 11195 EMang. esha LN Deng, Twin sippets vets machine fr ister, EE ans Neal New ew, yt 2 (10) (2018) 2585-588 ISSA Y"ato, 2 wag csterng by tw appr econ machine owt tae Sy tos CRs) 201) 17-940 ‘aut Shao chuna Zhang Nebo Wate Naan Deng, inyrove fro Raman Mop eat stein spp er Was Ne Charan K Maral Eerpbased mode of lest squaes ‘oan pare veteran rman seen eornton Sm Pe fou poy buea57 no A. Sol vb fate selection based web eas squares el BI tl na ro ma us us ua ns, sy (20 ru ray (2s ses fr al 1 (2st (261 wn si (21 ba bu A Wong zhang eat X 1B. Du, C Xu. ¥, zhang L hang D. Tas, Robust letag wath Impevtec preg informatica, Arte iteligence 282 (2020) 103246. ive Ne 5. cae, Densiybased weighting mult-srtace least squares {casriestin wh ts appeaons, Know in 53 (2) 2012} 288-308, Yixa'sc Fan Z Zhou Z Yang Y zhang. toc lest square tt Suppor vector machine for dassfcaton, Appl ttl 42 (2015) 527-336. YX W.NLX eR Coe, An improve east squares wi suport ver ‘machine J at. Comput Se (2012) 1063-1071 DM. Taner, ML Khun, 5-H, Robust enegybased least squares van Suppor vector machites, Appl Iell 3 (2016) 174-186, [Mr Tanvee, sharma. KMiharnmad,Lacgessale least squares twin VMs, ‘eMC Trane aerne Fahl 21 (2) (2021) (CYuan L Yang Capped fay nen metic based bust least squats twin Support vector machine fer pattern elastin, Neural Netw 142 (2021) srs, 4 ouwiedgeBsed Systems 274 (2023) 110527 (32) 5. tuang 7 Kang, 2 Xu, Qt Robust deep Iemeans: An etleesve and Simple method far data cstenng, Pater Hecogn.117 (2021) 107936, {35} KL] Pareus K Singh Multivaate analysis by data dept: Descipave ‘atitics, grapes and inference wh dacusion snd Tender by Ls fn Singh) Ana Statist 27 (2) (190) 782-858, HC Hamme: Avan, H Rue, Estimating Tukey deck using incremental ‘guntle eximators, Pattern Reognt 122 (2022) 108339. (35) J Cerdera Piva A enzaly natin fr graphs based an Tukey depth ‘Ap Math Comput. 409 (2021) 125408 [36] x Le KC Mosler P. Mocharosky. Fast computation of Tukey trimmed Fegions and metian in dimension p>}. Comput. Graph Stats 28 (3) (201s) os2-007, Master. Dep statistics, Robust. Complex Data stuct. (2013) 17-34 K Jomsten, useing and easifiation based on the Ly data depth J. Malate Anal 90 (1) (2004) 67-88 (a 71 fe

You might also like