Arbitrary-Norm Separating Plane

You might also like

Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 12

Arbitrary-Norm Separating Plane 

O. L. Mangasarian
y

Abstract
A plane separating two point sets in -dimensional real space is constructed such that it
n

minimizes the sum of arbitrary-norm distances of misclassi ed points to the plane. In contrast
to previous approaches that used surrogates to the misclassi ed-point distance-minimization
problem, the present approach is based on a norm-dependent explicit closed form for the pro-
jection of a point on a plane. This projection is used to formulate the separating-plane problem
as a minimization of a convex function on a unit sphere in a norm dual to that of the arbitrary
norm used. For the 1-norm only, the problem can be solved in polynomial time by solving 2
linear programs or by solving a bilinear program. For all other -norms, 2 (1 1], a related
n

p p ;

decision problem to the minimization problem is NP-complete. For a general -norm, the min-
p

imization problem can be transformed via an exact penalty formulation to minimizing the sum
of a convex function and a bilinear function on a convex set. For the one and in nity norms, a
nite successive linearization algorithm is proposed for solving the exact penalty formulation.

1 Introduction
One of the fundamental problems of machine learning is that of discriminating between two nite
point sets in n-dimensional real space Rn . When the convex hulls of the two sets do not intersect,
a single linear program can construct a strict separating plane such that each of the two open
halfspaces generated by the plane contains one of the two sets [5, 12, 2]. Such a plane corresponds
to a perceptron and can also be obtained by the iterative perceptron learning algorithm [21, 10]
which can be interpreted as the Motzkin-Schoenberg iterative scheme for solving consistent linear
inequalities [20]. When the convex hulls of the two sets intersect the iterative scheme fails because
the underlying linear inequalities are inconsistent, while the linear programming approach must
be provided with an error criterion to be minimized. We propose as a criterion here the sum of
arbitrary-norm distances to the separating plane of points lying on the wrong side of the plane.
Unfortunately, if precise distances are used, linearity of the objective function is lost, as will be
shown in this work. What many of the previous approaches have tended to use, are distance
surrogates that maintained linearity of the problem [9, 8, 19, 2] but did not measure distances
of violating points to the separating plane. In contrast, our approach here depends on a closed
form formula for the projection of a point in Rn onto a given plane using an arbitrary norm,
which is given in Theorem 2.2 of Section 2. In Section 3 we formulate the problem of obtaining a
separating plane that minimizes the sum of distances of points on the wrong side of the plane, as
that of minimizing a convex function on a unit sphere in a norm dual to that used in measuring the
 Mathematical Programming Technical Report 97-07, May 1997. This material is based on research supported by
National Science Foundation Grant CCR-9322479 and Air Force Oce of Scienti c Research Grant F49620-97-1-0326.
y
Computer Sciences Department, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706,
olvi@cs.wisc.edu. The author gratefully acknowledges the gracious hospitality of the Mathematics Department of
the University of California at San Diego during his sabbatical leave January-May 1997.

1
distance to the projection on the separating plane (problem (17)). In Theorem 3.2 the nonconvex
sphere constraint is replaced by two unit-ball convex constraints and a bilinear constraint. This
allows us to reformulate the problem as an exact penalty problem for the 1-norm and 1-norm that
leads to a problem of minimizing the sum of a convex function and a bilinear function on a convex
set (Theorem 3.3). This in turn leads to a nitely terminating successive linear programming
algorithm for the 1-norm problem (Algorithm 3.4). The 1-norm problem can also be solved exactly
by 2n linear programs (problem (22)). In Section 4 we show that for all p-norms, except the 1-norm,
a closely related problem to the separating plane problem is NP-complete. Section 5 concludes with
a summary and an open question.
1.1 Notation & Background
A word about our notation and background material. All vectors will be column vectors unless
transposed to a row vector by a prime superscript 0 . For x 2 Rn , jxj will denote the vector in
Rn with components that are the absolute values of the components xi of x. The scalar product
of two vectors x and y in the n-dimensional real space Rn will be denoted by x0 y. For a linear
program min
x2X
c0x the notation arg vertex min
x2X
c0 x, will denote the set of vertex solutions of the linear
n 1
program. For x 2 Rn and p 2 [1; 1), the norm kxkp will denote the p-norm: ( jxi jp ) p and kxk1
X

i=1
will denote 1max
i n
j x i j . For x 2 R n ; (x+ )i = max f0; xi g; i = 1; : : : ; n: For an m  n matrix A; Ai
will denote the ith row of A and Aij will denote the element in row i and column j . The identity
matrix in a real space of arbitrary dimension will be denoted by I; while a column vector of ones
of arbitrary dimension will be denoted by e. The symbol := will denote a de nition of the term
appearing to the left of the symbol by the term appearing to the right of the symbol.
Because most our results hold for a general norm and not necessarily for a monotonic (or
absolute) norm [22], it is convenient to recall the de nition a general norm and that of a monotonic
norm.
De nition 1.1 A norm on Rn is function k  k : Rn ! R with the following three properties for
all x and y in Rn :
(a) kxk  0; kxk = 0 () x = 0
(b) k xk = j j kxk for 2 R
(c) kx + yk  kxk + kyk
For a general norm k  k on Rn , the dual norm k  k0 on Rn is de ned as
kxk0 := kmax
yk=1
x0 y; (1)
from which follows the generalized Cauchy-Schwarz inequality,
x0y  jx0 yj  kxk0 kyk: (2)
For p; q 2 [1; 1], p + q = 1, the p-norm and q-norm are dual norms by the classical Holder inequality
1 1

[1]. A norm k  k on Rn is said to be monotonic (or absolute) if either of the following equivalent
conditions hold:
x; y 2 Rn; jxj  jyj =) kxk  kyk (3)
k jxj k = kxk 8x 2 Rn
For p 2 [1; 1], the p-norm is monotonic.
2
2 Arbitrary-Norm Projection on a Plane
In this section we derive an explicit expression for the projection of a point on a given plane using
an arbitrary norm. We will use this expression to derive a mathematical programming formulation
of the separating plane problem. For convenience we begin with an elementary lemma.
Lemma 2.1 The general norm kxk is (a) convex on Rn, (b) has bounded level sets in Rn, and (c)
is continuous on Rn .
Proof
(a) For  2 (0; 1), and x; y 2 Rn we have by (c) and (b) of De nition 1.1:
k(1 )x + yk  (1 )kxk + kyk
(b) Let L( ) := fx j kxk  g, 2 R. If L( ) is unbounded, then there exists a sequence of
nonzero points fxi g  L( ) such that kxi k ! 1 which implies that k kxxii k k  kx i k ! 0.
This contradicts the fact that the bounded sequence f kxxii k g has an accumulation point z such
that kzk = 1.
(b) The continuity of the norm follows from its convexity on Rn [14, Theorem 4.1.15].}
We give now an explicit form for the projection of an arbitrary point on a given plane using a
general norm for measuring the distance between the point and its projection.
Theorem 2.2 Arbitrary-Norm Projection on a Plane. Let q 2 Rn be any point in Rn not
on the plane:
P := fx j w0 x = g; 0 6= w 2 Rn ; 2 R: (4)
A projection p(q) 2 P using a general norm k  k on R is given by:
n
0
p(q) = q wkqwk0 y(w); (5)
where k  k0 is the dual norm to k  k and:
y(w) 2 arg kmax
yk=1
w0 y: (6)
Consequently, the distance between q and its projection p(q) is given by:
kq p(q)k = jwkqwk0 j :
0
(7)
Proof The proof consists of showing that p(q) 2 P and that it satis es the right inequality of the
Karush-Kuhn-Tucker saddlepoint sucient optimality criterion [14, Equation 5.1.4, Theorem 5.3.1]
for some Lagrange multiplier  2 R:
kp(q) qk  kx qk (w0 x ); 8x 2 Rn: (8)
To show that p(q) 2 P note that:
0 0
w0 p(q) = w0 q + kwwk0 q w0 y(w) = w0 q + kwwk0 q kwk0 = 0: (9)

3
Hence p(q) 2 P . To show that (8) holds, de ne:
 := kw1k0  (j ww0 qqj) :
0
(10)
So we need to show that:
k kwwk0 q y(w)k  kx qk kw1k0 j ww0 qqj  (w0 x ); 8x 2 Rn;
0 0
(11)
or equivalently that:
j w0 qj  ky(w)k + (w0 x ) (j ww0 qqj)  kwk0  kx qk; 8x 2 Rn:
0
(12)
Since ky(w)k = 1, inequality (12) is equivalent to:
 w0 (x q)  kwk0  kx qk; 8x 2 Rn; (13)
which follows immediately from the generalized Cauchy-Schwarz inequality or equivalently from
the de nition of the dual norm. Hence (8) holds, and by the Karush-Kuhn-Tucker saddlepoint
suciency theorem [14, Theorem 5.3.1] p(q) as given by (5) is a projection of q on P . Formula (7)
for the distance between q and its projection p(q) follows from (5) by noting that ky(w)k = 1. }
As immediate consequences of the above theorem we give projections of q on P using the
common one, two and in nity norms.
Corollary 2.3 1-Norm Projection. For the 1-norm k  k , w0 y(w) = kwk1 and hence
1

yi (w) = 0 for jwi j =


6 kwk1
9
>
= n
yi (w) = i for wi = kwk1 ; i  0; i = 1:
X

yi (w) = i for wi = kwk1 >


; i =1

Furthermore, since ky(w)k = 1, it follows from (5) and (7) that:


1

p(q) = q wkwq k y(w); & kq p(q)k = jwkwq k j :


0 0
1
1 1
Corollary 2.4 2-Norm Projection. For the 2-norm k  k , w0 y(w) = kwk and hence
2 2

y(w) = kwwk :
2

Furthermore, since ky(w)k2 = 1, it follows from (5) and (7) that:


p(q) = q wkqwk y(w); & kq p(q)k2 = jwkqwk j :
0 0
2 2

Corollary 2.5 1-Norm Projection. For the in nity-norm k  k1 , w0 y(w) = kwk1 and hence
)
yi(w) = 1 for wi > 0 :
yi(w) = 1 for wi  0
Furthermore, since ky(w)k1 = 1, it follows from (5) and (7) that:
p(q) = q wkqwk y(w); & kq p(q)k1 = jwkqwk j :
0 0
1 1

4
If we make use of standard inequalities between various norms such as [23, p 170]:
kxk1  kxk  kxk  n 21 kxk  nkxk1;
2 1 2 (14)
and the fact that for a xed x 2 Rn , the p-norm is a nonincreasing function of p 2 (0; 1] [1, p 18],
we obtain the following corollary to Theorem 2.2 relating norm-dependent distances between the
point q and its projection p(q) on the plane P .
Corollary 2.6 Inequalities Between Norm-Dependent Distances to Projections. Let
d(q; P )` denote the distance between the point q and its projection p(q) on the plane P using the
norm k  k` , ` 2 [1; 1]. Then
d(q; P )1  d(q; P )2  d(q; P )1  n 21 d(q; P )2  n 1 d(q; P )1 (15)
and
d(q; P )`1  d(q; P )`2 ; for `1 ; `2 2 [1; 1]; `2 > `1 : (16)
We demonstrate now the above projections with a very simple example.
Example 2.7
" #

P = fx 2 R2 j x1 + x2 = 4g; q = 53

1-norm k  k1 " #

y(w) = 1 ; 1  0; 2  0; 1 + 2 = 1
2
" # " # " #

p(q) = 53 8 4 1 = 5 41 ;   0;   0;  +  = 1; kq p(q)k = 4:


1 2 3 42 1 2 1 2 1

2-norm k  k2 " #
p12
y(w) = p12

2 = 3 ; kq p(q)k = 2p2:
" # " # " # " # " #

p(q) = 53 8p 4 p12 = 5
2 p12 3 2 1 2

1-norm k  k1 " #

y(w) = 11
" # " # " #

p(q) = 53 8 4 1 = 3 ; kq p(q)k = 2:
2 1 1 1

We note that the inequalities (15) and (16) of Corollary 2.6 are satis ed by the norm-dependent
distances computed in Example 2.7.
With the above projection results we can now turn our attention to separating planes based on
arbitrary norms.
5
3 Arbitrary-Norm Separating Plane as a Mathematical Program
We consider two nite point sets A and B in Rn represented respectively by the matrices A 2 Rmn
and B 2 Rkn. We wish to construct a plane P de ned by (4) so that most of the points of A are
on one side of P and most of the points of B are on the other side of P . We will attempt to achieve
this by minimizing the sum of arbitrary-norm distances to P of points of A on the wrong side of
P and distances of points to P of B on the wrong side of the plane as well. If we assume, without
loss of generality, that the points of A should lie on side of P in the direction of the normal w to P
and the points of B on the other side, we then have the following optimization problem, where the
variables (w; ) have been normalized by dividing by kwk0 , the dual of the general norm k  k used:
( " # )

min

Aw + e



kwk0 = 1 : (17)
(w; )2Rn+1

Bw e +


1

We note that the 1-norm in the objective function is a xed norm on Rm+k that is used to obtain
the sum of norm-dependent distances to the plane P of misclassi ed points of A and B. However,
the norm in the constraint is an arbitrary norm on Rn that is dual to the norm used in measuring
distances of misclassi ed points to the plane P . We also note that the objective function of (17) is
convex on Rn for any monotonic norm including the 1-norm. In fact this follows from the following
simple lemma.
Lemma 3.1 Let h : Rn ! Rm be a convex function on Rn. The function f (x) := kh(x) k is +
convex on Rn for any monotonic norm k  k on Rm .
Proof Let x; y 2 Rn and let  2 (0; 1). Then,
f ((1 )x + y) = k(h((1 )x + y)) k +
 k((1 )h(x) + h(y)) k +
(By convexity of h & monotonicity
of the plus function & the norm)
 k((1 )h(x))+ + (h(y))+ k
(By (a + b)+  (a+ + b+ ) & norm monotonicity)
 (1 )k(h(x)+ k + kh(y)+ k
= (1 )f (x) + f (y)
Hence kh(x)+ k is convex on Rn .}
Thus the objective function of the mathematical program (17) is convex but its feasible region,
which is the unit sphere in the dual norm kk0 , is not convex. It is precisely this essential nonconvex
condition that has been either ignored in most previous work [12, 9, 8, 2] or used heuristically
[13, 19] to enforce nonzeroness of w but not as a distance-normalization constraint. Thus in these
papers, the sum of the distances of misclassi ed points to the separating plane has not been the real
objective function that has been minimized. In fact the nonconvexity of the program (17) leads to
NP-completeness of a related decision problem for all p 2 (1; 1] but not for p = 1 (See Section 4).
In order to alleviate this diculty to some extent, we introduce an equivalent formulation of (17)
which is convex except for a single bilinear constraint. This will allow us to construct a penalty
problem that is convex except for a bilinear penalty term. Such bilinear reformulations have been
successfully used to solve other dicult problems [3, 16, 4]. We state and establish this equivalent
reformulation of (17).

6
Theorem 3.2 The mathematical program (17) is equivalent to the following:
( " # )

min

Aw + e



kwk0  1; ktk  1; w0 t  1 : (18)
(w; ;t)2Rn+1+n

Bw e +


1

Proof We establish the equivalence of the mathematical programs (17) and (18) by showing that
the constraints of each program imply the constraints of the other program.
(=)) Let kwk0 = 1. Then (w; t := y(w)) satis es the constraints of (18), where
y(w) = arg kmax
yk=1
w0 y = kwk0 = 1:
This is so because
ktk = ky(w)k = 1; kwk0 = 1; w0 t = w0 y(w) = kwk0 = 1;
and hence the constraints of (18) are satis ed by (w; t := y(w)). Consequently:
min of (18)  min of (17):
((=) Let (w; t) satisfy the constraints of (18). Then by the generalized Cauchy-Schwarz in-
equality:
1  w0 t  kwk0  ktk  1:
Since kwk0  1, ktk  1, it follows that kwk0 = 1 and ktk = 1. Hence w satis es the constraint of
(17) and
min of (17)  min of (18):
}
We can reformulate problem (18) as a penalty problem for which the only nonconvexity is a
bilinear term in the objective function. In addition, for the cases of the one and in nity norms,
the penalty function formulation can be shown to be exact. We state this result as the following
theorem.
Theorem 3.3 The mathematical program (18) can be reformulated as the following penalty prob-
lem: ( " # )

minn+1

Aw + e w0 t kwk0  1; ktk  1 ; > 0 (19)
(w; )2R

Bw e + 1

For a general norm, the sequence f(w( i ); ( i ); t( i ))g of penalty problem solutions of (19), for a
sequence f i g " 1 of penalty parameters, has an accumulation (w;   ; t), and each such accumulation
point solves (18). For the one and in nity norms, some xed vertex solution solves (19) as well as
(18) for each ij of some subsequence f ij g of f i g " 1.
Proof From the rst two constraints of (18) we have that
w0 t  kwk0 ktk  1;
and hence 1 w0 t  0. Thus the exterior penalty term (1 w0 t) for the constraint w0 t  1 of (18)
+
is equivalent to (1 w0 t). Dropping the constant term , gives the penalty term w0 t of (19).
The rst part of the theorem then follows from standard exterior penalty function results such as
[15, Theorem 2.8]. We establish the second part for the 1-norm only, the proof for the 1-norm is
7
similar. For the 1-norm, problems (18) and (19) can be written as the following programs with a
single bilinear term in the constraints and the objective function respectively:
( )

min e0 y + e0 z

Aw + e  y; y  0; Bw e  z; z  0; (20)
(w;t;s; ;y;z)2R3n+1+m+k

e  w  e; s  t  s; e0 s  1; w0 t  1
( )

min e0 y + e0 z w0 t

Aw + e  y; y  0; Bw e  z; z  0; (21)
w;t;s; ;y;z)2R3n+1+m+k
( e  w  e; s  t  s; e0 s  1
Since the quadratic objective function of (21) is bounded below by kwk1  ktk1 = , it follows
that the quadratic program (21) has a solution [6]. Since the constraints of (21) are separable in
(w; ; y; z ) and (t; s), it follows that (21) has a vertex solution because from any solution point
a vertex of the (w; ; y; z ) constraints can be obtained by a single linear program in (w; ; y; z ),
and second linear program in (t; s) from that vertex gives the desired vertex solution of (21) [3,
Proposition 2.1]. By [18, Theorem 2.1], or more simply by noting that some xed vertex of the
feasible region of (21) solves (21) for a subsequence f ij g " 1 and hence must also solve (20).}
We turn our attention now to algorithmic aspects for solving 1-norm separating plane. First
we note that it can be solved in polynomial time because the solution of (17) for the 1-norm can
be obtained by the best solution of the following 2n polynomially solvable [11, 24] linear programs
( )

min

e0 y + e0 z Aw + e  y; y  0; Bw e  z; z  0; ; i = 1; : : : ; n: (22)
w; ;y;z)2Rn+1+m+k
( e  w  e; wi = 1
If n is large, one may wish to resort to the following nitely terminating successive linearization
algorithm for solving the bilinear formulation (21) of the 1-norm separating plane problem.
Algorithm 3.4 Let > 0 and start with arbitrary (w ; ), such that e  w  e, y = 0 0 0 0

( Aw + e ) ; z = (Bw e ) and arbitrary t such that kt k  1. Having (wi ; i ; ti ; yi ; z i ; si )


0 0
+
0 0 0
+
0 0
1
compute (wi+1 ; i+1 ; ti+1 ; yi+1 ; z i+1 ; si+1 ) as a vertex solution of the following linear program:
( )

min e0 (y yi ) + e0 (z z i )
Aw + e  y; y  0; Bw e  z; z  0;
w;t;s; ;y;z)2R3n+1+m+k
( ((wi )0 (t ti) + (w wi )0 ti )
e  w  e; ; s  t  s; e0 s  1;
(23)
Stop when the minimum of (23) is zero.
It follows by an argument similar to that of [3, Algorithm 2.2] that the above algorithm terminates
at a vertex that satis es the minimum principle necessary optimality condition for (21).
We turn our attention now to complexity issues associated with the separating plane problem
that depend on the norm used.

4 Complexity of Separating Plane Problems


In this section we will show that for all p 2 (1; 1], a closely related problem to (17) is NP-complete
by adding the constraint
kwk1  n1 ; (24)

8
to (17). This constraint is consistent with the constraint kwk0 = 1 of (17) whenever k  k = k  kp ,
k  k0 = k  kq , p 2 (1; 1] and p1 + 1q = 1. For it follows from:
1
kwkq  n q kwk1  nkwk1 ; q 2 [1; 1); (25)
and the constraint of (17):
kwkq = kwk0 = 1; (26)
that
kwk1  n1 : (27)
From (24), (26), (27), kwk1  kwkq [1, p 18] and q 2 [1; 1) it follows that:
1 = kw k  kw k = 1; (28)
n 1 q

and hence
wi =  n1 ; i = 1; : : : ; n: (29)
We note however that the constraint (24) cannot be added to (17) when p = 1, and hence q = 1,
because that would lead to the following contradiction for n > 1:
1  kwk = kwk = kwk0 = 1: (30)
n 1 q

We show now that (17) augmented by the constraint (24) is NP-complete [7] when A and B
have rational entries and the norm is the p-norm, p 2 (1; 1], that is:
( " # )

minn+1

Aw + e



kwkqq = 1; kwk1  n 1
; p 2 (1; 1]; p1 + 1q = 1: (31)
(w; )2R

Bw e +


1

Theorem 4.1 NP-Completeness of Separating-Plane-Related Problem, p 2 (1; 1]. The


following decision problem for p = 2; 3; : : :, 1q = 1 p1 , associated with the nonconvex program (31)
with rational entries for A and B is NP-complete:
( " # )

Is min

Aw + e



kwkqq = 1; kwk1  n  0?1
(32)
( w; )2Rn+1

Bw e +


1

For p = 1 the decision problem (32) is interpreted as:


( " # )

Is min

Aw + e



kwk = 1; kwk1  n  0? 1
(33)
w; )2Rn+1
(

Bw e +


1


1

Proof By (29) each of the constraints of (32) and (33) are equivalent to wi =  n ; i = 1; : : : ; n. 1

Therefore problems (32) and (33) are in NP, because a correct guess of w that gives a minimum value
of the objective function of either problem will answer the question of (32) or (33) in polynomial
time by checking whether or not

1
max B w  1min
ik i
A w;
im i
(34)

9
and setting
= 12 (1max B w + 1min
ik i
A w);
im i
(35)
when (34) is satis ed.
We now show that (32) and (33) are are NP-hard by reducing to them an instance of the
partition problem:
Is d0 w = 0 for x 2 Rn such that kwkqq = 1; kwk1  n1 ?; p = 2; 3; : : : ; 1q = 1 p1 ; (36)
or equivalently
Is d0 w = 0 for x 2 Rn such that kwk1 = 1; kwk1  n1 ? (37)
Here the components (d1 ; : : : ; dn ) of d are given positive integers. The partition problem is the
following instance of (32) and (33):
d0 w
8 2 3 9
> >

d0 w +
> >

 0? p = 2; 3; : : : ; 1q = 1 1p ;
>
< 6 7 >
=
Is minn+1 >
w; )2R
(



6
6
4 d0 w +
7
7
5






kwkqq = 1; kwk1  n 1
>
> >
>
:
d0 w +


1


>
;

(38)
or
d0 w
8 2 3 9
> >

d0 w +
>
> >
>
< 6 7 =
Is minn+1 >
w; )2R
(



6
6
4 d0 w +
7
7
5






kwk = 1; kwk1  n
1
1
>
 0? (39)
> >
>
:
d0 w +


1


>
;

If the minimum of either problem is zero, then  d0 w  and  d0 w  . Hence


d0 w = = 0 and wi =  n1 ; i = 1; : : : ; n, and we have a solution to the partition problem. Hence
(32) and (33) are NP-hard, and since they are in NP, they are NP-complete. }

5 Summary & Conclusion


By using an exact expression for an arbitrary-norm projection of a point on a plane we were able to
pose the separating plane problem as a minimization of a convex function consisting of the 1-norm
of a piece-wise linear function on a unit sphere. A related problem to the minimization problem is
shown to be NP-complete for all p-norms, except p = 1. For p = 1 the separating plane problem
can be solved in polynomial time, and for p = 1 and p = 1 the problem can be reduced via an
exact penalty reformulation to minimizing a bilinear function on separable linear constraints. A
successive linearization algorithm for the bilinear problem terminates in a nite number of steps
at a stationary point of the bilinear exact penalty formulation. Successful experience in solving
dicult NP-hard problems by such a bilinear formulation [3, 16, 17, 4] leads us to conjecture that
the bilinear approach would be viable in the present context as well. An interesting open question
that has not been resolved in this work is this: Are the formulations (32) and (33) NP-complete
without the additional constraint kwk1  n1 ?

10
References
[1] E. F. Beckenbach and R. Bellman. Inequalities. Springer{Verlag, Berlin, 1961.
[2] K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two
linearly inseparable sets. Optimization Methods and Software, 1:23{34, 1992.
[3] K. P. Bennett and O. L. Mangasarian. Bilinear separation of two sets in n-space. Computational
Optimization & Applications, 2:207{227, 1993.
[4] P. S. Bradley, O. L. Mangasarian, and W. N. Street. Clustering via concave minimization.
Technical Report 96-03, Computer Sciences Department, University of Wisconsin, Madison,
Wisconsin, May 1996. Advances in Neural Information Processing Systems 9, MIT Press, Cam-
bridge, MA 1997, to appear. Available by ftp://ftp.cs.wisc.edu/math-prog/tech-reports/96-
03.ps.Z.
[5] A. Charnes. Some fundamental theorems of perceptron theory and their geometry. In J. T.
Lou and R. H. Wilcox, editors, Computer and Information Sciences, pages 67{74, Washington,
D.C., 1964. Spartan Books.
[6] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics
Quarterly, 3:95{110, 1956.
[7] M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to the Theory of
NP{Completeness. W. H. Freeman and Company, San Francisco, 1979.
[8] F. Glover. Improved linear programming models for discriminant analysis. Decision Sciences,
21:771{785, 1990.
[9] R. C. Grinold. Mathematical methods for pattern classi cation. Management Science, 19:272{
289, 1972.
[10] J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation.
Addison-Wesley, Redwood City, California, 1991.
[11] N. Karmarkar. A new polynomial time algorithm for linear programming. Combinatorica,
4:373{395, 1984.
[12] O. L. Mangasarian. Linear and nonlinear separation of patterns by linear programming. Op-
erations Research, 13:444{452, 1965.
[13] O. L. Mangasarian. Multi-surface method of pattern separation. IEEE Transactions on In-
formation Theory, IT-14:801{807, 1968.
[14] O. L. Mangasarian. Nonlinear Programming. McGraw{Hill, New York, 1969. Reprint: SIAM
Classic in Applied Mathematics 10, 1994, Philadelphia.
[15] O. L. Mangasarian. Some applications of penalty functions in mathematical programming.
In R. Conti, E. De Giorgi, and F. Giannessi, editors, Optimization and Related Fields, pages
307{329. Springer{Verlag, Heidelberg, 1986. Lecture Notes in Mathematics 1190.
[16] O. L. Mangasarian. Misclassi cation minimization. Journal of Global Optimization, 5:309{323,
1994.
11
[17] O. L. Mangasarian. The linear complementarity problem as a separable bilinear program.
Journal of Global Optimization, 6:153{161, 1995.
[18] O. L. Mangasarian and J.-S. Pang. Exact penalty functions for mathematical pro-
grams with linear complementarity constraints. Optimization, 1997. To appear.
ftp://ftp.cs.wisc.edu/math-prog/tech-reports/96-06.ps.Z.
[19] O. L. Mangasarian, R. Setiono, and W. H. Wolberg. Pattern recognition via linear program-
ming: Theory and application to medical diagnosis. In T. F. Coleman and Y. Li, editors,
Large-Scale Numerical Optimization, pages 22{31, Philadelphia, Pennsylvania, 1990. SIAM.
Proceedings of the Workshop on Large-Scale Numerical Optimization, Cornell University,
Ithaca, New York, October 19-20, 1989.
[20] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linear inequalities. Canadian
Journal of Mathematics, 6:393{404, 1954.
[21] N. J. Nilsson. Learning Machines. MIT Press, Cambridge, Massachusetts, 1966.
[22] J. M. Ortega. Numerical Analysis, A Second Course. Academic Press, 1972.
[23] G. W. Stewart. Introduction to Matrix Computations. Academic Press, New York, 1973.
[24] Robert J. Vanderbei. Linear Programming: Foundations and Extensions. Kluwer Academic
Publishers, Hingham, MA, 1997.

12

You might also like