0% found this document useful (0 votes)

30 views30 pages

Nonsmooth DC Programming for Clustering

This paper presents a novel algorithm for addressing minimum sum-of-squares clustering problems using a nonsmooth difference of convex (DC) programming approach. The authors provide characterizations of critical points and introduce an efficient algorithm that is tested against other clustering methods on large real-world datasets. The results indicate that the proposed algorithm performs particularly well in handling large datasets compared to existing techniques.

Uploaded by

bkomachi.arimakana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views30 pages

Nonsmooth DC Programming for Clustering

Uploaded by

bkomachi.arimakana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Author’s Accepted Manuscript

Nonsmooth DC programming approach to the

minimum sum-of-squares clustering problems

Adil M. Bagirov, Sona Taheri, Julien Ugon

[Link]/locate/pr

PII: S0031-3203(15)00431-8
DOI: [Link]
Reference: PR5574
To appear in: Pattern Recognition
Received date: 2 April 2015
Revised date: 16 October 2015
Accepted date: 15 November 2015
Cite this article as: Adil M. Bagirov, Sona Taheri and Julien Ugon, Nonsmooth
DC programming approach to the minimum sum-of-squares clustering problems,
Pattern Recognition, [Link]
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Nonsmooth DC programming approach to the minimum
sum-of-squares clustering problems
Adil M. Bagirov∗ Sona Taheri Julien Ugon

Abstract
This paper introduces an algorithm for solving the minimum sum-of-squares clus-
tering problems using their difference of convex representations. A nonsmooth noncon-
vex optimization formulation of the clustering problem is used to design the algorithm.
Characterizations of critical points, stationary points in the sense of generalized gra-
dients and inf-stationary points of the clustering problem are given. The proposed
algorithm is tested and compared with other clustering algorithms using large real
world data sets.

1 Introduction
Clustering is an unsupervised partitioning technique dealing with the problems of orga-
nizing a collection of patterns into clusters based on similarity. Most clustering algorithms
are based on the hierarchical and partitional approaches. Algorithms based on the hierar-
chical approach generate a dendrogram representing the nested grouping of patterns and
similarity levels at which groupings change [22]. Partitional clustering algorithms find the
partition that optimizes a clustering criterion [22]. In this paper we develop a partitional
clustering algorithm. More specifically, we develop an algorithm for solving the minimum
sum-of-squares clustering (MSSC) problems.
To date various heuristics such as the k-means algorithm and its variations have been
developed to solve the MSSC problem (see, for example, [23, 24] and references therein).
The global k-means algorithm and its many modifications are among the most efficient
algorithms for solving the MSSC problem [6, 12, 14, 25, 27, 28, 30].
The MSSC can be formulated as a mixed integer nonlinear programming or noncon-
vex nonsmooth optimization problems [10, 13, 33]. Different optimization techniques have
been applied to solve it. These techniques include branch and bound [18] and interior
point methods [19], nonsmooth optimization algorithms [8, 11, 13], algorithms based on
hyperbolic smoothing technique [9, 36, 37], the variable neighborhood search [20], simu-
lated annealing [31], tabu search [1] and genetic algorithms [29]. Not all these algorithms
are efficient for solving the MSSC in very large data sets.
The objective functions of the clustering problems, called cluster functions, can be
represented as a difference of convex (DC) functions. Above mentioned algorithms do not
exploit this special structure of the clustering problem. There are several papers where
∗
Faculty of Science and Technology, Federation University Australia, Victoria, Australia; Phone:
+61353276306; Fax: +61353279289; Email: [Link]@[Link]

1
the DC representation of the MSSC problems is used to design algorithms. In [16], the
truncated codifferential method is applied to solve the MSSC using its DC representa-
tion. The branch and bound method was modified for such problems in [34] using their
DC representation. In [2] an algorithm based on DC programming and DC Algorithms
(DCA) is introduced. In [5], the authors use the hard combinatorial optimization model
to formulate MSSC as a DC program and propose an algorithm based on DCA. Such an
approach allows one to make simpler and less expensive computations in the resulting
DCA. In [3], the DCA and a Gaussian kernel are applied to design an algorithm to solve
the MSSC problem.
In this paper, we propose a new approach for solving the MSSC problems using their
DC representations. The main contributions of this paper are: (i) the characterization of
critical points, stationary points in the sense of generalized gradients and inf-stationary
points of the MSSC problem; (ii) an algorithm for solving the MSSC problem based on
its DC representation; (iii) convergence results for the algorithm. Results of numerical
experiments on some real world data sets are reported and the proposed algorithm is
compared with several other clustering algorithms. It is demonstrated that the proposed
algorithm is especially efficient for solving the MSSC problems in very large data sets.
The rest of the paper is organized as follows. Section 2 provides some preliminaries on
DC functions and nonsmooth analysis. DC representations of cluster functions are given
in Section 3. Optimality conditions for the auxiliary clustering problem are studied in
Section 4 and for the clustering problem in Section 5. Section 6 presents an algorithm
for solving clustering problems. An incremental algorithm is described in Section 7. The
implementation of algorithms is discussed in Section 8. Numerical results are reported in
Section 9 and Section 10 contains some concluding remarks.

2 Preliminaries
In this section we give some results on nonsmooth analysis and DC functions used through-
out the paper. In what follows n
Pn we denote by R the n-dimensional Euclidean space with
the inner product hu, vi = i=1 ui vi and the associated norm kuk = hu, ui1/2 , u, v ∈ Rn .
Bε (x) = {y ∈ Rn : ky − xk < ε} is the open ball centered at x with the radius ε > 0.
Let f : Rn → R be a convex function. Its subdifferential at x ∈ Rn is defined as:
∂c f (x) = {ξ ∈ Rn : f (y) − f (x) ≥ hξ, y − xi ∀y ∈ Rn } .
A function f : Rn → R is called a locally Lipschitz on Rn if for any bounded subset
X ⊂ Rn there exists L > 0 such that
|f (x) − f (y)| ≤ Lkx − yk ∀x, y ∈ X.
The generalized derivative of a locally Lipschitz function f at a point x with respect to a
direction u ∈ Rn is defined as [15]:
f (y + αu) − f (y)
f 0 (x, u) = lim sup .
α↓0,y→x α
The subdifferential ∂f (x) of the function f at x is:
∂f (x) = ξ ∈ Rn : f 0 (x, u) ≥ hξ, ui ∀u ∈ Rn .

2
According to Rademacher’s theorem any locally Lipschitz function defined on Rn is dif-
ferentiable almost everywhere and its subdifferential can also be defined as:

∂f (x) = conv lim ∇f (xi ) : xi → x and ∇f (xi ) exists .
i→∞

Here “conv” denotes the convex hull of a set. Each vector ξ ∈ ∂f (x) is called a subgradient.
For convex function f : Rn → R one has ∂f (x) = ∂c f (x), x ∈ Rn . From now on we will
use the notation ∂f for subdifferentials of convex functions.
Now assume that the function f is directionally differentiable at x that is the limit
f (x + αu) − f (x)
f 0 (x, u) = lim
α↓0 α

exists for any u ∈ Rn . This function is called regular at x if f 0 (x, u) = f 0 (x, u), ∀u ∈ Rn .
Definition 1. f : Rn → R is called a DC function if there exist convex functions g, h :
Rn → R such that:
f (x) = g(x) − h(x), x ∈ Rn .
Here g − h is called a DC decomposition of f while g and h are DC components of f .
A function f is locally DC if for any x0 ∈ Rn , there exist ε > 0 such that f is DC on the
ball Bε (x0 ). It is well known that every locally DC function is DC [21]. Note that a DC
function has infinitely many DC decompositions.
An unconstrained DC program is an optimization problem of the form:

minimize f (x) = g(x) − h(x) subject to x ∈ Rn . (1)

In general, nonsmooth DC functions are not regular and the Clarke subdifferential
calculus exists for such functions only in the form of inclusions:

∂f (x) ⊆ ∂g(x) − ∂h(x). (2)

Such a calculus cannot be used to compute subgradients of the function f .

A point x∗ ∈ Rn is called a local minimizer of the problem (1) if there exists ε > 0
such that f (x∗ ) ≤ f (x) for all x ∈ Bε (x∗ ).
Proposition 1. [17] A point x∗ to be a local minimizer of the problem (1) it is necessary
that
∂h(x∗ ) ⊂ ∂g(x∗ ). (3)
It is clear that if a point x∗ is a local minimizer of the problem (1) then

0 ∈ ∂f (x∗ ) (4)

and
∂h(x∗ ) ∩ ∂g(x∗ ) 6= ∅. (5)
Points satisfying (3) are called inf-stationary, points satisfying (4) are called Clarke sta-
tionary and points satisfying (5) are called critical points of the problem (1). In general,
any inf-stationary point is also a Clarke stationary and a critical point. Furthermore, any
Clarke stationary point is also a critical point.

3
3 DC programming approach to clustering problems
In this section we give a nonsmooth optimization formulation of clustering problems and
their DC representations.
In cluster analysis we assume that we are given a finite set of points A in the n−dimensional
space Rn , that is A = {a1 , . . . , am }, where ai ∈ Rn , i = 1, . . . , m. The hard unconstrained
clustering problem is the distribution of the points of the set A into a given number k of
disjoint subsets Aj , j = 1, . . . , k such that:

1. Aj 6= ∅ and Aj Al = ∅, j, l = 1, . . . , k, j 6= l.
T

k
Aj .
S
2. A =
j=1

The sets Aj , j = 1, . . . , k are called clusters and each cluster Aj can be identified by its
center xj ∈ Rn , j = 1, . . . , k. The problem of finding these centers is called the k-clustering
(or k-partition) problem. In order to formulate the clustering problem one needs to define
the similarity (or dissimilarity) measure. In this paper, the similarity measure is defined
using the L2 norm:
n
X
d2 (x, a) = (xi − ai )2 .
i=1

The nonsmooth optimization formulation of the MSSC problem is [10, 13]:

minimize fk (x) subject to x = (x1 , . . . , xk ) ∈ Rnk , (6)

where
1 X
fk (x1 , . . . , xk ) = min d2 (xj , a). (7)
m j=1,...,k
a∈A

The objective function fk in Problem (6) can be expressed as a DC function:

fk (x) = fk1 (x) − fk2 (x), x = (x1 , . . . , xk ) ∈ Rnk , (8)

where
k k
1 XX 1 X X
fk1 (x) = d2 (xj , a), fk2 (x) = max d2 (xs , a).
m m j=1,...,k
a∈A j=1 a∈A s=1,s6=j

Since the function d2 is convex in x the function fk1 as a sum of convex functions is also
convex. The function fk2 is a sum of maxima of sum of convex functions. Since the sum
of convex functions is convex, the functions under maximum are convex. Furthermore,
since the maximum of a finite number of convex functions is also convex, the function fk2
is a sum of convex functions and therefore it is also convex.
Note that the similarity measure can also be defined using other norms and in par-
ticular, the L1 -norm. However, in this case the clustering function fk is more complex
and both functions fk1 and fk2 are nonsmooth whereas for d2 only the function fk2 is
nonsmooth.

4
Problem (6) is a global optimization problem, the objective function fk in this problem
has many local minimizers and only its global minimizers provide the best cluster structure
of a data set with the least number of clusters. In general, conventional global optimization
methods cannot be applied to solve this problem in large data sets. Therefore in such
data sets heuristics and deterministic local search algorithms are the only choice. But the
success of these algorithms heavily depends on the choice of starting cluster centers and
the development of efficient procedures for generating starting clusters centers is crucial for
the success of such algorithms. We apply an approach introduced in [28] to find starting
cluster centers. This approach involves the solution of the so-called auxiliary clustering
problem.
Assume that the solution x1 , . . . , xk−1 , k ≥ 2 to the (k − 1)-clustering problem is
known. Denote by rk−1a the distance between the data point a ∈ A and the closest cluster
center among k − 1 centers x1 , . . . , xk−1 :
n o
a
rk−1 = min d2 (x1 , a), . . . , d2 (xk−1 , a) . (9)

The k-th auxiliary cluster function is defined as [6]:

1 X
f¯k (y) =
a
min rk−1 , d2 (y, a) , y ∈ Rn . (10)
m
a∈A

This function is nonsmooth, locally Lipschitz, directionally differentiable and as a sum of

minima of convex functions it is, in general, nonconvex. It is obvious that

f¯k (y) = fk (x1 , . . . , xk−1 , y), ∀y ∈ Rn .

A problem:
minimize f¯k (y) subject to y ∈ Rn (11)
is called the k-th auxiliary clustering problem [6]. The DC representation of the function
f¯k is as follows:
f¯k (y) = f¯k1 (y) − f¯k2 (y) (12)
where
1 X a 1 X
f¯k1 (y) = rk−1 + d2 (y, a) , f¯k2 (y) = a

max{rk−1 , d2 (y, a)}.
m m
a∈A a∈A

4 Optimality conditions for auxiliary clustering problem

In this section we study optimality conditions for Problem (11) using its DC representation.
The function f¯k1 is continuously differentiable on Rn and its gradient at y ∈ Rn is:
2 X
∇f¯k1 (y) = (y − a). (13)
m
a∈A

The function f¯k2 , in general, is nondifferentiable and to write its subdifferential at a given
point y ∈ Rn introduce the following sets:
a a
Ā1 (y) = {a ∈ A : rk−1 > d2 (y, a)}, Ā2 (y) = {a ∈ A : rk−1 < d2 (y, a)},

5
a
Ā3 (y) = {a ∈ A : rk−1 = d2 (y, a)}.
We rewrite the function f¯k2 at y as:
 
1  X X X
f¯k2 (y) = a
a
rk−1 + d2 (y, a) + max rk−1 , d2 (y, a)  .
m
a∈Ā1 (y) a∈Ā2 (y) a∈Ā3 (y)

Then its subdifferential at y is:

 
2  X X
∂ f¯k2 (y) = (y − a) + conv{0, (y − a)} . (14)
m
a∈Ā2 (y) a∈Ā3 (y)

Proposition 2. The generalized subdifferential ∂ f¯k (y) of the function f¯k at y ∈ Rn can
be given as:
∂ f¯k (y) = ∇f¯k1 (y) − ∂ f¯k2 (y).

Proof. Denote by D(y) = ∇f¯k1 (y) − ∂ f¯k2 (y). The inclusion ∂ f¯k (y) ⊂ D(y) follows from
(2). Therefore we prove only the opposite inclusion. As finite valued convex functions,
defined on Rn , f¯k1 and f¯k2 are directionally differentiable. Therefore, the function f¯k is
also directionally differentiable and
0
f¯k0 (y, u) = f¯k1 0
(y, u) − f¯k2 (y, u), u ∈ Rn .

Since the function f¯k is DC, the function −f¯k is also DC and

−f¯k0 (y, u) = f¯k2

0 0
(y, u) − f¯k1 (y, u), u ∈ Rn .

It is known that (see, for example, [7])

0 0
f¯k1 (y, u) = h∇f¯k1 (y), ui, f¯k2 (y, u) = max hξ, ui.
ξ∈∂ f¯k2 (y)

Then

(−f¯k )0 (y, u) ≥ (−f¯k )0 (y, u)

0 0
= f¯k2 (y, u) − f¯k1 (y, u)
= max ξ − ∇f¯k1 (y), u
ξ∈∂ f¯k2 (y)

for all u ∈ Rn . This means that (−f¯k )0 (y, u) ≥ hv, ui for all u ∈ Rn and v ∈ −D(y) which
in turn due to the convexity of the set D(y) implies that −D(y) ⊆ ∂(−f¯k )(y) = −∂ f¯k (y).
Therefore D(y) ⊆ ∂ f¯k (y).

Proposition 3. The subdifferential ∂ f¯k2 (y ∗ ) is singleton at any inf-stationary point y ∗ of

Problem (11) and this subdifferential is:
 
2  X 
∂ f¯k2 (y ∗ ) = (y ∗ − a) . (15)
m ∗

a∈Ā2 (y )

6
Proof. The subdifferential ∂ f¯k1 (y) is singleton at any y ∈ Rn and therefore it follows from
the definition of inf-stationary points (3) that the set ∂ f¯k2 (y ∗ ) must be singleton at all
such points.
It is clear that the first term in (14) is a singleton. Therefore at the inf-stationary
point y ∗ the second term must be a singleton (namely it has to be {0}). That is y ∗ = a
for every a ∈ Ā3 (y ∗ ) meaning that the subdifferential ∂ f¯k2 (y ∗ ) is given by (15).

Remark 1. The condition y ∗ = a for every a ∈ Ā3 (y ∗ ) implies that y ∗ = xj for some
j = 1, . . . , k − 1. Furthermore since the set Ā3 (y ∗ ) is singleton the cluster Aj contains only
one data point.

Proposition 4. For a point y ∗ to be a local minimizer of Problem (11) it is necessary

that X
(y ∗ − a) = 0. (16)
a∈Ā1 (y ∗ )

Proof. Any local minimizer of Problem (11) is an inf-stationary point to this problem.
Then the proof follows from the expression of the gradient ∇f¯k1 (y ∗ ), Proposition 3 and
Remark 1.

Proposition 5. The sets of Clarke stationary and critical points of Problem (11) coincide
and they are given by:
S = {y ∈ Rn : ∇f¯k1 ∈ ∂ f¯k2 (y)}. (17)

Proof. Assume the point ȳ is a critical point of Problem (11). Since the subdifferential
f¯k1 at any y ∈ Rn is singleton we get that ∇f¯k1 (ȳ) ∈ ∂ f¯k2 (ȳ). Then it follows from
Proposition 2 that 0 ∈ ∂ f¯k (ȳ) that is ȳ is Clarke stationary.
Now assume that ȳ is Clarke stationary. Then Proposition 2 implies that ∇f¯k1 (ȳ) ∈
¯
∂ fk2 (ȳ) and therefore, ȳ is critical point. The expression (17) for the set of Clarke sta-
tionary points is obvious.

Remark 2. It is obvious that any inf-stationary point of Problem (11) is also Clarke
stationary and critical point of this problem. In general, the set of inf-stationary points
is a strict subset of these two sets which is demonstrated by the following example.

Example 1. Take A = {0, 5, 7}, n = 1 and k = 2. The solution to the (k − 1)-cluster

problem is the barycenter of the data set x1 = 4. The set of inf-stationary points is:
(−∞, −4) ∪ {0, 7} ∪ (10, ∞) and the set of critical points and Clarke stationary points is:
(−∞, −4) ∪ {−4, 0, 4, 6, 7} ∪ (10, ∞). Clearly the set of critical points and the set of Clarke
stationary points are bigger than the set of inf-stationary points.

5 Optimality conditions for clustering problem

In this section we formulate optimality conditions for Problem (6).

Proposition 6. Let x ∈ Rnk be a local minimizer of the problem (6). Then the objective
function fk is continuously differentiable at this point and ∇fk (x) = 0.

7
Proof. First we derive expressions for subdifferentials of the functions fk1 and fk2 . The
function fk1 is differentiable and its gradient is:

∇fk1 (x) = 2(x − A).

b (18)

Here A
b = (A
b1 , . . . , A
bk ), A
b1 = . . . = A
bk = (â1 , . . . , ân ) and

1 X
ât = at .
m
a∈A

This means that the subdifferential ∂fk1 (x) is a singleton for any x ∈ Rn .
In general, the function fk2 is nonsmooth. To compute its subdifferential consider the
following function and a set [28]:
k
X
ϕa (x) = max d2 (xs , a), (19)
j=1,...,k
s=1,s6=j

and  
 k
X 
Ra (x) = j ∈ {1, . . . , k} : d2 (xs , a) = ϕa (x) . (20)
 
s=1,s6=j

The subdifferential ∂ϕa (x) of the function ϕa at x is as follows:

n o
∂ϕa (x) = conv V ∈ Rnk : V = 2(x̃j − Ãji ), j ∈ Ra (x) , (21)

where
x̃j = x1 , . . . , xj−1 , 0n , xj+1 , . . . , xk , Ãji = Ãji1 , . . . , Ãjik ∈ Rnk ,

and
Ãjit = ai , t = 1, . . . , k, t 6= j, Ãjij = 0n .
Then the subdifferential ∂fk2 (x) can be expressed as:
1 X
∂fk2 (x) = ∂ϕa (x). (22)
m
a∈A

The local minimizer x of the problem (6) is also its inf-stationary point. Since the
subdifferential ∂fk1 (x) is a singleton at any x it follows from (3) that the subdifferential
∂fk2 (x) is a singleton at any inf-stationary point x. This means that the subdifferentials
∂ϕa (x), i = 1, . . . , m are also singletons at any stationary point x which in turn means
that at any such point the index sets Ra (x) are singletons for all a ∈ A. This implies that
for each a ∈ A there exists a unique j ∈ {1, . . . , k} such that Ra (x) = {j}. It follows from
the DC representation of the function fk that this j stands for the index of the cluster to
which the data point a belongs. Thus, if x is an inf-stationary point, then for each data
point a ∈ A there exists only one cluster center xj such that

min d2 (xs , a) = d2 (xj , a)

s=1,...,k

8
and d2 (xs , a) > d2 (xj , a) for any other s = 1, . . . , k, s 6= j. This means that the clustering
function fk is continuously differentiable at any inf-stationary point x of Problem (6) and
the Clarke subdifferential of this function are singletons at such points, that is

∂fk (x) = {∇fk (x)} ,

where
2 X X
∇fk (x) = (xj − a).
m
a∈A j∈Ra (x)

Then the necessary condition for a minimum is ∇fk (x) = 0 and in addition each cluster
center xj , j = 1, . . . , k attracts data points a ∈ A such that j ∈ Ra (x).

Proposition 7. The generalized subdifferential ∂fk (x) of the function fk at x ∈ Rnk is:

∂fk (x) = ∇fk1 (x) − ∂fk2 (x).

Proof. The proof is similar to that of Proposition 2.

Proposition 8. The sets of Clarke stationary and critical points of Problem (6) coincide
and at these points
∇fk1 (x) ∈ ∂fk2 (x).

Proof. The proof follows from Proposition 7 and the definitions of Clarke stationary and
critical points.

6 An algorithm for solving optimization problems

In this section we design one algorithm for solving both the clustering problem (6) and
the auxiliary clustering problem (11). It is easy to observe that both problems can be
formulated as the following unconstrained DC programming problem:

minimize f (x) subject to x ∈ Rn (23)

where f (x) = f1 (x) − f2 (x), the function f1 is continuously differentiable convex and the
function f2 is, in general, nonsmooth convex function.
According to Propositions 2 and 7 for the objective function f in the problem (23)

∂f (x) = conv {∇f1 (x) − ξ : ξ ∈ ∂f2 (x)} . (24)

Take any λ > 0 and define the following set at a point x ∈ Rn :

Q1 (x, λ) = conv {∇f1 (x + λu) : u ∈ S1 } .

Here S1 = {u ∈ Rn : kuk = 1} is the unit sphere. It is obvious that the set Q1 (x, τ ) is
convex and compact for any x ∈ Rn and λ > 0. Moreover, due to the convexity of f1

f1 (x + λu) − f1 (x) ≤ λh∇f1 (x + λu), ui, u ∈ S1 .

9
Since f2 (x + λu) − f2 (x) ≥ λhξ, ui for any ξ ∈ ∂f2 (x) we get

f (x + λu) − f (x) ≤ λh∇f1 (x + λu) − ξ, ui, (25)

and
f (x + λu) − f (x) ≤ λ max hη − ξ, ui.
η∈Q1 (x,λ)

Next we define a (λ, δ)-inf-stationary and (λ, δ)-stationary points.

Definition 2. A point x∗ ∈ Rn is called a (λ, δ)-inf-stationary of the problem (23) iff:

∂f2 (x∗ ) ⊂ Q1 (x∗ , λ) + Bδ (0). (26)

Definition 3. A point x∗ ∈ Rn is called a (λ, δ)-stationary of the problem (23) if there

exists ξ ∈ ∂f2 (x∗ ) such that
ξ ∈ Q1 (x∗ , λ) + Bδ (0). (27)

Assume that a point x ∈ Rn is not a (λ, δ)-stationary point. Then kξ − zk ≥ δ for all
ξ ∈ ∂f2 (x) and z ∈ Q1 (x, λ). Take any ξ ∈ ∂f2 (x) and define the following set:

Q̄ξ (x, λ) = Q1 (x, λ) − ξ.

Then we have
f (x + λu) − f (x) ≤ λ max hz, ui ∀u ∈ Rn . (28)
z∈Q̄ξ (x,λ)

Proposition 9. Assume that the point x is not a (λ, δ)-stationary. Then the direction
z0
u0 = − ,
kz0 k

where z0 = argmin{(1/2)kzk2 : z ∈ Q̄ξ (x, λ)} 6= 0, is a descent direction of the function f

at x and
f (x + λu0 ) − f (x) ≤ −λkz0 k. (29)

Proof. From the necessary condition for a minimum we have

hz0 , z − z0 i ≥ 0, ∀z ∈ Q̄ξ (x, λ)

or hz0 , zi ≥ kz0 k2 , ∀z ∈ Q̄ξ (x, λ). Dividing both sides by −kz0 k we have hz, u0 i ≤
−kz0 k, ∀z ∈ Q̄ξ (x, λ). Then the proof follows from (28).

It follows from Proposition 9 that if the point x is not a (λ, δ)-stationary then the
set Q̄ξ (x, λ) can be used to find a direction of sufficient decrease of the function f at
x. However, the computation of the set Q̄ξ (x, λ) is not always possible. Next we design
an algorithm which uses a finite number of elements from Q̄ξ (x, λ) to compute descent
directions.
Let λ > 0, δ > 0 be given numbers.

10
Algorithm 1 Computation of descent directions.
Step 1: (Initialization). Select a search control parameter c ∈ (0, 1), the initial direction
u1 ∈ S1 , compute the gradient ∇f1 (x + λu1 ) and the subgradient ξ ∈ ∂f2 (x). Set
Q̄12 := {∇f1 (x + λu1 ) − ξ} and j := 1.
Step 2: (Computation of least distance subgradient). Compute z j = argmin{(1/2)kzk2 :
z ∈ Q̄j2 }.
Step 3: (Stopping criterion). If kz j k ≤ δ then stop. x is a (λ, δ)-stationary point.
Step 4: (Computation of a search direction). Compute uj+1 = −kz j k−1 z j . If

f (x + λuj+1 ) − f (x) ≤ −cλkz j k (30)

then stop. The direction ū = uj+1 is a direction of sufficient decrease.

j+1
Step 5: (Computation of n a new gradient). Computeo the gradient ∇f1 (x + λu ) and
the set Q̄j+1
2 := conv Q̄j2 ∪ {∇f1 (x + λuj+1 ) − ξ} . Set j := j + 1 and go to Step 2.

Next we prove that Algorithm 1 terminates after a finite number of iterations.

Proposition 10. Assume that M ∈ (0, ∞) is such that

max {max {k∇f1 (x + λu)k, u ∈ S1 } , max{kξk : ξ ∈ ∂f2 (x)}} ≤ M. (31)

Then Algorithm 1 terminates after at most j0 iterations where

& 4 '
4 M
j0 = 2
. (32)
(1 − c) δ

Here d·e is a ceiling of a number and c ∈ (0, 1).

Proof. If the algorithm does not stop at the j-th iteration, j ≥ 1, then

kz j k > δ (33)

and
f (x + λuj+1 ) − f (x) > −cλkz j k. (34)
First, we will show that if the algorithm does not terminate at the j-th iteration then
wj+1 = ∇f1 (x + λuj+1 ) − ξ ∈/ Q̄j2 . It is obvious that

f (x + λuj+1 ) − f (x) ≤ λhwj+1 , uj+1 i. (35)

From the necessary condition for a minimum we have

hz j , z − z j i ≥ 0, ∀z ∈ Q̄k2

or hz j , zi ≥ kz j k2 , ∀z ∈ Q̄j2 . Dividing both sides by −kz j k we have

hz, uj+1 i ≤ −kz j k, ∀z ∈ Q̄j2 . (36)

From (34) and (35) we get

hwj+1 , uj+1 i > −ckz j k. (37)

11
/ Q̄j2 .
Since c ∈ (0, 1) we have that the vector wj+1 does not satisfy (36) and therefore wj+1 ∈
(37) can be rewritten as:
hwj+1 , z j i < ckz j k2 . (38)
It is clear that for any t ∈ [0, 1]

kz j+1 k2 ≤ ktwj+1 + (1 − t)z j k2

= kz j + t(wj+1 − z j )k2
= kz j k2 + 2thwj+1 − z j , z j i + t2 kwj+1 − z j k2
= kz j k2 + 2thwj+1 , z j i − 2tkz j k2 + t2 kwj+1 − z j k2
< kz j k2 + 2t(c − 1)kz j k2 + t2 kwj+1 − z j k2 ,

where the last inequality follows from (38). (31) implies that kξk ≤ M and k∇f1 (x+λu)k ≤
M, for all u ∈ S1 . Then we have

kz j+1 k2 ≤ kz j k2 + 2t(c − 1)kz j k2 + 4t2 M 2 . (39)

Select t as
(1 − c)kz j k2
t= .
4M 2
It is clear that t ∈ (0, 1). Putting this t in (39) we get

(1 − c)2 kz j k4
kz j+1 k2 ≤ kz j k2 − ,
4M 2
which together with (33) implies that

kz j+1 k2 ≤ kz j k2 − C(δ).
(1−c)2 δ 4
Here C(δ) = 4M 2
. For j ≥ 1 we have

0 ≤ kz j+1 k2 ≤ kz j k2 − C(δ) ≤ kz j−1 k2 − 2C(δ) ≤ . . . ≤ kz 1 k2 − jC(δ).

Since C(δ) is constant this means that the algorithm must terminate after a finite number
of iterations. In order to estimate this number notice that according to (31), kz 1 k2 ≤ M 2 .
Then the inequality kz 1 k2 − jC(δ) ≤ δ 2 will be satisfied in at most j0 iterations where j0
is given by (32).

Next we describe an algorithm for finding (λ, δ)-stationary points, λ > 0, δ > 0.

12
Algorithm 2 Finding (λ, δ)-stationary points.
Step 1: (Initialization). Select any starting point x1 ∈ Rn , numbers c1 ∈ (0, 1) and
c2 ∈ (0, c1 ]. Set j := 1.
Step 2: Apply Algorithm 1 with c = c1 to find a search direction at the point xj . This
algorithm terminates after a finite number of iterations lj > 0 with either kz lj k ≤ δ or
finds a descent direction ulj ∈ S1 such that

f (xj + λulj ) − f (xj ) ≤ −c1 λkz lj k. (40)

Step 3: If kz lj k ≤ δ, then stop. xj is a (λ, δ)-stationary point.

Step 4: If (40) holds then find the step-length αj > 0 as follows:

αj = argmax{α ≥ 0 : f (xj + αulj ) − f (xj ) ≤ −c2 αkz lj k}.

Step 5: Set xj+1 := xj + αj ulj , j := j + 1 and go to Step 2.

Proposition 11. Assume that f∗ = inf{f (x), x ∈ Rn } > −∞. Then Algorithm 2 finds
(λ, δ)-stationary points in at most K iterations where
f (x1 ) − f∗

K= . (41)
c2 λδ
Proof. Assume the contrary, that is the sequence {xj } is infinite and points xj are not
(λ, δ)-stationary for all j = 1, 2, . . .. This means that kz lj k > δ, j = 1, 2, . . .. Since c2 ≤ c1
it follows from (40) that αj ≥ λ for any j > 0. Then
f (xj+1 ) − f (xj ) ≤ −c2 λδ.
Therefore
f (xj+1 ) − f (x1 ) ≤ −c2 jλδ,
which means that f (xj ) → −∞ as j → ∞ which is a contradiction that is the algorithm
is finite convergent. Since f∗ ≤ f (xj+1 ) it is obvious the maximum number of iterations
K is given by (41).
Corollary 1. For both clustering and auxiliary clustering problems f∗ ≥ 0 and the esti-
mation (41) can be given as:
f (x1 )

K= .
c2 λδ
Now we can design an algorithm for finding Clarke stationary points of Problem (23).

Algorithm 3 Finding Clarke stationary points.

Step 1: (Initialization). Select any starting point x1 ∈ Rn , sequences {λi }, {δi } such
that λi ↓ 0, δi ↓ 0 as i → ∞ and an optimality tolerance ε > 0. Set j := 1.
Step 2: Apply Algorithm 2 starting from the point xj to find (λj , δj )-stationary point
xj+1 .
Step 3: Set j := j + 1. If λj < ε, δj < ε then stop. Otherwise go to Step 2.

The convergence result for Algorithm 3 is given in the next proposition.

13
Proposition 12. Assume the level set J (x1 ) = {x ∈ Rn : f (x) ≤ f (x1 )} is bounded and
ε = 0. Then all limit points of the sequence {xj } generated by Algorithm 3 are Clarke
stationary points of Problem (23).
Proof. Since Algorithm 3 is a descent algorithm it follows that the sequence {xj } ⊂ J (x1 )
and because J (x1 ) is a compact set this sequence has at least one limit point. Assume x̄
is a limit of {xj } and there exists the subsequence {xjl } such that xjl → x̄ as l → ∞.
After each iteration jl we get (λjl , δjl )-stationary point xjl +1 which means that there
exists ξ jl +1 ∈ ∂f2 (xjl +1 ) such that
ξ jl +1 ∈ Q1 (xjl +1 , λjl ) + Bδjl (0).
Replacing jl by jl − 1 we have
ξ jl ∈ Q1 (xjl , λjl −1 ) + Bδjl −1 (0). (42)
Continuity of the gradient ∇f1 (x) implies that for any γ > 0 there exists l0 > 0 such that
k∇f1 (xjl + λjl −1 u) − ∇f1 (x̄)k < γ, ∀l > l0 and u ∈ S1 .
This means that
Q1 (xjl , λjl −1 ) ⊂ ∇f1 (x̄) + Bγ (0) ∀l > l0 . (43)
From (42) and (43) we get
ξ jl ∈ ∇f1 (x̄) + Bγ+δjl −1 (0) ∀l > l0 . (44)

The mapping x 7→ ∂f2 (x) is upper semicontinuous. Then for any θ > 0 there exists ¯l > 0
such that
ξ jl ∈ ∂f2 (x̄) + Bθ (0), ∀l > ¯l. (45)
Without loss of generality assume that there exists ξ¯ ∈ ∂f2 (x̄) such that kξ jl − ξk
¯ < θ for
¯
all l > l. Then it follows from (44) that
¯ < θ + γ + δj −1 ∀ l > ˆl = max{l0 , ¯l}.
k∇f (x̄) − ξk l

Since γ and θ are arbitrary and δj ↓ 0 as j → ∞ we have

ξ¯ − ∇f1 (x̄) = 0, ξ¯ ∈ ∂f2 (x̄).
Then it follows that 0 ∈ ∂f (x̄) that is x̄ is the Clarke stationary.

Finally, we design an algorithm for finding inf-stationary points of Problem (23). This
algorithm involves a special procedure to escape from the Clarke stationary points which
are not inf-stationary.
Assume that the point x∗ is a Clarke stationary found by Algorithm 3. If the function
f2 is differentiable at this point then x∗ is inf-stationary point. Otherwise one can compute
the subgradient ξ ∈ ∂f2 (x∗ ) such that kξ − ∇f1 (x∗ )k > ε for some ε > 0. If ∂f2 (x∗ ) ⊂
∇f1 (x∗ ) + Bε (0) for some sufficiently small ε > 0 then the point x∗ can be considered as
an approximate inf-stationary point.
Our main assumption is: if the subdifferential ∂f2 (x) is not a singleton at a point
x ∈ Rn then we can always compute two subgradients ξ 1 , ξ 2 ∈ ∂f2 (x) such that ξ 1 6= ξ 2 .
We will show that this assumption is satisfied for Problems (6) and (11).

14
Proposition 13. Assume that the subdifferential ∂f2 (x) is not a singleton at x and the
subgradients ξ 1 , ξ 2 ∈ ∂f2 (x) are such that ξ 1 6= ξ 2 . Consider the direction ū = −kv̄k−1 v̄
where
v̄ = arg max k∇f1 (x) − ξ i k, i = 1, 2 .

(46)
Then
f 0 (x, ū) ≤ −kv̄k.

Proof. Since the subdifferential ∂f2 (x) is not a singleton and ξ 1 6= ξ 2 it follows that v̄ 6= 0.
For simplicity we assume that v̄ = ∇f1 (x) − ξ 2 . Then convexity of functions f1 and f2
implies that

f 0 (x, ū) =f10 (x, ū) − f20 (x, ū)

=h∇f1 (x), ūi − max hξ, ūi
ξ∈∂f2 (x)
2
≤h∇f1 (x) − ξ , ūi
= − k∇f1 (x) − ξ2 k < 0.

Thus, the direction ū is a descent direction at the point x.

Proposition 14. If at a Clarke stationary point x ∈ Rn of Problem (23) the subdifferential

∂f2 (x) is a singleton then x is also an inf-stationary point.

Proof. The proof follows from (24) and the definition of inf-stationary points.

Corollary 2. If at a Clarke stationary point x ∈ Rn of Problem (23) the subdifferential

∂f2 (x) is not a singleton then x is not an inf-stationary point.

Remark 3. Propositions 13, 14 and Corollary 2 show how one can design an algorithm
for finding inf-stationary points of Problem (23). Applying Algorithm 3 we can find
the Clarke stationary point x of the problem (23). If at this point the subdifferential
∂f2 (x) is a singleton then according to Proposition 14 it is also an inf-stationary. If this
subdifferential is not a singleton then Corollary 2 implies that this point is not an inf-
stationary and according to Proposition 13 we can find a descent direction from this point
which in turn allows us to find a new starting point for Algorithm 3.

An algorithm for finding inf-stationary points of Problem (23) proceeds as follows.

15
Algorithm 4 Finding inf-stationary points.
Step 1: (Initialization). Choose numbers c1 ∈ (0, 1), c2 ∈ (0, c1 ], c3 ∈ (0, 1/2] and an
optimality tolerance ε > 0. Select any starting point x1 ∈ Rn and set j := 1.
Step 2: Apply Algorithm 3 starting from the point xj and using constants c1 , c2 to find
Clarke stationary point x̄ with the optimality tolerance ε.
Step 3: If ∂f2 (x̄) ⊂ ∇f1 (x) + Bε (0) then stop. x̄ is an inf-stationary point.
Step 4: Compute subgradients ξ 1 , ξ 2 ∈ ∂f2 (x̄) such that

r = max kξ i − ∇f1 (x̄)k ≥ ε,

i=1,2

the direction ūj at x̄ using (46).

Step 5: Compute xj+1 = x̄ + αj ūj where

αj = argmax{α ≥ 0 : f (x̄ + αūj ) − f (x̄) ≤ −c3 αr},

set j := j + 1 and go to Step 2.

Proposition 15. Assume that f∗ > −∞ and the gradient ∇f1 : Rn → Rn satisfies
Lipschitz condition. Then Algorithm 4 terminates after finite number of iterations at an
inf-stationary point of Problem (23).
Proof. For simplicity assume that at the j-th iteration ūj = −kv̄ j k−1 v̄ j and v̄ j = ∇f1 (xj )−
ξ 1 , ξ 1 ∈ ∂f2 (xj ). Applying the mean value theorem to the function f1 we get that for some
σj ∈ (0, 1)

f (x̄ + αūj ) − f (x̄) =[f1 (x̄ + αūj ) − f1 (x̄)] − [f2 (x̄ + αūj ) − f2 (x̄)]
≤αh∇f1 (x̄ + ασj ūj ), ūj i − αhξ 1 , ūj i
≤αh∇f1 (x̄) − ξ 1 , ūj i + αh∇f1 (x̄ + ασj ūj ) − ∇f1 (x̄), ūj i.

Let L > 0 be a Lipschitz constant of the gradient ∇f1 . Then

kf1 (x̄ + ασj ūj ) − ∇f1 (x̄)k ≤ Lασj kūj k = Lασj .

Since kv̄ j k ≥ ε we get

f (x̄ + αūj ) − f (x̄) ≤αh∇f1 (x̄) − ξ 1 , ūj i + Lα2 σj

= − αk∇f1 (x̄) − ξ 1 k + Lα2 σj
<α(−r + Lα).

For ᾱ = r/2L we have

r2
f (x̄ + ᾱūj ) − f (x̄) < − ≤ −c3 ᾱr ≤ −c3 ᾱε.
4L
This means that at each iteration of Algorithm 4 αj ≥ ᾱ ≥ ε/2L and the function f
decreases by at least c3 ε2 /2L > 0 at each iteration. Since f∗ > −∞ the algorithm must
stop after finite number of iterations.

16
Next we will show that the gradient of the functions f¯k1 and fk1 are Lipschitz.

Proposition 16. The gradient of the function f¯k1 satisfies Lipschitz condition with the
constant L = 2.

Proof. It follows from (13) that for any y 1 , y 2 ∈ Rn

∇f¯k1 (y 1 ) − ∇f¯k1 (y 2 ) = 2(y 1 − y 2 ).

Then k∇f¯k1 (y 1 ) − ∇f¯k1 (y 2 )k = 2ky 1 − y 2 k that is the gradient ∇f¯k1 satisfies the Lipschitz
condition on Rn with the constant L = 2.

Proposition 17. The gradient of the function fk1 satisfies Lipschitz condition with the
constant L = 2.

Proof. The proof is similar to that of Proposition 16.

Finally, we demonstrate how two different subgradients from ∂ f¯k2 (y), y ∈ Rn and
∂fk2 (x), x ∈ Rnk can be computed if these subdifferentials are not singleton. First consider
the function f¯k2 . We can choose ξ 1 , ξ 2 ∈ ∂ f¯k2 (y) using (14) as follows
2 X
ξ1 = (y − a)
m
a∈Ā2 (y)

and
¯ ξ¯ = argmax ky − ak.
ξ 2 = ξ 1 + ξ,
a∈Ā3 (y)

It is clear that if ξ¯ = 0 then ∂ f¯k2 (y) is singleton.

Now consider the function fk2 . Define the following two sets

A1 = {a ∈ A : |Ra (x)| = 1}, A2 = {a ∈ A : |Ra (x)| ≥ 2}.

(The sets Ra (x) are defined in Section 5). If A1 = A then ∂ f¯k2 (x) is singleton. If |A2 | ≥ 1
then ∂ f¯k2 (x) is not singleton. Take any a ∈ A2 . Since |Ra (x)| ≥ 2 the point is attracted
by at least two cluster centers. Using two cluster centers we can compute two subgradients
for the function ϕa defined by (19).

7 Incremental algorithm
In this section we present an incremental algorithm for solving Problems (6) and (11)
using the DC approach. An important part of this algorithm is a procedure for finding
starting points for the l-th cluster center where 1 ≤ l ≤ k. This procedure was described
in detail in [28].

17
Algorithm 5 An incremental clustering algorithm.
Step 1: (Initialization). Compute the center x1 ∈ Rn of the set A. Set l := 1.
Step 2: (Stopping criterion). Set l := l + 1. If l > k then stop. The k-partition problem
has been solved.
Step 3: (Computation of a set of starting points for the auxiliary clustering problem).
Apply the procedure from [28] to find the set S1 ⊂ Rn of starting points for solving
the auxiliary clustering problem (11) for k = l.
Step 4: (Computation of a set of starting points for the l-th cluster center). Apply
Algorithm 4 to solve Problem (11) starting from each point y ∈ S1 . This algorithm
generates a set S2 ⊂ Rn of starting points for the l-th cluster center.
Step 5: (Computation of a set of cluster centers). For each ȳ ∈ S2 apply Algorithm
4 to solve Problem (6) starting from the point (x1 , . . . , xl−1 , ȳ) and find a solution
(ŷ 1 , . . . , ŷ l ). Denote by S3 ⊂ Rnl a set of all such solutions.
Step 6: (Computation of the best solution). Compute
n o
flmin = min fl (ŷ 1 , . . . , ŷ l ) : (ŷ 1 , . . . , ŷ l ) ∈ S3

and the collection of cluster centers (ȳ 1 , . . . , ȳ l ) such that fl (ȳ 1 , . . . , ȳ l ) = flmin .
Step 7: (Solution to the l-partition problem). Set xj := ȳ j , j = 1, . . . , l as a solution to
the l-th partition problem and go to Step 2.

Since the clustering Algorithm 5 applies Algorithm 4 to solve clustering problems it is

called the DCClust algorithm. It is easy to see that this algorithm in addition to the k-
partition problem solves also all intermediate l-partition problems where l = 1, . . . , k − 1.
Steps 4 and 5 are the most time-consuming steps of this algorithm as Algorithm 4 is
applied repeatedly.
Algorithm 4 is a local search algorithm. In order to find best or near best known
solutions to clustering problems applying this algorithm a large number of starting cluster
centers are used in the DCClust. To find these points we apply the procedure from [28].
This procedure generates starting cluster centers from different parts of the search space.
Such an approach allows one to compute global or near global solutions to clustering
problems which will be confirmed in Section 9 using computational results.

8 Implementation of algorithms
We compare the DCClust with the following algorithms:
1. The global k-means algorithm (GKM) [27].

2. The Multi-start modified global k-means algorithm (MS-MGKM) [28].

3. The version of the Algorithm 5 where in Steps 4 and 5 Algorithm 4 is replaced by

the DCA (MS-DCA).
All these algorithms are based on the incremental approach. The implementation of the
GKM and MS-MGKM is discussed in [27] and [28], respectively. Therefore we discuss
only the implementation of the DCClust and MS-DCA.

18
8.1 Implementation of DCClust
This algorithm contains a special procedure to generate starting cluster centers (Step
3) which is described in detail in [9, 28]. The choice of parameters in this procedure
is the same as in [9]. Algorithm 4 is applied in Step 4 of the DCClust to solve the
auxiliary clustering problem and in Step 5 to solve clustering problem. Parameters in this
algorithm are chosen as follows: c1 = 0.2, c2 = 0.01, c3 = 0.4, ε = 10−5 , δi ≡ 10−7 and
λ1 = 1, λi+1 = 0.2λi , i = 1, 2, . . .. The algorithm from [35] is applied to solve the quadratic
programming problem in Step 2 of Algorithm 1.

8.2 Implementation of the DCA

The DC Algorithm (DCA) is an algorithm for solving DC programming problems. A
detailed study of this algorithm can be found in [4, 32]. The generic DCA scheme for
solving the problem (23) is shown below.

Algorithm 6 DCA scheme for Problem (23)

Step 1: Select any starting point x1 ∈ Rn and set j := 1.
Step 2: Compute ξ j ∈ ∂f2 (xj ).
Step 3: If ξ j = ∇f1 (xj ) then stop.
Step 4: Find the solution xj+1 to the following convex optimization problem:

minimize f1 (x) − hξ j , x − xj i subject to x ∈ Rn . (47)

Step 5: Set j := j + 1 and go to Step 2.

In general, the sequence {xj } generated by the DCA converges to critical points of
the problem (23). Since for this problem the sets of Clarke stationary and critical points
coincide limit points of the sequence {xj } are also Clarke stationary.
In order to apply the DCA to solve Problem (11) the subgradient ξ j in Step 2 is
computed as (see (14)):
2 X
ξj = (xj − a), xj ∈ Rn .
m j
a∈Ā2 (x )

Then the solution xj+1 to the problem (47) in Step 4 can be expressed as follows:
 
1 X
xj+1 = |Ā2 (xj )|xj + a .
m j
S j a∈Ā1 (x ) Ā3 (x )

Applying (13) the stopping criterion in Step 3 can be given by

X
(xj − a) = 0.
a∈Ā1 (xj ) Ā3 (xj )
S

Next we describe the application of the DCA to solve the clustering problem (6). Let
xj = (x1j , . . . , xkj ) ∈ Rnk be a vector of cluster centers at the iteration j and A1 , . . . , Ak be
the cluster partition of the data set A given by these centers.

19
In order to compute the subgradient ξ j in Step 2 for each a ∈ A we compute the set
Ra (xj ) given by (20), take any p ∈ Ra (xj ) and compute the subgradient vaj ∈ ∂ϕa (xj )
using (21). Then we apply (22) to compute the subgradient ξ j . Thus we get the following
formula for the subgradient ξ j :
 
2 X X
ξj =  (x1j − a), . . . , (xkj − a)
m
a∈A\A1 a∈A\Ak
2
= (m − |A1 |)x1j − (mā − |A1 |ā1 )), . . . , (m − |Ak |)xkj − (mā − |Ak |āk ))
m
where āl is the center of the cluster Al , l = 1, . . . , k and ā is the center of the whole set A.
The solution xj+1 = (x1j+1 , . . . , xkj+1 ) to the problem (47) in Step 4 is given by:

|At | |At | t

xtj+1 = 1− xtj + ā , t = 1, . . . , k.
m m
Finally, the stopping criterion in Step 3 can be given by

|At | |At |

t
xj = 1 − xtj + āt , t = 1, . . . , k.
m m
To design the version of Algorithm 5 with the DCA in Steps 4 and 5 Algorithm 4 is
replaced by the above described version of the DCA.

Remark 4. Despite some similarities between the DCA and Algorithm 3 these two algo-
rithms are designed in different ways. In DCA a subgradient ξ j ∈ ∂f2 (xj ) is computed at
each global minimizer of the overestimation f1 (x) − hξ j , x − xj i, however, in Algorithm 3
this subgradient is updated at each iteration.

9 Numerical results
To test the DCClust algorithm and compare it with other three clustering algorithms
numerical experiments with a number of real-world data sets have been carried out. Algo-
rithms were implemented in Fortran 95 and compiled using the gfortran compiler. Com-
putational results were obtained on a PC with the CPU Intel(R) Core(TM) i5-3470S 2.90
GHz and RAM 8 GB. Eight data sets have been used in numerical experiments. Their
brief description is given in Table 1. The detailed description can be found in [26]. All
data sets contain only numeric features and they do not have missing values. To get as
more comprehensive picture about the performance of the algorithms as possible the data
sets were chosen so that: (i) the number of attributes is ranging from very few (2) to large
(128); (ii) the number of data points is ranging from tens of thousands (smallest 13,910)
to hundred of thousands (largest 434,874).
We computed up to 25 clusters in all data sets. The CPU time used by algorithms
is limited to 20 hours. Since all algorithms computes clusters incrementally we present
results with the maximum number of clusters obtained by an algorithm during this time
limit. Results for cluster function values found by different algorithms are presented in
Tables 2 and 3. In these tables we use the following notation:

20
Table 1: The brief description of data sets

Data sets Number of Number of

instances attributes
Gas Sensor Array Drift 13910 128
EEG Eye State 14980 14
D15112 15112 2
KEGG Metabolic Relation Network 53413 20
Shuttle Control 58000 9
Pla85900 85900 2
Skin Segmentation 245057 3
3D Road Network 434874 3

• k is the number of clusters;

• fbest (multiplied by the number shown after names of data sets) is the best known
value of the cluster function (7) (multiplied by m) for the corresponding number of
clusters;

• EA is the error in % by an algorithm A which is calculated as follows:

f¯ − fbest
EA = × 100%
fbest

where f¯ is the value of the clustering function obtained by an algorithm A;

• The sign “-” in tables shows that an algorithm failed to compute clusters in the
given time frame.

All data sets can be divided into two groups. The first group contains data sets with
small number of attributes (2 or 3). D15112, Pla85900, Skin Segmentation and 3D Road
Network data sets belong to this group. The number of points in these data sets ranges
from 15112 to 434874. Results presented in Tables 2 and 3 demonstrate that in these data
sets the performance of algorithms are similar in the sense of accuracy. All algorithms
can find at least near best known solutions in these data sets. Only exception is the case
when k = 25 in 3D Road Network data set where the DCClust algorithm failed to find
the best solution. Results also demonstrate that the GKM and MS-MGKM algorithms
are not efficient for solving clustering problems within a given timeframe in data sets with
hundreds of thousands of points.
The second group contains data sets with relatively large number of attributes. Gas
Sensor Array Drift, EEG Eye State, KEGG Metabolic Relation Network and Shuttle
Control data sets belong to this group. The number of attributes in these data sets ranges
from 9 to 128. Results show that all algorithms are efficient to find (near) best known
solutions in Gas Sensor Array Drift and EEG Eye State data sets. The MS-MGKM
algorithm failed to find such solutions in KEGG Metabolic Relation Network data set
for k = 15 and the GKM algorithm for k = 15, 20. Results for Shuttle Control data set

21
show that the DCClust algorithm failed to find an accurate solution for k = 12 and three
other algorithms for k = 25. In this data set most points are very close to each other
and clusters are not well separated when their number is large. In this situation some
clustering algorithms may fail to find accurate solutions. In all other cases algorithms are
able to find such solutions.
Figures 1(a)-1(h) illustrate dependence of the number of distance function evalua-
tions (Nd ) on the number of clusters for four algorithms in all data sets. These figures
demonstrate that the MS-MGKM algorithm requires the least number of distance func-
tion evaluations in all data sets except the 3D Road Network data set. In this data set
the MS-MGKM algorithm computed only 6 clusters within a 20 hours timeframe and
the number Nd is similar to that of by the MS-DCA and DCClust algorithms. For the
GKM algorithm Nd depends linearly on the number of clusters, however this algorithm
requires significantly more distance function evaluations than other three algorithms in
three largest data sets: Pla85900, Skin Segmentation and 3D Road Network. Comparison
of the DCClust and the MS-DCA algorithms shows that the latter algorithm requires
more distance function evaluations than the former algorithm in all data sets except the
3D Road Network data set. The difference between these two algorithms in data sets
with the number of instances less than 100,000 is significant which is not the case for two
largest data sets.
Figures 2(a)-2(h) illustrate dependence of the CPU time on the number of clusters
for four algorithms in all data sets. It is obvious that the MS-MGKM algorithm requires
less CPU time than any other algorithm for almost all the number of clusters in five
data sets: Gas Sensor Array Drift, D15112, EEG Eye State, KEGG Metabolic Relation
Network and Shuttle Control. However, as the size (the number of data points) of a data
set increase this algorithm requires more CPU time than the DCClust and the MS-DCA
algorithms. The GKM algorithm is more time-consuming than the MS-MGKM algorithm
and it requires significantly more CPU time than the DCClust and the MS-DCA algorithms
in three largest data sets. Both the MS-MGKM and the GKM algorithms computed only
6 clusters in the 3D Road Network data set within a 20 hours timeframe. Results for
two largest data sets show that these algorithms are not efficient for solving clustering
problems in data sets with hundreds of thousands of data points. The comparison of the
DCClust and the MS-DCA algorithms shows the former requires less CPU time than the
latter algorithm in all other data sets except the D15112.

10 Conclusion
In this paper the minimum sum-of-squares clustering problems are studied using their DC
representation. Inf-stationary, stationary points in the sense of generalized subgradients
and critical points of the minimum sum-of-squares clustering problems are characterized
using such a representation. An incremental algorithm based on DC representation is
designed to solve the minimum sum-of-squares clustering problems. A special algorithm
is designed to solve nonsmooth optimization problems at each iteration of the incremental
algorithm. It is proved that this algorithm converges to inf-stationary points of the clus-
tering problems. The similar incremental algorithm is developed where the well-known
DCA algorithm is applied to solve optimization problems.

22
The proposed algorithms are tested using real world data sets with the number of
data points ranging from tens of thousands to hundreds of thousands. Results clearly
demonstrate that the use of the DC representation of clustering problems allows one to
significantly improve ability of incremental algorithms to solve clustering problems in very
large data sets in a reasonable time. Furthermore, the use of nonsmooth optimization
algorithms can increase this ability for even larger data sets.

Acknowledgement. This research by Dr. Adil Bagirov and Dr. Sona Taheri was sup-
ported under Australian Research Council’s Discovery Projects funding scheme (Project
No. DP140103213). The authors thank two anonymous referees for their comments that
helped to improve the quality of the paper.

References
[1] K.S. Al-Sultan. A tabu search approach to the clustering problem. Pattern Recogni-
tion, 28(9):1443–1451, 1995.

[2] L.T.H. An, M.T. Belghiti, and P.D. Tao. A new efficient algorithm based on DC
programming and DCA for clustering. J. of Global Optim., 37(4):593–608, 2007.

[3] L.T.H. An, L.H. Minh, and P.D. Tao. New and efficient DCA based algorithms for
minimum sum-of-squares clustering. Pattern Recognition, 47:388–401, 2014.

[4] L.T.H. An, H.V. Ngai, and P.D. Tao. Exact penalty and error bounds in DC pro-
gramming. Journal of Global Optimization, 52(3):509–535, 2012.

[5] L.T.H. An and P.D. Tao. Minimum sum-of-squares clustering by DC programming

and DCA. In D.-S. Huang, K.-H. Jo, H.-H. Lee, H.-J. Kang, and V. Bevilacqua,
editors, Emerging Intelligent Computing Technology and Applications. With Aspects
of Artificial Intelligence, ICIC 2009, LNAI-5755, pages 327–340. Springer-Verlag,
Berlin, Heidelberg.

[6] A.M. Bagirov. Modified global k-means algorithm for minimum sum-of-squares clus-
tering problems. Pattern Recognition, 41(10):3192–3199, 2008.

[7] A.M. Bagirov, N. Karmitsa, and M. Mäkelä. Introduction to Nonsmooth Optimization:

Theory, Practice and Software. Springer, Cham, Heidelberg, 2014.

[8] A.M. Bagirov and E. Mohebi. Nonsmooth optimization based algorithms in cluster
analysis. In Emre M. Celebi, editor, Partitional Clustering Algorithms, pages 99–146.
Springer.

[9] A.M. Bagirov, B. Ordin, G. Ozturk, and A.E. Xavier. An incremental clustering algo-
rithm based on hyperbolic smoothing. Computational Optimization and Applications,
61:219–241, 2015.

[10] A.M. Bagirov, A.M. Rubinov, N.V. Soukhoroukova, and J. Yearwood. Unsupervised
and supervised data classification via nonsmooth and global optimization. Top, 11:1–
93, 2003.

23
[11] A.M. Bagirov and J. Ugon. An algorithm for minimizing clustering functions. Opti-
mization, 54(4-5):351 – 368, 2005.

[12] A.M. Bagirov, J. Ugon, and D. Webb. Fast modified global k-means algorithm for
incremental cluster construction. Pattern Recognition, 44(4):866–876, April 2011.

[13] A.M. Bagirov and J. Yearwood. A new nonsmooth optimization algorithm for mini-
mum sum-of-squares clustering problems. European Journal of Operational Research,
170(2):578–596, 2006.

[14] L. Bai, J. Liang, C. Sui, and Ch. Dang. Fast global k-means clustering based on local
geometrical information. Information Sciences, 245:168 – 180, 2013.

[15] F.H. Clarke. Optimization and Nonsmooth Analysis. Canadian Mathematical Society
series of monographs and advanced texts. Wiley, 1983.

[16] V.F. Demyanov, A.M. Bagirov, and A.M. Rubinov. A method of truncated codif-
ferential with application to some problems of cluster analysis. Journal of Global
Optimization, 23(1):63–80, 2002.

[17] V.F. Demyanov and A.M. Rubinov. Constructive Nonsmooth Analysis. Peter Lang,
Frankfurt am Main, 1995.

[18] G. Diehr. Evaluation of a branch and bound algorithm for clustering. SIAM J.
Scientific and Statistical Computing, (6):268–284, 1985.

[19] O. du Merle, P. Hansen, B. Jaumard, and N. Mladenovic. An interior point algo-

rithm for minimum sum-of-squares clustering. SIAM Journal on Scientific Comput-
ing, 21(4):1485–1505, 1999.

[20] P. Hansen and N. Mladenovic. Variable neighborhood decomposition search. Journal

of Heuristic, 7:335–350, 2001.

[21] R. Horst and N.V. Thoai. DC programming: Overview. Journal of Optimization

Theory and Applications, 103(1):1–43, 1999.

[22] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Comput.
Surv., 31(3):264–323, 1999.

[23] A.K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters,
31(8):651–666, 2010.

[24] J. Kogan. Introduction to Clustering Large and High-dimensional Data. Cambridge

University Press, 2007.

[25] J.Z.C. Lai and T.-J. Huang. Fast global k-means clustering using cluster membership
and inequality. Pattern Recognition, 43(5):1954 – 1963, 2010.

[26] M. Lichman. UCI machine learning repository, [Link] Univer-

sity of California, Irvine, School of Information and Computer Sciences, 2013.

24
[27] A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Pattern
Recognition, 36(2):451–461, 2003.

[28] B. Ordin and A.M. Bagirov. A heuristic algorithm for solving the minimum sum-of-
squares clustering problems. Journal of Global Optimization, 61:341–361, 2015.

[29] Md A. Rahman and Md Z. Islam. A hybrid clustering technique combining a novel

genetic algorithm with k-means. Knowledge-Based Systems, 71:345–365, 2014.

[30] R. Scitovski and S. Scitovski. A fast partitioning algorithm and its application to
earthquake investigation. Computers & Geosciences, 59:124–131, 2013.

[31] S.Z. Selim and K.S. Al-Sultan. A simulated annealing algorithm for the clustering.
Pattern Recognition, 24(10):1003–1008, 1991.

[32] P.D. Tao and L.T.H. An. Convex analysis approach to DC programming: theory,
algorithms and applications. Acta Mathematica Vietnamica, 22(1):289–355, 1997.

[33] M. Teboulle. A unified continuous optimization framework for center-based clustering

methods. The Journal of Machine Learning Research, (8):65–102, 2007.

[34] Hoang Tuy, A.M. Bagirov, and A.M. Rubinov. Clustering via DC optimization. In
Advances in Convex Analysis and Global Optimization, pages 221–234. Springer, 2001.

[35] P.H. Wolfe. Finding the nearest point in a polytope. Mathematical Programming,
11(2):128–149, 1976.

[36] A.E. Xavier. The hyperbolic smoothing clustering method. Pattern Recognition,
43:731–737, 2010.

[37] A.E. Xavier and V.L. Xavier. Solving the minimum sum-of-squares clustering prob-
lem by hyperbolic smoothing and partition into boundary and gravitational regions.
Pattern Recognition, 44(1):70 – 77, 2011.

25
Table 2: Cluster function values obtained by algorithms

k fbest EDCClust EM S−DCA EM S−M GKM EGKM

Gas Sensor Array Drift (×1013 )
2 79.11857 0.00 0.00 0.00 0.00
3 5.02412 0.00 0.00 0.00 0.00
5 3.22726 0.00 0.00 0.00 0.00
10 1.65524 0.00 0.00 0.00 0.00
12 1.40655 0.00 0.01 0.01 0.00
15 1.13801 0.35 0.00 0.00 0.36
20 0.87916 0.62 0.16 0.00 0.62
25 0.72348 0.47 0.00 0.00 0.16
EEG Eye State (×108 )
2 8178.13809 0.00 0.00 0.00 0.00
3 1833.88058 0.00 0.00 0.00 0.00
5 1.33858 0.00 0.00 0.00 0.00
10 0.45669 0.00 0.00 0.00 0.00
12 0.40251 0.01 0.00 0.00 0.00
15 0.34653 0.26 0.05 0.05 0.00
20 0.28987 0.96 0.03 0.00 0.00
25 0.25995 0.00 0.12 0.12 0.63
D15112 (×10 )11

2 3.68403 0.00 0.00 0.00 0.00

3 2.53240 0.00 0.00 0.00 0.00
5 1.32707 0.00 0.00 0.00 0.00
10 0.64892 0.00 0.00 0.00 0.78
12 0.54500 0.00 0.92 0.01 0.02
15 0.43138 0.24 0.00 0.00 0.02
20 0.32177 0.00 0.25 0.25 0.01
25 0.25309 0.00 0.00 0.00 0.49
KEGG Metabolic Relation Network (×108 )
2 11.38530 0.00 0.00 0.00 0.00
3 4.90060 0.00 0.00 0.00 0.00
5 1.88367 0.00 0.00 0.00 0.00
10 0.63515 0.09 0.00 0.00 0.00
12 0.47825 0.49 0.41 0.00 0.00
15 0.35484 0.01 0.00 3.28 3.28
20 0.25430 0.47 0.29 0.00 1.26
25 0.19289 0.37 0.00 0.51 0.67

26
Table 3: Cluster function values obtained by algorithms (cont.)

k fbest EDCClust EM S−DCA EM S−M GKM EGKM

Shuttle Control (×108 )
2 21.34329 0.00 0.00 0.00 0.00
3 10.85415 0.00 0.00 0.00 0.00
5 7.24479 0.28 0.01 0.00 0.00
10 2.83216 0.20 0.32 0.32 0.00
12 2.14136 4.04 0.45 0.44 0.00
15 1.53154 0.72 0.01 0.00 0.00
20 1.06012 0.58 0.02 0.00 0.00
25 0.78727 0.00 1.54 1.50 1.50
Pla85900 (×1015 )
2 3.74908 0.00 0.00 0.00 0.00
3 2.28057 0.00 0.00 0.00 0.00
5 1.33972 0.00 0.00 0.00 0.00
10 0.68294 0.00 0.00 0.00 0.00
12 0.57504 0.00 0.19 0.19 0.00
15 0.46249 0.00 0.00 0.00 0.03
20 0.34988 0.52 0.00 0.00 0.29
25 0.28265 0.00 0.11 0.11 0.94
Skin Segmentation (×109 )
2 1.32236 0.00 0.00 0.00 0.00
3 0.89362 0.00 0.00 0.00 0.00
5 0.50203 0.00 0.00 0.00 0.00
10 0.25122 0.00 0.00 0.00 0.00
12 0.21416 0.00 0.55 0.55 0.00
15 0.16964 0.18 0.19 0.18 0.00
20 0.12770 0.14 0.17 0.17 0.00
25 0.10299 0.00 0.00 0.00 0.00
3D Road Network (×10 ) 6

2 49.13298 0.00 0.00 0.00 0.00

3 22.77818 0.00 0.00 0.00 0.00
5 8.82574 0.00 0.01 0.00 0.00
10 2.56710 0.00 0.21 - -
12 1.84976 0.00 0.05 - -
15 1.27072 0.00 0.26 - -
20 0.80872 0.00 0.73 - -
25 0.60334 1.93 0.00 - -

27
Gas Sensor Array Drift EEG Eye State
1E+10 7E+09
9E+09
6E+09
8E+09
No of dist. func. eval.

No of dist. func. eval.

7E+09 5E+09
6E+09 DCClust 4E+09 DCClust
5E+09
GKM 3E+09 GKM
4E+09
3E+09 MS-MGKM MS-MGKM
2E+09
2E+09 MS-DCA
1E+09 MS-DCA
1E+09
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(a) (b)
D15112 KEGG Metabolic Relation Network
9E+09 1E+11
8E+09 9E+10
8E+10

No of dist. func. eval.

7E+09
7E+10
6E+09
6E+10 DCClust
5E+09 DCClust
5E+10
4E+09 GKM GKM
4E+10
3E+09 MS-MGKM 3E+10 MS-MGKM
2E+09 2E+10
MS-DCA MS-DCA
1E+09 1E+10
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

1.4E+11
2E+11
1.2E+11
DCClust DCClust
1.5E+11 1E+11
GKM 8E+10
GKM
1E+11 MS-MGKM 6E+10 MS-MGKM
4E+10
5E+10 MS-DCA MS-DCA
2E+10
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(e) (f)
Skin Segmentation 3D Road Network
1.6E+12 2.5E+12
1.4E+12
2E+12
No of dist. func. eval.

No of dist. func. eval.

1.2E+12
1E+12 1.5E+12
DCClust DCClust
8E+11
GKM GKM
6E+11 1E+12
MS-MGKM MS-DCA
4E+11
MS-DCA 5E+11 MS-MGKM
2E+11
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(g) (h)

Figure 1: The number of distance function calculations vs the number of clusters.

28
Gas Sensor Array Drift EEG Eye State
6000 350

5000 300

250
4000
CPU time

CPU time
DCClust 200 DCClust
3000
GKM 150 GKM
2000 MS-MGKM MS-MGKM
100
1000 MS-DCA MS-DCA
50
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(a) (b)
D15112 KEGG Metabolic Relation Network
120 9000
8000
100 7000

80 CPU time 6000

CPU time

DCClust 5000 DCClust

60 4000
GKM GKM
40 3000 MS-MGKM
MS-MGKM
2000
MS-DCA MS-DCA
20 1000

0 0
0 5 10 15 20 25 0 5 10 15 20 25

No of clusters No of clusters

10000 3500
3000
8000
2500
CPU time

CPU time

DCClust DCClust
6000 2000
GKM GKM
4000 1500
MS-MGKM MS-MGKM
1000
2000 MS-DCA MS-DCA
500
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(e) (f)
Skin Segmentation 3D Road Network
35000 80000
30000 70000

25000 60000
50000
CPU time

CPU time

20000 DCClust DCClust

40000
15000 GKM GKM
30000
10000 MS-MGKM MS-DCA
20000
5000 MS-DCA MS-MGKM
10000
0 0
0 5 10 15 20 25 0 5 10 15 20 25
No of clusters No of clusters

(g) (h)

Figure 2: The CPU time vs the number of clusters.

Efficient DC Programming for Clustering
No ratings yet
Efficient DC Programming for Clustering
16 pages
Unified Convergence of BCD Methods
No ratings yet
Unified Convergence of BCD Methods
34 pages
Proximal Subgradient Algorithms for Fractional Programs
No ratings yet
Proximal Subgradient Algorithms for Fractional Programs
29 pages
Coordinate Descent for Large-Scale Optimization
No ratings yet
Coordinate Descent for Large-Scale Optimization
23 pages
Deep Learning for Linear Complementarity
No ratings yet
Deep Learning for Linear Complementarity
10 pages
30 Years of DC Programming and DCA
No ratings yet
30 Years of DC Programming and DCA
64 pages
DC Programming and DCA Insights
No ratings yet
DC Programming and DCA Insights
24 pages
Understanding DC Programming Methods
No ratings yet
Understanding DC Programming Methods
13 pages
Neural Network Solutions for Optimal Control
No ratings yet
Neural Network Solutions for Optimal Control
29 pages
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
100% (1)
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
453 pages
Generalized Proximal Point Algorithm
No ratings yet
Generalized Proximal Point Algorithm
13 pages
Overview of Coordinate Descent Algorithms
No ratings yet
Overview of Coordinate Descent Algorithms
32 pages
Optimality Conditions for D.C. Vector Problems
No ratings yet
Optimality Conditions for D.C. Vector Problems
14 pages
Majorization Minimization and BCD Overview
No ratings yet
Majorization Minimization and BCD Overview
44 pages
Numerical Methods for Generalized Minimax
No ratings yet
Numerical Methods for Generalized Minimax
49 pages
Nonsmooth Convex Optimization Method
No ratings yet
Nonsmooth Convex Optimization Method
15 pages
Nonsmooth Newton Method for GCP
No ratings yet
Nonsmooth Newton Method for GCP
24 pages
Adaptive Proximal Gradient Method
No ratings yet
Adaptive Proximal Gradient Method
23 pages
Nonlinear Analysis: Real World Applications: Changfeng Ma, Lihua Jiang, Desheng Wang
No ratings yet
Nonlinear Analysis: Real World Applications: Changfeng Ma, Lihua Jiang, Desheng Wang
16 pages
Limited-Memory Quasi-Newton Algorithm
No ratings yet
Limited-Memory Quasi-Newton Algorithm
22 pages
Nonlinear Optimization Basics in Berlin
No ratings yet
Nonlinear Optimization Basics in Berlin
103 pages
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
No ratings yet
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
368 pages
Efficient Data Preprocessing for SVC
No ratings yet
Efficient Data Preprocessing for SVC
17 pages
Gradient Methods for Composite Functions
No ratings yet
Gradient Methods for Composite Functions
31 pages
Clarke Subgradients in Stratifiable Functions
No ratings yet
Clarke Subgradients in Stratifiable Functions
17 pages
Conic Reformulations for K-means Clustering
No ratings yet
Conic Reformulations for K-means Clustering
24 pages
Proximal Algorithm for Fractional Programming
No ratings yet
Proximal Algorithm for Fractional Programming
28 pages
Optimizing Linear Functions in Convex Sets
No ratings yet
Optimizing Linear Functions in Convex Sets
9 pages
A Limited T,: Memory Algorithm For Bound Constrained T, T
No ratings yet
A Limited T,: Memory Algorithm For Bound Constrained T, T
19 pages
Understanding Optimization Techniques
No ratings yet
Understanding Optimization Techniques
47 pages
CntrlEngg (Optimization) ConvexOptimizationAlgorithms DimitriBertsekas
100% (1)
CntrlEngg (Optimization) ConvexOptimizationAlgorithms DimitriBertsekas
578 pages
Nonlinear Programming by Bertsekas
No ratings yet
Nonlinear Programming by Bertsekas
6 pages
Fréchet Subdifferential in Optimization
No ratings yet
Fréchet Subdifferential in Optimization
24 pages
Predictive Machines with Error Estimates
No ratings yet
Predictive Machines with Error Estimates
18 pages
Smoothing Functions for Complementarity Problems
No ratings yet
Smoothing Functions for Complementarity Problems
42 pages
Fortran MMA and GCMMA Algorithms
No ratings yet
Fortran MMA and GCMMA Algorithms
23 pages
Nonlinear Eigenproblems in PCA & Clustering
No ratings yet
Nonlinear Eigenproblems in PCA & Clustering
9 pages
Efficient Nonlinear Optimization Methods
No ratings yet
Efficient Nonlinear Optimization Methods
159 pages
Lecture Notes on Optimization Algorithms
No ratings yet
Lecture Notes on Optimization Algorithms
102 pages
Projected Gradient Descent Explained
No ratings yet
Projected Gradient Descent Explained
21 pages
Mirror Descent for Convex Optimization
No ratings yet
Mirror Descent for Convex Optimization
9 pages
OBCD for Nonsmooth Optimization
No ratings yet
OBCD for Nonsmooth Optimization
45 pages
Nonlinear Programming Concepts Explained
No ratings yet
Nonlinear Programming Concepts Explained
19 pages
EE227C Course Notes: Optimization
No ratings yet
EE227C Course Notes: Optimization
122 pages
EE227C Course Notes: Convex Optimization
No ratings yet
EE227C Course Notes: Convex Optimization
122 pages
Non-Derivative Optimization Algorithms
No ratings yet
Non-Derivative Optimization Algorithms
12 pages
Optimization Techniques in Machine Learning
No ratings yet
Optimization Techniques in Machine Learning
19 pages
Gupal - Norkin Cybernetics1997 2
No ratings yet
Gupal - Norkin Cybernetics1997 2
5 pages
Robust Fractional Optimization Analysis
No ratings yet
Robust Fractional Optimization Analysis
19 pages
Optimal Algorithms for Non-smooth Optimization
No ratings yet
Optimal Algorithms for Non-smooth Optimization
39 pages
DASC: Decomposition for Stochastic Programs
No ratings yet
DASC: Decomposition for Stochastic Programs
8 pages
Perceptron Learning in Machine Learning
No ratings yet
Perceptron Learning in Machine Learning
12 pages
New Semismooth Method for NCPs
No ratings yet
New Semismooth Method for NCPs
18 pages
Grundlehren Der Mathematischen Wissenschaften 306: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 306: A Series of Comprehensive Studies in Mathematics
361 pages
Phenomenological Power Spectrum Model
No ratings yet
Phenomenological Power Spectrum Model
10 pages
Dynamical Solutions to Cosmological Problems
No ratings yet
Dynamical Solutions to Cosmological Problems
18 pages
Dynamics of Cosmic Flows
No ratings yet
Dynamics of Cosmic Flows
46 pages
Scalar Field Potential in Inflation Models
No ratings yet
Scalar Field Potential in Inflation Models
35 pages
Phono-Magnetic Effects in Antiferromagnets
No ratings yet
Phono-Magnetic Effects in Antiferromagnets
11 pages
False Vacuum Inflation in Einstein Gravity
No ratings yet
False Vacuum Inflation in Einstein Gravity
40 pages
Virial Theorem in Action-Governed Theories
No ratings yet
Virial Theorem in Action-Governed Theories
9 pages
Quasar Progenitors from Low-Spin Collapse
No ratings yet
Quasar Progenitors from Low-Spin Collapse
22 pages
Cosmological Perturbation Theory Analysis
No ratings yet
Cosmological Perturbation Theory Analysis
36 pages
Galactic Mass Distribution Analysis
No ratings yet
Galactic Mass Distribution Analysis
18 pages
MACHO Lens Events and Light Curves
No ratings yet
MACHO Lens Events and Light Curves
10 pages
Causal Transport in Strong Shear Flows
No ratings yet
Causal Transport in Strong Shear Flows
44 pages
Quantum Algorithm for ARM Mining
No ratings yet
Quantum Algorithm for ARM Mining
8 pages
Quantum Phase Transitions in p-Wave Superfluids
No ratings yet
Quantum Phase Transitions in p-Wave Superfluids
4 pages
Indoor Air Quality in Hospitals During COVID-19
No ratings yet
Indoor Air Quality in Hospitals During COVID-19
8 pages
Quantum Information Division Insights
No ratings yet
Quantum Information Division Insights
8 pages
Synthesis of Stable Cyclooctene Derivatives
No ratings yet
Synthesis of Stable Cyclooctene Derivatives
9 pages
Synthesis and Analysis of Dibromo-Indacene
No ratings yet
Synthesis and Analysis of Dibromo-Indacene
12 pages
Methane Reaction with Organometallic Complexes
No ratings yet
Methane Reaction with Organometallic Complexes
2 pages
Low-Power Ternary JKL Flip-Flop Design
No ratings yet
Low-Power Ternary JKL Flip-Flop Design
4 pages
R Program for Commonality Analysis
No ratings yet
R Program for Commonality Analysis
10 pages
Olefin Metathesis in Maitotoxin Synthesis
No ratings yet
Olefin Metathesis in Maitotoxin Synthesis
2 pages
JKL-ECM: Hessian Curve Factorization
No ratings yet
JKL-ECM: Hessian Curve Factorization
17 pages
Non-JKL Density Matrix Functional Analysis
No ratings yet
Non-JKL Density Matrix Functional Analysis
17 pages
Tetracyclopenta Hydrocarbon Study
No ratings yet
Tetracyclopenta Hydrocarbon Study
5 pages
Fuzzy vs. 0-1 K-Means Clustering Explained
No ratings yet
Fuzzy vs. 0-1 K-Means Clustering Explained
11 pages
Sumanene: NMR Data and Assignments
No ratings yet
Sumanene: NMR Data and Assignments
3 pages
Transmission Electron Microscopy Overview
No ratings yet
Transmission Electron Microscopy Overview
60 pages
SONEDCO Wage Increase Dispute Resolution
No ratings yet
SONEDCO Wage Increase Dispute Resolution
4 pages
Hedonic Pricing Model Overview
No ratings yet
Hedonic Pricing Model Overview
5 pages
E-Commerce Project Management Overview
No ratings yet
E-Commerce Project Management Overview
5 pages
Inter Miami CF: Founding and History
No ratings yet
Inter Miami CF: Founding and History
1 page
Effective Product Inventory Management System
No ratings yet
Effective Product Inventory Management System
17 pages
Electronic Tachometer for Sprite
No ratings yet
Electronic Tachometer for Sprite
12 pages
AI Fundamentals Overview
No ratings yet
AI Fundamentals Overview
15 pages
ICH M4Q(R2) Guidelines Overview
100% (1)
ICH M4Q(R2) Guidelines Overview
51 pages
2024 Bond Proposal Overview
No ratings yet
2024 Bond Proposal Overview
22 pages
Thermodynamics Answer Key Overview
No ratings yet
Thermodynamics Answer Key Overview
5 pages
CDME Digital Marketing Program Overview
No ratings yet
CDME Digital Marketing Program Overview
28 pages
Pou Chen International Scholarship Program
No ratings yet
Pou Chen International Scholarship Program
3 pages
A 492730 - Performance of Nomex Military Uniforms in Attacks by Flame Resistant Wool Nomex Blend
No ratings yet
A 492730 - Performance of Nomex Military Uniforms in Attacks by Flame Resistant Wool Nomex Blend
32 pages
S56 Visa Info Request for Ruturaj Sawant
No ratings yet
S56 Visa Info Request for Ruturaj Sawant
4 pages
Beltaş ALP Fire Engine Overview
No ratings yet
Beltaş ALP Fire Engine Overview
11 pages
6NH062 Finial Sub
No ratings yet
6NH062 Finial Sub
23 pages
Biodiesel Production References List
No ratings yet
Biodiesel Production References List
2 pages
BA in Leadership at TWU Richmond/Langley
No ratings yet
BA in Leadership at TWU Richmond/Langley
2 pages
Bureau of Immigration Office Directory
No ratings yet
Bureau of Immigration Office Directory
54 pages
Home Energy Management Systems Review
No ratings yet
Home Energy Management Systems Review
28 pages
Signature Park: Integrated Township in Lucknow
No ratings yet
Signature Park: Integrated Township in Lucknow
2 pages
LABMAN
No ratings yet
LABMAN
5 pages
Conroy2008 PDF
No ratings yet
Conroy2008 PDF
8 pages
Thaliard SlidesCarnival
No ratings yet
Thaliard SlidesCarnival
27 pages
UF Mechanical Engineering PPT Template
No ratings yet
UF Mechanical Engineering PPT Template
2 pages
TOA M-633D Digital Mixer Manual
No ratings yet
TOA M-633D Digital Mixer Manual
32 pages
AP English Style Analysis Essay Guide
100% (1)
AP English Style Analysis Essay Guide
5 pages
Excel 2007-Create A Chart
No ratings yet
Excel 2007-Create A Chart
50 pages
Instruction Execution and Datapath Design
No ratings yet
Instruction Execution and Datapath Design
15 pages