Nonmonotone Gradient Method for Topology Optimization
Nonmonotone Gradient Method for Topology Optimization
395
CONTROL AND OPTIMIZATION
Volume 2, Number 2, June 2012 pp. 395–412
Rouhollah Tavakoli1
Department of Material Science and Engineering
Sharif University of Technology
Tehran, P.O. Box 11365-9466, Iran
Hongchao Zhang
Department of Mathematics
Louisiana State University
Baton Rouge, LA, 70808, USA
395
396 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
connected by the current point and the projection point of a cyclic Barzilai-Borwein
(CBB) step onto the feasible set. Hence, the method can be called the projected
cyclic Barzilai-Borwein (PCBB) method. Because of the special structure of the
feasible set of the problem, which consists of box constants and a single linear con-
straint, it is possible to do the projection on the feasible set very efficiently with
linear time complexity. The main attractive features of the presented algorithm are:
producing strictly feasible iterations; using almost one objective function and gra-
dient evaluations per iteration; O(n) memory consumption (6n working memory);
easily to be implemented; only first order (gradient) information being required.
and
αMG
k = arg min kg(xk − αgk )k. (4)
α∈R
where k · k denotes the Euclidean norm of a vector. However, it is well-known
that SD and MG methods can be very slow when the Hessian of f is singular or
nearly singular at the local minimum. In this case the iterates could approach the
minimum very slowly in a zigzag fashion [14].
The basic idea of Barzilai-Borwein (BB) [2] method is to use the matrix D(αk ) =
1 2
αk I, where I denotes the identity matrix, to approximate of the Hessian ∇ f (xk )
by imposing a quasi-Newton condition on D(αk ):
αBB
k = arg min kD(α) sk−1 − yk−1 k2 , (5)
α∈R
which gives
sTk−1 yk−1
αBB2
k = T y
. (8)
yk−1 k−1
398 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
The R-linear convergence of CBB method for a strongly convex quadratic objec-
tive function has been proved in [9], while the local R-linear convergence for the
CBB method at a local minimizer for general nonlinear objective function has been
established in [10].
Since the feasible set is convex, problem (12) always has an unique solution. How-
ever, in general (12) is still a convex constrained quadratic programming problem,
which could be as difficult as the original problem. And for general convex con-
strained large-scale problems, this projection step at each iteration could be very
time consuming and is normally the most expensive part of gradient projection
methods. Hence, there is little interests on applying the gradient projection meth-
ods for large-scale problem unless the gradient projection step can be performed
very efficiently. However, in some special cases when efficient algorithms for cal-
culating the projection (12) exist, for example there are only box or single ball
constraints, these gradient projection methods will be attractive. Fortunately, as
we will see in the next section, the projection step can be performed very efficiently
for the volume constrained topology optimization problems. This makes it possible
for us to design gradient projection type algorithms for volume constrained topology
optimization.
Combining the Barzilai-Borwein stepsize rule and the gradient projection method,
the projected Barzilai-Borwein (PBB) method was first introduced in [4]. In this
method the iteration updating formula (2) is modified to
xx+1 = xk + βk dα
k, k = 0, 1, . . . , (13)
where β ∈ R+ and the search direction dα k (descent direction) is computed by
connecting the current point to the projection point of the trial iterate (2) based
on the BB stepsize, that is
dα BB
k = PD [xk − αk gk ] − xk . (14)
In [4],dα
k was called spectral projected gradient. It is not difficult to show that
dαk is a descent direction (see Lemma 3.1). This together with the convexity of the
feasible set D would imply that for a sufficiently small βk , an iterate of (13) will
reduce the objective function value while simultaneously preserve the feasibility of
the iterates.
Lemma 3.1. for all xk ∈ D and αBB
k > 0,
2
(i) hgk (x), dα
k (x)i 6
1
αBB
kdα
k (x)k .
k
400 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
(ii) dα
k (x) = 0 if and only if x is a stationary point for (11).
Parameters:
• ǫ ∈ [0, ∞), error tolerance.
• δ ∈ (0, 1), descent parameter used in Armijo line search.
• η ∈ (0, 1), decay factor for stepsize in Armijo line search.
• αmin , αmax ∈ (0, ∞), safeguarding interval for BB stepsize.
Initialization:
SPECTRAL PROJECTED GRADIENT METHOD 401
r
• k = 0, x0 = starting guess, and f−1 = f (x0 ).
The variable fkr in Algorithm 4.1 denotes the so called “reference” function value
in nonmonotone line search. It can be seen that the traditional monotone line
search simply corresponds to the choice of setting fkr = f (xk ) at each iteration.
And the nonmonotone line search developed in [16] corresponds to the choice of
setting fkr = fkmax . In our present study, fkr is chosen based on Algorithm 4.2
adapted from [17]. Let fk denote f (xk ). In the algorithm 4.2, the integer a counts
the number of consecutive iterations for which βk = 1 in Algorithm 4.1 is accepted
and the Armijo line search in step 5 is skipped. The integer l counts the number of
iterations since the function value is strictly decreased by an amount ∆ > 0.
Algorithm 4.2.
variable fkmaxmin stores the maximum function value since the last new minimum
was recorded in fkmin .
The condition f (xk ) < fkr in step 3 of Algorithm 4.1 guarantees that the Armijo
line search in step 5 can be satisfied. Notice that the requirement “fkr < fkmax infin-
itely often” in step 3 which is required to ensure the global convergence is a weaker
condition. Besides Algorithm 4.2 which satisfies this condition, this condition can
be satisfied by many other strategies. For example, it is possible to set fkr = fkmax
at every L iterations. In Algorithm 4.2, fkr = fkmax if f (xk−L ) − f (xk ) 6 ∆ for
given decrease parameter ∆ > 0 and integer L > 0.
Now lets give more details about computation of ᾱk in step 1 of Algorithm 4.1.
This parameter is computed based on the safeguarded CBB scheme as follows. Let
j as an integer that counts the number of times in which the current BB step has
been reused and let m as the CBB memory in (10), i.e., the maximum number of
times the BB step will be reused.
Algorithm 4.3.
S0. If k = 0 choose ᾱ0 ∈ [αmin , αmax ] and a parameter θ < 1 near 1; set j = 0
and flag=1. If k > 0 set flag = 0.
S1. 0 < |dki | < ᾱk |gki | for some i (component of vector) then set flag = 1.
S2. If βk = 1 in Algorithm 4.1 then set j=j+1; Else (βk < 1) set flag =1.
S3. If j > m or flag=1 or sTk yk /ksk kkyk k > θ then:
S3.1. If sTk yk 6 0 then
S3.1.1. If j > 1.5m then set t = min{kxk k∞ , 1}/kd1 (xk )k∞ ,
ᾱk+1 = min{αmax , max{αmin , βk }} and j = 0;
where d1 (xk ) = PD [xk − gk ] − xk
S3.1.2. Else ᾱk+1 = ᾱk
S3.2 Else set ᾱk+1 = min{αmax , max{αmin , αBB k }} and j = 0.
In Algorithm 4.3, the former BB stepsize is reused for the current iterate unless
one of the following conditions happens in which the new BB stepsize is computed
(see S3.2) : (I) the previous BB stepsize is truncated by the projection step, i.e.,
when the trial point using BB stepsize lies outside of the feasible domain and the
gradient projection was performed (see S1); (II) the previous BB stepsize is truncat-
ed by the line search step (see S2 where βk < 1); (III) the number of times the BB
stepsize was reused reaches to its bound, i.e., j > m (see S3); (IV) sTk yk /ksk kkyk k
is close to 1 (see section 4 [10] for details about the justification for this decision).
The condition sTk yk < 0 (see S3.1) is equivalent to detection of the negative curva-
ture in the searching direction. Assuming that the objective function can be well
approximated by a quadratic function in the vicinity of the current iterate, a rela-
tively large stepsize should be used in the next iteration (see S3.1.1) to reduce the
function as much as possible once a negative curvature is detected. This strategy
is similar to the original SPG algorithm (see section 2 [5]).
Now lets briefly review the convergence theory of Algorithm 4.1.
5. Projection onto the feasible set. The introduced algorithm in the previous
section is general and can be applied to any type of convex constrained optimization
problems which satisfy the requirements of Theorem 4.4. However, the performance
of the method is significantly affected by the efficiency of the projection step, see
[7]. In this section, we explore the special structure of the feasible set in the volume
constrained topology optimization and introduce a very efficient algorithm to do
the projection step.
After discretization, the feasible set in the volume constrained topology opti-
mization problems has the following form (see Figure 1)
D = {x ∈ Rn : aT x = b, l 6 x 6 u}, (16)
n n
where a ∈ R , ai ∈ R+ , b ∈ R+ , l, u ∈ R , 0 < li 6 ui < ∞. Henceforth, in this
section we denote by D the feasible domain defined in (16).
Considering (12) together with (16), for a given trial vector x, PD [x] is the unique
minimizer of the following box constrained Lagrangian:
1 1
L(z; λ) = kzk22 − xT z + kxk22 + λ(aT z − b), z ∈ B, (17)
2 2
404 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
From (20), it is obvious that for each i, zi (λ) is a continuous piecewise linear and
non-increasing function of λ (see Figure 2). Therefore, by ai > 0 for all i, g(λ) is
a continuous piecewise linear and non-increasing function of λ. Then, with some
straitforward calculations, we can see that the root λ∗ of g(λ) is unique and λ∗ ∈
[λmin , λmax ] with λmin 6 0 6 λmax , where λmin = min{λui } and λmax = max{λli }.
So, starting from interval [λmin , λmax ] and using some classical one dimensional root
finding method, it is possible to find λ∗ up to the machine precision with a priori
known computational complexity. However, it is possible to use more advanced root
founding methods to improve the performance of this step.
In our approach, the Brent’s root finding method [6] is employed to solve (19).
The Brent’s method does not assume the function differentiability and is enable to
manage the limited precision of the computed arithmetic very well. In the worst
SPECTRAL PROJECTED GRADIENT METHOD 405
condition, the convergence of this method is never slower that of the bisection
method. The Brent’s method has been proved to be very efficient and robust in
practice and it is currently accepted as a standard method for one dimensional
root finding problem (cf. chapter 9 of [19]). The specific implementation details of
Brent’s method are available in the Numerical Recipe (see chapter 9 of [19])..
where θ ∈ H 1 (Ω) is the state variable, p > 1 is a penalization factor and w ∈ L2 (Ω)
is the control parameter (topology indicator field). By some standard derivations,
the first order optimality condition of problem P can be written as the following:
−∇ · (k(w)∇θ) = f (x) in Ω,
θ(x) = θ0 (x) on ∂Ω,
−∇ · (k(w)∇η) = −∇ · (∇θ) in Ω,
(OC) η(x) = 0 on ∂Ω,
k(w) = wp kβ + (1 − wp )kα in Ω,
P G(x) = 0 in Ω,
D
G(x) = −pwp−1 (kβ − kα )∇θ · ∇η in Ω,
where η ∈ H01 (Ω) is the adjoint state, G is the L2 gradient of the objective functional
with respect to w and PD (u) denotes the L2 projection of function u onto the
admissible set D,
Z
2
D = {w ∈ L (Ω) | w(x)dx = R|Ω|, 0 < R < 1, 0 6 w 6 1}.
Ω
By discretization of Ω into an n control volumes, we have the finite dimensional
counterpart of problem (P). We also assume that the state variable and the design
parameter are defined at the center of each control volume. Under these assump-
tions, the admissible design domain D forms a simplex in Rn which is identical to the
continuous knapsack constraints in (16). At each iterate of the control parameter
w, we solve the associated state PDE. Then the discretized optimization problem
would have the general format of problem (11), which has a nonlinear objective
function with convex continuous knapsack constraints.
406 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
In our numerical experiments, the problem (P) was solved in two and three
dimensions for Ω = [0, 1]2 and Ω = [0, 1]3 . The spatial domain is divided into
1272 and 313 control volumes in two and three dimensions respectively. In all of
experiments, we set kα = 1, f (x) = 1 and R = 0.4. And in these experiments two
conductivity ratios 2 and 100 were tested, which is equivalent to setting kβ = 2, 100
respectively. The penalization factor p is taken to be 1 and 10 for conductivity
ratio 2 and 100 respectively. The governing PDE are solved by cell centered finite
volume method using central difference scheme and the related system of linear
equations are solved by a preconditioned conjugate gradient method with relative
convergence threshold 10−20 . The optimization is performed for 15 iterations in
these numerical experiments. Notice that using finite volume method we do not
observe any topological instability phenomena (the checkerboard pattern), which
often occurs by using finite element method for topology optimization problems.
The input parameters related to optimization algorithms used in this study are
as follows: δ = 10−4 , η = 0.5, αmin = 10−30 , αmax = 1030 , A = 40, L = 10,
M = 20, m = 4, γ1 = γ2 = 2, θ = 0.975. Moreover, the initial value α0 for spectral
step-size was taken to be 1/kPD [xk − gk ] − xk k∞ . Another alternative choice for
this parameter could be α0 = 1/kPD [xk − gk ] − xk k2 . However, for the testing
problems in our numerical experiments the former choice was considerably more
efficient.
To evaluate the efficiency of the presented method, we compare our results with
the results obtained by the method of moving asymptotic (MMA) [22]. The imple-
mentation of MMA available in SCPIP code [26]2 is used in our experiments. All
default parameters are used in SCPIP code, except the parameter for the constrain-
t violation, for which the threshold 10−7 is used in this study. That SCPIP code
used in MMA algorithm has two globalization strategies. The first one is identical
to that of the original MMA by Svanberg [23], while the second one is globalization
by monotone line search method (see: [25]). The first strategy is employed in our
experiments, since our results with second strategy was significantly worse in terms
of the computational cost. Note that in our procedure the PDE constraint is solved
upto the accuracy of the finite volume method we have applied for solving this
PDE, and the constraints (16) in PCBB algorithm are satisfied upto the machine
precision.
The results of our numerical experiments including the variation of the objective
functional during the optimization process and the final resulted topology (w-field)
are shown in figures 3, 4, 5 and 6. The plots in figures 3, 4 show the success of the
presented method to solve these topology optimization problems. Roughly speaking,
both methods behave very competitively in terms of computational cost and final
results. The main differences in the results are related to the conductivity ratio
100, in which the PCBB performs superior and its final objective function values
are considerably lower than that of MMA. The sign of the differences is clear in the
figures 5 and 6 related to the final topologies. But for the conductivity ratio equal
to 2, it seems MMA behaves slightly better than PCBB. However, the differences
in terms of the objective function values as well as the final topologies are almost
negligible .
In all our numerical experiments, the final total number of function and gradient
evaluations was 15 which is equal to the number of optimization cycles. This result
2 The SCPIP code (in Fortran) is freely available through personal request from its original
1 1
0.95 0.95
scaled objective function
0.85 0.85
0.8 0.8
0.75 0.75
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
iterations iterations
2D result: PCBB with conductivity ratio = 2 2D result: MMA with conductivity ratio = 2
1 1
0.9 0.9
0.8
0.8
0.7
scaled objective function
0.3
0.2
0.1 0.2
0 0.1
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
iterations iterations
2D result: PCBB with conductivity ratio = 100 2D result: MMA with conductivity ratio = 100
shows that both methods used only one function and gradient evaluations per it-
eration in practice. We believe that this is a key property for the success of MMA
and makes the method well accepted in the engineering design community. In fact,
MMA behaves very conservatively and uses small steps to proceed toward a local
minimum. More clearly, it does not use (expensive) line search globalization strate-
gy (unlike alternative methods), but uses reasonably small steps such that hopefully
the merit function decreases sufficiently during each iteration. Of course, whenever
the merit function increases (or sufficient decrease in merit function value violates),
which rarely happens in practice, it performs sub-cycles to ensure the desired mono-
tonic behavior. On the other hand, the nonmonotone PCBB methods enjoys such
property using an alternative strategy. In practice, by applying the nonmonotonic
line search, PCBB often uses only one function evaluation per optimization cycle.
As our results clearly show, this property makes the nonmonotone PCBB method
as a very competitive alternative to MMA for these class of problems.
It is important to note that in all of our experiments presented here, the pro-
jection step is used actively in PCBB method. Therefore, the cyclic reuse of step
size never employed in our testing problems (cf. algorithm 4.3). Moreover, for
the conductivity ratio 100, PCBB method explored directions of negative curvature
after a few iterations, and so used large step-sizes which considerably accelerate
408 ROUHOLLAH TAVAKOLI AND HONGCHAO ZHANG
the convergence. This property plays an important role for the superior results of
PCBB compared with MMA in these cases (cf. table 1).
1 1
0.98 0.98
0.96 0.96
scaled objective function
0.92 0.92
0.9 0.9
0.88 0.88
0.86 0.86
0.84 0.84
0.82 0.82
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
iterations iterations
3D result: PCBB with conductivity ratio = 2 3D result: MMA with conductivity ratio = 2
1 1
0.9 0.9
0.8
0.8
0.7
scaled objective function
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.1 0.2
0 0.1
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
iterations iterations
3D result: PCBB with conductivity ratio = 100 3D result: MMA with conductivity ratio = 100
Besides the presented numerical results here, we have also applied the presented
method successfully to many families of topology optimization problems with very
satisfied results. But we omit the details of these experiments here due to the
limited space for this paper.
iter 1 2 3 4 5 6 7 8 9 10
α(2) 37 48 124 133 52 44 54 147 305 286
α(100) 53 1030 1030 243 164 100 104 1030 1030 1093
SPECTRAL PROJECTED GRADIENT METHOD 409
Acknowledgments. The authors would like to thank Prof. Christian Zillober for
sharing SCPIP code with them. The work of second author is supported by the
National Science Foundation under grants 1016204.
REFERENCES
[12] R. Fletcher, On the Barzilai-Borwein method, Optimization and Control with Applications,
96 (2005), 235–256.
[13] C. Fleury, CONLIN: An efficient dual optimizer based on convex approximation concepts,
Structural and Multidisciplinary Optimization, 1 (1989), 81–89.
[14] G. E. Forsythe, On the asymptotic directions of the s-dimensional optimum gradient method,
Numerische Mathematik, 11 (1968), 57–76.
[15] P. E. Gill, W. Murray, and M. A. Saunders, SNOPT: An SQP algorithm for large-scale
constrained optimization, SIAM J. Optim., 12 (2002), 979–1006.
[16] L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique for Newton’s
method, SIAM Journal on Numerical Analysis, (1986), 707–716.
[17] W. W. Hager and H. Zhang, A new active set algorithm for box constrained optimization,
SIAM Journal on Optimization, 17 (2006), 526–557.
[18] J. Nocedal and S. J. Wright, “Numerical Optimization,” Springer, 2006.
[19] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, “Numerical Recipes 3rd
edition: The Art of Scientific Computing,” 2007.
[20] M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained
minimization problem, SIAM Journal on Optimization, 7 (1997), 26–33.
[21] J. B. Rosen, The gradient projection method for nonlinear programming. Part I. Linear
constraints, Journal of the Society for Industrial and Applied Mathematics, (1960), 181–217.
[22] K. Svanberg, The method of moving asymptotes- A new method for structural optimization,
International Journal for Numerical Methods in Engineering, 24 (1987), 359–373.
[23] K. Svanberg, A class of globally convergent optimization methods based on conservative con-
vex separable approximations, SIAM J. Optim., 12 (2002), 555–573.
[24] Y. X. Yuan, Recent advances in numerical methods for nonlinear equations and nonlinear
least squares, Numerical Algebra Control and Optimization, 1 (2011), 15–34.
[25] C. Zillober, A globally convergent version of the method of moving asymptotes, Structural
and Multidisciplinary Optimization, 6 (1993), 166–174.
[26] C. Zillober, SCPIP: an efficient software tool for the solution of structural optimization
problems, Struct. Multidisc. Optim., 24 (2002), 362–371.