This julia notebook reproduces Figure 2 in the paper:
"Adaptive restart of the optimized gradient method for convex optimization,"
by Donghwan Kim and Jeffrey A. Fessler,
J. Optim. Theory and Appl. 178:240-63, July 2018.
http://arxiv.org/abs/1703.04641
http://doi.org/10.1007/s10957-018-1287-4
2018-08-12 Julia 0.7.0
2019-01-20 Julia 1.0.3
2019-02-11 Julia 1.1.0

Get packages

In [1]:
using LinearAlgebra: I, svdvals, dot, cond
using Arpack: eigs
using Plots
using LaTeXStrings # for nice labels in plots

Define problem and common functions

Optimization Problem: Smooth Convex Minimization

$\min_x f(x)$

Optimization Algorithms: Accelerated First-order Algorithms [KF17]

Iterate as below for given coefficients $(\alpha, \beta_k, \gamma_k)$

for $k = 0,1,...$

  • $y_{k+1} = x_k - \alpha \, f'(x_k)$ : gradient update
  • $x_{k+1} = y_{k+1} + \beta_k \, (y_{k+1} - y_k) + \gamma_k (y_{k+1} - x_k)$ : momentum update

end

Variants:

  • Gradient method (GM): $\beta_k = \gamma_k = 0$.
  • Fast Gradient Method (FGM): $\gamma_k = 0$. [N83, BT09]
  • Optimized Gradient Method (OGM) - [KF16] [KF18]
  • FGM with Restart - [OC15]
  • OGM with Restart - [KF18]
In [2]:
# get main routine that does GM/FGM/OGM with restart
include("gm_restart.jl")
Out[2]:
gm_restart

Figure 2

Strongly convex example, Section 6.1.1 of [KF18]

In [3]:
# generate problem: simple unconstrained quadratic
using Random
Random.seed!(0) # matlab and julia output may differ because of this seed...
n = 500
M = randn(n,n)
A = M'*M
eps = 1e-4
rho = eigs(A; which=:LM, nev=1)[1][1]
#rho = svdvals(M)[1]^2
A = (1-eps) * A/rho + eps * I
b = randn(n,1)
@show cond(A)

# function for computing first-order information
f_cost(x) = dot(x, (1/2*A*x - b))
f_grad(x) = A*x - b

# function parameters
f_L = eigs(A; nev=1, which=:LM)[1][1]
f_mu = eigs(A; nev=1, which=:SM)[1][1]

# find optimum
xopt = A \ b;
fopt = f_cost(xopt)
cond(A) = 9951.556640639368
Out[3]:
-33797.8549585782
In [4]:
# sanity checks
print(extrema(A * xopt - b))
(-5.607736497381666e-13, 6.530331830845171e-13)
In [5]:
# generate initial
x0 = randn(n,1);
niter = 1500;
In [6]:
# GM with and without knowledge of μ
(xs0,fs0) = gm_restart(x0, f_cost, f_grad, f_L;
    niter=niter);
(xsq,fsq) = gm_restart(x0, f_cost, f_grad, f_L;
    f_mu=f_mu, niter=niter);
In [7]:
# FGM w/o restart
(xsf, fsf) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=1, restart=0, niter=niter)
(xsfq, fsfq) = gm_restart(x0, f_cost, f_grad, f_L;
    f_mu=f_mu, mom=1, restart=0, niter=niter);
In [8]:
# OGM w/o restart
(xso, fso) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=2, restart=0, niter=niter)
(xsoq, fsoq) = gm_restart(x0, f_cost, f_grad, f_L;
    f_mu=f_mu, mom=2, restart=0, niter=niter);
In [9]:
# FGM/OGM with function restart
(xsf_fr, fsf_fr, rsf_fr) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=1, restart=1, niter=niter)
(xso_fr, fso_fr, rso_fr) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=2, restart=1, niter=niter);
In [10]:
# FGM/OGM with gradient restart
(xsf_gr, fsf_gr, rsf_gr) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=1, restart=2, niter=niter)
(xso_gr, fso_gr, rso_gr) = gm_restart(x0, f_cost, f_grad, f_L;
    mom=2, restart=2, niter=niter);
In [11]:
fp(x) = log10.(max.(x .- fopt, 1e-9)); # to plot with log scale
In [12]:
# plot with markers showing restarts
iter = 0:niter;
plot(iter, fp(fs0), label="GM", line=(:black),
    xlabel = "iteration", xticks=0:500:1500,
    yticks = -8:2:4,
    ylim = [-9, 5],
    ylabel = L"f(y_k) - f_*")

plot!(iter, fp(fsq), label="GM-q", line=(:black,:dash))
plot!(iter, fp(fsf), label="FGM", line=(:blue,2))
plot!(iter, fp(fsfq), label="FGM-q", line=(:blue,:dash,2))
plot!(iter, fp(fso), label="OGM", line=(:red,2))
plot!(iter, fp(fsoq), label="OGM-q", line=(:red,:dash,2))

plot!(iter, fp(fsf_fr), label="FGM-FR", line=(:cyan,:dot,2))
plot!(iter, fp(fsf_gr), label="FGM-GR", line=(:cyan,:dash,2))
plot!(iter, fp(fso_fr), label="OGM-FR", line=(:magenta,:dot,2))
plot!(iter, fp(fso_gr), label="OGM-GR", line=(:magenta,:dash,2))

#scatter!(rsf_fr, fp(fsf_fr[rsf_fr]), color=:cyan, label="")
scatter!(rsf_gr, fp(fsf_gr[rsf_gr]), color=:cyan, label="")
#scatter!(rso_fr, fp(fso_fr[rso_fr]), color=:magenta, label="fr")
scatter!(rso_gr, fp(fso_gr[rso_gr]), color=:magenta, label="")
Out[12]:
0 500 1000 1500 -8 -6 -4 -2 0 2 4 iteration GM GM-q FGM FGM-q OGM OGM-q FGM-FR FGM-GR OGM-FR OGM-GR