Description
(1) (Existence of Xopt) Is the (i) minimum and (ii) maximum for the function
f(x) = x
2
1 + e
x2 + e
−x2 + 3x
4
3
exist for
(a) over the set X = {x ∈ R3
: x
2
1 + 2x
2
2 + 3x
2
3 ≤ 1}?
(b) over the set R3
?
(2) (GM for a non-convex function) Consider the implementation of the Gradient Descent
Method (GM) to f : R2 → R given by
f(x) = 1
2
x
2
1 +
1
4
x
4
2 −
1
2
x
2
2
(a) Plot the function in a 3-D figure and identify the local minima.
(b) Write the GM iteration, and consider the trajectory of GM with a small enough
(identify how small it should be for guaranteed convergence) constant stepsize for
the initial state x
(0) = (1, 0)T
. Does {x
(t)}t converge to one of the identified minima?
Does your observation conflict with the claimed results in class?
(3) (GM for a convex non-Lipschitz gradient function) Consider the operation of GM with
constant step-size on the function f : R2 → R given by
f(x) = kxk
3/2
.
(a) Show that this function does not have Lipschitz Of for any choice of C < ∞.
(b) Show that for any value of the constant stepsize γ > 0, the method either converges
in a finite number of iterations to the unique minimizing point x
∗ = 0, or else it
does not converge to x
∗
.
(4) (GM for Linear Regression) Recall the Least-Square minimization problem formulated in
Lecture 3 for Linear Regression with hθ(x) = θ0+
Pn
j=1 θjxj . Given a set {(x
(m)
, y(m)
)}m=1,…,M
of M input-output observations, suppose you want to apply steepest descent to finding
the optimal vector θ
?
that minimizes the quadratic cost. Express the update rule for the
vector θ under GM.
(5) (Convex Sets) Problem 2.11 from B & V.
(6) (Convex Functions) Problem 3.16 from B & V [specify: convex, concave, both, or neither.]
(7) (Logistic Regression convexity) Recall the Logistic Regression cost function expressed in
Lecture 3. In particular, for the following function:
f(θ) = −y log(hθ(x)) − (1 − y) log(1 − hθ(x), where hθ(x) = 1
1 + exp(−θ
T x)
,
prove that this cost is a convex function of the vector θ ∈ R
n
for each x ∈ R
n and
y ∈ [0, 1].
(You can use the fact that sum of two convex functions is also convex without proof.)
(8) (GM Implementation-Investigation) Recall that the steepest gradient descent algorithm
with a constant gradient is given as:
x
k+1 = x
k − γ∇f(x
k
).
We showed in class that if f is a function with a Lipschitz gradients with constant L, then
f(x
k+1) ≤ f(x
k
) − γ(1 −
γL
2
)k∇f(x
k
)k
2
(1)
Consider the two-dimensional quadratic function f(x, y) = x
2 + 20y
2
. Run the ”gradientconstant” algorithm provided.
(a) Determine µ and L, where µ and L are the least and the greatest eigenvalues of the
Hessian O
2f(x, y), respectively. (That is, µI ≤ ∇2
f(x, y) ≤ LI.) Set the variables mu
and L in the MATLAB code “mainOpt.m” to these values.
(b) Recall the interval (A, B) of the step size γ for which convergence is feasible. Set
γ1 = B, the upper limit of the interval .
(c) Using the MATLAB code provided, investigate the behavior of convergence in an
-neighborhood of B. Use the row vector gammaV ec in “mainOpt.m” to assign the
competing step sizes. What do you observe? Comment on these observations.
(Note: Choose < 0.001 to make your observations.)
(d) Find the step size γ3 using the above upper bound 1 in terms of L that the achieves
fastest convergence rate for the steepest descent algorithm. Compare and illustrate
with a plot the convergence speed of this choice with respect to other extremes (in
particular, compare to too small and barely stabilizing choices of γ)
(e) f(y) ≥ f(x) + ∇f(x)
T
(y − x) + µ
2
ky − xk
2
, since ∇2f(z) ≥ µI, for all z.
Use the row vector gammaV ec to assign competing step sizes, γ1, γ2 =
2
L+µ
and
γ3. What do you observe? Do these observations align with your findings in (b)-(d).
Comment.

