Description
1. Suppose we have a data set {x1, · · · , xN } and out goal is to partition the data set in to
K clusters with µk representing the center of the k-th cluster. Recall that in K-means
clustering we are attempting to minimize an objective function defined as follows:
J =
X
N
n=1
X
K
k=1
rnkkxn − µkk
2
2
,
where rnk ∈ {0, 1} and rnk = 1 only if xn is assigned to cluster k.
(a) What is the minimum value of the objective function when K = N (the number
of clusters equals to the number of samples)?
(b) Adding a regularization term, the objective function now becomes:
J =
X
K
k=1 ”
λkµkk
2
2 +
X
N
n=1
rnkkxn − µkk
2
2
#
.
Consider the optimization of µk with all rnk known. Find the optimal µk for
argminµk
λkµkk
2
2 +
X
N
n=1
rnkkxn − µkk
2
2
.
Discuss your answer. How would the regularization affect each step of the Kmeans clustering algorithm?
1
2. We have unlabeled data xn ∈ RM, n = 1, · · · , N. Suppose we want to use L1 distance
instead of L2 distance to cluster the data into K clusters. The objective function we
are minimizing becomes:
J =
X
N
n=1
X
K
k=1
rnkkxn − µkk1,
where kzk1 =
PM
i=1 |zi
| for z ∈ RM. The parameters rnk ∈ {0, 1} and rnk = 1 only if
xn is assigned to cluster k.
In the maximization step, with rnk fixed, define Ck = {n|rnk = 1} for the k-th cluster.
Then we need to find µk that minimizes the following function:
f(µk) = X
n∈Ck
kxn − µkk1. (1)
Define xni to be the i-th element in xn and µki to be the i-th element in µk. We can
expand (1) as following:
f(µk) = X
n∈Ck
kxn − µkk1 =
X
n∈Ck
X
M
i=1
|xni − µki| =
X
M
i=1
X
n∈Ck
|xni − µki|.
The above expansion shows that we can optimize for each element in µk separately.
(a) Consider first the problem of finding ¯y
∗
that minimizes f(¯y) = PNk
j=1 |yj − y¯|
for yj ∈ R. Because f(¯y) is not differentiable everywhere, we need the notion of
subgradient. We say g ∈ R is a subgradient of f at x ∈ domf of for all z ∈ domf:
f(z) ≥ f(x) + g(z − x).
The subgradient of f at point x where f is differentiable equals to the derivative
of f at x. A function f is called subdifferentiable at x if there exists at least one
subgradient at x. The set of subgradient of f at point x is called subdifferential
of f at y and is denoted as ∂f(x). Find ∂f(x) of f(x) = |x| for x < 0, x > 0 and
x = 0.
(b) For simplicity, we assume y1 < y2 < · · · < yNk−1 < yNk
and Nk being odd. Also
assume that f(¯y) is convex and is subdifferentiable everywhere. Use the following
theorem:
A point x
∗
is a minimizer of a convex function f if and only if f is subdifferentiable
at x
∗ and 0 ∈ ∂f(x
∗
), i.e., g = 0 is a subgradient of f at x
∗
.
Show that the ¯y
∗
that minimizes PNk
j=1 |yj −y¯| is the median of {y1, · · · , yNk
}, i.e.,
argmaxy¯f(¯y) = y Nk+1
2
.
(c) Write a two-step algorithm similar to the K-means algorithm that minimizes J.
Comment on the advantage of this algorithm compared to the K-means algorithm.
2
3. Consider the matrix
A =
3 2
2 0
.
(a) Find the eigenvalues and eigenvectors of A. Show your steps. Make sure to
normalize your eigenvectors to have unit norm.
(b) Find the eigenvalue decomposition of A using the eigenvalues and eigenvectors
you found. Hint: A is a symmetric matrix.
3
4. Answer the following questions regarding positive (semi-)definite matrix. A symmetric
real matrix M is said to be positive definite if the scalar z
TMz is positive for every
non-zero column vector z.
(a) Consider the matrix
A =
9 6
6 a
.
What should a satisfy so that the matrix A is positive definite?
(b) Suppose we know matrix B is positive definite. Show that B−1
is also positive
definite. Hint: use the definition and the fact that every positive definite matrix
is non-singular (invertible).
(c) Show that the data covariance matrix S in PCA is positive semi-definite.
4
5. One application of the K-means algorithm is image segmentation and image compression. The goal of image segmentation is to partition an image into regions that have
relatively similar visual appearance. Each pixel in an image can be viewed as a point
in a 3-dimensional space which contains the intensity of the 3 color red, green and
blue. K-means algorithm can be used to cluster the points in the 3-dimensional space
in to K clusters therefore achieve segmentation. After segmentation, compression is
achieved by replacing each pixel with the {R,G,B} triplet given by µk, the center the
cluster to which it is assigned.
In this exercise, you will implement the K-means algorithm to segment and compress
the image UCLA Bruin.jpg. Note: for submission, you may turn in the required images
in black and white.
(a) Visualization. The picture of the famous Bruin bear contains 300 × 400 pixels.
Read the image into MATLAB using imread and show the image using imshow.
(b) K-means Algorithm with K = 4. Implement the K-means algorithm using all
of the following specifications:
• Partition the pixels into K = 4 clusters.
• To allow for a deterministic result, initialize the cluster centers using the
furthest-first heuristic on page 180 of A Course in Machine Learning. The
heuristic is sketched below:
– Pick the first pixel of the image, whose {R,G,B} values are [229, 249, 250],
as the center for the first cluster, i.e., µ1 = [229, 249, 250].
– For k = 2, · · · , K: find the example n
∗
that is as far as possible from
all previously selected means. Namely, n
∗ = argmax
n
min
k
0<k
kxn − µk
0k
2
. Set
µk = xn∗ .
• Run the K-means algorithm for 10 iterations. An iteration consists the following two steps:
– Step 1, assign each example to the cluster whose center is the closest.
– Step 2, re-estimate the center of each cluster.
Calculate the K-means objective function:
J =
X
N
n=1
X
K
k=1
rnkkxn − µkk
2
2
,
at the end of each iteration. The parameters rnk ∈ {0, 1} and rnk = 1 only if xn is
assigned to cluster k. For this image, we have xn ∈ R3
, i = 1, · · · , N, N = 120000.
Generate a plot showing Js v.s. iterations. Comment on the convergence of the
K-means algorithm.
(c) Compression with K = 4, 8 and 16. For K = 4, 8 and 16, compress the
UCLA Bruin image using your K-means algorithm. For compression, replace the
{R,G,B} values of each pixel with the center of the cluster to which it belongs.
Use the same specifications in (b) and report the value of the objective function
after your last iteration. Show your compressed image using imshow. Comment
on how K affects the quality of the compressed image.
5
(d) Compression Ratio. In the original image, each of the 300×400 pixels comprises
{R,G,B} values each of which is stored with 8 bits of precision, i.e., 0 − 255.How
many bits do you need to store the original image?
Now you have compressed your image using the K-means algorithm. For each
pixel, you store only the index of cluster to which it is assigned. You also need
to store the value of the K centers with 8 bits of precision per color. How many
bits do you need to store the compressed image with K = 4, 8 and 16? What are
the compression ratios?

