40.319 Statistical and Machine Learning Homework 3

$30.00

Download Details:

  • Name: Homework-3-tt4owl.zip
  • Type: zip
  • Size: 2.17 MB

Category:

Description

Rate this product

Problem 1 (10 points)
In this problem, we will implement the EM algorithm for clustering. Start by importing the required
packages and preparing the dataset.
import numpy a s np
import m a t pl o tli b . p y pl o t a s p l t
from numpy import l i n a l g a s LA
from m a t pl o tli b . p a t c h e s import E l l i p s e
from s k l e a r n . d a t a s e t s . s am pl e s g e n e r a t o r import make blobs
from s ci p y . s t a t s import m ul ti v a ri a t e n o rm al
K = 3
NUM DATAPTS = 150
X, y = make blobs ( n s ample s=NUM DATAPTS, c e n t e r s=K, s h u f f l e=F al se ,
r and om s t a te =0, c l u s t e r s t d =0.6)
g1 = np . a s a r r a y ( [ [ 2 . 0 , 0 ] , [ −0. 9 , 1 ] ] )
g2 = np . a s a r r a y ( [ [ 1 . 4 , 0 ] , [ 0 . 5 , 0 . 7 ] ] )
mean1 = np . mean (X [ : int (NUM DATAPTS/K ) ] )
mean2 = np . mean (X[ int (NUM DATAPTS/K) : 2 ∗ int (NUM DATAPTS/K ) ] )
X [ : int (NUM DATAPTS/K) ] = np . einsum ( ’ n j , i j −>ni ’ ,
X [ : int (NUM DATAPTS/K) ] − mean1 , g1 ) + mean1
X[ int (NUM DATAPTS/K) : 2 ∗ int (NUM DATAPTS/K) ] = np . einsum ( ’ n j , i j −>ni ’ ,
X[ int (NUM DATAPTS/K) : 2 ∗ int (NUM DATAPTS/K) ] − mean2 , g2 ) + mean2
X[ : , 1 ] −= 4
(a) Randomly initialize a numpy array mu of shape (K, 2) to represent the mean of the clusters,
and initialize an array cov of shape (K, 2, 2) such that cov[k] is the identity matrix for each k.
cov will be used to represent the covariance matrices of the clusters. Finally, set π to be the
uniform distribution at the start of the program.
(b) Write a function to perform the E-step:
def E s tep ( ) :
gamma = np . z e r o s ( (NUM DATAPTS, K) )
. . .
. . .
return gamma
(c) Write a function to perform the M-step:
def M step (gamma ) :
. . .
. . .
(d) Now write a loop that iterates through the E and M steps, and terminates after the change in
log-likelihood is below some threshold. At each iteration, print out the log-likelihood, and use
the following function to plot the progress of the algorithm:
def p l o t r e s u l t (gamma=None ) :
ax = p l t . s u b pl o t ( 1 1 1 , a s p e c t=’ e q u al ’ )
ax . s e t x l i m ([ −5 , 5 ] )
ax . s e t y l i m ([ −5 , 5 ] )
ax . s c a t t e r (X[ : , 0 ] , X[ : , 1 ] , c=gamma, s =50, cmap=None )
for k in range (K) :
l , v = LA. e i g ( cov [ k ] )
t h e t a = np . a r c t a n ( v [ 1 , 0 ] / v [ 0 , 0 ] )
e = E l l i p s e ( (mu[ k , 0 ] , mu[ k , 1 ] ) , 6∗ l [ 0 ] , 6∗ l [ 1 ] ,
t h e t a ∗ 180 / np . pi )
2
e . s e t a l p h a ( 0 . 5 )
ax . a d d a r t i s t ( e )
p l t . show ( )
(e) Use sklearn’s KMeans module
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
to perform K-means clustering on the dataset, and compare both clustering results.
Problem 2 (3 points)
Let p and q be distributions on {1, 2, 3, 4, 5} such that p1 =
1
8
, p2 =
1
2
, p3 = p4 = p5 =
1
8
, and
q1 =
1
4
, q2 = q3 =
1
8
, q4 = q5 =
1
4
.
(a) Compute the cross-entropy H(p, q) in bits. Is H(q, p) = H(p, q)?
(b) Compute the entropies H(p) and H(q) in bits.
(c) Compute the KL-divergence DKL(p | q) in bits.
Show all working and leave your answers in fractions.
Problem 3 (8 points)
(a) Perform singular value decomposition (SVD) on the following matrix
X =


1 1
1 0
0 1

 = UΣV
T
.
(b) For a general design matrix X, why are the columns of the transformed matrix T = XV
orthogonal?
Problem 4 (4 points)
In this problem, we will perform principal component analysis (PCA) on sklearn’s diabetes dataset.
Start by importing the required packages and load the dataset.
import numpy a s np
from s k l e a r n import d e c om p o si ti o n
from s k l e a r n import d a t a s e t s
X = d a t a s e t s . l o a d d i a b e t e s ( ) . data
3
You can find out more on how to use sklearn’s PCA module from:
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
For this problem, make sure the design matrix is first normalized to have zero mean and unit
standard deviation for each column.
(a) Write code to print the matrix V that will be used to transform the dataset, and print all the
singular values.
(b) Now perform PCA on the dataset and print out the 3 most important components for the first
10 data-points.
Problem 5 (5 points)
An AR(2) model assumes the form
rt = φ0 + φ1rt−1 + φ2rt−2 + at,
where at is a white noise sequence. Show that if the model is stationary, then
(a) E(rt) = φ0
1−φ1−φ2
(assume φ1 + φ2 6= 1);
(b) the ACF is given by
ρ(1) = φ1
1 − φ2
, ρ(s) = φ1ρ(s − 1) + φ2ρ(s − 2), ∀ s ≥ 2.
4