CSE 891-001: Deep Learning Homework 4

$30.00

Download Details:

  • Name: 04-5d1ow7.zip
  • Type: zip
  • Size: 2.91 MB

Category:

Description

5/5 - (1 vote)

1 Reversible Architectures [3pts]: In this section, we will investigate a variant for implementing reversible block with affine coupling layers. Consider the following reversible affline coupling block:
y1 = exp(G(x2)) ◦ x1 + F(x2)
y2 = exp(s) ◦ x2
(1)
where ◦ denotes element-wise multiplication. The each inputs x1, x2 ∈ R
d
2 . The functions F and G maps
from R
d
2 → R
d
2 . This modified block is identical to the ordinary reversible block, except that the inputs
x1 and x2 are multiplied element-wise by vectors exp(F(x2)) and exp(s).
1. (1pt) Give the equations for inverting this block, i.e. computing x1 and x2 from y2 and y2. You may
use / to denote element-wise division.
2. (1pt) Give a formula for the Jacobian ∂y
∂x
, where y denotes the concatenation of y1 and y2 . You may
denote the solution as a block matrix, as long as you clearly define what the matrix for each block
corresponds to.
3. (1pt) Give a formula for the determinant of the Jacobian from previous part, i.e. compute det 
∂y
∂x

.
Is this a volume preserving transformation? Justify your answer.
1
2 Variational Free Energy [6pts]: In this question you will derive some expressions related to variational
free energy which is maximized to train a VAE. Recall that the VFE is defined as:
F(q) = Eq[log p(x|z)] − DKL(q(z)||p(z))
where KL divergence is defined as
DKL(q(z)||p(z)) = Eq[log q(z) − log p(z)]
We will assume that the prior z is a standard Gaussian:
p(z) = N (z; 0, I) =
D

i=1
pi(zi) =
D

i=1
N (zi
; 0, 1)
Similarly we will assume that the variational approximation q(z) is a fully factorized (i.e., diagonal) Gaussian:
q(z) = N (z; µ, Σ) =
D

i=1
qi(zi) =
D

i=1
N (zi
; µi
, σi)
1. (1pt) Show that:
F(q) = log p(x) − DKL(q(z)||p(z|x))
2. (1pt) Show that the KL term decomposes as a sum of KL terms for individual dimensions. In particular,
DKL(q(z)||p(z)) = ∑
i
DKL(qi(zi)||pi(zi))
3. (2pts) Give an explicit formula for the KL divergence DKL(qi(zi)||pi(zi)). This should be a mathematical expression involving µi and σi
.
4. (2pts) One way to do gradient descent on the KL term is to apply the formula from above. Another
approach is to compute stochastic gradients using the reparameterization trick:
∇θDKL(qi(zi)||pi(zi)) = Ee[∇θti
]
, where
θ =

µi
σi

and
zi = µi + σiei
ri = log qi(zi)
si = log pi(zi)
ti = ri − si
(2)
Show how to compute a stochastic estimate of ∇θDKL(qi(zi)||pi(zi)) by doing backpropagation on
the above equations. You may find it helpful to draw the computation graph.
2
3 Feedback (1pt):
1. What aspects of the written and programming homeworks did you enjoy for this course?
2. What aspects of the written and programming homeworks did you hate for this course?
3. Suggestions for what you would like to modify in the homeworks.
4. Suggestions for course content/lecture slides and topics.