Description
1. (2 marks) Consider the problem of computing a homography from point matches that include outliers. If 50% of the initial matches are correct, how many iterations of the RANSAC algorithm would we expect to have to run in order to have a 95% chance of computing the correct homography? 2. (1 mark) Consider a 3-layer neural network (as shown in the diagram below): h 1 = σ(W1x), h2 = σ(W2h 1 ), f(x) =< w3 , h2 > where W denotes a matrix, w denotes a vector and σ(·) is applied elementwise. Compute ∂f ∂W1 i,j . 1 3. (1 mark) You are training a 3-layer neural network and would like to use backprop to compute the gradient of the cost function. In the backprop algorithm, one of the steps is to update ∆(2) ij := ∆ (2) ij + δ (3) i ∗ (a (2))j for every i, j. Can you rewrite the above equation for all the weights in layer 2 in vector form? (HINT: ∆(2) := ∆(2) + · · ·??) 4. (2 marks) Consider a neural network with 1 hidden layer, d inputs, M hidden units and c output units. Write down an expression for the total number of weights and biases in the network. Consider the derivatives of the error function with respect to the weights for one input example only. Using the fact that these derivatives are given by equations of the form ∂En/∂wkj = δkzj , write down an expression for the number of independent derivatives. 5. (4 marks) It is possible to mathematically show that minimizing the sum-of-squared error in a neural network is equivalent to/derived from the principle of Maximum Likelihood Estimation. (Read the first few chapters of http: // neuralnetworksanddeeplearning. com/ if you need help understanding this.) Now, consider a model in which the target data has the form: yn = f(xn; w) + n where n is drawn from a zero mean Gaussian distribution having a fixed covariance matrix Σ. Derive the likelihood function for this dataset drawn from this distribution, and write down the corresponding error function. Such an error function is called generalized least squares. The usual sum-of-squares error function corresponds to the special case Σ = σ 2 I, where I is the identity matrix. 6. (2.5+2.5=5 marks) One of the major concerns while training a large neural network is the presence of symmetries in the weight space. For a given set of model parameters {θ}, assuming a simple scaling i.e. {γθl}, γ ∈ R for weights in layer l, and multiplying the next layer’s weights by 1/γ, the loss function value remains the same for both weight configurations. This is called scale symmetry. (a) Can you give any one problem caused by scale symmetry while training deep networks? Substantiate your answer with a proper reason or example. (b) Provide any one other symmetry that a deep neural network suffers from (other than scale symmetry) in its weight space. Substantiate your answer with a proper reason or example. 2 Programming (35 marks) • The programming questions are shared in “Assignment 2.zip”. Please follow the instructions in the notebook. Turn-in the notebook via Google Classroom once you finish your work. • Marks breakdown is as follows: Part 1 – Question 1: 2 marks – Question 2: 9 marks – Question 3: 4 marks Part 2 – Implementation of the network: 8 marks – Question 4: 2 marks – Question 5: 2 marks 2 – Question 6: 2 marks – Question 7: 2 marks – Question 8: 2 marks – Question 9: 2 marks Part 3: Ungraded 3

