Description
Theory Questions Question 1.1 (5pts) Suppose two cameras fixate on a point P (see Figure 1) in space such that their optical axes intersect at that point. Show that if the image coordinates are normalized so that the coordinate origin (0,0) coincides with the principal point, the F33 element of the fundamental matrix is zero. 1 Figure 1: Figure for Q1.1. C1 and C2 are the optical centers. The principal axes intersect at point P. Question 1.2 (10 pts) Consider the case of two cameras viewing an object such that the second camera differs from the first by a pure translation that is parallel to the x-axis. Show that the epipolar lines in the two cameras are also parallel to the x-axis. Question 1.3 (10 pts) Show that the image of an object and the image of the same object viewed in a mirror are related by a skew-symmetric fundamental matrix, i.e., F = −F T Question 1.4 (10 pts) Homework 1 discussed two cases where a one-to-one mapping (homography) exists between a pair of images: a) when all the points of interest lie on a plane; b) when two cameras are separated by a pure rotation. Consider a point P in 3D, and its projection on camera 1 (p1) and camera 2 (p2). Assume that we know the homography H to map p1 to p2, i.e. p2 = Hp1. • Assume that H is computed from a known plane where P is located, and the two cameras are separated by some translation and rotation (meaning that their centers are different). Given p1, express the corresponding epipolar line in the second camera as a function of p1, H, and e2 (epipole in the second image). • Assume that two cameras are separated by a pure rotation, and the H is computed from this relation. Is it possible to express the epipolar line similar to the previous case? If yes, express it. If not, explain the reason. 2 Implementation Questions The main goal of this homework is to generate novel views of the lovely Smith Hall (the home of Computer Vision and CMU). You will be given two images of Smith Hall as shown in Figure 2a and Figure 2a. Then, with the techniques learned in class, we can recover the 3D scene and generate novel views of Smith Hall as shown in Figure 2c and Figure 2d. We will prepare you for this task by guiding you to implement some basic functions that would be necessary for this task. However, it will be up to you to design your method to complete this task given the basic functions. 2 Figure 2: Smith Hall viewed from different angles. Your final results will not have the purple text box. (a) View 1 of Smith Hall (b) View 2 of Smith Hall (c) Novel view 1 of Smith Hall (d) Novel view 2 of Smith Hall In order to recover the 3D scene, we need to recover the relative positions and viewpoints of the two cameras. To recover this information, we start by computing the fundamental matrix F. Then, with F and the intrinsic parameters K, we can recover the essential matrix by E = KTFK. This is assuming that the intrinsic parameters for both cameras are the same. Once we have the essential matrix, there exists a method1 to recover the relative rotation R and translation t between the two cameras. With the relative rotation and translation, we can compute the camera matrices of the two cameras. Once we have the camera matrices, we can perform triangulation to location the point correspondences in 3D, as shown in Figure 3. In sum, computing the fundamental matrix is the first step to get this started. In this part you will begin by implementing the two different methods seen in class to estimate the fundamental matrix from point correspondences in two images. Fundamental matrix estimation The Eight Point Algorithm The 8-point algorithm (discussed in class, and outlined in Section 10.1 of Forsyth & Ponce) is arguably the simplest method for estimating the fundamental matrix. Question 2.1 (10 pts) Submit a function with the following signature for this portion of the assignment: F = eightpoint norm(pts1, pts2, normalization_constant) 1More details here (http://en.wikipedia.org/wiki/Essential_matrix) and here (lib/camera2.m). 3 Figure 3: 3D reconstruction point cloud of Smith Hall where pts1 and pts2 are 2 × N matrices corresponding to the (x, y) coordinates (one point per column) of the points in the first and second image respectively. normalization_constant is a normalization constant to avoid numerical issues. Numerical issues are caused by the finite precision of floating point numbers in computers. Therefore, to avoid numerical problems, it is best practice to scale the point locations so that they are between [0,1]. For example, for i1.jpg, the width of the image is 1920, so the x-axis value will range from [1,1920]. We should normalize all x-axis values so that it ranges from [0, 1] instead. In practice, we usually set the normalization constant to the larger image dimension, i.e. the width of the image in this case. Mathematically, we are not doing anything different, but implementation wise this is important to get stable results. Finally, remember that the x-coordinate of a point in the image is its column entry, and y-coordinate is the row entry. Also note that eight-point is just a figurative name; your algorithm should use an over-determined system. Run your function on the pts1 and pts2 stored in lib/clean_correspondences.mat. To visualize the correctness of your estimated F8,clean, use the function displayEpipolarF(i1, i2, F) in lib/displayEpipolarF.m. This is a GUI that lets you select a point in one of the images and visualize the corresponding epipolar line in the other image (see Fig.4). You can find the two images of Smith Hall at data/i1.jpg and data/i2.jpg. Do the epipolar lines look reasonable? The correspondences used now are manually selected by humans, so there is very little noise in this set of correspondences. However, in reality, point correspondences are found automatically, so there could potentially be lots of noise in the correspondences. Run your function on the pts1 and pts2 stored in lib/noisy correspondences.mat to compute F8,noisy and visualize the epipolar lines. How do they look? Comparing between using clean and noisy correspondences, which set of point correspondences have more reasonable epipolar lines? In your answer sheet: write your recovered F8,clean and F8,noisy. Also print an output of displayEpipolarF similar to Figure 4 for both fundamental matrices F8,clean and F8,noisy. Finally, in ≤ 2 sentences, please state which fundamental matrix estimate is more accurate, and why. To deal with noisy point correspondences, we mentioned in class that we can use RANSAC (RANdom SAmple Consensus) to help remove noisy point correspondences. As mentioned in class, to enable RANSAC to find a correct F quicker, we want to estimate F with as few points 4 Figure 4: Snapshot of displayEpipolarF as possible. Therefore, before we implement RANSAC, we want to implement the seven point algorithm first. The Seven Point Algorithm Question 2.2 (15 pts) Since the fundamental matrix only has seven degrees of freedom, it is possible to calculate F using only seven point correspondences. This requires solving a polynomial equation. In the section, you will implement the seven-point algorithm described in class, and outlined in Section 15.6 of Forsyth and Ponce. The function should have the signature: F = sevenpoint_norm(pts1, pts2, normalization_constant) where pts1 and pts2 are 2×7 matrices containing the correspondences, and F is a cell array of length either 1 or 3 containing Fundamental matrix/matrices. normalization_constant is the normalization constant to avoid numerical issues. Run your function on the first 7 correspondences of lib/clean_correspondences.mat to recover F7,clean. Use displayEpipolarF to visualize F7,clean and pick the correct one. In your answer sheet: write your recovered F7,clean and print an output of displayEpipolarF. Hints: you can use roots; the epipolar lines may not match exactly due to imperfectly selected correspondences. Now that we have the seven point algorithm, we can proceed to RANSAC. Computing F from Noisy Correspondences with RANSAC Question 2.3 (10 pts) To compute the fundamental matrix from noisy correspondences, we utilize both RANSAC and the seven point algorithm. In each iteration of RANSAC, 7 random point correspondences are chosen and provided to the seven point algorithm to compute a F. Then F is applied to each point correspondence (x1, x2) to check whether the geometric distance between point x2 and epipolar line Fx1 is within a threshold (ideally, the point should be on the line, i.e. x T 2 F x1 = 0). The point correspondences that pass the check are the “inliers”. The goal of RANSAC is to try find a F with the most inliers. Please implement the function below: 5 [F, inliers] = ransacF(pts1, pts2, normalization_constant) where pts1 and pts2 are 2 × N matrices containing the correspondences, and F is a 3 × 3 fundamental matrix found by RANSAC. inliers is a 1 × a vector which stores the indices of the a inliers of F. Run RANSAC on the pts1 and pts2 stored in lib/noisy_correspondences.mat to compute F7,RANSAC and visualize the epipolar lines. How do they look? In your answer sheet: write your recovered F7,RANSAC. Also print an output of displayEpipolarF for F7,RANSAC. Hint: About half of the points in the data are noisy. So you may need to iterate many times (thousands of times) to get a good result. Generating Novel Views of Smith Hall Question 2.4 (30 pts) Now that we have the basic tools for 3D reconstruction, you are ready to generate novel views of Smith Hall. You only need to focus on drawing the two main planes of Smith Hall. Here are a list of functions which might be useful for you. 1. Intrinsic parameters K are stored in data/K.mat. 2. M2 = camera2(F, K1, K2, pts1, pts2) in lib/camera2.m: assuming that the camera matrix M1 for camera 1 is K1 [I 0], it computes the camera matrix M2 for camera 2. It requires the intrinsic parameters K2 and the inlier point correspondences pts1, pts2 which are 2 × N matrices. 3. P = triangulate(M1, pts1, M2, pts2) in lib/triangulate.m: Given the two camera matrices M1, M2 and the inlier point correspondences, it computes the 3D location P of each point correspondence. pts1, pts2 are the point correspondences stored in 2 × N matrices. P is a 3 × N matrix. 4. frame = drawNovelView(smith_south_plane, smith_west_plane, M): Given the 3D plane parameters for the southern plane and the western plane of Smith Hall (as shown in Figure 5), it draws the scene viewed from camera matrix M. Figure 2c and Figure 2d are drawn with this function. smith_south_plane, smith_west_plane are 4 × 1 vectors which corresponds to [a b c d] for the plane equation ax + by + cz + d = 0. Hints: 1. If a group of points belong to the same plane in 3D, there exists a homography to map the points between the two images. Can you use this fact to find planes in the scene? 2. There are two very obvious planes in Smith Hall. Take advantage of it. 3. Camera matrix M ≡ K[Rt], and you can get the K, R, t of the original views using the provided functions above. Now your goal is to manipulate them and get novel Ms to generate novel views. For this task, please use the point correspondences data/noisy_correspondences.mat. You should not need to do any more manual annotation. Your code should be fully automatic and not require a human in the loop. 6 Figure 5: The southern and western plane of Smith Hall. Save your code in genNovelView.m. The script should run out-of-the-box. Dont forget to include all the .m files required by this script in your submission. In your answer sheet, please explain clearly the steps you did to create the 2 novel views. Also, please show 2 novel views of Smith Hall. At least one of the views should be viewing Smith Hall from a higher angle like the one shown in Figure 2d.

