. Problem 1: Version Spaces [30 points] In this problem we will investigate the geometry of version spaces for different hypotheses. For all parts of this question, training instances are vectors in Z 2 , i.e. points on a 2D lattice, with labels c ∈ {−1, 1}. (a) Rectangular hypothesis Consider a hypothesis of the form (pT L, pBR), where pi = {xi , yi |xi , yi are integers and 0 ≤ xi , yi ≤ 8} define the diagonally-opposite corners of a rectangle (Top Left, Bottom Right, respectively). An instance (x, y) is classified positive if it falls inside, or on the boundary of this rectangle, and negative otherwise. Note, that the definition allows for degenerate cases, where two or four of the corners overlap. In addition, allow for pT L and pBR to take on a special value ∅. If either pT L = ∅ or pBR = ∅, all instances are classified as negative. This hypothesis definition applies to problems 1a to 1d. What is the size of this hypothesis space? (b) Rectangular hypothesis A hypothesis h1 is considered more general than hypothesis 1-1 h2 (and h2 more specific than h1) if h2 implies h1 The most general (most specific) hypothesis h ∗ is a hypothesis, such that no other hypothesis is more general (more specific) than h ∗ . Draw the most general hypothesis that satisfies the training data D1 in Figure 1. Draw the most specific hypothesis. (c) Rectangular hypothesis What is the size of the version space for the training data D1? (d) Rectangular hypothesis In a form of Active Learning, a learner can query the teacher for more data. The goal of the learner would be to pick query instances, such that the size of the version space is reduced the most (which means that the greatest number of inconsitent hypotheses is pruned after observing the query instance). Consider the following 3 candidates for a query instance: (3,7), (4,3), (4,6). Compute, for each, the expected size of the version space after observing the training set D1 and the label for that query instance (considering the two possible labels have an equal chance of occuring). Between the three query candidates, is there a best choice? x y Figure 1: Training set D1 (e) Decision Tree hypothesis Now consider a hypothesis space formed by all 1-level decision trees. Since all of the attributes for our data are integer-valued, consider that splitting thresholds in our decision tree are also constrained to integers, and each node in the tree splits on a single attribute, x or y, with a splitting criterion attribute ≥ threshold. Give the size of the version space for a 1-level decision tree and the training set D2. Note that in general multiple trees can represent the same function, and we are interested in counting functions. 1-2 (f) Decision Tree hypothesis Now consider a hypothesis space formed by all decision trees with 3 leaf nodes, with the same splitting criteria as above. What is the size of the version-space now? Draw all hypotheses that belong to the version space in the (x, y) space of Figure 2. x y Figure 2: Training dataset D2 Problem 2: Regression with kNN [30 points] In this problem, we will investigate the application of kNN to regression in a setting of partial image completion. You have been hired by the NSA to help them reconstruct corrupted surveillance photos of possible criminals. Due to a glitch in their cameras, some images are partially blank. Your goal will be to provide the “best” completion of the image using kNN. Included with the assignment is a subset of the Olivetti face dataset. The dataset is a collection of 64×64 grayscale images of peoples faces. This dataset was processed into the SVMLight format for this assignment. This file format will be used for all programming assignments in this course. In this file, each line corresponds to a training instance, providing the label, and the feature-value pairs for that instance: