Description
1. Consult the AdaBoost algorithm given in Bishop Chapter 14. Suppose there are two weak leaners
ℎ1 and ℎ2, and a set of 17 points.
a) Let say ℎ1 makes one mistake and ℎ2 makes four mistakes on the dataset. Which leaner
will AdaBoost choose in the first iteration (namely m=1)?
b) What is 𝛼𝛼1?
c) Calculate the data weighting co-efficient 𝑤𝑤2 for the following two cases (1) the points on
which chosen leaner made a mistake and (2) the points on which the chosen leaner did
not make a mistake. [10 Points]
2. The diagram shows training data for a binary concept where a heart denotes positive examples.
Also shown are three decision stumps (A, B and C) each of which consists of a linear decision
boundary. Suppose that AdaBoost chooses A as the first stump in an ensemble and it has to decide
between B and C as the nest stump. Which will it choose? Explain. What will be the ϵ and α values
for the first iteration? [5 Points]
3. Consider cluster 1D data with a mixture of 2 Guassian using the EM algorithm. You are given the
ID data points x = [1 10 20]. Suppose the output of the E step is the following matrix
𝑅𝑅 = �
1 0
0.4 0.6
0 1
�
where entry 𝑟𝑟𝑖𝑖,𝑐𝑐 is the probability of observation 𝑥𝑥𝑖𝑖 belonging to cluster 𝑐𝑐 ( the responsibility of cluster c
for data point i). You have to compute the M step. You may state the equations for maximum likelihood
estimates of these quantities ( which you should know) without proof; you just have to apply the
equations to this data set. You may leave your answer in fractional form.
a. Write down the likelihood function you are trying to optimize.
b. After performing M step for the missing weights 𝜋𝜋1, 𝜋𝜋2, what are the new values?
c. After performing M step for the means 𝜇𝜇1, 𝜇𝜇2, what are the new values?
[10 Points]
4. In the following figure some data points are shown which lie on integer grid. Suppose we apply
the K-means algorithm to this data, using K =2 and with the centers initialized at the two circled
data points. Draw the final clusters obtained after K-means converges. [5 Points]
5. Consider training a two-input perceptron. Give an upper bound on the number of training
examples sufficient to assure with 90% confidence that the learned perceptron will have true
error of at most 5%? [5 Points]
6. The VC dimension is always less than size of the hypothesis space. True/False?
[5 Points]
7. Computational Learning Theory
[10 Points]
(a) Consider the class C of concepts of the form: (𝑎𝑎 ≤ 𝑥𝑥1 ≤ 𝑏𝑏) ᴧ (𝑐𝑐 ≤ 𝑥𝑥2 ≤ 𝑑𝑑). Note that each
concept in this class corresponds to a rectangle in 2-dimensions. Let a, b be integers in the range
[0, 199] and c, d be integers in the range [0, 99]. Give an upper bound on the number of training
examples sufficient to assure that for any target concept c Є C, any consistent learner using H = C
will, with probability 0.99, output a hypothesis with error at most 0.05.
(b) Consider the class C of concepts of the form: (𝑎𝑎 ≤ 𝑥𝑥1 ≤ 𝑏𝑏) ᴧ (𝑐𝑐 ≤ 𝑥𝑥2 ≤ 𝑑𝑑) ᴧ (𝑒𝑒 ≤ 𝑥𝑥3 ≤ 𝑓𝑓).
Note that each concept in this class corresponds to a hyper-rectangle in 3-d. Now suppose that a,
b, c, d, e, f take on real values instead of integers. Give an upper bound on the number of training
examples sufficient to assure that for any target concept c ϵ C, a learner will, with probability 0.95,
output a hypothesis with error at most 0.01.

