Name: IE 613: Online Machine Learning Assignment 1
SKU: 5897
Availability: InStock

Description

5/5 - (1 vote)

Question 1 (Full information setting, 20 Points) Consider the problem of prediction with expert advice with d = 10. Assume that the losses assigned to each expert are generated according to independent
Bernoulli distributions. The adversary/environment generates loss for experts 1 to 8 according to Ber(0.5)
in each round. For the 9th expert, loss is generated according to Ber(0.5 − ∆) in each round. The losses for
the 10th expert are generated according to different Bernoulli random variable in each round– for the first
T /2 rounds, they are generated according to Ber(0.5 + ∆) and the remaining T /2 rounds they are generated
according to Bernoulli random variable Ber(0.5 − 2∆). ∆ = 0.1 and T = 105
. Generate (pseudo) regret
values for different learning rates (η) for Weighted Majority algorithm. The averages should be taken over at
least 20 sample paths (more is better). Display 95% confidence intervals for each plot. Vary c in the interval
[0.1 2.1] in steps of size 0.2 to get different learning rates. Implement Weighted Majority algorithm with
η = c
p
2 log(d)/T.
Question 2 (Bandit setting, 30 Points) Consider the problem of multi-armed bandit with K = 10 arms.
Assume that the losses are generated as in Question 1. For each of the following algorithms generate (pseudo)
regret for different learning rates (η) for each of the following algorithms. The averages should be taken over
atleast 50 sample paths (more is better). Display 95% confidence intervals for each plot. Vary c in the
interval [0.1 2.1] in steps of size 0.2 to get different learning rates.
• EXP3. Set η = c
p
2 log(K)/KT.
• EXP3.P. Set η = c
p
2 log(K)/KT, β = η, γ = Kη.
• EXP-IX. Set η = c
p
2 log(K)/KT, γ = η/2.
Question 3 (5 Points) In Question 2, which one of EXP3, EXP3.P and EXP3-IX performs better and
why?
Question 4 (10 Points) Show that for any deterministic policy π there exists an environment ν such that
RT (π, ν) ≥ T(1 − 1/K) for T rounds and K arms.
Question 5 (15 Points) Suppose we had defined the regret by
R
track
T
(π, ν) = E
“X
T
t=1
max
i∈[K]
xti −
X
T
t=1
xtIt
#
where It is the arm chosen by the policy π and xtIt
is the reward observed in the round t. At first sight
this definition seems like the right thing because it measures what you actually care about. Unfortunately,
1-1
1-2 Homework 1: February 7
however, it gives the adversary too much power. Show that for any policy π (randomized or not) there exists
a ν ∈ [0, 1]K×T
such that
R
track
T
(π, ν) ≥ T

1 −
1
K

Question 6 (15 Points) Let p ∈ Pk be a probability vector and suppose Xˆ : [k]×R → R is a function such
that for all x ∈ R
k
, if A ∼ p,
E[Xˆ(A, xA)] = X
k
i=1
piXˆ(i, xi) = x1.
Show there exists an a ∈ R
k
such that such that Pk
j=1 ajpj = 0 and Xˆ(i, x) = ai +
I{i=1}x1
p1
.
Question 7 (5 Points) Suppose we have a two-armed stochastic Bernoulli bandit with µ1 = 0.5 and µ2 =
0.55. Test your implementation of EXP3 from the Question 2. What happens when T = 105 and the sequence
of rewards is xt1 = I{t ≤ T /4} and xt2 = I{t > T /4}?
Submission Format and Evaluation: Your should submit a report along with your code. Please
zip all your files and upload via Moodle. The zipped folder should named as YourRegistrationNo.zip
e.g.‘154290002.zip’. The report should contain two figures: first figure should have one plot corresponding to algorithm in Q.1 and the other should have 3 plots one corresponding to each algorithm in Q.2. For
each figure, write a brief summary of your observations. We may also call you to a face-to-face session to
explain your code.
Note: Please calculate (pseudo) regret for each algorithm in Q.2 for a given set of parameters as follows:
Let µ
i
t denote the mean of arm i in round t. Suppose an adversary generates sequence of loss vectors {lt}
T
t=1
and an algorithm generates sequence of pulls {It}
T
t=1, the (pseudo) regret for this sample path is
X
T
t=1
E[lt(It)] − min
i
X
T
t=1
E[lt(i)] (1.1)
=
X
T
t=1
µ
It
t − min
i
X
T
t=1
µ
i
t
(1.2)
Note that in this calculation we only considered the mean values of losses, not the actual losses suffered. It
is Okay if this value turns out to be negative. There is no expectation over random choices of Its here. Now
generate 20 such sample paths and take their average.

IE 613: Online Machine Learning Assignment 1

Download Details:

Description

Related products

IE 613: Online Machine Learning Assignment 2

IE613: Online Machine Learning Assignment 3