Description
Question 1 (5 points) Suppose that X is σ−subgaussian and X1 and X2 are independent and σ1 and
σ2−subgaussian respectively, then:
1. E[X]=0 and Var[X] ≤ σ
2
.
2. cX is |c|σ−subgaussian for all c ∈ R.
3. X1 + X2 is p
σ
2
1 + σ
2
2−subgaussian.
Question 2 (10 points) Suppose that X is zero-mean and X ∈ [a, b] almost surely for constants a < b.
1. Show that X is (b − a)/2−subgaussian.
2. Using Cramer-Chernoff method shows that if X1, X2, . . . , Xn are independent and Xt ∈ [at, bt] almost
surely with at < bt for all t, then prove
P
Xn
t=1
(Xt − E[Xt]) ≥
!
≤ exp
−
2
2
Pn
t=1(bt − at)
2
Question 3 (5 points) [Expectation of maximum] Let X1, . . . , Xn be a sequence of σ-subgaussian random
variables (possibly dependent) and Z = maxt∈[n] Xt. Prove that
1. E[Z] ≤
p
2σ
2 log(n).
2. P(Z ≥
p
2σ
2 log(n/δ)) ≤ δ for any δ ∈ (0, 1).
Question 4 (20 points) [Bernstein’s inequality] Let X1, . . . , Xn be a sequence of independent random variables with Xt − E[Xt] ≤ b almost surely and S =
Pn
t=1
(Xt − E[Xt]) and v =
Pn
t=1
V [Xt].
1. Show that g(x) = 1
2 +
x
3! +
x
2
4! + · · · =
(exp(x)−1−x)
x2 is increasing.
2. Let X be a random variable with E[X] = 0 and X ≤ b almost surely. Show that E[exp(X)] ≤ 1 +
g(b)V [X].
3. Prove that (1+α) log(1+α)−α ≥
3α
2
6+2α
for all α ≥ 0. Prove that this is the best possible approximation
in the sense that the 2 in the denominator cannot be increased.
2-1
2-2 Homework 2: March 19
4. Let > 0 and α = b/v and prove that
P(S ≥ ) ≤ exp
−
v
b
2
((1 + α) log(1 + α) − α)
≤ exp
−
2
2v
1 + b
3v
!
.
5. Use the previous result to show that
P
S ≥
s
2v log
1
δ
+
2b
3
log
1
δ
!
≤ δ.
Question 5 (10 points) Show that
RT ≤ min
T ∆, ∆ + 4
∆
1 + max
0, log
T ∆2
4
implies the regret of an optimally tuned Explore-then-Commit (ETC) algorithm for subgaussian 2−armed
bandits with means µ1, µ2 ∈ R and ∆ = |µ1 − µ2|, satisfies RT ≤ ∆ + C
√
T where C > 0 is a universal
constant.
Question 6 (5 points) Fix δ ∈ (0, 1). Modify the ETC algorithm to depend on δ and prove a bound on
the pseudo-regret RT = T µ? −
PT
t=1 uAt of ETC algorithm that holds with probability 1 − δ where At is the
arm chosen in the round t.
Hint: Choose ‘m’ appropriately in the regret upper bound of ETC algorithm which is proved in the class.
Question 7 (5 points) Fix δ ∈ (0, 1). Prove a bound on the random regret RT = T µ? −
PT
t=1 Xt of ETC
algorithm that holds with probability 1 − δ. Compare this to the bound derived for the pseudo-regret in the
question 5. What can you conclude?
Question 8 (10 points) Assume the rewards are 1−subgaussian and there are k ≥ 2 arms. The −greedy
algorithm depends on a sequence of parameters 1, 2, . . .. First it chooses each arm once and subsequently
chooses At = arg max
i
µˆi(t − 1) with probability 1 − t and otherwise chooses an arm uniformly at random.
1. Prove that if t = > 0, then lim
T→∞
RT
T =
k
P
k
i=1
∆i.
2. Let ∆min = min{∆i
: ∆i > 0} where ∆i = µ
? − µi, and t = min n
1,
Ck
t∆2
min o
where C > 0 is a
sufficiently large universal constant. Prove that there exists a universal C
0 > 0 such that
RT ≤ C
0X
k
i=1
∆i +
∆i
∆2
min
log max
e,
T ∆2
min
k
.
Question 9 (10 points) Fix a 1−subgaussian k−armed bandit environment and a horizon T. Consider
the version of UCB that works in phases of exponentially increasing length of 1, 2, 4, . . .. In each phase, the
algorithm uses the action that would have been chosen by UCB at the beginning of the phase
Homwork 2: March 19 2-3
1. State and prove a bound on the regret for this version of UCB.
2. How would the result change if the l
th phase had a length of dα
l
e with α > 1?
Submission Format and Evaluation: Your should submit a report along with your code. Please
zip all your files and upload via Moodle. The zipped folder should named as YourRegistrationNo.zip
e.g.‘154290002.zip’. The report should contain one figure with four plots corresponding to each algorithm in
Q.1. Write a brief summary of your observations. We may also call you to a face-to-face session to explain
your code.

