Loading [MathJax]/jax/output/CommonHTML/jax.js
Convex Optimization
-
Problem Setup
-
Timechoice 1choice 2⋯choice n1x(1)1,l(1)1x(1)2,l(1)2⋯x(1)n,l(d)n2x(2)1,l(2)1x(2)2,l(2)2⋯x(2)n,l(d)n⋮⋱⋱⋱⋮tx(t)1,l(t)1x(t)2,l(t)2⋯x(t)n,l(d)n⋮⋱⋱⋱⋮Tx(T)1,l(T)1x(T)2,l(T)2⋯x(T)n,l(d)n
Conditions:
∀t,x(t)i≥0;n∑i=1x(t)i=1;
-
At step t the algorithm has loss n∑i=1x(t)i⋅l(t)i
-
l(t)i is unknown ahead of time
-
Regret: actual performance minus best fixed strategy w/ hindsight
RT=[T∑t=1n∑i=1x(i)tl(i)t]−[mini=1⋯nT∑t=1l(t)i]
Comparing against someone who knew everything but is only able to make a fixed choice
-
Offline optimum: regret = 0
-
The actions performed are online since we do not have access to loss beforehand
-
Multiplicative Weights
-
Stores weights w(t)i initialized to 1
-
parameterized by 0<ε<12
-
update: w(t+1)i=(1−ε)l(t)i
update penalizes weights that generate loss
-
At time t play x(T)i=w(t)i∑w(t)i
-
Guarantees RT≤O(√Tlogn)
And RTT≤O(√lognT), RTT→0 as T→∞
-
Regret bound:
T∑t=1n∑i=1x(i)tl(i)t≤minx∗T∑t=1n∑i=1x∗tl(i)t+εT+lnnεloss of algorithmoffline optimumregret
Where x∗ is a fixed strategy
-
Multiplicative Weights Analysis
-
Assume l∈[0,1],ε∈[0,12
-
Definitions:
Lt=n∑i=1x(i)tl(i)t; loss at time t
L∗=miniT∑t=1l(t)i; offline optimum
εT+lnnε; regret
Wt=n∑i=1w(t)i; total weight at time t
-
Lemma 1:
Wt+1≥(1−ε)L∗
Proof:
w(t+1)i=(1−ε)l(1)i+l(2)i+l(3)i⋯l(T)i
Wt≥w(t)i,∀i
L∗=∑l(1)i⋯l(T)i for some i
-
Lemma 2:
Wt+1≤n⋅T∏t=1(1−εLt)
Proof:
W1=n since all weights are initialized as 1
Claim that Wt+1≤Wt⋅(1−εLt)
Wt+1=∑w(t+1)i=n∑i=1w(t)i⋅(1−ε)ℓ(t)i=Wtn∑i=1x(t)i⋅(1−ε)ℓ(t)i≤Wtn∑i=1x(t)i(1−ε⋅l(t)i)=Wt⋅(n∑i=1x(t)i−n∑i=1εxtil(t)i)=Wt⋅(1−εLt)
-
From lemmas 1 and 2:
(1−ε)L∗≤nT∏t=1(1−εLt)ln(1−ε)L∗≤ln(nT∏t=1(1−εLt))L∗ln(1−ε)≤lnn+T∑t=1ln(1−εLt)
-
Multiplicative Weights: More General Losses
-
Follow the Leader
-
A more general case / parent of MW
-
Setup:
TimeAlgorithmLossLoss of Algorithmt=1x(1)∈Kf1:K→Rf1(x(1))t=2x(2)∈Kf2:K→Rf2(x(2))⋮⋮⋮⋮
-
at time t x(t)=argminx∈Kf1(x)+…ft−1(x)
-
does badly in an interesting way
keeps changing its mind and choosing sub-optimal choices
-
Regret can be bounded in with respect to how much the algorithm changes its mind
-
FTL Analysis
-
Theorem: RT≤T∑t=1ft(x(t))−ft(x(t+1))
i.e. bounded by how much the loss function changes
-
Proof: Induction on T
T∑t=1ft(x(t+1))≤minx∈ΔT∑t=1ft(x)
base case: t=1, by definition
assume true for t, prove for t+1
T+1∑t=1ft(x(t+1))=fT+1(x(T+2))+T+1∑t=1ft(x(t+1))≤fT+1(x(T+2))+T∑t=1ft(x(T+2))=T+1∑t=1ft(x(T+2))
-
Follow the Regularized Leader
-
define a function R:K→R
-
Algorithm: x(t)=argminx∈Kf1(x)+⋯+ft−1(x)+R(x)
-
Pick an R that has an unique minimum and is very smooth
-
New regret bound:
∑Tt=1ft(x(t))−∑Tt=1ft(x)≤(∑Tt=1ft(x(t))−ft(x(t+1)))+R(x)−R(x(1))
i.e. RT≤(∑Tt=1ft(x(t))−ft(x(t+1)))+maxx∈KR(x)−R(x(1))
-
R should be spread out to fight the tendency of x to be very concentrated
smaller if distribution is spread out, larger if distribution is concentrated
-
FTRL with entropy as regularizer = MW