jeremykun.com
Bandits and Stocks
So far in this series we’ve seen two nontrivial algorithms for bandit learning in two different settings. The first was the UCB1 algorithm, which operated under the assumption that the reward…