Online Meta-Learning -
## Introduction Two distinct research paradigms have studied how prior tasks or experiences can be used by an agent to inform future learning. * Meta Learning: past experience is used to acquire a prior over model parameters or a learning procedure, and typically studies a setting where a set of meta-training tasks are made available together upfront * Online learning : a sequential setting where tasks are revealed one after another, but aims to attain zero-shot generalization without any task-specific adaptation. We argue that neither setting is ideal for studying continual lifelong learning. Meta-learning deals with learning to learn, but neglects the sequential and non-stationary aspects of the problem. Online learning offers an appealing theoretical framework, but does not generally consider how past experience can accelerate adaptation to a new task. ## Online Learning Online learning focuses on regret minimization. Most standard notion of regret is to compare to the cumulative loss of the best fixed model in hindsight: One way minimize regret is with Follow the Leader (FTL): ## Online Meta-learning Setting: let $U_t$ be the update procedure for task $t$ e.g. in MAML: The overall protocol for the setting is as follows: 1. At round t, the agent chooses a model defined by $w_t$ 2. The world simultaneously chooses task defined by $f_t$ 3. The agent obtains access to the update procedure $U_t$, and uses it to update parameters as $\tilde w_t = U_t(w_t)$ 4. The agent incurs loss $f_t(\tilde w_t )$. Advance to round t + 1. the goal for the agent is to minimize regrets over rounds. Achieving sublinear regrets means you're improving and converging to upper bound (joint training on all tasks) ## Algorithm and Analysis: Follow the meta-leader (FTML): FTML’s regret is sublinear (under some assumption)