**Experimental Set-up**

- Regret = optimal value - mean value
- Optimal value = sum over max. positive reward achievable in each step (20 for Horizon 2, 100 for Horizon 10)

**Results: Regret Plots**

Horizon 2 |

Horizon 2 (zoomed in) |

Horizon 10 |

Horizon 10 (zoomed in) |