- Regret = optimal value - mean value
- Optimal value = sum over max. positive reward achievable in each step (20 for Horizon 2, 100 for Horizon 10)
Results: Regret Plots
Horizon 2 |
Horizon 2 (zoomed in) |
Horizon 10 |
Horizon 10 (zoomed in) |
First 1000 samples |
First 1000 samples (x-axis in log scale) |
First 200 samples |
First 200 samples (x-axis in log scale) |
First 200 Samples |
First 200 Samples (x-axis in log scale) |
First 500 Samples |
First 500 Samples (x-axis in log scale) |
Horizon 10 without error bars |
Horizon 10 with error bars |
Horizon 2 without error bars |
Horizon 2 with error bars |