- Regret = optimal value - mean value
- Optimal value = sum over max. positive reward achievable in each step (20 for Horizon 2, 100 for Horizon 10)
Results: Regret Plots
![]() |
Horizon 2 |
![]() |
Horizon 2 (zoomed in) |
![]() |
Horizon 10 |
![]() |
Horizon 10 (zoomed in) |
![]() |
First 1000 samples |
![]() |
First 1000 samples (x-axis in log scale) |
![]() |
First 200 samples |
![]() | ||||
First 200 samples (x-axis in log scale) |
![]() |
First 200 Samples |
![]() |
First 200 Samples (x-axis in log scale) |
![]() |
First 500 Samples |
![]() | ||
First 500 Samples (x-axis in log scale) |
![]() |
Horizon 10 without error bars |
![]() |
Horizon 10 with error bars |
![]() |
Horizon 2 without error bars |
![]() |
Horizon 2 with error bars |