- Environment: Infinite Horizon Tiger Problem
- Discount factor: 0.5
- Discount horizon: 0.00001
- Number of episodes: 5000
- Number of steps: 2
- Algorithms:
- Both Keep 1st Node variants use perfect recall as the splitting strategy
- Keep 1st Node + Update also updates the "first node" during backpropagation
- Regret plots: the mean value achieved by magic guesser is taken as the optimum
Regret: Variations of COMCTS |
Regret: Variations of TT |
Mean: Variations of COMCTS |
Mean: Variations of TT |
No comments:
Post a Comment