- Thesis:
- Wrote a section about related work
- Started to write on conclusion chapter
- Added experiments about performance in infinite horizon tiger problem
- Performance
- Sample based (# roll-outs vs mean)
- Variations of COMCTS
- Variations of TT
- Best COMCTS variation against best TT variation
- Variations with similar performance
- Time based: same four experiments (time vs mean)
- Practical time and space usage (time vs roll-outs)
- Varying Noise (Standard deviation of noise vs mean)
Experiments: Time vs roll-outs, time vs mean
Set-up
- Environment: infinite horizon Tiger problem
- Discount factor: 0.5
- Discount horizon: 0.00001
- Number of episodes: 2,000
- Algorithms:
- BFAIRMC (pronounced: "be fair mc") = Belief based Function Approximation using Incremental Regression trees and Monte Carlo = transposition tree
time vs rollouts |
time vs mean |
time vs mean (zoomed in) |
time vs regret (optimum - mean) |
No comments:
Post a Comment