Wednesday, July 11, 2012

Feedback 11-07-10

Experimental Set-up
  • Environment: Infinite Horizon Tiger Problem
  • Discount factor: 0.5
  • Discount horizon: 0.00001
  • Number of episodes: 5000
  • Number of steps: 2
  • Algorithms:
    • Both Keep 1st Node variants use perfect recall as the splitting strategy
    • Keep 1st Node + Update also updates the "first node" during backpropagation
  • Regret plots: the mean value achieved by magic guesser is taken as the optimum
Results

Regret: Variations of COMCTS

Regret: Variations of TT























Mean: Variations of COMCTS

Mean: Variations of TT

No comments:

Post a Comment