Monday, May 21, 2012

Updated Results

Progress
  • Implemented complete recall for COMCTS:
    • Stores all examples  (observation(t),reward(t)) together with their "future" {<action(t+1),observation(t+1),reward(t+1)>,<action(t+2),observation(t+2),reward(t+2)>,...)}
    • Uses this knowledge to reconstruct the tree after a split
  • Implemented a 4x3 grid world:

    • Actions: up, down, left right
    • Observations: number of adjacent walls corrupted by some normally distributed noise with standard deviation = 0.965
    • Rewards: +1 for goal state, -1 for penalty state, movement costs -0.04
 

Tiger Problem: Horizon 2

 
 Tiger Problem: Horizon 10












Grid World: max. 22 steps


1 comment:

  1. Very good, can you also show the regret plots to zoom in on the behavior in the end?

    ReplyDelete