Master Thesis Artificial Intelligence: Updated Results

Monday, May 21, 2012

Updated Results

Progress

Implemented complete recall for COMCTS:

Stores all examples (observation(t),reward(t)) together with their "future" {<action(t+1),observation(t+1),reward(t+1)>,<action(t+2),observation(t+2),reward(t+2)>,...)}
Uses this knowledge to reconstruct the tree after a split

Implemented a 4x3 grid world:

Actions: up, down, left right
Observations: number of adjacent walls corrupted by some normally distributed noise with standard deviation = 0.965
Rewards: +1 for goal state, -1 for penalty state, movement costs -0.04

Tiger Problem: Horizon 2

Tiger Problem: Horizon 10

Grid World: max. 22 steps

1 comment:

MichaelKaisersMay 21, 2012 at 6:55 AM
Very good, can you also show the regret plots to zoom in on the behavior in the end?
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)