Master Thesis Artificial Intelligence: Feedback 16-05-12

Wednesday, May 16, 2012

Feedback 16-05-12

Progress

Implemented heuristic and magic agent for the Tiger Problem (see below)
Changed the choice of actions in COMCTS to a more random choice in both the final action selection for the step in the "real world" and for the selection step in the simulation

This made the algorithm more stable and removed a lot of dips in the plots (see below)

Experimental Set-up

Mean value is taken over:

10,000 runs for Horizon 2
5,000 runs for Horizon 10
1,000 runs for Horizon 100

Error bars indicate standard error
Environment is continuing
Execution is stopped after Horizon=k actions
Discount factor = 0.5, discount horizon = 0.000001

Algorithms

Random: computed by hand for Horizon 2; using an agent which selects actions randomly for Horizon 10 and 100
MC: Monte Carlo with uniformly sampled actions
COMCTS with deletion: deletes everything below a node which is split in the regression tree
COMCTS with local recall:

stores all examples
when a split occurs at a node w and two new children l and r are created, the values stored at the winning test for the split point at w are given to l and r such that l and r start with some mean and count
the stored examples are used to create new tests (for split points) at l and r

MC and COMCTS both use ε-accurate horizon time [Michael J. Kearns, Yishay Mansour, Andrew Y. Ng: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning 49(2-3): 193-208 (2002)]

Heuristic: listens until its belief is sufficiently high (b(s)<0.1, b(s)>0.9) and then opens one of the doors
Magic: can magically observe the real state of the environment but waits until its belief is sufficiently high (b(s)<0.1, b(s)>0.9) and then opens one of the doors

Results

Horizon 2:

Horizon 10:

Horizon 100:

1 comment:

MichaelKaisersMay 16, 2012 at 5:43 AM
Dear Andreas,

good progress, I now have the impression I can judge the performance of your algorithm. Unfortunately, it does not much better than the heuristic. Maybe you can elicit some advantage in the long run by looking at the regret plot (see last plot of Colin's blog entry from 27 April).

You may also want to try varying the tiger reset probabilities. Now comes the exciting experimentation phase. Good luck :)

Best regards, Michael
ReplyDelete
Replies

Subscribe to: Post Comments (Atom)