Master Thesis Artificial Intelligence: Feedback 20-07-2012

Friday, July 20, 2012

Feedback 20-07-2012

Progress

Implemented bootstrapping for the transposition tree:

Updates the tree from bottom to top (same way as before)
Formula: R_k...d-1 + γ^d-k* V(b^d) where

k is the depth of the current update,
d is the horizon depth given by ε-horizon time, and
V(b^d) = max_a(Q(b^d,a))

Q(b^d,a) refers to the average outcome of action a in belief state b^d (given by the corresponding UCT value)
R_k...d-1 is the return or cumulative discounted reward of the step rewards from step k to step d - 1

Now, the deletion and insertion strategies perform as good as the perfect recall strategy (see plots below)
Remark: the results below all keep the first node separate from the other nodes and also update this node during backpropagation; the remaining nodes form the transposition tree

Results

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)