- Implemented bootstrapping for the transposition tree:
- Updates the tree from bottom to top (same way as before)
- Formula: Rk...d-1 + γd-k * V(bd) where
- k is the depth of the current update,
- d is the horizon depth given by ε-horizon time, and
- V(bd) = maxa(Q(bd,a))
- Q(bd,a) refers to the average outcome of action a in belief state bd (given by the corresponding UCT value)
- Rk...d-1 is the return or cumulative discounted reward of the step rewards from step k to step d - 1
- Now, the deletion and insertion strategies perform as good as the perfect recall strategy (see plots below)
- Remark: the results below all keep the first node separate from the other nodes and also update this node during backpropagation; the remaining nodes form the transposition tree
Friday, July 20, 2012
Feedback 20-07-2012
Progress
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment