Friday, July 20, 2012

Feedback 20-07-2012

Progress
  • Implemented bootstrapping for the transposition tree:
    • Updates the tree from bottom to top (same way as before)
    • Formula: Rk...d-1 + γd-k * V(bd) where  
      • k is the depth of the current update,  
      • d is the horizon depth given by ε-horizon time, and
      • V(bd) = maxa(Q(bd,a)) 
    • Q(bd,a) refers to the average outcome of action a in belief state bd (given by the corresponding UCT value)
    • Rk...d-1 is the return or cumulative discounted reward of the step rewards from step k to step d - 1
  • Now, the deletion and insertion strategies perform as good as the perfect recall strategy (see plots below)
  • Remark: the results below all keep the first node separate from the other nodes and also update this node during backpropagation; the remaining nodes form the transposition tree
Results


No comments:

Post a Comment