- One action performed in the real world, from different initial beliefs
- Probability(state = tiger left) = initial belief
- Number of episodes: 5,000
- Algorithms:
- Re-use computes the belief range for each leaf of the observation tree and only splits a leaf in the observation tree if the distance between the belief ranges that the two new children would have is below some (very low) threshold
- Deletion is the standard COMCTS algorithm
Legend
- Re-use is straight line
- Deletion is dotted line
- The color of each line segment (p1,p2) is the RGB-mixture of the average percentages of the selected actions at the points p1 and p2
- Black markers indicate the data points (i.e., the initial beliefs)
No comments:
Post a Comment