Thursday, July 5, 2012

Feedback 05-07-2012

Progress
  • Implemented the transposition tree
    • Starts from a single leaf node which represents the complete belief space ( [0,1] in the Tiger problem )
    • Uses an F-test for splitting in the belief space
    • Collects test statistics for each action for the two "sides" of each test
    • Splits if there is a significant difference between any action (e.g., between action 0 on the "left" side and action 0 on the "right" side)
  • Re-use of knowledge after a split:
    • Deletion (implemented)
    • Insertion of information from winning test into new action nodes (implemented)
    • Insert split tests into new leaves (not implemented)
    • Other strategies (like perfect recall)?
  • Problems:
    • All following problems relate to the agent's true belief in the real environment at the current time step and occur for both deletion and insertion
    • If there is a split near the end of the episode at a leaf whose range includes the agent's true belief, the algorithm cannot gather enough new data about the expected value of each action at that range to give reliable estimates.
    • Splitting sometimes creates a leaf with a large range, e.g., [0.5000, 0.9170]. Now, if the agent's true belief is 0.91, the expected value of the actions is again not correct.
Planning
  • Implement other strategies
  • Implement not splitting the first node

No comments:

Post a Comment