Feedback 05-07-2012
Progress
- Implemented the transposition tree
- Starts from a single leaf node which represents the complete belief space ( [0,1] in the Tiger problem )
- Uses an F-test for splitting in the belief space
- Collects test statistics for each action for the two "sides" of each test
- Splits if there is a significant difference between any action (e.g., between action 0 on the "left" side and action 0 on the "right" side)
- Re-use of knowledge after a split:
- Deletion (implemented)
- Insertion of information from winning test into new action nodes (implemented)
- Insert split tests into new leaves (not implemented)
- Other strategies (like perfect recall)?
- Problems:
- All following problems relate to the agent's true belief in the real environment at the current time step and occur for both deletion and insertion
- If there is a split near the end of the episode at a leaf whose range includes the agent's true belief, the algorithm cannot gather enough new data about the expected value of each action at that range to give reliable estimates.
- Splitting sometimes creates a leaf with a large range, e.g., [0.5000, 0.9170]. Now, if the agent's true belief is 0.91, the expected value of the actions is again not correct.
Planning
- Implement other strategies
- Implement not splitting the first node
No comments:
Post a Comment