Feedback 05-03-2012
Parser
- Since the problems do not contain continuous actions / observations, I think it's redundant to implement a parser for such POMDP problems for RL-Glue
Paper by D. Silver
- Combination of:
- MCTS (UCT) for optimal action selection
- Particle filter for belief state approximation
- Same simulations used for both techniques
- Search tree is based on:
- Histories in the nodes
- Actions and observations in the edges (i.e. "action" edges followed by "observation" edges)
- Each node is coupled to a set of particles which approximate the belief state
- The algorithm (called POMCP) can make use of domain knowledge
- POMCP performs on a high level in discrete state spaces with up to 10^56 states and beats other online and offline planning algorithms
- Possibility 1: Extend to continuous observations
- Possibility 2: Build the tree on action-observation pairs
- Possiblity 3: Use a Bayes or a Kalman filter for the belief state approximation (although I think a particle filter is already the best choice)
Planning
- Preparation for meeting on Wednesday (March 7, 2012, 11:00)
No comments:
Post a Comment