Monday, March 5, 2012

Feedback 05-03-2012

Parser
  • Since the problems do not contain continuous actions / observations, I think it's redundant to implement a parser for such POMDP problems for RL-Glue
Paper by D. Silver
  • Combination of: 
    • MCTS (UCT) for optimal action selection
    • Particle filter for belief state approximation
  • Same simulations used for both techniques
  • Search tree is based on: 
    • Histories in the nodes
    • Actions and observations in the edges (i.e. "action" edges followed by "observation" edges)
  • Each node is coupled to a set of particles which approximate the belief state
  • The algorithm (called POMCP) can make use of domain knowledge
  • POMCP performs on a high level in discrete state spaces with up to 10^56 states and beats other online and offline planning algorithms
  • Possibility 1: Extend to continuous observations
  • Possibility 2: Build the tree on action-observation pairs
  • Possiblity 3: Use a Bayes or a Kalman filter for the belief state approximation (although I think a particle filter is already the best choice)
Planning
  • Preparation for meeting on Wednesday (March 7, 2012, 11:00)

No comments:

Post a Comment