Wednesday, February 29, 2012

Meeting 29-02-2012

Discussion
  • RL-Glue as an environment for the following reasons:
    • It allows partial observability
    • Colin and Lukas are using it as well so we can interchange environments and agents to compare our approaches
    • It's fairly simple to use and supports a variety of programming languages
  • I've implemented the Tiger Problem in RL-Glue with hard coded world dynamics
  • Parser for POMDP problems (on T. Cassandra's website)
    • Cassandra's software as a reference
    • Might not be necessary if the focus is on continuous actions/observations because all problems on the website include only discrete actions/observations
    • POMDPs with continuous actions/observations could be just hard coded in RL-Glue
  • Ideas for tree search in partially observable environments:
    1. Tree based on observations (how to choose the action?; rather impossible to realize in a simple way)
    2. Tree based on action-observation pairs
    3. Tree in which each action is followed by an observation
    4. Relational learning to find similar histories in non-markovian environments (e.g., a relational language which is able to deal with the similarities/differences between actions)
    5. POMDP (with discrete actions, discrete observations) as MDP with continuous belief state space
      • Work by D. Silver [1] is state-of-the-art and can serve as a benchmark
      • Could be extended to continuous observations later-on
  • Possible values to look at in the tree search:
    • Value of state V(s)
    • Value of action-state pair Q(s,a)
    • Value of belief state-action pair Q(b,a)
  • Kurt explained TLS using the sine function example [2]
  • Michael discussed the Tiger Problem (slide 31)
  • Frans elaborated on the belief update and the value calculation in POMDPs (see above)
Planning
  • Create Blog (done)
  • Post minutes of meeting (done)
  • Work-out example of algorithm for search in continuous belief space (proposal number 5 above) using problem definition, goal, input, output for the Tiger Problem for a small horizon
  • Read the paper by D. Silver [1]
  • Look into parser and use the software by T. Cassandra as a reference
Appointments
  • Meeting with Frans next Tuesday (March 6, 2012, 15:00)
  • Meeting with Frans, Kurt, Michael and Lukas, Colin, which work on the same topic, next Wednesday (March 7, 2012, 11:00)
[1] David Silver and Joel Veness. Monte-carlo planning in large pomdps. Processing,47(Icml):1– 9, 2010.
[2] Guy Van den Broeck and Kurt Driessens. Automatic discretization of actions and states in Monte-Carlo tree search. In Proceedings of the ECML/PKDD 2011 Workshop on Machine Learning and Data Mining in and around Games, pages 1– 12, September 2011.