- RL-Glue as an environment for the following reasons:
- It allows partial observability
- Colin and Lukas are using it as well so we can interchange environments and agents to compare our approaches
- It's fairly simple to use and supports a variety of programming languages
- I've implemented the Tiger Problem in RL-Glue with hard coded world dynamics
- Parser for POMDP problems (on T. Cassandra's website)
- Cassandra's software as a reference
- Might not be necessary if the focus is on continuous actions/observations because all problems on the website include only discrete actions/observations
- POMDPs with continuous actions/observations could be just hard coded in RL-Glue
- Ideas for tree search in partially observable environments:
- Tree based on observations (how to choose the action?; rather impossible to realize in a simple way)
- Tree based on action-observation pairs
- Tree in which each action is followed by an observation
- Relational learning to find similar histories in non-markovian environments (e.g., a relational language which is able to deal with the similarities/differences between actions)
- POMDP (with discrete actions, discrete observations) as MDP with continuous belief state space
- Work by D. Silver [1] is state-of-the-art and can serve as a benchmark
- Could be extended to continuous observations later-on
- Possible values to look at in the tree search:
- Value of state V(s)
- Value of action-state pair Q(s,a)
- Value of belief state-action pair Q(b,a)
- Kurt explained TLS using the sine function example [2]
- Michael discussed the Tiger Problem (slide 31)
- Frans elaborated on the belief update and the value calculation in POMDPs (see above)
- Create Blog (done)
- Post minutes of meeting (done)
- Work-out example of algorithm for search in continuous belief space (proposal number 5 above) using problem definition, goal, input, output for the Tiger Problem for a small horizon
- Read the paper by D. Silver [1]
- Look into parser and use the software by T. Cassandra as a reference
- Meeting with Frans next Tuesday (March 6, 2012, 15:00)
- Meeting with Frans, Kurt, Michael and Lukas, Colin, which work on the same topic, next Wednesday (March 7, 2012, 11:00)
[2] Guy Van den Broeck and Kurt Driessens. Automatic discretization of actions and states in Monte-Carlo tree search. In Proceedings of the ECML/PKDD 2011 Workshop on Machine Learning and Data Mining in and around Games, pages 1– 12, September 2011.
No comments:
Post a Comment