Master Thesis Artificial Intelligence: February 2012

Discussion

RL-Glue as an environment for the following reasons:

It allows partial observability
Colin and Lukas are using it as well so we can interchange environments and agents to compare our approaches
It's fairly simple to use and supports a variety of programming languages

I've implemented the Tiger Problem in RL-Glue with hard coded world dynamics
Parser for POMDP problems (on T. Cassandra's website)

Cassandra's software as a reference
Might not be necessary if the focus is on continuous actions/observations because all problems on the website include only discrete actions/observations
POMDPs with continuous actions/observations could be just hard coded in RL-Glue

Ideas for tree search in partially observable environments:

Tree based on observations (how to choose the action?; rather impossible to realize in a simple way)
Tree based on action-observation pairs
Tree in which each action is followed by an observation
Relational learning to find similar histories in non-markovian environments (e.g., a relational language which is able to deal with the similarities/differences between actions)
POMDP (with discrete actions, discrete observations) as MDP with continuous belief state space

Work by D. Silver [1] is state-of-the-art and can serve as a benchmark
Could be extended to continuous observations later-on

Possible values to look at in the tree search:

Value of state V(s)
Value of action-state pair Q(s,a)
Value of belief state-action pair Q(b,a)

Kurt explained TLS using the sine function example [2]
Michael discussed the Tiger Problem (slide 31)
Frans elaborated on the belief update and the value calculation in POMDPs (see above)

Planning

Create Blog (done)
Post minutes of meeting (done)
Work-out example of algorithm for search in continuous belief space (proposal number 5 above) using problem definition, goal, input, output for the Tiger Problem for a small horizon
Read the paper by D. Silver [1]
Look into parser and use the software by T. Cassandra as a reference

Appointments

Meeting with Frans next Tuesday (March 6, 2012, 15:00)
Meeting with Frans, Kurt, Michael and Lukas, Colin, which work on the same topic, next Wednesday (March 7, 2012, 11:00)

[1] David Silver and Joel Veness. Monte-carlo planning in large pomdps. Processing,47(Icml):1– 9, 2010.
[2] Guy Van den Broeck and Kurt Driessens. Automatic discretization of actions and states in Monte-Carlo tree search. In Proceedings of the ECML/PKDD 2011 Workshop on Machine Learning and Data Mining in and around Games, pages 1– 12, September 2011.

Master Thesis Artificial Intelligence

Wednesday, February 29, 2012

Meeting 29-02-2012

About Me