Master Thesis Artificial Intelligence: Minutes: Joint Meeting

Discussion

Planning

We reviewed our planning based on the Gantt chart made for the thesis plan

Update chart if necessary (like in my case)

We discussed the graduation date and the reviewing and hand-in of the thesis (and of draft versions of it)
Hand-in of thesis draft in three weeks

Results

We elaborated on our current results
I should make regret plots (optimal - mean) to show differences between the algorithms
Try to add optimal lines to the grid world plots
Michael wants to think about a simple environment to show the difference between MCTS and MC

Other

We discussed the following "transposition tree":

Assume the state / observation space to be Markovian
The algorithm builds a tree consisting of two parts:

An "observation tree" which provides a discretization of the observation space (or the state space). This tree is conceptually the same as the observation tree in COMCTS

For each leaf in the observation tree, there is an action node which keeps the UCB knowledge about the actions for the observation range that is represented by the leaf (in case of continuous actions, this action node is replaced by an action tree which provides a discretization of the action space)

The algorithm starts with one leaf in the observation tree connected to one action node.
Splits can be decided on a simple criterion such as a count [1] or a more informative one such as the statistics used in the TG algorithm [2]
Because of the assumption made above, this tree can be re-used after a step in the real world is taken. However, for POMDPs the observation might be dependent on the action taken and this is not considered by this algorithm.

Complete recall: reconstruct the tree only on the first level

Keep one list of samples, pointers to the correct depth and to the correct examples

Backpropagation:

first update all UCB knowledge (bottom to top)
after that check for a split (top to bottom)
this avoids splits at high depths of the tree that would become redundant if there also occurs a split at a lower depth

Planning

Update Gantt chart
Plotting:

Regret plots

Optimal curves for grid world

Think further about "transposition trees" for POMDPs
Think further about re-using and sharing trees based on belief state distance
Adapt Wumpus world to continuous observations

[1] Ali Nouri, Michael L. Littman. Multi-resolution Exploration in Continuous Spaces. Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008 (December 2008), pp. 1209-1216.

[2] K Driessens, J Ramon, H Blockeel. Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. Machine Learning: ECML 2001, 97-108.

Master Thesis Artificial Intelligence

Friday, May 25, 2012

Minutes: Joint Meeting

No comments:

Post a Comment

About Me