Wednesday, December 19, 2012

Progress 19-12-2012

Discretization for POMCP
  • Equal width binning
    • Predefined number of cut points per dimension (m)
    • Number of dimensions (n)
    • For each sequence node, there are (m+1)n history nodes => lots of nodes!
    • Used for the tree and to update the particle filter 
Hallway environment
  • Based on [1]
  • Actions: move forward, turn left, turn right, turn around
  • Reward: -1 per action
  • Observations: 4 wall detection sensors (with Gaussian noise), 1 landmark / goal detection sensor
  • Belief: location of agent (agent's orientation is known)
  • Initial belief distribution: uniform over all free cells
  • Currently available maps (0 = free, 1 = wall, G = goal, L = landmark):
    • 0000G
    • 00000000000

[1] Michael Littman, Anthony Cassandra, and Leslie Kaelbling. Learning policies for partially observable environments: Scaling up. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362--370, San Francisco, CA, 1995. Morgan Kaufmann.

Tree Visualization