Monday, June 18, 2012

Feedback 18-06-12

Progress
  • Extended the incremental regression tree learner to multidimensional, continuous observations
  • Implemented a discrete state space version of the light-dark environment:
    • Agent A is placed in a grid world and has to reach a goal location G
    • It is very dark in this grid world but there is a light source in one column, so the idea is that the agent has to move away from its goal to localize itself
    • Actions: move up, down, left, right
    • Observations: location(x,y) corrupted by some zero-mean Gaussian with a standard deviation given by the following quadratic equation: 
                STD = sqrt [ 0.5 * (xbest - x)^2 + K ]
    • Rewards: -1 for moving, +10 for reaching the goal
  • Finished the background chapter
Experimental set-up
  • Environment: 5x10 light-dark domain, light source in column 9:
G*******L*
********L*
******A*L*
********L*
********L*
  • Number of episodes: 1,000
  • Max. steps: 25
  • UCT-C: 10 
  • Discount factor: 0.95
  • Optimal is the MDP-solution based on shortest distance from the agent's starting location to the goal location
Results
Normal Scaling

X-Axis scaled to Log




















Planning
  • Try a larger grid world (the agent does not show the behavior of going right for the first few steps and then left)
  • Continue writing
  • Meeting tomorrow at 13:00

No comments:

Post a Comment