Master Thesis Artificial Intelligence: Feedback 18-06-12

Monday, June 18, 2012

Feedback 18-06-12

Progress

Extended the incremental regression tree learner to multidimensional, continuous observations
Implemented a discrete state space version of the light-dark environment:

Agent A is placed in a grid world and has to reach a goal location G
It is very dark in this grid world but there is a light source in one column, so the idea is that the agent has to move away from its goal to localize itself
Actions: move up, down, left, right
Observations: location(x,y) corrupted by some zero-mean Gaussian with a standard deviation given by the following quadratic equation:

STD = sqrt [ 0.5 * (x_best- x)^2 + K ]

Rewards: -1 for moving, +10 for reaching the goal

Finished the background chapter

Experimental set-up

Environment: 5x10 light-dark domain, light source in column 9:

G*******L*
********L*
******A*L*
********L*
********L*

Number of episodes: 1,000
Max. steps: 25
UCT-C: 10
Discount factor: 0.95
Optimal is the MDP-solution based on shortest distance from the agent's starting location to the goal location

Results

Normal Scaling

X-Axis scaled to Log

Planning

Try a larger grid world (the agent does not show the behavior of going right for the first few steps and then left)
Continue writing
Meeting tomorrow at 13:00

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)