- Extended the incremental regression tree learner to multidimensional, continuous observations
- Implemented a discrete state space version of the light-dark environment:
- Agent A is placed in a grid world and has to reach a goal location G
- It is very dark in this grid world but there is a light source in one column, so the idea is that the agent has to move away from its goal to localize itself
- Actions: move up, down, left, right
- Observations: location(x,y) corrupted by some zero-mean Gaussian with a standard deviation given by the following quadratic equation:
- Rewards: -1 for moving, +10 for reaching the goal
- Finished the background chapter
- Environment: 5x10 light-dark domain, light source in column 9:
********L*
******A*L*
********L*
********L*
- Number of episodes: 1,000
- Max. steps: 25
- UCT-C: 10
- Discount factor: 0.95
- Optimal is the MDP-solution based on shortest distance from the agent's starting location to the goal location
Normal Scaling |
X-Axis scaled to Log |
Planning
- Try a larger grid world (the agent does not show the behavior of going right for the first few steps and then left)
- Continue writing
- Meeting tomorrow at 13:00
No comments:
Post a Comment