Light dark domain with a discrete state space (grid world)
If a move would end outside of the grid, the agent enters the grid again on the opposite side (e.g., agent leaves on the left side and enters on the right side)
Initial belief: uniform over all possible states (except for goal state)
Current set-up (A=agent, G=goal, L=light):
***L* **AL*
*G*L*
***L*
***L*
Behavior of the agent:
No tendency to go in a particular direction in the first step
Belief update: There seems to be something incorrect with either taking the probability density function to update the belief or with the probability density function itself (it gives probabilities > 1 which is not correct)
Extended the incremental regression tree learner to multidimensional, continuous observations
Implemented a discrete state space version of the light-dark environment:
Agent A is placed in a grid world and has to reach a goal location G
It is very dark in this grid world but there is a light source in one column, so the idea is that the agent has to move away from its goal to localize itself
Actions: move up, down, left, right
Observations: location(x,y) corrupted by some zero-mean Gaussian with a standard deviation given by the following quadratic equation:
STD = sqrt [ 0.5 * (xbest - x)^2 + K ]
Rewards: -1 for moving, +10 for reaching the goal
Finished the background chapter
Experimental set-up
Environment: 5x10 light-dark domain, light source in column 9:
One action performed in the real world, from different initial beliefs
Probability(state = tiger left) = initial belief
Number of episodes: 5,000
Algorithms:
Re-use computes the belief range for each leaf of the observation tree and only splits a leaf in the observation tree if the distance between the belief ranges that the two new children would have is below some (very low) threshold
Deletionis the standard COMCTS algorithm
Legend
Re-use is straight line
Deletion is dotted line
The color of each line segment(p1,p2)is the RGB-mixture of the average percentages of the selected actions at the points p1 and p2
Black markers indicatethe data points (i.e., the initial beliefs)