- Changed the light-dark domain such that it locates the agent at a randomly selected starting cell at the beginning of an episode
- Implemented a heuristic for the light-dark domain:
- Assumes the agent is on the grid cell for which the belief is the highest
- Takes the action that has the smallest Euclidean distance from that cell to the goal cell
- Light Dark domain with the following layout:
********L*
********L*
********L*
********L*
***G****L*
********L*
********L*
********L*
********L*
- At the beginning of each episode, the agent is randomly placed at any cell except the goal G
- Initial belief: uniform over all possible states, except the goal state
- Number of episodes: 1000
- Discount factor: 0.95
No comments:
Post a Comment