Feedback 03-04-12
Algorithm for Continuous Observations
- Implemented the black box simulator for POMDP
- Implemented discounted return
- Improved the algorithm's code
Tiger Problem
- Can now be made episodic by setting a maximum number of steps that can be done by the agent (necessary for correct stepping in RL-Glue)
Output of Results from Experiments
Thesis
- Gave it some structure
- Wrote some sections for the background chapter
Planning
- Finish Output of Results from Experiments
- After that: performance experiment for the Tiger Problem (#samples vs total reward)
- Continue writing the background chapter
No comments:
Post a Comment