Tuesday, April 3, 2012

Feedback 03-04-12

Algorithm for Continuous Observations
  • Implemented the black box simulator for POMDP
  • Implemented discounted return
  • Improved the algorithm's code
Tiger Problem
  • Can now be made episodic by setting a maximum number of steps that can be done by the agent (necessary for correct stepping in RL-Glue)
Output of Results from Experiments
  • Started working on it
Thesis
  • Gave it some structure
  • Wrote some sections for the background chapter
Planning
  • Finish Output of Results from Experiments
  • After that: performance experiment for the Tiger Problem (#samples vs total reward)
  • Continue writing the background chapter

No comments:

Post a Comment