Saturday, April 28, 2012

Results: Runtime

Experimental Set-up
  • Mean value is taken over 1,000 episodes
  • Each data point is indicated by 'o'
Results
First 1,000 milliseconds

First 1,000 milliseconds (logarithmic scale)

Friday, April 27, 2012

Results: Horizon 100 Tiger Problem

Experimental Set-up
  • Mean value is taken over 1,000 episodes
  • Error bars represent standard error
Results
  • With random baseline
  • First 5,000 samples
 











 
  • Without random baseline
  • First 5,000 samples


Thursday, April 26, 2012

Results: Horizon 10 Tiger Problem

Experimental Set-up
  • Mean value is taken over 10,000 episodes
  • Error bars represent standard error
Results
  • With random baseline
First 10,000 samples

First 1,000 samples






























  • Without random baseline
First 10,000 samples

First 1,000 samples

Monday, April 23, 2012

Feedback 23-06-12

Progress
  • Computed maximum average reward that the random agent could achieve
  • Updated performance graphs (see below)
  • Implemented leaf visualization (see below)

Results
First 1000 Samples


First 100 Samples






































Visualization of Leafs



Tuesday, April 17, 2012

Feedback 17-04-12

Progress
  • Implemented flat Monte Carlo (uniform sampling for first action choice)
  • Experiments can now be run in two ways:
    • With graphical output (using RL-Viz, see below)
    • Or with plain text output (using .properties files for the settings)
  • Added RL-Viz to support:
    • Simple resetting of parameters
    • Simple selecting of agent and environment
    • Visualization of agent and environment

















  

Friday, April 13, 2012

Feedback 13-04-12

Progress
  • Removed unnecessary information and functions from observation tree (most of them weren't used anyway)
  • The computation of the belief state update is correct (I just read a number wrong)
  • Changed from separated runs to just doing more episodes (10,000) for the experiments
  • Belief and action trajectories can now be shown for each episode
Thesis Structure






















Wednesday, April 11, 2012

Monday, April 9, 2012

Feedback 09-04-12

Experiment Pipeline
  • Settings for experiments can now be loaded from files
  • Results of experiments ...
    • Can be directly plotted,
    • Can be stored in files which are suitable for Matlab / Octave, and
    • Plots can be stored as image files also.
Planning
  • Finish experiment with Tiger Problem
  • Next meeting: April 11, 2012, 11a.m.

Tuesday, April 3, 2012

Feedback 03-04-12

Algorithm for Continuous Observations
  • Implemented the black box simulator for POMDP
  • Implemented discounted return
  • Improved the algorithm's code
Tiger Problem
  • Can now be made episodic by setting a maximum number of steps that can be done by the agent (necessary for correct stepping in RL-Glue)
Output of Results from Experiments
  • Started working on it
Thesis
  • Gave it some structure
  • Wrote some sections for the background chapter
Planning
  • Finish Output of Results from Experiments
  • After that: performance experiment for the Tiger Problem (#samples vs total reward)
  • Continue writing the background chapter