Master Thesis Artificial Intelligence: April 2012

Saturday, April 28, 2012

Results: Runtime

Experimental Set-up

Mean value is taken over 1,000 episodes
Each data point is indicated by 'o'

Results

First 1,000 milliseconds

First 1,000 milliseconds (logarithmic scale)

Friday, April 27, 2012

Results: Horizon 100 Tiger Problem

Experimental Set-up

Mean value is taken over 1,000 episodes
Error bars represent standard error

Results

With random baseline

First 5,000 samples

Without random baseline

First 5,000 samples

Thursday, April 26, 2012

Results: Horizon 10 Tiger Problem

Experimental Set-up

Mean value is taken over 10,000 episodes
Error bars represent standard error

Results

With random baseline

First 10,000 samples


First 1,000 samples

Without random baseline

First 10,000 samples

First 1,000 samples

Monday, April 23, 2012

Feedback 23-06-12

Progress

Computed maximum average reward that the random agent could achieve
Updated performance graphs (see below)
Implemented leaf visualization (see below)

Results

First 1000 Samples

First 100 Samples

Visualization of Leafs

Tuesday, April 17, 2012

Feedback 17-04-12

Progress

Implemented flat Monte Carlo (uniform sampling for first action choice)
Experiments can now be run in two ways:

With graphical output (using RL-Viz, see below)
Or with plain text output (using .properties files for the settings)

Added RL-Viz to support:

Simple resetting of parameters
Simple selecting of agent and environment
Visualization of agent and environment

Friday, April 13, 2012

Feedback 13-04-12

Progress

Removed unnecessary information and functions from observation tree (most of them weren't used anyway)
The computation of the belief state update is correct (I just read a number wrong)
Changed from separated runs to just doing more episodes (10,000) for the experiments
Belief and action trajectories can now be shown for each episode

Thesis Structure

Wednesday, April 11, 2012

Preliminary Results

Horizon 2 Tiger Problem

Horizon 10 Tiger Problem

Thesis Structure

Monday, April 9, 2012

Feedback 09-04-12

Experiment Pipeline

Settings for experiments can now be loaded from files
Results of experiments ...

Can be directly plotted,
Can be stored in files which are suitable for Matlab / Octave, and
Plots can be stored as image files also.

Planning

Finish experiment with Tiger Problem
Next meeting: April 11, 2012, 11a.m.

Tuesday, April 3, 2012

Feedback 03-04-12

Algorithm for Continuous Observations

Implemented the black box simulator for POMDP
Implemented discounted return
Improved the algorithm's code

Tiger Problem

Can now be made episodic by setting a maximum number of steps that can be done by the agent (necessary for correct stepping in RL-Glue)

Output of Results from Experiments

Started working on it

Thesis

Gave it some structure
Wrote some sections for the background chapter

Planning

Finish Output of Results from Experiments
After that: performance experiment for the Tiger Problem (#samples vs total reward)
Continue writing the background chapter