Friday, July 6, 2012

Feedback 06-07-2012

Progress
  • Implemented perfect recall for the transposition tree
    • Remembers all samples (belief, action, discounted reward)
    • In case of a split, uses the stored samples to recreate the mean and the count of the action nodes which are below the new leaves
    • Space for the samples is pre-allocated: 
      • #samples = #roll-outs * epsilon-horizon
  • All three variants (deletion, insertion, transposition tree) decrease in performance after the first split(s) (in the plot below, there is a reoccurring dip at the beginning of each curve)
Experimental Set-up
  • Environment: Infinite Horizon Tiger Problem
  • Discount factor: 0.5
  • Discount horizon: 0.00001
  • Number of episodes: 5000
  • Number of steps: 2

Preliminary Results
First 100 samples

2 comments:

  1. Dear Andreas,

    how comes your algorithms are so much better than random with one single sample?
    I like the fact that we're looking at good behavior within 100 samples. Of course, now you will want to be better than that, but the implementations you suggested (having a dedicated first node) is a good step in that direction. It is actually the most simple and pre-defined extension towards splitting on time as well.

    ReplyDelete
  2. One sample in that plot actually means 25 samples to update the tree because in one iteration, the algorithm does 25 simulated steps and uses each of these to update the tree (25 is the depth given here by epsilon-horizon time). Maybe the plot's scaling should be changed ...

    ReplyDelete