Master Thesis Artificial Intelligence: Feedback 06-07-2012

Friday, July 6, 2012

Progress

Remembers all samples (belief, action, discounted reward)
In case of a split, uses the stored samples to recreate the mean and the count of the action nodes which are below the new leaves
Space for the samples is pre-allocated:

All three variants (deletion, insertion, transposition tree) decrease in performance after the first split(s) (in the plot below, there is a reoccurring dip at the beginning of each curve)

Experimental Set-up

Preliminary Results

First 100 samples

MichaelKaisersJuly 7, 2012 at 3:08 PM
Dear Andreas,

how comes your algorithms are so much better than random with one single sample?
I like the fact that we're looking at good behavior within 100 samples. Of course, now you will want to be better than that, but the implementations you suggested (having a dedicated first node) is a good step in that direction. It is actually the most simple and pre-defined extension towards splitting on time as well.
ReplyDelete
Replies
AndreasJuly 8, 2012 at 6:55 AM
One sample in that plot actually means 25 samples to update the tree because in one iteration, the algorithm does 25 simulated steps and uses each of these to update the tree (25 is the depth given here by epsilon-horizon time). Maybe the plot's scaling should be changed ...
ReplyDelete
Replies