Tuesday, May 8, 2012

Updated Results: Number of Roll-outs vs Mean Value

Experimental Set-up
  • Mean value is taken over:
    • 10,000 runs for Horizon 2
    • 5,000 runs for Horizon 10
  • Error bars indicate standard error
  • Environment is continuing
  • Execution is stopped after Horizon=k actions
  • Discount factor = 0.5, discount horizon = 0.000001
Algorithms
  • Random: computed by hand for Horizon 2; using an agent which selects actions randomly for Horizon 10
  • Optimal: represents the MDP solution (assuming that the correct action is always taken and assuming a perfect observation model)
  • COMCTS with local recall: stores all examples and uses them to rebuild the regression tree
  • MC and COMCTS both use e-accurate horizon time [Michael J. Kearns, Yishay Mansour, Andrew Y. Ng: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning 49(2-3): 193-208 (2002)]
Results
  • Horizon 2:
First 1000 samples

First 1000 samples (x-axis in log scale)

First 200 samples

First 200 samples (x-axis in log scale)    



















































  • Horizon 10:
First 200 Samples

First 200 Samples (x-axis in log scale)

First 500 Samples

First 500 Samples (x-axis in log scale)  



























































































No comments:

Post a Comment