Thursday, March 8, 2012

Minutes 07-03-2012

Assignment Pitch
  • Get to point earlier
  • The existing technique already works on continuous states
  • Belief space is already a continuous space
Things that all of us (could) use
  • RL-Glue (framework)
  • Difference between rollouts (planning) and runs (execution)
  • Adaptive range exploration factor C (UCT)
  • Evaluation methods 
    • #Samples/max time vs performance
    • Performance over time
    • Performance over horizon
    • Difference between approximate and exact results
    • Error
  • Continuous (multi-dimensional) action tree interface
    • Aim: Common interface for all our different trees
    • Methods: update, getGreedyAction, getUCTAction
  • Meta tree interface
    • Applicable for my tree?
Miscellaneous
  • Model-based vs model-free learning
  • Cloning of state information from the environment for the rollouts might be expensive
  • Is there an analytical way to calculate C in the UCT-formula?
  • Expected mean value convergence vs behavioral convergence (variance of greedy actions)
  • Candidate split generation:
    • First n samples
    • Grid (equally spaced sample ranges)
  • Splitting: first optimality condition (SDR, f-test), then sufficiency condition (f-test)
Planning
  • Write about adaptive C in LaTeX
  • (Prepare for) meeting with Frans and Michael tomorrow (08-03-2011, 14:00) (done)

No comments:

Post a Comment