Minutes 07-03-2012
Assignment Pitch
- Get to point earlier
- The existing technique already works on continuous states
- Belief space is already a continuous space
Things that all of us (could) use
- RL-Glue (framework)
- Difference between rollouts (planning) and runs (execution)
- Adaptive range exploration factor C (UCT)
- Evaluation methods
- #Samples/max time vs performance
- Performance over time
- Performance over horizon
- Difference between approximate and exact results
- Error
- Continuous (multi-dimensional) action tree interface
- Aim: Common interface for all our different trees
- Methods: update, getGreedyAction, getUCTAction
- Meta tree interface
Miscellaneous
- Model-based vs model-free learning
- Cloning of state information from the environment for the rollouts might be expensive
- Is there an analytical way to calculate C in the UCT-formula?
- Expected mean value convergence vs behavioral convergence (variance of greedy actions)
- Candidate split generation:
- First n samples
- Grid (equally spaced sample ranges)
- Splitting: first optimality condition (SDR, f-test), then sufficiency condition (f-test)
Planning
- Write about adaptive C in LaTeX
- (Prepare for) meeting with Frans and Michael tomorrow (08-03-2011, 14:00) (done)
No comments:
Post a Comment