Master Thesis Artificial Intelligence: Minutes 07-03-2012

Thursday, March 8, 2012

Minutes 07-03-2012

Assignment Pitch

Get to point earlier
The existing technique already works on continuous states
Belief space is already a continuous space

Things that all of us (could) use

RL-Glue (framework)
Difference between rollouts (planning) and runs (execution)
Adaptive range exploration factor C (UCT)
Evaluation methods

#Samples/max time vs performance
Performance over time
Performance over horizon
Difference between approximate and exact results
Error

Continuous (multi-dimensional) action tree interface

Aim: Common interface for all our different trees
Methods: update, getGreedyAction, getUCTAction

Meta tree interface

Applicable for my tree?

Miscellaneous

Model-based vs model-free learning
Cloning of state information from the environment for the rollouts might be expensive
Is there an analytical way to calculate C in the UCT-formula?
Expected mean value convergence vs behavioral convergence (variance of greedy actions)
Candidate split generation:

First n samples
Grid (equally spaced sample ranges)

Splitting: first optimality condition (SDR, f-test), then sufficiency condition (f-test)

Planning

Write about adaptive C in LaTeX
(Prepare for) meeting with Frans and Michael tomorrow (08-03-2011, 14:00) (done)

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)