Monday, July 9, 2012

Feedback 09-07-2012

Progress
  • Transposition Tree: Implemented not splitting the first node 
    • The agent's initial belief is represented by an additional node
    • This "first node" is updated separately from all other nodes
    • There is a parameter that allows to also update it during backpropagation if the belief falls in the first node's range
  • Transposition Tree: Corrected the way an action is selected at the end of a simulation
Experimental Set-up
  • Environment: Infinite Horizon Tiger Problem
  • Discount factor: 0.5
  • Discount horizon: 0.00001 
    • (with this setting, one roll-out means 25 updates to the tree)
  • Number of episodes: 5000
  • Number of steps: 2
  • Algorithms:
    • Both Keep 1st Node variants use Perfect Recall as the splitting strategy
    • Keep 1st Node + Update also updates the "first node" during backpropagation 
Results


2 comments:

  1. Dear Andreas,

    can you provide a regret-like plot (although we don't know if the magic guesser reward is possible, take that as an optimum for now)?

    Also, a comparison to your non TT version would be great, i.e., if you can compare TT vs OT. I like both variants of keep 1st, but perfect recall appears to be dominating it (better in all cases), so it does not have the problem we expected (dip if splitting just before the end)?

    If you perform a split, do you hand down the action tree of that split to the newly generated node?

    Looking forward to the comparison and regret-like plots. I think you are on to something here, considering that the performance after one playout is already 'close' to optimal. Best regards, Michael

    ReplyDelete
  2. 'If you perform a split, do you hand down the action tree of that split to the newly generated node?'

    That is exactly what the 'insertion' strategy does.

    'I like both variants of keep 1st, but perfect recall appears to be dominating it (better in all cases), so it does not have the problem we expected (dip if splitting just before the end)?'

    I think that with 'perfect recall', this problem does not occur - only with 'deletion' and 'insertion'.

    ReplyDelete