Wednesday, January 23, 2013

Results: Light Dark 10x10 (Goal at Corner)

Setup

  • Same as before; grid looks as follows:
********L*
********L*
********L*
********L*
*****A**L*
********L*
********L*
********L*
********L*
G*******L*

Results














  • Seems to be simpler to solve than with goal in middle of grid
  • Update: Similar results for all other corners
Goal at upper left

Monday, January 21, 2013

Updated Results: Light Dark 10x10

Setup
  • 10x10 Light Dark domain (wrap-around) with continuous observations:
********L*
    ********L*
      ********L*
        ********L*
          **G*****L*
            ********L*
              ********L*
                *****A**L*
                  ********L*
                    ********L*
                    • Actions: up, down, left, right
                    • Observations: location (x,y) corrupted by Gaussian noise with STD based on distance to light
                    • Rewards: -1 per move
                    • Initial belief: uniform over all states except the goal
                    • # runs: 1000
                    Results: Automatic vs Fixed Discretization (1 cut point per dimension)













                    Results: No Discretization


                    Wednesday, January 9, 2013

                    Results: Light Dark 10x10

                    Setup
                    • 10x10 Light Dark domain with continuous observations
                    • Starts from initial uniform belief
                    • # runs: 1000
                    Results



                     

                    Thursday, January 3, 2013

                    Progress 03-01-13

                    Hallway environment
                    • Changed from MOMDP to POMDP
                      • Orientation included in belief state
                    • Forward action is stochastic, turn actions are deterministic
                    Parser for POMDP files
                    • Started developing a parser for .POMDP files
                    • Using / adapting parser from libpomdp

                    Wednesday, December 19, 2012

                    Progress 19-12-2012

                    Discretization for POMCP
                    • Equal width binning
                      • Predefined number of cut points per dimension (m)
                      • Number of dimensions (n)
                      • For each sequence node, there are (m+1)n history nodes => lots of nodes!
                      • Used for the tree and to update the particle filter 
                    Hallway environment
                    • Based on [1]
                    • Actions: move forward, turn left, turn right, turn around
                    • Reward: -1 per action
                    • Observations: 4 wall detection sensors (with Gaussian noise), 1 landmark / goal detection sensor
                    • Belief: location of agent (agent's orientation is known)
                    • Initial belief distribution: uniform over all free cells
                    • Currently available maps (0 = free, 1 = wall, G = goal, L = landmark):
                      • 0000G
                      • 00000000000
                        11L1L1L1G11

                    [1] Michael Littman, Anthony Cassandra, and Leslie Kaelbling. Learning policies for partially observable environments: Scaling up. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362--370, San Francisco, CA, 1995. Morgan Kaufmann.

                    Tree Visualization


                    Sunday, July 29, 2012

                    Feedback 29-07-2012

                    Progress
                    • Implemented visualization for light-dark domain in RL-Viz:
                     










                     



                    • Finished time-based performance experiment for light-dark domain (see below)
                    • Thesis: 
                      • Included the experiment mentioned before
                      • Some small language / content corrections
                    Experimental Set-up
                    • Environment: 10x10 discrete state space light-dark domain
                    • Max. steps: 20
                    • Discount factor: 0.95
                    • Number of episodes: 2,000
                    Results
                    time vs roll-outs

                    time vs mean