Master Thesis Artificial Intelligence

Wednesday, January 23, 2013

Results: Light Dark 10x10 (Goal at Corner)

Setup

Same as before; grid looks as follows:

********L*
********L*
********L*
********L*
*****A**L*
********L*
********L*
********L*
********L*
G*******L*

Results

Seems to be simpler to solve than with goal in middle of grid
Update: Similar results for all other corners

Goal at upper left

Monday, January 21, 2013

Updated Results: Light Dark 10x10

Setup

10x10 Light Dark domain (wrap-around) with continuous observations:

********L*

**G*****L*

********L*

*****A**L*

********L*

Actions: up, down, left, right
Observations: location (x,y) corrupted by Gaussian noise with STD based on distance to light
Rewards: -1 per move
Initial belief: uniform over all states except the goal
# runs: 1000

Results: Automatic vs Fixed Discretization (1 cut point per dimension)

Results: No Discretization

Thursday, January 10, 2013

Wednesday, January 9, 2013

Results: Light Dark 10x10

Setup

10x10 Light Dark domain with continuous observations
Starts from initial uniform belief
# runs: 1000

Results

Thursday, January 3, 2013

Progress 03-01-13

Hallway environment

Changed from MOMDP to POMDP

Orientation included in belief state

Forward action is stochastic, turn actions are deterministic

Parser for POMDP files

Started developing a parser for .POMDP files
Using / adapting parser from libpomdp

Wednesday, December 19, 2012

Progress 19-12-2012

Discretization for POMCP

Equal width binning

Predefined number of cut points per dimension (m)
Number of dimensions (n)

For each sequence node, there are (m+1)ⁿ history nodes => lots of nodes!
Used for the tree and to update the particle filter

Hallway environment

Based on [1]
Actions: move forward, turn left, turn right, turn around
Reward: -1 per action
Observations: 4 wall detection sensors (with Gaussian noise), 1 landmark / goal detection sensor
Belief: location of agent (agent's orientation is known)
Initial belief distribution: uniform over all free cells
Currently available maps (0 = free, 1 = wall, G = goal, L = landmark):

0000G
00000000000
11L1L1L1G11

[1] Michael Littman, Anthony Cassandra, and Leslie Kaelbling. Learning policies for partially observable environments: Scaling up. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 362--370, San Francisco, CA, 1995. Morgan Kaufmann.

Tree Visualization

Sunday, July 29, 2012

Feedback 29-07-2012

Progress

Implemented visualization for light-dark domain in RL-Viz:

Finished time-based performance experiment for light-dark domain (see below)
Thesis:

Included the experiment mentioned before
Some small language / content corrections

Experimental Set-up

Environment: 10x10 discrete state space light-dark domain
Max. steps: 20
Discount factor: 0.95
Number of episodes: 2,000

Results

time vs roll-outs

time vs mean

Master Thesis Artificial Intelligence

Wednesday, January 23, 2013

Results: Light Dark 10x10 (Goal at Corner)

Monday, January 21, 2013

Updated Results: Light Dark 10x10

Thursday, January 10, 2013

Results: Hallway

Wednesday, January 9, 2013

Results: Light Dark 10x10

Thursday, January 3, 2013

Progress 03-01-13

Wednesday, December 19, 2012

Progress 19-12-2012

Sunday, July 29, 2012

Feedback 29-07-2012

About Me