官术网_书友最值得收藏!

Simulating the FrozenLake environment

The optimal policies for the MDPs we have dealt with so far are pretty intuitive. However, it won't be that straightforward in most cases, such as the FrozenLake environment. In this recipe, let's play around with the FrozenLake environment and get ready for upcoming recipes where we will find its optimal policy.

FrozenLake is a typical Gym environment with a discrete state space. It is about moving an agent from the starting location to the goal location in a grid world, and at the same time avoiding traps. The grid is either four by four (https://gym.openai.com/envs/FrozenLake-v0/) or eight by eigh.

t (https://gym.openai.com/envs/FrozenLake8x8-v0/). The grid is made up of the following four types of tiles:

  • S: The starting location
  • G: The goal location, which terminates an episode
  • F: The frozen tile, which is a walkable location
  • H: The hole location, which terminates an episode

There are four actions, obviously: moving left (0), moving down (1), moving right (2), and moving up (3). The reward is +1 if the agent successfully reaches the goal location, and 0 otherwise. Also, the observation space is represented in a 16-dimensional integer array, and there are 4 possible actions (which makes sense). 

What is tricky in this environment is that, as the ice surface is slippery, the agent won't always move in the direction it intends. For example, it may move to the left or to the right when it intends to move down.

主站蜘蛛池模板: 华坪县| 沁水县| 师宗县| 镇赉县| 拉萨市| 泰兴市| 青铜峡市| 枝江市| 滦南县| 德州市| 紫阳县| 梅河口市| 和平县| 绥棱县| 永登县| 宁城县| 永丰县| 承德县| 闽侯县| 沾化县| 南岸区| 雅江县| 梁平县| 江陵县| 公主岭市| 板桥市| 石屏县| 万盛区| 金坛市| 枣强县| 犍为县| 渝北区| 临夏县| 雷州市| 南汇区| 高雄市| 萨迦县| 济南市| 德昌县| 巴彦淖尔市| 鸡东县|