官术网_书友最值得收藏!

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.

The following recipes will be covered in this chapter:

  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem
主站蜘蛛池模板: 临潭县| 桃江县| 灵台县| 天峻县| 南召县| 东丽区| 襄汾县| 墨脱县| 洞口县| 南皮县| 岳西县| 新乡县| 杭锦后旗| 洛南县| 清徐县| 湖口县| 清远市| 景洪市| 石渠县| 丽江市| 砚山县| 六盘水市| 大厂| 游戏| 都兰县| 读书| 彭阳县| 砚山县| 甘洛县| 胶州市| 弥勒县| 庆云县| 三明市| 绥滨县| 边坝县| 嵩明县| 瑞金市| 海淀区| 呼图壁县| 苍溪县| 黑龙江省|