官术网_书友最值得收藏!

Markov Decision Processes and Dynamic Programming

In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.

The following recipes will be covered in this chapter:

  • Creating a Markov chain
  • Creating an MDP
  • Performing policy evaluation
  • Simulating the FrozenLake environment
  • Solving an MDP with a value iteration algorithm
  • Solving an MDP with a policy iteration algorithm
  • Solving the coin-flipping gamble problem
主站蜘蛛池模板: 菏泽市| 广南县| 南木林县| 海阳市| 泗洪县| 敦煌市| 吴川市| 综艺| 罗甸县| 太原市| 平塘县| 无锡市| 贵港市| 赣榆县| 肃北| 彭泽县| 类乌齐县| 平阴县| 凯里市| 东丽区| 九龙坡区| 搜索| 松江区| 老河口市| 平江县| 吉水县| 北流市| 荆门市| 保靖县| 岳普湖县| 巴林右旗| 历史| 桓仁| 文山县| 宜良县| 合川市| 万载县| 通化县| 疏勒县| 大荔县| 图们市|