- PyTorch 1.x Reinforcement Learning Cookbook
- Yuxi (Hayden) Liu
- 160字
- 2021-06-24 12:34:43
Markov Decision Processes and Dynamic Programming
In this chapter, we will continue our practical reinforcement learning journey with PyTorch by looking at Markov decision processes (MDPs) and dynamic programming. This chapter will start with the creation of a Markov chain and an MDP, which is the core of most reinforcement learning algorithms. You will also become more familiar with Bellman equations by practicing policy evaluation. We will then move on and apply two approaches to solving an MDP: value iteration and policy iteration. We will use the FrozenLake environment as an example. At the end of the chapter, we will demonstrate how to solve the interesting coin-flipping gamble problem with dynamic programming step by step.
The following recipes will be covered in this chapter:
- Creating a Markov chain
- Creating an MDP
- Performing policy evaluation
- Simulating the FrozenLake environment
- Solving an MDP with a value iteration algorithm
- Solving an MDP with a policy iteration algorithm
- Solving the coin-flipping gamble problem
推薦閱讀
- 錯覺:AI 如何通過數(shù)據(jù)挖掘誤導(dǎo)我們
- 物聯(lián)網(wǎng)與云計算
- 完全掌握AutoCAD 2008中文版:機(jī)械篇
- 網(wǎng)絡(luò)化分布式系統(tǒng)預(yù)測控制
- 我也能做CTO之程序員職業(yè)規(guī)劃
- Linux嵌入式系統(tǒng)開發(fā)
- LMMS:A Complete Guide to Dance Music Production Beginner's Guide
- 和機(jī)器人一起進(jìn)化
- 寒江獨釣:Windows內(nèi)核安全編程
- 電動汽車驅(qū)動與控制技術(shù)
- Serverless Design Patterns and Best Practices
- 系統(tǒng)建模與控制導(dǎo)論
- Building Analytics Teams
- Practical Autodesk AutoCAD 2021 and AutoCAD LT 2021
- 嵌入式系統(tǒng)應(yīng)用開發(fā)基礎(chǔ)