- Python Reinforcement Learning
- Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo
- 123字
- 2021-06-24 15:17:30
The Markov Decision Process and Dynamic Programming
The Markov Decision Process (MDP) provides a mathematical framework for solving the reinforcement learning (RL) problem. Almost all RL problems can be modeled as MDP. MDP is widely used for solving various optimization problems. In this chapter, we will understand what MDP is and how can we use it to solve RL problems. We will also learn about dynamic programming, which is a technique for solving complex problems in an efficient way.
In this chapter, you will learn about the following topics:
- The Markov chain and Markov process
- The Markov Decision Process
- Rewards and returns
- The Bellman equation
- Solving a Bellman equation using dynamic programming
- Solving a frozen lake problem using value and policy iteration
推薦閱讀
- 計(jì)算機(jī)綜合設(shè)計(jì)實(shí)驗(yàn)指導(dǎo)
- Python絕技:運(yùn)用Python成為頂級(jí)數(shù)據(jù)工程師
- Word 2010中文版完全自學(xué)手冊(cè)
- 信息系統(tǒng)與數(shù)據(jù)科學(xué)
- Learning Spring Boot
- Sybase數(shù)據(jù)庫在UNIX、Windows上的實(shí)施和管理
- MySQL 8.x從入門到精通(視頻教學(xué)版)
- 企業(yè)級(jí)容器云架構(gòu)開發(fā)指南
- 跨領(lǐng)域信息交換方法與技術(shù)(第二版)
- 聯(lián)動(dòng)Oracle:設(shè)計(jì)思想、架構(gòu)實(shí)現(xiàn)與AWR報(bào)告
- Gideros Mobile Game Development
- Access 2010數(shù)據(jù)庫程序設(shè)計(jì)實(shí)踐教程
- 云工作時(shí)代:科技進(jìn)化必將帶來的新工作方式
- 數(shù)據(jù)分析方法及應(yīng)用:基于SPSS和EXCEL環(huán)境
- 數(shù)據(jù)中心UPS系統(tǒng)運(yùn)維