官术网_书友最值得收藏!

Getting Started with the Q-Learning Algorithm

Q-learning is an algorithm that is designed to solve a control problem called a Markov decision process (MDP). We will go over what MDPs are in detail, how they work, and how Q-learning is designed to solve them. We will explore some classic reinforcement learning (RL) problems and learn how to develop solutions using Q-learning.

We will cover the following topics in this chapter:

  • Understanding what an MDP is and how Q-learning is designed to solve an MDP
  • Learning how to define the states an agent can be in, and the actions it can take from those states in the context of the OpenAI Gym Taxi-v2 environment that we will be using for our first project
  • Becoming familiar with alpha (learning), gamma (discount), and epsilon (exploration) rates
  • Diving into a classic RL problem, the multi-armed bandit problem (MABP), and putting it into a Q-learning context
主站蜘蛛池模板: 宜兴市| 介休市| 通化市| 保亭| 东乌| 彭山县| 深圳市| 大田县| 灯塔市| 工布江达县| 化德县| 江达县| 美姑县| 即墨市| 乌审旗| 井研县| 梧州市| 读书| 平和县| 潼南县| 沽源县| 泗洪县| 海丰县| 美姑县| 色达县| 梁平县| 勐海县| 黄山市| 尉氏县| 上栗县| 滕州市| 柘城县| 芦山县| 凤翔县| 永顺县| 碌曲县| 河东区| 阿克| 祁东县| 厦门市| 辽宁省|