官术网_书友最值得收藏!

Questions

The question list is as follows:

  1. What is the Markov property?
  2. Why do we need the Markov Decision Process?
  3. When do we prefer immediate rewards?
  4. What is the use of the discount factor?
  5. Why do we use the Bellman function?
  6. How would you derive the Bellman equation for a Q function?
  7. How are the value function and Q function related?
  8. What is the difference between value iteration and policy iteration?
主站蜘蛛池模板: 南宁市| 左云县| 石泉县| 临颍县| 开阳县| 咸阳市| 桐庐县| 西华县| 沐川县| 德钦县| 马龙县| 仁布县| 东平县| 巨野县| 延津县| 阿巴嘎旗| 霍邱县| 龙岩市| 凉城县| 积石山| 江门市| 樟树市| 山东| 南丰县| 江孜县| 尤溪县| 循化| 永康市| 安国市| 雷州市| 武清区| 高唐县| 长武县| 香港 | 浪卡子县| 乌海市| 涞源县| 古田县| 黔江区| 忻城县| 绥棱县|