官术网_书友最值得收藏!

Questions

  1. What is the difference between a reward and a value?
  2. What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter. 
  3. Why will a Q-learning agent not choose the highest Q-valued action for its current state?
  4. Explain one benefit of a decaying gamma.
  5. Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
  6. What kind of policy does Q-learning implicitly assume the agent is following?
  7. Under what circumstances will SARSA and Q-learning produce the same results?
主站蜘蛛池模板: 新干县| 黑山县| 浦东新区| 绥滨县| 南澳县| 南充市| 大理市| 阜新| 石家庄市| 手机| 海口市| 乐山市| 宝丰县| 绍兴县| 门源| 贵南县| 县级市| 丽水市| 桃园市| 荣昌县| 绥宁县| 土默特右旗| 伊宁县| 荣成市| 平定县| 铁岭市| 盈江县| 饶河县| 沾益县| 池州市| 洛隆县| 寿阳县| 南安市| 韶关市| 镇坪县| 玉田县| 大连市| 萍乡市| 山东省| 团风县| 临江市|