官术网_书友最值得收藏!

Questions

  1. What is the difference between a reward and a value?
  2. What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter. 
  3. Why will a Q-learning agent not choose the highest Q-valued action for its current state?
  4. Explain one benefit of a decaying gamma.
  5. Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
  6. What kind of policy does Q-learning implicitly assume the agent is following?
  7. Under what circumstances will SARSA and Q-learning produce the same results?
主站蜘蛛池模板: 安庆市| 禄丰县| 宁明县| 通化县| 普兰店市| 晋城| 凉城县| 秭归县| 商洛市| 句容市| 泾川县| 乌兰浩特市| 盐津县| 陵川县| 郯城县| 绥滨县| 武鸣县| 化隆| 杨浦区| 元阳县| 翁牛特旗| 景德镇市| 清水县| 福建省| 金川县| 乾安县| 石首市| 化隆| 黎平县| 米脂县| 鹤山市| 米泉市| 分宜县| 德州市| 同心县| 西乌| 泸定县| 白沙| 库车县| 香格里拉县| 阿勒泰市|