官术网_书友最值得收藏!

<tt id="aflag"></tt>

<menuitem id="aflag"></menuitem>

<del id="aflag"></del>

<legend id="aflag"></legend>

書名： Hands-On Q-Learning with Python
作者名： Nazia Habib
本章字?jǐn)?shù)： 91字
更新時(shí)間： 2021-06-24 15:13:13

Questions

What is the difference between a reward and a value?
What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
Why will a Q-learning agent not choose the highest Q-valued action for its current state?
Explain one benefit of a decaying gamma.
Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
What kind of policy does Q-learning implicitly assume the agent is following?
Under what circumstances will SARSA and Q-learning produce the same results?

主站蜘蛛池模板：安庆市| 禄丰县| 宁明县| 通化县| 普兰店市| 晋城| 凉城县| 秭归县| 商洛市| 句容市| 泾川县| 乌兰浩特市| 盐津县| 陵川县| 郯城县| 绥滨县| 武鸣县| 化隆| 杨浦区| 元阳县| 翁牛特旗| 景德镇市| 清水县| 福建省| 金川县| 乾安县| 石首市| 化隆| 黎平县| 米脂县| 鹤山市| 米泉市| 分宜县| 德州市| 同心县| 西乌| 泸定县| 白沙| 库车县| 香格里拉县| 阿勒泰市|

<tt id="foqd3"></tt>

<td id="foqd3"></td>