官术网_书友最值得收藏!

Identifying reward functions and the concept of discounted rewards

Rewards in RL are no different from real world rewards – we all receive good rewards for doing well, and bad rewards (aka penalties) for inferior performance. Reward functions are provided by the environment to guide an agent to learn as it explores the environment. Specifically, it is a measure of how well the agent is performing.

The reward function defines what the good and bad things are that can happen to the agent. For instance, a mobile robot that reaches its goal is rewarded, but is penalized for crashing into obstacles. Likewise, an industrial robot arm is rewarded for putting a peg into a hole, but is penalized for being in undesired poses that can be catastrophic by causing ruptures or crashes. Reward functions are the signal to the agent regarding what is optimum and what isn't. The agent's long-term goal is to maximize rewards and minimize penalties.

主站蜘蛛池模板: 萨嘎县| 黄石市| 普定县| 同心县| 攀枝花市| 临湘市| 浙江省| 溧水县| 乃东县| 临泽县| 垦利县| 土默特左旗| 饶河县| 甘南县| 天柱县| 宝清县| 大姚县| 晋中市| 美姑县| 栾城县| 常熟市| 璧山县| 枞阳县| 夏津县| 措美县| 旺苍县| 沐川县| 澄城县| 南投县| 盘锦市| 沙洋县| 遂溪县| 全州县| 阿克苏市| 张家口市| 房产| 任丘市| 邢台市| 阿瓦提县| 昂仁县| 公主岭市|