官术网_书友最值得收藏!

Deep Q-learning 

In Q-learning, we generally work with a finite set of states and actions; this means that, tables suffice to hold the Q-values and rewards. However, in practical applications, the number of states and applicable actions are mostly infinite, and better Q-function approximators are needed to represent and learn the Q-functions. This is where deep neural networks come to the rescue, since they are universal function approximators. We can represent the Q-function with a neural network that takes the states and actions as input and provides the corresponding Q-values as output. Alternatively, we can train a neural network using only the states, and have the output as Q-values corresponding to all of the actions. Both of these scenarios are illustrated in the following diagram. Since the Q-values are rewards, we are dealing with regression in these networks:

Figure 1.17: Deep Q-learning function approximator network

In this book, we will use reinforcement learning to train a race car to drive by itself through deep Q-learning.

主站蜘蛛池模板: 连南| 深州市| 临猗县| 襄樊市| 博兴县| 舞钢市| 韶关市| 望奎县| 宝山区| 吉木萨尔县| 玛多县| 民和| 靖边县| 舒兰市| 五峰| 白河县| 金堂县| 湖南省| 江源县| 巫溪县| 东明县| 朝阳市| 定南县| 隆林| 甘洛县| 赤城县| 山西省| 泸州市| 丽水市| 武夷山市| 罗源县| 石河子市| 盐亭县| 象山县| 溧水县| 文昌市| 安多县| 泰和县| 佛冈县| 乌审旗| 余干县|