舉報

會員
TensorFlow Reinforcement Learning Quick Start Guide
Advancesinreinforcementlearningalgorithmshavemadeitpossibletousethemforoptimalcontrolinseveraldifferentindustrialapplications.Withthisbook,youwillapplyReinforcementLearningtoarangeofproblems,fromcomputergamestoautonomousdriving.ThebookstartsbyintroducingyoutoessentialReinforcementLearningconceptssuchasagents,environments,rewards,andadvantagefunctions.Youwillalsomasterthedistinctionsbetweenon-policyandoff-policyalgorithms,aswellasmodel-freeandmodel-basedalgorithms.YouwillalsolearnaboutseveralReinforcementLearningalgorithms,suchasSARSA,DeepQ-Networks(DQN),DeepDeterministicPolicyGradients(DDPG),AsynchronousAdvantageActor-Critic(A3C),TrustRegionPolicyOptimization(TRPO),andProximalPolicyOptimization(PPO).ThebookwillalsoshowyouhowtocodethesealgorithmsinTensorFlowandPythonandapplythemtosolvecomputergamesfromOpenAIGym.Finally,youwillalsolearnhowtotrainacartodriveautonomouslyintheTorcsracingcarsimulator.Bytheendofthebook,youwillbeabletodesign,build,train,andevaluatefeed-forwardneuralnetworksandconvolutionalneuralnetworks.Youwillalsohavemasteredcodingstate-of-the-artalgorithmsandalsotrainingagentsforvariouscontrolproblems.
目錄(168章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- TensorFlow Reinforcement Learning Quick Start Guide
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Up and Running with Reinforcement Learning
- Why RL?
- Formulating the RL problem
- The relationship between an agent and its environment
- Defining the states of the agent
- Defining the actions of the agent
- Understanding policy value and advantage functions
- Identifying episodes
- Identifying reward functions and the concept of discounted rewards
- Rewards
- Learning the Markov decision process
- Defining the Bellman equation
- On-policy versus off-policy learning
- On-policy method
- Off-policy method
- Model-free and model-based training
- Algorithms covered in this book
- Summary
- Questions
- Further reading
- Temporal Difference SARSA and Q-Learning
- Technical requirements
- Understanding TD learning
- Relation between the value functions and state
- Understanding SARSA and Q-Learning
- Learning SARSA
- Understanding Q-learning
- Cliff walking and grid world problems
- Cliff walking with SARSA
- Cliff walking with Q-learning
- Grid world with SARSA
- Summary
- Further reading
- Deep Q-Network
- Technical requirements
- Learning the theory behind a DQN
- Understanding target networks
- Learning about replay buffer
- Getting introduced to the Atari environment
- Summary of Atari games
- Pong
- Breakout
- Space Invaders
- LunarLander
- The Arcade Learning Environment
- Coding a DQN in TensorFlow
- Using the model.py file
- Using the funcs.py file
- Using the dqn.py file
- Evaluating the performance of the DQN on Atari Breakout
- Summary
- Questions
- Further reading
- Double DQN Dueling Architectures and Rainbow
- Technical requirements
- Understanding Double DQN
- Updating the Bellman equation
- Coding DDQN and training to play Atari Breakout
- Evaluating the performance of DDQN on Atari Breakout
- Understanding dueling network architectures
- Coding dueling network architecture and training it to play Atari Breakout
- Combining V and A to obtain Q
- Evaluating the performance of dueling architectures on Atari Breakout
- Understanding Rainbow networks
- DQN improvements
- Prioritized experience replay
- Multi-step learning
- Distributional RL
- Noisy nets
- Running a Rainbow network on Dopamine
- Rainbow using Dopamine
- Summary
- Questions
- Further reading
- Deep Deterministic Policy Gradient
- Technical requirements
- Actor-Critic algorithms and policy gradients
- Policy gradient
- Deep Deterministic Policy Gradient
- Coding ddpg.py
- Coding AandC.py
- Coding TrainOrTest.py
- Coding replay_buffer.py
- Training and testing the DDPG on Pendulum-v0
- Summary
- Questions
- Further reading
- Asynchronous Methods - A3C and A2C
- Technical requirements
- The A3C algorithm
- Loss functions
- CartPole and LunarLander
- CartPole
- LunarLander
- The A3C algorithm applied to CartPole
- Coding cartpole.py
- Coding a3c.py
- The AC class
- The Worker() class
- Coding utils.py
- Training on CartPole
- The A3C algorithm applied to LunarLander
- Coding lunar.py
- Training on LunarLander
- The A2C algorithm
- Summary
- Questions
- Further reading
- Trust Region Policy Optimization and Proximal Policy Optimization
- Technical requirements
- Learning TRPO
- TRPO equations
- Learning PPO
- PPO loss functions
- Using PPO to solve the MountainCar problem
- Coding the class_ppo.py file
- Coding train_test.py file
- Evaluating the performance
- Full throttle
- Random throttle
- Summary
- Questions
- Further reading
- Deep RL Applied to Autonomous Driving
- Technical requirements
- Car driving simulators
- Learning to use TORCS
- State space
- Support files
- Training a DDPG agent to learn to drive
- Coding ddpg.py
- Coding AandC.py
- Coding TrainOrTest.py
- Training a PPO agent
- Summary
- Questions
- Further reading
- Assessment
- Chapter 1
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:29:32
推薦閱讀
- Hands-On Intelligent Agents with OpenAI Gym
- 后稀缺:自動化與未來工作
- Oracle SOA Governance 11g Implementation
- 菜鳥起飛系統安裝與重裝
- 工業機器人運動仿真編程實踐:基于Android和OpenGL
- TensorFlow Reinforcement Learning Quick Start Guide
- Word 2007,Excel 2007辦公應用融會貫通
- MPC5554/5553微處理器揭秘
- AVR單片機工程師是怎樣煉成的
- 傳感器原理與工程應用
- WPF專業編程指南
- Flash CS3動畫制作融會貫通
- Microsoft 365 Mobility and Security:Exam Guide MS-101
- Building Analytics Teams
- OSGi原理與最佳實踐
- 智能機器人:從“深藍”到AlphaGo
- 新手學Illustrator CS6平面廣告設計
- S7-200系列PLC應用技術
- Learning PostgreSQL 10(Second Edition)
- 大數據導論
- 大數據安全技術與應用
- CentOS 5系統管理
- 撥開CCNA迷霧
- R Web Scraping Quick Start Guide
- 灰色分析技術及程序實現
- 計算機實用技能及應用
- Learning ServiceNow
- 計算機檢修技能零基礎成長
- Selenium Fundamentals
- Hands-On Big Data Modeling