舉報

會員
TensorFlow Reinforcement Learning Quick Start Guide
Advancesinreinforcementlearningalgorithmshavemadeitpossibletousethemforoptimalcontrolinseveraldifferentindustrialapplications.Withthisbook,youwillapplyReinforcementLearningtoarangeofproblems,fromcomputergamestoautonomousdriving.ThebookstartsbyintroducingyoutoessentialReinforcementLearningconceptssuchasagents,environments,rewards,andadvantagefunctions.Youwillalsomasterthedistinctionsbetweenon-policyandoff-policyalgorithms,aswellasmodel-freeandmodel-basedalgorithms.YouwillalsolearnaboutseveralReinforcementLearningalgorithms,suchasSARSA,DeepQ-Networks(DQN),DeepDeterministicPolicyGradients(DDPG),AsynchronousAdvantageActor-Critic(A3C),TrustRegionPolicyOptimization(TRPO),andProximalPolicyOptimization(PPO).ThebookwillalsoshowyouhowtocodethesealgorithmsinTensorFlowandPythonandapplythemtosolvecomputergamesfromOpenAIGym.Finally,youwillalsolearnhowtotrainacartodriveautonomouslyintheTorcsracingcarsimulator.Bytheendofthebook,youwillbeabletodesign,build,train,andevaluatefeed-forwardneuralnetworksandconvolutionalneuralnetworks.Youwillalsohavemasteredcodingstate-of-the-artalgorithmsandalsotrainingagentsforvariouscontrolproblems.
目錄(168章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- TensorFlow Reinforcement Learning Quick Start Guide
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Up and Running with Reinforcement Learning
- Why RL?
- Formulating the RL problem
- The relationship between an agent and its environment
- Defining the states of the agent
- Defining the actions of the agent
- Understanding policy value and advantage functions
- Identifying episodes
- Identifying reward functions and the concept of discounted rewards
- Rewards
- Learning the Markov decision process
- Defining the Bellman equation
- On-policy versus off-policy learning
- On-policy method
- Off-policy method
- Model-free and model-based training
- Algorithms covered in this book
- Summary
- Questions
- Further reading
- Temporal Difference SARSA and Q-Learning
- Technical requirements
- Understanding TD learning
- Relation between the value functions and state
- Understanding SARSA and Q-Learning
- Learning SARSA
- Understanding Q-learning
- Cliff walking and grid world problems
- Cliff walking with SARSA
- Cliff walking with Q-learning
- Grid world with SARSA
- Summary
- Further reading
- Deep Q-Network
- Technical requirements
- Learning the theory behind a DQN
- Understanding target networks
- Learning about replay buffer
- Getting introduced to the Atari environment
- Summary of Atari games
- Pong
- Breakout
- Space Invaders
- LunarLander
- The Arcade Learning Environment
- Coding a DQN in TensorFlow
- Using the model.py file
- Using the funcs.py file
- Using the dqn.py file
- Evaluating the performance of the DQN on Atari Breakout
- Summary
- Questions
- Further reading
- Double DQN Dueling Architectures and Rainbow
- Technical requirements
- Understanding Double DQN
- Updating the Bellman equation
- Coding DDQN and training to play Atari Breakout
- Evaluating the performance of DDQN on Atari Breakout
- Understanding dueling network architectures
- Coding dueling network architecture and training it to play Atari Breakout
- Combining V and A to obtain Q
- Evaluating the performance of dueling architectures on Atari Breakout
- Understanding Rainbow networks
- DQN improvements
- Prioritized experience replay
- Multi-step learning
- Distributional RL
- Noisy nets
- Running a Rainbow network on Dopamine
- Rainbow using Dopamine
- Summary
- Questions
- Further reading
- Deep Deterministic Policy Gradient
- Technical requirements
- Actor-Critic algorithms and policy gradients
- Policy gradient
- Deep Deterministic Policy Gradient
- Coding ddpg.py
- Coding AandC.py
- Coding TrainOrTest.py
- Coding replay_buffer.py
- Training and testing the DDPG on Pendulum-v0
- Summary
- Questions
- Further reading
- Asynchronous Methods - A3C and A2C
- Technical requirements
- The A3C algorithm
- Loss functions
- CartPole and LunarLander
- CartPole
- LunarLander
- The A3C algorithm applied to CartPole
- Coding cartpole.py
- Coding a3c.py
- The AC class
- The Worker() class
- Coding utils.py
- Training on CartPole
- The A3C algorithm applied to LunarLander
- Coding lunar.py
- Training on LunarLander
- The A2C algorithm
- Summary
- Questions
- Further reading
- Trust Region Policy Optimization and Proximal Policy Optimization
- Technical requirements
- Learning TRPO
- TRPO equations
- Learning PPO
- PPO loss functions
- Using PPO to solve the MountainCar problem
- Coding the class_ppo.py file
- Coding train_test.py file
- Evaluating the performance
- Full throttle
- Random throttle
- Summary
- Questions
- Further reading
- Deep RL Applied to Autonomous Driving
- Technical requirements
- Car driving simulators
- Learning to use TORCS
- State space
- Support files
- Training a DDPG agent to learn to drive
- Coding ddpg.py
- Coding AandC.py
- Coding TrainOrTest.py
- Training a PPO agent
- Summary
- Questions
- Further reading
- Assessment
- Chapter 1
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:29:32
推薦閱讀
- 網頁編程技術
- Hands-On Neural Networks with Keras
- 網絡綜合布線技術
- 大數據處理平臺
- 計算機網絡安全
- Statistics for Data Science
- 網站入侵與腳本攻防修煉
- C++程序設計基礎(上)
- 電動汽車驅動與控制技術
- JSP通用范例開發金典
- 巧學活用Linux
- 工業機器人編程指令詳解
- Hadoop大數據開發基礎
- Mastering Windows Group Policy
- 華人動畫師的法蘭西印象
- 博弈論與無線傳感器網絡安全
- 新手學Illustrator CS6平面廣告設計
- 嵌入式系統原理與接口技術
- GAN實戰
- Prezi Essentials
- 這樣用PPT!
- Machine Learning with Core ML
- 雙劍合璧3ds max 2009/VRay&FinalRender渲染傳奇
- Selenium測試實踐
- Fedora 31 Essentials
- Machine Learning Quick Reference
- 可重入生產系統的多尺度建模與控制策略研究
- Kali Linux Cookbook
- Photoshop CS3中文版圖像處理與平面設計精彩百練
- 電子商務網頁設計與制作