舉報(bào)

會(huì)員
Python Reinforcement Learning
Sudharsan Ravichandiran Sean Saito Rajalingappaa Shanmugamani Yang Wenzhuo 著
更新時(shí)間:2021-06-24 15:18:32
開會(huì)員,本書免費(fèi)讀 >
ReinforcementLearning(RL)isthetrendingandmostpromisingbranchofartificialintelligence.ThisLearningPathwillhelpyoumasternotonlythebasicreinforcementlearningalgorithmsbutalsotheadvanceddeepreinforcementlearningalgorithms.TheLearningPathstartswithanintroductiontoRLfollowedbyOpenAIGym,andTensorFlow.YouwillthenexplorevariousRLalgorithms,suchasMarkovDecisionProcess,MonteCarlomethods,anddynamicprogramming,includingvalueandpolicyiteration.You'llalsoworkonvariousdatasetsincludingimage,text,andvideo.Thisexample-richguidewillintroduceyoutodeepRLalgorithms,suchasDuelingDQN,DRQN,A3C,PPO,andTRPO.Youwillgainexperienceinseveraldomains,includinggaming,imageprocessing,andphysicalsimulations.You'llexploreTensorFlowandOpenAIGymtoimplementalgorithmsthatalsopredictstockprices,generatenaturallanguage,andevenbuildotherneuralnetworks.Youwillalsolearnaboutimagination-augmentedagents,learningfromhumanpreference,DQfD,HER,andmanyoftherecentadvancementsinRL.BytheendoftheLearningPath,youwillhavealltheknowledgeandexperienceneededtoimplementRLanddeepRLinyourprojects,andyouentertheworldofartificialintelligencetosolvevariousreal-lifeproblems.ThisLearningPathincludescontentfromthefollowingPacktproducts:Hands-OnReinforcementLearningwithPythonbySudharsanRavichandiran.PythonReinforcementLearningProjectsbySeanSaito,YangWenzhuo,andRajalingappaaShanmugamani.
最新章節(jié)
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Chapter 19: Capstone Project – Car Racing Using DQN
- Chapter 11: Policy Gradients and Optimization
- Chapter 10: The Asynchronous Advantage Actor Critic Network
- Chapter 9: Playing Doom with a Deep Recurrent Q Network
品牌:中圖公司
上架時(shí)間:2021-06-24 12:24:03
出版社:Packt Publishing
本書數(shù)字版權(quán)由中圖公司提供,并由其授權(quán)上海閱文信息技術(shù)有限公司制作發(fā)行
- Leave a review - let other readers know what you think 更新時(shí)間:2021-06-24 15:18:32
- Other Books You May Enjoy
- Chapter 19: Capstone Project – Car Racing Using DQN
- Chapter 11: Policy Gradients and Optimization
- Chapter 10: The Asynchronous Advantage Actor Critic Network
- Chapter 9: Playing Doom with a Deep Recurrent Q Network
- Chapter 8: Atari Games with Deep Q Network
- Chapter 6: Multi-Armed Bandit Problem
- Chapter 5: Temporal Difference Learning
- Chapter 4: Gaming with Monte Carlo Methods
- Chapter 3: The Markov Decision Process and Dynamic Programming
- Chapter 2: Getting Started with OpenAI and TensorFlow
- Chapter 1: Introduction to Reinforcement Learning
- Assessments
- References
- Summary
- Multi-agent reinforcement learning
- Transfer learning
- Addressing the limitations
- Upcoming developments in reinforcement learning
- Susceptibility to attacks
- Explainability/accountability
- Reproducibility
- Resource efficiency
- The shortcomings of reinforcement learning
- Looking Ahead
- Further reading
- Questions
- Summary
- Car racing
- Training the network
- Replay memory
- Dueling network
- Environment wrapper functions
- Capstone Project - Car Racing Using DQN
- Summary
- Final result
- Training the data
- Helper script
- Agent script
- Critic script
- Actor script
- Step-by-step guide
- Data used
- Background problem
- Predicting Future Stock Prices
- Summary
- Advantages of NAS
- Additional exercises
- train.py
- config.py
- Testing ChildCNN
- train_controller method
- Generating a child network using the Controller
- Method for generating the Controller
- controller.py
- cifar10_processor.py
- child_network.py
- Implementing NAS
- Training algorithm
- Training the Controller
- Generating and training child networks
- Neural Architecture Search
- Generating a Deep Learning Image Classifier
- Summary
- Testing and results
- Training the data
- Chatbot model
- Helper methods
- Data reader
- Data parser
- Step-by-step guide
- Dataset
- The background problem
- Creating a Chatbot
- References
- Summary
- train.py
- controller.py
- Putting everything together
- alphagozero_agent.py
- Combining PolicyValueNetwork and MCTS
- mcts.py
- Monte Carlo tree search
- network.py
- features.py
- preprocessing.py
- Policy and value networks
- Implementing AlphaGo Zero
- Comparison with AlphaGo
- Training AlphaGo Zero
- AlphaGo Zero
- Combining neural networks and MCTS
- Value network
- Reinforcement learning policy networks
- Supervised learning policy networks
- AlphaGo
- Update
- Simulation
- Expansion
- Selection
- Monte Carlo tree search
- Go and AI research
- Go and other board games
- A brief introduction to Go
- Learning to Play Go
- Summary
- Experiments
- Implementation of A3C
- Asynchronous advantage actor-critic algorithm
- Data preparation
- Introduction to the Minecraft environment
- Building Virtual Worlds in Minecraft
- Summary
- Experiments on MuJoCo tasks
- TRPO algorithm
- Theory behind TRPO
- Trust region policy optimization
- Experiments
- Implementation of DDPG
- DPG algorithm
- The theory behind policy gradient
- Deterministic policy gradient
- The classic control tasks
- Getting started
- Introduction to control tasks
- Simulating Control Tasks
- Summary
- CartPole
- Markov models
- Robotics
- MuJoCo
- Algorithmic tasks
- Atari
- Running an environment
- Installation
- Gym
- OpenAI Gym
- Balancing CartPole
- Further reading
- Questions
- Summary
- Proximal Policy Optimization
- Trust Region Policy Optimization
- Swinging a pendulum
- Deep deterministic policy gradient
- Lunar Lander using policy gradients
- Policy gradient
- Policy Gradients and Optimization
- Further reading
- Questions
- Summary
- Visualization in TensorBoard
- Driving up a mountain with A3C
- How A3C works
- The architecture of A3C
- The three As
- The Asynchronous Advantage Actor Critic
- The Asynchronous Advantage Actor Critic Network
- Further reading
- Questions
- Summary
- Architecture of DARQN
- DARQN
- Doom with DRQN
- Basic Doom game
- Training an agent to play Doom
- Architecture of DRQN
- DRQN
- Playing Doom with a Deep Recurrent Q Network
- Further reading
- Questions
- Summary
- Dueling network architecture
- Prioritized experience replay
- Double DQN
- Building an agent to play Atari games
- Understanding the algorithm
- Clipping rewards
- Target network
- Experience replay
- Convolutional network
- Architecture of DQN
- What is a Deep Q Network?
- Atari Games with Deep Q Network
- Summary
- Experiments
- Implementation of DQN
- Demonstrating basic Q-learning algorithm
- Basic elements of reinforcement learning
- Deep Q-learning
- Data preparation
- Atari simulator using gym
- Implementation of the Atari emulator
- Getting started
- Building an Atari emulator
- Introduction to Atari games
- Playing Atari Games
- Further reading
- Questions
- Summary
- Contextual bandits
- Identifying the right advertisement banner using MAB
- Applications of MAB
- The Thompson sampling algorithm
- The upper confidence bound algorithm
- The softmax exploration algorithm
- The epsilon-greedy policy
- The MAB problem
- Multi-Armed Bandit Problem
- Further reading
- Questions
- Summary
- The difference between Q learning and SARSA
- Solving the taxi problem using SARSA
- SARSA
- Solving the taxi problem using Q learning
- Q learning
- TD control
- TD prediction
- TD learning
- Temporal Difference Learning
- Further reading
- Questions
- Summary
- Off-policy Monte Carlo control
- On-policy Monte Carlo control
- Monte Carlo exploration starts
- Monte Carlo control
- Let's play Blackjack with Monte Carlo
- Every visit Monte Carlo
- First visit Monte Carlo
- Monte Carlo prediction
- Estimating the value of pi using Monte Carlo
- Monte Carlo methods
- Gaming with Monte Carlo Methods
- Further reading
- Questions
- Summary
- Policy iteration
- Value iteration
- Solving the frozen lake problem
- Policy iteration
- Value iteration
- Dynamic programming
- Solving the Bellman equation
- Deriving the Bellman equation for value and Q functions
- The Bellman equation and optimality
- State-action value function (Q function)
- State value function
- The policy function
- Discount factor
- Episodic and continuous tasks
- Rewards and returns
- Markov Decision Process
- The Markov chain and Markov process
- The Markov Decision Process and Dynamic Programming
- Further reading
- Questions
- Summary
- Adding scope
- TensorBoard
- Sessions
- Computation graph
- Placeholders
- Constants
- Variables
- Variables constants and placeholders
- TensorFlow
- Building a video game bot
- OpenAI Universe
- Training a robot to walk
- Basic simulations
- OpenAI Gym
- Common error fixes
- Installing OpenAI Gym and Universe
- Installing Docker
- Installing Anaconda
- Setting up your machine
- Getting Started with OpenAI and TensorFlow
- Further reading
- Questions
- Summary
- Natural Language Processing and Computer Vision
- Finance
- Inventory management
- Manufacturing
- Medicine and healthcare
- Education
- Applications of RL
- ViZDoom
- Project Malmo
- RL-Glue
- DeepMind Lab
- OpenAI Gym and Universe
- RL platforms
- Single and multi-agent environment
- Episodic and non-episodic environment
- Continuous environment
- Discrete environment
- Partially observable environment
- Fully observable environment
- Stochastic environment
- Deterministic environment
- Types of RL environment
- Agent environment interface
- Model
- Value function
- Policy function
- Agent
- Elements of RL
- How RL differs from other ML paradigms
- RL algorithm
- What is RL?
- Introduction to Reinforcement Learning
- Reviews
- Get in touch
- Conventions used
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the authors
- Contributors
- Packt.com
- Why subscribe?
- About Packt
- Python Reinforcement Learning
- Copyright and Credits
- Title Page
- coverpage
- coverpage
- Title Page
- Copyright and Credits
- Python Reinforcement Learning
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Conventions used
- Get in touch
- Reviews
- Introduction to Reinforcement Learning
- What is RL?
- RL algorithm
- How RL differs from other ML paradigms
- Elements of RL
- Agent
- Policy function
- Value function
- Model
- Agent environment interface
- Types of RL environment
- Deterministic environment
- Stochastic environment
- Fully observable environment
- Partially observable environment
- Discrete environment
- Continuous environment
- Episodic and non-episodic environment
- Single and multi-agent environment
- RL platforms
- OpenAI Gym and Universe
- DeepMind Lab
- RL-Glue
- Project Malmo
- ViZDoom
- Applications of RL
- Education
- Medicine and healthcare
- Manufacturing
- Inventory management
- Finance
- Natural Language Processing and Computer Vision
- Summary
- Questions
- Further reading
- Getting Started with OpenAI and TensorFlow
- Setting up your machine
- Installing Anaconda
- Installing Docker
- Installing OpenAI Gym and Universe
- Common error fixes
- OpenAI Gym
- Basic simulations
- Training a robot to walk
- OpenAI Universe
- Building a video game bot
- TensorFlow
- Variables constants and placeholders
- Variables
- Constants
- Placeholders
- Computation graph
- Sessions
- TensorBoard
- Adding scope
- Summary
- Questions
- Further reading
- The Markov Decision Process and Dynamic Programming
- The Markov chain and Markov process
- Markov Decision Process
- Rewards and returns
- Episodic and continuous tasks
- Discount factor
- The policy function
- State value function
- State-action value function (Q function)
- The Bellman equation and optimality
- Deriving the Bellman equation for value and Q functions
- Solving the Bellman equation
- Dynamic programming
- Value iteration
- Policy iteration
- Solving the frozen lake problem
- Value iteration
- Policy iteration
- Summary
- Questions
- Further reading
- Gaming with Monte Carlo Methods
- Monte Carlo methods
- Estimating the value of pi using Monte Carlo
- Monte Carlo prediction
- First visit Monte Carlo
- Every visit Monte Carlo
- Let's play Blackjack with Monte Carlo
- Monte Carlo control
- Monte Carlo exploration starts
- On-policy Monte Carlo control
- Off-policy Monte Carlo control
- Summary
- Questions
- Further reading
- Temporal Difference Learning
- TD learning
- TD prediction
- TD control
- Q learning
- Solving the taxi problem using Q learning
- SARSA
- Solving the taxi problem using SARSA
- The difference between Q learning and SARSA
- Summary
- Questions
- Further reading
- Multi-Armed Bandit Problem
- The MAB problem
- The epsilon-greedy policy
- The softmax exploration algorithm
- The upper confidence bound algorithm
- The Thompson sampling algorithm
- Applications of MAB
- Identifying the right advertisement banner using MAB
- Contextual bandits
- Summary
- Questions
- Further reading
- Playing Atari Games
- Introduction to Atari games
- Building an Atari emulator
- Getting started
- Implementation of the Atari emulator
- Atari simulator using gym
- Data preparation
- Deep Q-learning
- Basic elements of reinforcement learning
- Demonstrating basic Q-learning algorithm
- Implementation of DQN
- Experiments
- Summary
- Atari Games with Deep Q Network
- What is a Deep Q Network?
- Architecture of DQN
- Convolutional network
- Experience replay
- Target network
- Clipping rewards
- Understanding the algorithm
- Building an agent to play Atari games
- Double DQN
- Prioritized experience replay
- Dueling network architecture
- Summary
- Questions
- Further reading
- Playing Doom with a Deep Recurrent Q Network
- DRQN
- Architecture of DRQN
- Training an agent to play Doom
- Basic Doom game
- Doom with DRQN
- DARQN
- Architecture of DARQN
- Summary
- Questions
- Further reading
- The Asynchronous Advantage Actor Critic Network
- The Asynchronous Advantage Actor Critic
- The three As
- The architecture of A3C
- How A3C works
- Driving up a mountain with A3C
- Visualization in TensorBoard
- Summary
- Questions
- Further reading
- Policy Gradients and Optimization
- Policy gradient
- Lunar Lander using policy gradients
- Deep deterministic policy gradient
- Swinging a pendulum
- Trust Region Policy Optimization
- Proximal Policy Optimization
- Summary
- Questions
- Further reading
- Balancing CartPole
- OpenAI Gym
- Gym
- Installation
- Running an environment
- Atari
- Algorithmic tasks
- MuJoCo
- Robotics
- Markov models
- CartPole
- Summary
- Simulating Control Tasks
- Introduction to control tasks
- Getting started
- The classic control tasks
- Deterministic policy gradient
- The theory behind policy gradient
- DPG algorithm
- Implementation of DDPG
- Experiments
- Trust region policy optimization
- Theory behind TRPO
- TRPO algorithm
- Experiments on MuJoCo tasks
- Summary
- Building Virtual Worlds in Minecraft
- Introduction to the Minecraft environment
- Data preparation
- Asynchronous advantage actor-critic algorithm
- Implementation of A3C
- Experiments
- Summary
- Learning to Play Go
- A brief introduction to Go
- Go and other board games
- Go and AI research
- Monte Carlo tree search
- Selection
- Expansion
- Simulation
- Update
- AlphaGo
- Supervised learning policy networks
- Reinforcement learning policy networks
- Value network
- Combining neural networks and MCTS
- AlphaGo Zero
- Training AlphaGo Zero
- Comparison with AlphaGo
- Implementing AlphaGo Zero
- Policy and value networks
- preprocessing.py
- features.py
- network.py
- Monte Carlo tree search
- mcts.py
- Combining PolicyValueNetwork and MCTS
- alphagozero_agent.py
- Putting everything together
- controller.py
- train.py
- Summary
- References
- Creating a Chatbot
- The background problem
- Dataset
- Step-by-step guide
- Data parser
- Data reader
- Helper methods
- Chatbot model
- Training the data
- Testing and results
- Summary
- Generating a Deep Learning Image Classifier
- Neural Architecture Search
- Generating and training child networks
- Training the Controller
- Training algorithm
- Implementing NAS
- child_network.py
- cifar10_processor.py
- controller.py
- Method for generating the Controller
- Generating a child network using the Controller
- train_controller method
- Testing ChildCNN
- config.py
- train.py
- Additional exercises
- Advantages of NAS
- Summary
- Predicting Future Stock Prices
- Background problem
- Data used
- Step-by-step guide
- Actor script
- Critic script
- Agent script
- Helper script
- Training the data
- Final result
- Summary
- Capstone Project - Car Racing Using DQN
- Environment wrapper functions
- Dueling network
- Replay memory
- Training the network
- Car racing
- Summary
- Questions
- Further reading
- Looking Ahead
- The shortcomings of reinforcement learning
- Resource efficiency
- Reproducibility
- Explainability/accountability
- Susceptibility to attacks
- Upcoming developments in reinforcement learning
- Addressing the limitations
- Transfer learning
- Multi-agent reinforcement learning
- Summary
- References
- Assessments
- Chapter 1: Introduction to Reinforcement Learning
- Chapter 2: Getting Started with OpenAI and TensorFlow
- Chapter 3: The Markov Decision Process and Dynamic Programming
- Chapter 4: Gaming with Monte Carlo Methods
- Chapter 5: Temporal Difference Learning
- Chapter 6: Multi-Armed Bandit Problem
- Chapter 8: Atari Games with Deep Q Network
- Chapter 9: Playing Doom with a Deep Recurrent Q Network
- Chapter 10: The Asynchronous Advantage Actor Critic Network
- Chapter 11: Policy Gradients and Optimization
- Chapter 19: Capstone Project – Car Racing Using DQN
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-06-24 15:18:32