舉報

會員
PyTorch 1.x Reinforcement Learning Cookbook
Yuxi (Hayden) Liu 著
更新時間:2021-06-24 12:35:24
開會員,本書免費讀 >
Reinforcementlearning(RL)isabranchofmachinelearningthathasgainedpopularityinrecenttimes.ItallowsyoutotrainAImodelsthatlearnfromtheirownactionsandoptimizetheirbehavior.PyTorchhasalsoemergedasthepreferredtoolfortrainingRLmodelsbecauseofitsefficiencyandeaseofuse.Withthisbook,you'llexploretheimportantRLconceptsandtheimplementationofalgorithmsinPyTorch1.x.Therecipesinthebook,alongwithreal-worldexamples,willhelpyoumastervariousRLtechniques,suchasdynamicprogramming,MonteCarlosimulations,temporaldifference,andQ-learning.You'llalsogaininsightsintoindustry-specificapplicationsofthesetechniques.Laterchapterswillguideyouthroughsolvingproblemssuchasthemulti-armedbanditproblemandthecartpoleproblemusingthemulti-armedbanditalgorithmandfunctionapproximation.You'llalsolearnhowtouseDeepQ-NetworkstocompleteAtarigames,alongwithhowtoeffectivelyimplementpolicygradients.Finally,you'lldiscoverhowRLtechniquesareappliedtoBlackjack,Gridworldenvironments,internetadvertising,andtheFlappyBirdgame.Bytheendofthisbook,you'llhavedevelopedtheskillsyouneedtoimplementpopularRLalgorithmsanduseRLtechniquestosolvereal-worldproblems.
最新章節
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- How it works...
- How to do it...
- Deploying the model and playing the game
- How it works...
品牌:中圖公司
上架時間:2021-06-24 12:06:31
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Leave a review - let other readers know what you think 更新時間:2021-06-24 12:35:24
- Other Books You May Enjoy
- How it works...
- How to do it...
- Deploying the model and playing the game
- How it works...
- How to do it...
- Training and tuning the network
- How it works...
- How to do it...
- Building a Deep Q-Network to play Flappy Bird
- How it works...
- How to do it...
- Getting ready
- Setting up the game environment
- Capstone Project – Playing Flappy Bird with DQN
- How it works...
- How to do it...
- Playing CartPole through the cross-entropy method
- See also
- There's more...
- How it works...
- How to do it...
- Solving the continuous Mountain Car environment with the advantage actor-critic network
- How it works...
- How to do it...
- Setting up the continuous Mountain Car environment
- How it works...
- How to do it...
- Solving Cliff Walking with the actor-critic algorithm
- How it works...
- How to do it...
- Implementing the actor-critic algorithm
- How it works...
- How to do it...
- Developing the REINFORCE algorithm with baseline
- See also
- How it works...
- How to do it...
- Implementing the REINFORCE algorithm
- Implementing Policy Gradients and Policy Optimization
- See also
- How it works...
- How to do it...
- Using convolutional neural networks for Atari games
- How it works...
- How to do it...
- Applying Deep Q-Networks to Atari games
- How it works...
- How to do it...
- Developing Dueling deep Q-Networks
- How it works...
- How to do it...
- Tuning double DQN hyperparameters for CartPole
- How it works...
- How to do it...
- Developing double deep Q-Networks
- How it works...
- How to do it...
- Improving DQNs with experience replay
- See also
- How it works...
- How to do it...
- Developing deep Q-networks
- Deep Q-Networks in Action
- How it works...
- How to do it...
- Solving the CartPole problem with function approximation
- See also
- How it works...
- How to do it...
- Developing Q-learning with neural network function approximation
- How it works...
- How to do it...
- Incorporating batching using experience replay
- How it works...
- How to do it...
- Developing SARSA with linear function approximation
- How it works...
- How to do it...
- Developing Q-learning with linear function approximation
- See also
- How it works...
- How to do it...
- Estimating Q-functions with gradient descent approximation
- How it works...
- How to do it...
- Getting ready
- Setting up the Mountain Car environment playground
- Scaling Up Learning with Function Approximation
- How it works...
- How to do it...
- Solving internet advertising problems with contextual bandits
- See also
- How it works...
- How to do it...
- Solving multi-armed bandit problems with the Thompson sampling algorithm
- How it works...
- How to do it...
- Solving internet advertising problems with a multi-armed bandit
- See also
- There's more...
- How it works...
- How to do it...
- Solving multi-armed bandit problems with the upper confidence bound algorithm
- How it works...
- How to do it...
- Solving multi-armed bandit problems with the softmax exploration
- There's more...
- How it works...
- How to do it...
- Solving multi-armed bandit problems with the epsilon-greedy policy
- How it works...
- How to do it...
- Creating a multi-armed bandit environment
- Solving Multi-armed Bandit Problems
- See also
- How it works...
- How to do it...
- Developing the Double Q-learning algorithm
- There's more...
- How it works...
- How to do it...
- Solving the Taxi problem with SARSA
- How it works...
- How to do it...
- Getting ready
- Solving the Taxi problem with Q-learning
- There's more...
- How it works...
- How to do it...
- Developing the SARSA algorithm
- How it works...
- How to do it...
- Setting up the Windy Gridworld environment playground
- There's more...
- How it works...
- How to do it...
- Developing the Q-learning algorithm
- How it works...
- How to do it...
- Getting ready
- Setting up the Cliff Walking environment playground
- Temporal Difference and Q-Learning
- See also
- There's more...
- How it works...
- How to do it...
- Developing MC control with weighted importance sampling
- See also
- There's more...
- How it works...
- How to do it...
- Performing off-policy Monte Carlo control
- How it works...
- How to do it...
- Developing MC control with epsilon-greedy policy
- There's more...
- How it works...
- How to do it...
- Performing on-policy Monte Carlo control
- See also
- There's more...
- How it works...
- How to do it...
- Playing Blackjack with Monte Carlo prediction
- There's more...
- How it works...
- How to do it...
- Performing Monte Carlo policy evaluation
- See also
- There's more...
- How it works...
- How to do it...
- Calculating Pi using the Monte Carlo method
- Monte Carlo Methods for Making Numerical Estimations
- There's more...
- How it works...
- How to do it...
- Solving the coin-flipping gamble problem
- See also
- There's more...
- How it works...
- How to do it...
- Solving an MDP with a policy iteration algorithm
- There's more...
- How it works...
- How to do it...
- Solving an MDP with a value iteration algorithm
- There's more...
- How it works...
- How to do it...
- Getting ready
- Simulating the FrozenLake environment
- There's more...
- How it works...
- How to do it...
- Performing policy evaluation
- See also
- There's more...
- How it works...
- How to do it...
- Creating an MDP
- See also
- There's more...
- How it works...
- How to do it...
- Creating a Markov chain
- Technical requirements
- Markov Decision Processes and Dynamic Programming
- See also
- There's more...
- How it works...
- How to do it...
- Developing a policy gradient algorithm
- See also
- There's more...
- How it works...
- How to do it...
- Developing the hill-climbing algorithm
- There's more...
- How it works...
- How to do it...
- Implementing and evaluating a random search policy
- See also
- There's more...
- How to do it...
- Reviewing the fundamentals of PyTorch
- There's more...
- How it works...
- How to do it...
- Simulating the CartPole environment
- See also
- There's more...
- How it works...
- How to do it...
- Simulating Atari environments
- See also
- There's more...
- How it works...
- How to do it...
- Installing OpenAI Gym
- See also
- There's more...
- How it works...
- How to do it...
- Setting up the working environment
- Getting Started with Reinforcement Learning and PyTorch
- Reviews
- Get in touch
- See also
- There's more…
- How it works…
- How to do it…
- Getting ready
- Sections
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewers
- About the author
- Contributors
- Why subscribe?
- About Packt
- PyTorch 1.x Reinforcement Learning Cookbook
- Copyright and Credits
- Title Page
- coverpage
- coverpage
- Title Page
- Copyright and Credits
- PyTorch 1.x Reinforcement Learning Cookbook
- About Packt
- Why subscribe?
- Contributors
- About the author
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Sections
- Getting ready
- How to do it…
- How it works…
- There's more…
- See also
- Get in touch
- Reviews
- Getting Started with Reinforcement Learning and PyTorch
- Setting up the working environment
- How to do it...
- How it works...
- There's more...
- See also
- Installing OpenAI Gym
- How to do it...
- How it works...
- There's more...
- See also
- Simulating Atari environments
- How to do it...
- How it works...
- There's more...
- See also
- Simulating the CartPole environment
- How to do it...
- How it works...
- There's more...
- Reviewing the fundamentals of PyTorch
- How to do it...
- There's more...
- See also
- Implementing and evaluating a random search policy
- How to do it...
- How it works...
- There's more...
- Developing the hill-climbing algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Developing a policy gradient algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Markov Decision Processes and Dynamic Programming
- Technical requirements
- Creating a Markov chain
- How to do it...
- How it works...
- There's more...
- See also
- Creating an MDP
- How to do it...
- How it works...
- There's more...
- See also
- Performing policy evaluation
- How to do it...
- How it works...
- There's more...
- Simulating the FrozenLake environment
- Getting ready
- How to do it...
- How it works...
- There's more...
- Solving an MDP with a value iteration algorithm
- How to do it...
- How it works...
- There's more...
- Solving an MDP with a policy iteration algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Solving the coin-flipping gamble problem
- How to do it...
- How it works...
- There's more...
- Monte Carlo Methods for Making Numerical Estimations
- Calculating Pi using the Monte Carlo method
- How to do it...
- How it works...
- There's more...
- See also
- Performing Monte Carlo policy evaluation
- How to do it...
- How it works...
- There's more...
- Playing Blackjack with Monte Carlo prediction
- How to do it...
- How it works...
- There's more...
- See also
- Performing on-policy Monte Carlo control
- How to do it...
- How it works...
- There's more...
- Developing MC control with epsilon-greedy policy
- How to do it...
- How it works...
- Performing off-policy Monte Carlo control
- How to do it...
- How it works...
- There's more...
- See also
- Developing MC control with weighted importance sampling
- How to do it...
- How it works...
- There's more...
- See also
- Temporal Difference and Q-Learning
- Setting up the Cliff Walking environment playground
- Getting ready
- How to do it...
- How it works...
- Developing the Q-learning algorithm
- How to do it...
- How it works...
- There's more...
- Setting up the Windy Gridworld environment playground
- How to do it...
- How it works...
- Developing the SARSA algorithm
- How to do it...
- How it works...
- There's more...
- Solving the Taxi problem with Q-learning
- Getting ready
- How to do it...
- How it works...
- Solving the Taxi problem with SARSA
- How to do it...
- How it works...
- There's more...
- Developing the Double Q-learning algorithm
- How to do it...
- How it works...
- See also
- Solving Multi-armed Bandit Problems
- Creating a multi-armed bandit environment
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the epsilon-greedy policy
- How to do it...
- How it works...
- There's more...
- Solving multi-armed bandit problems with the softmax exploration
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the upper confidence bound algorithm
- How to do it...
- How it works...
- There's more...
- See also
- Solving internet advertising problems with a multi-armed bandit
- How to do it...
- How it works...
- Solving multi-armed bandit problems with the Thompson sampling algorithm
- How to do it...
- How it works...
- See also
- Solving internet advertising problems with contextual bandits
- How to do it...
- How it works...
- Scaling Up Learning with Function Approximation
- Setting up the Mountain Car environment playground
- Getting ready
- How to do it...
- How it works...
- Estimating Q-functions with gradient descent approximation
- How to do it...
- How it works...
- See also
- Developing Q-learning with linear function approximation
- How to do it...
- How it works...
- Developing SARSA with linear function approximation
- How to do it...
- How it works...
- Incorporating batching using experience replay
- How to do it...
- How it works...
- Developing Q-learning with neural network function approximation
- How to do it...
- How it works...
- See also
- Solving the CartPole problem with function approximation
- How to do it...
- How it works...
- Deep Q-Networks in Action
- Developing deep Q-networks
- How to do it...
- How it works...
- See also
- Improving DQNs with experience replay
- How to do it...
- How it works...
- Developing double deep Q-Networks
- How to do it...
- How it works...
- Tuning double DQN hyperparameters for CartPole
- How to do it...
- How it works...
- Developing Dueling deep Q-Networks
- How to do it...
- How it works...
- Applying Deep Q-Networks to Atari games
- How to do it...
- How it works...
- Using convolutional neural networks for Atari games
- How to do it...
- How it works...
- See also
- Implementing Policy Gradients and Policy Optimization
- Implementing the REINFORCE algorithm
- How to do it...
- How it works...
- See also
- Developing the REINFORCE algorithm with baseline
- How to do it...
- How it works...
- Implementing the actor-critic algorithm
- How to do it...
- How it works...
- Solving Cliff Walking with the actor-critic algorithm
- How to do it...
- How it works...
- Setting up the continuous Mountain Car environment
- How to do it...
- How it works...
- Solving the continuous Mountain Car environment with the advantage actor-critic network
- How to do it...
- How it works...
- There's more...
- See also
- Playing CartPole through the cross-entropy method
- How to do it...
- How it works...
- Capstone Project – Playing Flappy Bird with DQN
- Setting up the game environment
- Getting ready
- How to do it...
- How it works...
- Building a Deep Q-Network to play Flappy Bird
- How to do it...
- How it works...
- Training and tuning the network
- How to do it...
- How it works...
- Deploying the model and playing the game
- How to do it...
- How it works...
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 12:35:24