舉報(bào)

會(huì)員
Reinforcement Learning with TensorFlow
IfyouwanttogetstartedwithreinforcementlearningusingTensorFlowinthemostpracticalway,thisbookwillbeausefulresource.Thebookassumespriorknowledgeofmachinelearningandneuralnetworkprogrammingconcepts,aswellassomeunderstandingoftheTensorFlowframework.NopreviousexperiencewithReinforcementLearningisrequired.
最新章節(jié)
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Summary
- ROUGE
- What is BLEU score and what does it do?
- BLEU
品牌:中圖公司
上架時(shí)間:2021-08-27 18:05:58
出版社:Packt Publishing
本書數(shù)字版權(quán)由中圖公司提供,并由其授權(quán)上海閱文信息技術(shù)有限公司制作發(fā)行
- Leave a review - let other readers know what you think 更新時(shí)間:2021-08-27 18:52:42
- Other Books You May Enjoy
- Summary
- ROUGE
- What is BLEU score and what does it do?
- BLEU
- Scoring mechanism in sequential models in NLP
- Deterministic policy gradients
- Trust region policy optimization
- Continuous action space algorithms
- Further topics in Reinforcement Learning
- Summary
- Mixed objective using self-critical policy learning
- Deep residual coattention encoder
- Mixed objective and deep residual coattention for Question Answering
- Text question answering
- Mixed training objective function
- Policy learning
- Supervised learning with teacher forcing
- Hybrid learning objective
- Token generation and pointer
- Intra-decoder attention
- Intra-temporal attention on input sequence while decoding
- Neural intra-attention model
- Deep reinforced model for Abstractive Summarization
- Text summarization
- Deep Reinforcement Learning in NLP
- Summary
- Training specifics
- Model and training
- Reward
- Actions
- State
- Hierarchical object detection model
- Single Shot Detector
- You Look Only Once
- Faster R-CNN
- Fast R-CNN
- Spatial pyramid pooling networks
- Region-based convolution neural networks
- Related works
- Hierarchical object detection with deep reinforcement learning
- Reinforcement Learning in Image Processing
- Summary
- Real-time bidding by reinforcement learning in display advertising
- Bidding strategies of advertisers
- Adwords
- Search-advertisement management
- Sponsored-search advertisements
- Business models used in advertising
- Computational advertising challenges and bidding strategies
- Deep Reinforcement Learning in Ad Tech
- Summary
- Key takeaways
- Practical challenges for robotic reinforcement learning
- Open questions
- Open questions and practical challenges
- What's the final objective a robot wants to achieve?
- Issues due to model uncertainty
- Real-world challenges
- High dimensionality problem
- Challenges in robot reinforcement learning
- Evolution of reinforcement learning
- Reinforcement learning in robotics
- Reinforcement Learning in Robotics
- Summary
- Further improvements
- Reinforcement learning
- Data preparation
- Problem definition
- Introduction
- Financial Portfolio Management
- Summary
- DeepTraffic – MIT simulator for autonomous driving
- Planning
- Recurrent temporal aggregation
- Spatial features
- Sensor fusion
- Spatial aggregation
- Proposed frameworks for autonomous driving
- Why reinforcement learning ?
- Creating autonomous driving agents
- Reinforcement learning for autonomous driving
- Machine learning for autonomous driving
- Reinforcement Learning in Autonomous Driving
- Summary
- Training process in AlphaGo Zero
- Architecture and properties of AlphaGo Zero
- AlphaGo Zero
- Energy consumption analysis – Lee Sedol versus AlphaGo
- Architecture and properties of AlphaGo
- Monte Carlo Tree Search
- AlphaGo – mastering Go
- Why is the game tree approach no good for Go?
- How did DeepBlue defeat Gary Kasparov?
- Go versus chess
- What is Go?
- AlphaGo – Reinforcement Learning at Its Best
- Summary
- How is reinforcement learning better?
- Deep autoencoder
- Reinforcement learning in RTS gaming
- Why reinforcement learning?
- Drawbacks to real-time strategy games
- Online case-based planning
- Reinforcement learning and other approaches
- Real-time strategy games
- Robo Everything – Real Strategy Gaming
- Summary
- A3C for Pong-v0 in OpenAI gym
- Asynchronous advantage actor critic
- Asynchronous n-step Q-learning
- Asynchronous one-step SARSA
- Asynchronous one-step Q-learning
- Why asynchronous methods?
- Asynchronous Methods
- Summary
- SARSA algorithm for mountain car problem in OpenAI gym
- The SARSA algorithm
- The Monte Carlo Tree Search
- Minimax and game trees
- The Monte Carlo tree search algorithm
- Deep Q-network for Atari Breakout in OpenAI gym
- Deep Q-network for Cartpole problem in OpenAI gym
- Deep Q-network for mountain car problem in OpenAI gym
- Dueling DQN
- Double DQN
- Advancements in deep Q-networks and beyond
- Separate target network to compute the target Q-values
- Use of experience replay
- Using a convolution neural network instead of a single layer neural network
- Deep Q-networks
- Q-learning for the mountain car problem in OpenAI gym
- The exploration exploitation dilemma
- Q-learning
- On-policy and off-policy learning
- Temporal difference learning
- Monte Carlo learning
- Model based learning and model free learning
- Why reinforcement learning?
- Q-Learning and Deep Q-Networks
- Summary
- Agent learning pong using policy gradients
- Vanilla policy gradient
- Using a baseline to reduce variance
- Actor-critic algorithms
- The Monte Carlo policy gradient
- Policy gradients
- TD() rule
- TD(0) rule
- TD(1) rule
- Temporal difference rule
- Policy Gradient Theorem
- Policy objective functions
- Example 2 - state aliased grid-world
- Example 1 - rock paper scissors
- Why stochastic policy?
- Why policy optimization methods?
- The policy optimization method
- Policy Gradients
- Summary
- Training the FrozenLake-v0 environment using MDP
- Value iteration in POMDPs
- State estimation
- Partially observable Markov decision processes
- Policy iteration
- An example of value iteration using the Bellman equation
- Solving the Bellman equation to find policies
- The Bellman equations
- Utility of sequences
- The infinite horizons
- The sequence of rewards - assumptions
- Policy
- Rewards
- Transition model
- Actions
- The S state set
- The Markov property
- Markov decision processes
- Markov Decision Process
- Summary
- Using the Q-Network for real-world applications
- The Epsilon-Greedy approach
- Q-Learning
- Programming an agent using an OpenAI Gym environment
- Understanding an OpenAI Gym environment
- The OpenAI Gym
- Training Reinforcement Learning Agents Using OpenAI Gym
- Summary
- Libratus
- The AlphaGo program
- Google DeepMind
- Pieter Abbeel
- David Silver
- The pioneers and breakthroughs in reinforcement learning
- An introduction to OpenAI Gym
- Basic computations in TensorFlow
- Introduction to TensorFlow and OpenAI Gym
- Asynchronous advantage actor-critic
- The Q-learning approach to reinforcement learning
- The policy model for optimality
- The value function for optimality
- Optimality criteria
- Basic terminologies and conventions
- Reinforcement learning
- Overcoming the limitations of deep learning
- The exploding gradient problem
- The vanishing gradient problem
- Limitations of deep learning
- The Inception model
- The VGG-Net model
- The AlexNet model
- The LeNet-5 convolutional neural network
- Convolutional neural networks
- Long Short Term Memory Networks
- Recurrent neural networks
- The neural network model
- Why do we use xavier initialization?
- What is xavier initialization?
- Steps to solve logistic regression using gradient descent
- The computational graph
- The gradient descent algorithm
- The cost function
- Objective
- Notation
- Logistic regression as a neural network
- How to choose the right activation function
- The rectified linear unit function
- The softmax function
- The tanh function
- The sigmoid function
- Activation functions for deep learning
- Deep learning
- Deep Learning – Architectures and Frameworks
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewer
- About the author
- Contributors
- PacktPub.com
- Why subscribe?
- Packt Upsell
- Reinforcement Learning with TensorFlow
- Copyright and Credits
- Title Page
- 封面
- 封面
- Title Page
- Copyright and Credits
- Reinforcement Learning with TensorFlow
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Deep Learning – Architectures and Frameworks
- Deep learning
- Activation functions for deep learning
- The sigmoid function
- The tanh function
- The softmax function
- The rectified linear unit function
- How to choose the right activation function
- Logistic regression as a neural network
- Notation
- Objective
- The cost function
- The gradient descent algorithm
- The computational graph
- Steps to solve logistic regression using gradient descent
- What is xavier initialization?
- Why do we use xavier initialization?
- The neural network model
- Recurrent neural networks
- Long Short Term Memory Networks
- Convolutional neural networks
- The LeNet-5 convolutional neural network
- The AlexNet model
- The VGG-Net model
- The Inception model
- Limitations of deep learning
- The vanishing gradient problem
- The exploding gradient problem
- Overcoming the limitations of deep learning
- Reinforcement learning
- Basic terminologies and conventions
- Optimality criteria
- The value function for optimality
- The policy model for optimality
- The Q-learning approach to reinforcement learning
- Asynchronous advantage actor-critic
- Introduction to TensorFlow and OpenAI Gym
- Basic computations in TensorFlow
- An introduction to OpenAI Gym
- The pioneers and breakthroughs in reinforcement learning
- David Silver
- Pieter Abbeel
- Google DeepMind
- The AlphaGo program
- Libratus
- Summary
- Training Reinforcement Learning Agents Using OpenAI Gym
- The OpenAI Gym
- Understanding an OpenAI Gym environment
- Programming an agent using an OpenAI Gym environment
- Q-Learning
- The Epsilon-Greedy approach
- Using the Q-Network for real-world applications
- Summary
- Markov Decision Process
- Markov decision processes
- The Markov property
- The S state set
- Actions
- Transition model
- Rewards
- Policy
- The sequence of rewards - assumptions
- The infinite horizons
- Utility of sequences
- The Bellman equations
- Solving the Bellman equation to find policies
- An example of value iteration using the Bellman equation
- Policy iteration
- Partially observable Markov decision processes
- State estimation
- Value iteration in POMDPs
- Training the FrozenLake-v0 environment using MDP
- Summary
- Policy Gradients
- The policy optimization method
- Why policy optimization methods?
- Why stochastic policy?
- Example 1 - rock paper scissors
- Example 2 - state aliased grid-world
- Policy objective functions
- Policy Gradient Theorem
- Temporal difference rule
- TD(1) rule
- TD(0) rule
- TD() rule
- Policy gradients
- The Monte Carlo policy gradient
- Actor-critic algorithms
- Using a baseline to reduce variance
- Vanilla policy gradient
- Agent learning pong using policy gradients
- Summary
- Q-Learning and Deep Q-Networks
- Why reinforcement learning?
- Model based learning and model free learning
- Monte Carlo learning
- Temporal difference learning
- On-policy and off-policy learning
- Q-learning
- The exploration exploitation dilemma
- Q-learning for the mountain car problem in OpenAI gym
- Deep Q-networks
- Using a convolution neural network instead of a single layer neural network
- Use of experience replay
- Separate target network to compute the target Q-values
- Advancements in deep Q-networks and beyond
- Double DQN
- Dueling DQN
- Deep Q-network for mountain car problem in OpenAI gym
- Deep Q-network for Cartpole problem in OpenAI gym
- Deep Q-network for Atari Breakout in OpenAI gym
- The Monte Carlo tree search algorithm
- Minimax and game trees
- The Monte Carlo Tree Search
- The SARSA algorithm
- SARSA algorithm for mountain car problem in OpenAI gym
- Summary
- Asynchronous Methods
- Why asynchronous methods?
- Asynchronous one-step Q-learning
- Asynchronous one-step SARSA
- Asynchronous n-step Q-learning
- Asynchronous advantage actor critic
- A3C for Pong-v0 in OpenAI gym
- Summary
- Robo Everything – Real Strategy Gaming
- Real-time strategy games
- Reinforcement learning and other approaches
- Online case-based planning
- Drawbacks to real-time strategy games
- Why reinforcement learning?
- Reinforcement learning in RTS gaming
- Deep autoencoder
- How is reinforcement learning better?
- Summary
- AlphaGo – Reinforcement Learning at Its Best
- What is Go?
- Go versus chess
- How did DeepBlue defeat Gary Kasparov?
- Why is the game tree approach no good for Go?
- AlphaGo – mastering Go
- Monte Carlo Tree Search
- Architecture and properties of AlphaGo
- Energy consumption analysis – Lee Sedol versus AlphaGo
- AlphaGo Zero
- Architecture and properties of AlphaGo Zero
- Training process in AlphaGo Zero
- Summary
- Reinforcement Learning in Autonomous Driving
- Machine learning for autonomous driving
- Reinforcement learning for autonomous driving
- Creating autonomous driving agents
- Why reinforcement learning ?
- Proposed frameworks for autonomous driving
- Spatial aggregation
- Sensor fusion
- Spatial features
- Recurrent temporal aggregation
- Planning
- DeepTraffic – MIT simulator for autonomous driving
- Summary
- Financial Portfolio Management
- Introduction
- Problem definition
- Data preparation
- Reinforcement learning
- Further improvements
- Summary
- Reinforcement Learning in Robotics
- Reinforcement learning in robotics
- Evolution of reinforcement learning
- Challenges in robot reinforcement learning
- High dimensionality problem
- Real-world challenges
- Issues due to model uncertainty
- What's the final objective a robot wants to achieve?
- Open questions and practical challenges
- Open questions
- Practical challenges for robotic reinforcement learning
- Key takeaways
- Summary
- Deep Reinforcement Learning in Ad Tech
- Computational advertising challenges and bidding strategies
- Business models used in advertising
- Sponsored-search advertisements
- Search-advertisement management
- Adwords
- Bidding strategies of advertisers
- Real-time bidding by reinforcement learning in display advertising
- Summary
- Reinforcement Learning in Image Processing
- Hierarchical object detection with deep reinforcement learning
- Related works
- Region-based convolution neural networks
- Spatial pyramid pooling networks
- Fast R-CNN
- Faster R-CNN
- You Look Only Once
- Single Shot Detector
- Hierarchical object detection model
- State
- Actions
- Reward
- Model and training
- Training specifics
- Summary
- Deep Reinforcement Learning in NLP
- Text summarization
- Deep reinforced model for Abstractive Summarization
- Neural intra-attention model
- Intra-temporal attention on input sequence while decoding
- Intra-decoder attention
- Token generation and pointer
- Hybrid learning objective
- Supervised learning with teacher forcing
- Policy learning
- Mixed training objective function
- Text question answering
- Mixed objective and deep residual coattention for Question Answering
- Deep residual coattention encoder
- Mixed objective using self-critical policy learning
- Summary
- Further topics in Reinforcement Learning
- Continuous action space algorithms
- Trust region policy optimization
- Deterministic policy gradients
- Scoring mechanism in sequential models in NLP
- BLEU
- What is BLEU score and what does it do?
- ROUGE
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-08-27 18:52:42