舉報(bào)

會(huì)員
Hands-On Q-Learning with Python
Q-learningisamachinelearningalgorithmusedtosolveoptimizationproblemsinartificialintelligence(AI).ItisoneofthemostpopularfieldsofstudyamongAIresearchers.ThisbookstartsoffbyintroducingyoutoreinforcementlearningandQ-learning,inadditiontohelpingyougetfamiliarwithOpenAIGymaswellaslibrariessuchasKerasandTensorFlow.Afewchaptersintothebook,youwillgaininsightsintomodelfreeQ-learningandusedeepQ-networksanddoubledeepQ-networkstosolvecomplexproblems.Thisbookwillguideyouinexploringusecasessuchasself-drivingvehiclesandOpenAIGym’sCartPoleproblem.YouwillalsolearnhowtotuneandoptimizeQ-networksandtheirhyperparameters.Asyouprogress,youwillunderstandthereinforcementlearningapproachtosolvingreal-worldproblems.YouwillalsoexplorehowtouseQ-learningandrelatedalgorithmsinreal-worldapplicationssuchasscientificresearch.Towardtheend,you’llgainasenseofwhat’sinstoreforreinforcementlearning.Bytheendofthisbook,youwillbeequippedwiththeskillsyouneedtosolvereinforcementlearningproblemsusingQ-learningalgorithmswithOpenAIGym,Keras,andTensorFlow.
目錄(202章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Hands-On Q-Learning with Python
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Section 1: Q-Learning: A Roadmap
- Brushing Up on Reinforcement Learning Concepts
- What is RL?
- States and actions
- The decision-making process
- RL supervised learning and unsupervised learning
- States actions and rewards
- States
- Actions and rewards
- Bellman equations
- Key concepts in RL
- Value-based versus policy-based iteration
- Q-learning hyperparameters – alpha gamma and epsilon
- Alpha – deterministic versus stochastic environments
- Gamma – current versus future rewards
- Epsilon – exploration versus exploitation
- Decaying epsilon
- SARSA versus Q-learning – on-policy or off?
- SARSA and the cliff-walking problem
- When to choose SARSA over Q-learning
- Summary
- Questions
- Getting Started with the Q-Learning Algorithm
- Technical requirements
- Demystifying MDPs
- Control processes
- Markov chains
- The Markov property
- MDPs and state-action diagrams
- Solving MDPs with RL
- Your Q-learning agent in its environment
- Solving the optimization problem
- States and actions in Taxi-v2
- Fine-tuning your model – learning discount and exploration rates
- Decaying epsilon
- Decaying alpha
- Decaying gamma
- MABP – a classic exploration versus exploitation problem
- Setting up a bandit problem
- Bandit optimization strategies
- Other applications for bandit problems
- Optimal versus safe paths – revisiting SARSA
- Summary
- Questions
- Setting Up Your First Environment with OpenAI Gym
- Technical requirements
- Getting started with OpenAI Gym
- What is Gym?
- Setting up Gym
- Gym environments
- Setting up an environment
- Exploring the Taxi-v2 environment
- The state space and valid actions
- Choosing an action manually
- Setting a state manually
- Creating a baseline agent
- Stepping through actions
- Creating a task loop
- Baseline models in Q-learning and machine learning research
- Summary
- Questions
- Teaching a Smartcab to Drive Using Q-Learning
- Technical requirements
- Getting to know your learning agent
- Implementing your agent
- The value function – calculating the Q-value of a state-action pair
- Implementing Bellman equations
- The learning parameters – alpha gamma and epsilon
- Adding an updated alpha value
- Adding an updated epsilon value
- Model-tuning and tracking your agent's long-term performance
- Comparing your models and statistical performance measures
- Training your models
- Decaying epsilon
- Hyperparameter tuning
- Summary
- Questions
- Section 2: Building and Optimizing Q-Learning Agents
- Building Q-Networks with TensorFlow
- Technical requirements
- A brief overview of neural networks
- Extensional versus intensional definitions
- Taking a closer look
- Input hidden and output layers
- Perceptron functions
- ReLU functions
- Implementing a neural network with NumPy
- Feedforward
- Backpropagation
- Neural networks and Q-learning
- Policy agents versus value agents
- Building your first Q-network
- Defining the network
- Training the network
- Summary
- Questions
- Further reading
- Digging Deeper into Deep Q-Networks with Keras and TensorFlow
- Technical requirements
- Introducing CartPole-v1
- More about CartPole states and actions
- Getting started with the CartPole task
- Building a DQN to solve the CartPole problem
- Gamma
- Alpha
- Epsilon
- Building a DQN class
- Choosing actions with epsilon-greedy
- Updating the Q-values
- Running the task loop
- Testing and results
- Adding in experience replay
- About experience replay
- Implementation
- Experience replay results
- Building further on DQNs
- Calculating DQN loss
- Fixed Q-targets
- Double-deep Q-networks
- Dueling deep Q-networks
- Summary
- Questions
- Further reading
- Section 3: Advanced Q-Learning Challenges with Keras TensorFlow and OpenAI Gym
- Decoupling Exploration and Exploitation in Multi-Armed Bandits
- Technical requirements
- Probability distributions and ongoing knowledge
- Iterative probability distributions
- Revisiting a simple bandit problem
- A sample two-armed bandit iteration
- Multi-armed bandit strategy overview
- Greedy strategy
- Epsilon-greedy strategy
- Upper confidence bound
- Bandit regret
- Utility functions and optimal decisions
- Contextual bandits and state diagrams
- Thompson sampling and the Bayesian control rule
- Thompson sampling
- Bayesian control rule
- Solving a multi-armed bandit problem in Python – user advertisement clicks
- Epsilon-greedy selection
- Multi-armed bandits in experimental design
- The testing process
- Bandits with knapsacks – more multi-armed bandit applications
- Summary
- Questions
- Further reading
- Further Q-Learning Research and Future Projects
- Google's DeepMind and the future of Q-learning
- OpenAI Gym and RL research
- The standardization of RL research practice with Gym
- Tracking your scores with the Gym leaderboard
- More OpenAI Gym environments
- Pendulum
- Acrobot
- MountainCar
- Continuous control tasks – MuJoCo
- Continuous control tasks – Box2D
- Robotics research and development
- Algorithms
- Toy text
- Contextual bandits and probability distributions
- Probability and intelligence
- Updating probability distributions
- State spaces
- A/B testing versus multi-armed bandit testing
- Testing methodologies
- Summary
- Questions
- Further reading
- Assessments
- Chapter 1 Brushing Up on Reinforcement Learning Concepts
- Chapter 2 Getting Started with the Q-Learning Algorithm
- Chapter 3 Setting Up Your First Environment with OpenAI Gym
- Chapter 4 Teaching a Smartcab to Drive Using Q-Learning
- Chapter 5 Building Q-Networks with TensorFlow
- Chapter 6 Digging Deeper into Deep Q-Networks with Keras and TensorFlow
- Chapter 7 Decoupling Exploration and Exploitation in Multi-Armed Bandits
- Chapter 8 Further Q-Learning Research and Future Projects
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-06-24 15:13:57
推薦閱讀
- Instant Raspberry Pi Gaming
- 電氣自動(dòng)化專業(yè)英語(yǔ)(第3版)
- Hands-On Data Science with SQL Server 2017
- 精通Excel VBA
- 21天學(xué)通Visual Basic
- STM32嵌入式微控制器快速上手
- Hybrid Cloud for Architects
- 人工智能趣味入門(mén):光環(huán)板程序設(shè)計(jì)
- 網(wǎng)絡(luò)安全與防護(hù)
- Linux Shell編程從初學(xué)到精通
- Spark大數(shù)據(jù)商業(yè)實(shí)戰(zhàn)三部曲:內(nèi)核解密|商業(yè)案例|性能調(diào)優(yōu)
- 傳感器與自動(dòng)檢測(cè)
- Creating ELearning Games with Unity
- Cortex-M3嵌入式處理器原理與應(yīng)用
- Deep Learning Essentials
- PostgreSQL 10 High Performance
- 中老年人學(xué)電腦與上網(wǎng)
- Building Analytics Teams
- 數(shù)據(jù)準(zhǔn)備和特征工程:數(shù)據(jù)工程師必知必會(huì)技能
- Implementing Azure Cloud Design Patterns
- 這樣用PPT!
- 圖解傳感器與儀表應(yīng)用(第2版)
- Mastering Metasploit
- Java Web開(kāi)發(fā)入行真功夫
- Cloud Foundry for Developers
- Pentaho for Big Data Analytics
- Mastering Citrix? XenServer?
- Hands-On Neural Networks
- Visual C++.NET串口通信及測(cè)控應(yīng)用典型實(shí)例
- Mastering PostgreSQL 12