舉報

會員
Hands-On Q-Learning with Python
Q-learningisamachinelearningalgorithmusedtosolveoptimizationproblemsinartificialintelligence(AI).ItisoneofthemostpopularfieldsofstudyamongAIresearchers.ThisbookstartsoffbyintroducingyoutoreinforcementlearningandQ-learning,inadditiontohelpingyougetfamiliarwithOpenAIGymaswellaslibrariessuchasKerasandTensorFlow.Afewchaptersintothebook,youwillgaininsightsintomodelfreeQ-learningandusedeepQ-networksanddoubledeepQ-networkstosolvecomplexproblems.Thisbookwillguideyouinexploringusecasessuchasself-drivingvehiclesandOpenAIGym’sCartPoleproblem.YouwillalsolearnhowtotuneandoptimizeQ-networksandtheirhyperparameters.Asyouprogress,youwillunderstandthereinforcementlearningapproachtosolvingreal-worldproblems.YouwillalsoexplorehowtouseQ-learningandrelatedalgorithmsinreal-worldapplicationssuchasscientificresearch.Towardtheend,you’llgainasenseofwhat’sinstoreforreinforcementlearning.Bytheendofthisbook,youwillbeequippedwiththeskillsyouneedtosolvereinforcementlearningproblemsusingQ-learningalgorithmswithOpenAIGym,Keras,andTensorFlow.
目錄(202章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Hands-On Q-Learning with Python
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Section 1: Q-Learning: A Roadmap
- Brushing Up on Reinforcement Learning Concepts
- What is RL?
- States and actions
- The decision-making process
- RL supervised learning and unsupervised learning
- States actions and rewards
- States
- Actions and rewards
- Bellman equations
- Key concepts in RL
- Value-based versus policy-based iteration
- Q-learning hyperparameters – alpha gamma and epsilon
- Alpha – deterministic versus stochastic environments
- Gamma – current versus future rewards
- Epsilon – exploration versus exploitation
- Decaying epsilon
- SARSA versus Q-learning – on-policy or off?
- SARSA and the cliff-walking problem
- When to choose SARSA over Q-learning
- Summary
- Questions
- Getting Started with the Q-Learning Algorithm
- Technical requirements
- Demystifying MDPs
- Control processes
- Markov chains
- The Markov property
- MDPs and state-action diagrams
- Solving MDPs with RL
- Your Q-learning agent in its environment
- Solving the optimization problem
- States and actions in Taxi-v2
- Fine-tuning your model – learning discount and exploration rates
- Decaying epsilon
- Decaying alpha
- Decaying gamma
- MABP – a classic exploration versus exploitation problem
- Setting up a bandit problem
- Bandit optimization strategies
- Other applications for bandit problems
- Optimal versus safe paths – revisiting SARSA
- Summary
- Questions
- Setting Up Your First Environment with OpenAI Gym
- Technical requirements
- Getting started with OpenAI Gym
- What is Gym?
- Setting up Gym
- Gym environments
- Setting up an environment
- Exploring the Taxi-v2 environment
- The state space and valid actions
- Choosing an action manually
- Setting a state manually
- Creating a baseline agent
- Stepping through actions
- Creating a task loop
- Baseline models in Q-learning and machine learning research
- Summary
- Questions
- Teaching a Smartcab to Drive Using Q-Learning
- Technical requirements
- Getting to know your learning agent
- Implementing your agent
- The value function – calculating the Q-value of a state-action pair
- Implementing Bellman equations
- The learning parameters – alpha gamma and epsilon
- Adding an updated alpha value
- Adding an updated epsilon value
- Model-tuning and tracking your agent's long-term performance
- Comparing your models and statistical performance measures
- Training your models
- Decaying epsilon
- Hyperparameter tuning
- Summary
- Questions
- Section 2: Building and Optimizing Q-Learning Agents
- Building Q-Networks with TensorFlow
- Technical requirements
- A brief overview of neural networks
- Extensional versus intensional definitions
- Taking a closer look
- Input hidden and output layers
- Perceptron functions
- ReLU functions
- Implementing a neural network with NumPy
- Feedforward
- Backpropagation
- Neural networks and Q-learning
- Policy agents versus value agents
- Building your first Q-network
- Defining the network
- Training the network
- Summary
- Questions
- Further reading
- Digging Deeper into Deep Q-Networks with Keras and TensorFlow
- Technical requirements
- Introducing CartPole-v1
- More about CartPole states and actions
- Getting started with the CartPole task
- Building a DQN to solve the CartPole problem
- Gamma
- Alpha
- Epsilon
- Building a DQN class
- Choosing actions with epsilon-greedy
- Updating the Q-values
- Running the task loop
- Testing and results
- Adding in experience replay
- About experience replay
- Implementation
- Experience replay results
- Building further on DQNs
- Calculating DQN loss
- Fixed Q-targets
- Double-deep Q-networks
- Dueling deep Q-networks
- Summary
- Questions
- Further reading
- Section 3: Advanced Q-Learning Challenges with Keras TensorFlow and OpenAI Gym
- Decoupling Exploration and Exploitation in Multi-Armed Bandits
- Technical requirements
- Probability distributions and ongoing knowledge
- Iterative probability distributions
- Revisiting a simple bandit problem
- A sample two-armed bandit iteration
- Multi-armed bandit strategy overview
- Greedy strategy
- Epsilon-greedy strategy
- Upper confidence bound
- Bandit regret
- Utility functions and optimal decisions
- Contextual bandits and state diagrams
- Thompson sampling and the Bayesian control rule
- Thompson sampling
- Bayesian control rule
- Solving a multi-armed bandit problem in Python – user advertisement clicks
- Epsilon-greedy selection
- Multi-armed bandits in experimental design
- The testing process
- Bandits with knapsacks – more multi-armed bandit applications
- Summary
- Questions
- Further reading
- Further Q-Learning Research and Future Projects
- Google's DeepMind and the future of Q-learning
- OpenAI Gym and RL research
- The standardization of RL research practice with Gym
- Tracking your scores with the Gym leaderboard
- More OpenAI Gym environments
- Pendulum
- Acrobot
- MountainCar
- Continuous control tasks – MuJoCo
- Continuous control tasks – Box2D
- Robotics research and development
- Algorithms
- Toy text
- Contextual bandits and probability distributions
- Probability and intelligence
- Updating probability distributions
- State spaces
- A/B testing versus multi-armed bandit testing
- Testing methodologies
- Summary
- Questions
- Further reading
- Assessments
- Chapter 1 Brushing Up on Reinforcement Learning Concepts
- Chapter 2 Getting Started with the Q-Learning Algorithm
- Chapter 3 Setting Up Your First Environment with OpenAI Gym
- Chapter 4 Teaching a Smartcab to Drive Using Q-Learning
- Chapter 5 Building Q-Networks with TensorFlow
- Chapter 6 Digging Deeper into Deep Q-Networks with Keras and TensorFlow
- Chapter 7 Decoupling Exploration and Exploitation in Multi-Armed Bandits
- Chapter 8 Further Q-Learning Research and Future Projects
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:13:57
推薦閱讀
- 高效能辦公必修課:Word圖文處理
- Mastercam 2017數控加工自動編程經典實例(第4版)
- PIC單片機C語言非常入門與視頻演練
- Visual C# 2008開發技術詳解
- 網絡化分布式系統預測控制
- 網中之我:何明升網絡社會論稿
- 單片機C語言程序設計完全自學手冊
- 機器人人工智能
- 工業機器人力覺視覺控制高級應用
- Mastering OpenStack(Second Edition)
- 算法設計與分析
- 網絡信息安全項目教程
- Practical Network Automation
- Mastercam X5應用技能基本功特訓
- Learn T-SQL Querying
- Building Smart Drones with ESP8266 and Arduino
- 單片機技術
- 博弈論與無線傳感器網絡安全
- 局域網組建與管理技術詳解
- Android High Performance Programming
- R Programming By Example
- 深度學習
- 數據結構(C語言版)
- 大數據安全技術與應用
- 裝配式混凝土建筑:甲方管理問題分析與對策
- 計算機控制技術(MCGS實現)
- 計算機導論與C語言
- 人工智能初探1
- UG NX 5.0一冊通
- Practical DevOps