GrokkingDeepReinforcementLearning(2020)[Morales][9781617295454] 9781617295454

287 44 69MB

English Pages [472] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

GrokkingDeepReinforcementLearning(2020)[Morales][9781617295454]
 9781617295454

Table of contents :
foreword
preface
acknowledgments
about this book
about the author
Introduction todeep reinforcement learning
What is deep reinforcement learning?
The past, present, and future of deep reinforcement learning
The suitability of deep reinforcement learning
Setting clear two-way expectations
Mathematical foundationsof reinforcement learning
Components of reinforcement learning
MDPs: The engine of the environment
Balancing immediateand long-term goals
The objective of a decision-making agent
Planning optimal sequences of actions
Balancing the gatheringand use of information
The challenge of interpreting evaluative feedback
Strategic exploration
Evaluatingagents’ behaviors
Learning to estimate the value of policies
Learning to estimate from multiple steps
Improvingagents’ behaviors
The anatomy of reinforcement learning agents
Learning to improve policies of behavior
Decoupling behavior from learning
Achieving goals moreeffectively and efficiently
Learning to improve policies using robust targets
Agents that interact, learn, and plan
Introduction to value-baseddeep reinforcement learning
The kind of feedback deep reinforcement learning agents use
Introduction to function approximation for reinforcement learning
NFQ: The first attempt at value-based deep reinforcement learning
More stablevalue-based methods
DQN: Making reinforcement learning more like supervised learning
Double DQN: Mitigating the overestimation of action-value functions
Sample-efficientvalue-based methods
Dueling DDQN: A reinforcement-learning-aware neural network architecture
PER: Prioritizing the replay of meaningful experiences
Policy-gradient andactor-critic methods
REINFORCE: Outcome-based policy learning
VPG: Learning a value function
A3C: Parallel policy updates
GAE: Robust advantage estimation
A2C: Synchronous policy updates
Advancedactor-critic methods
DDPG: Approximating a deterministic policy
TD3: State-of-the-art improvements over DDPG
SAC: Maximizing the expected return and entropy
PPO: Restricting optimization steps
Toward artificialgeneral intelligence
What was covered and what notably wasn’t?
More advanced concepts toward AGI
What happens next?
index

Polecaj historie