This book constitutes the refereed proceedings of the Second International Conference on Distributed Artificial Intellig
393 55 17MB
English Pages 141 [149] Year 2020
Table of contents :
Preface
Organization
Contents
Parallel Algorithm for Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning
1 Introduction
2 Hostility Game
3 Algorithm
4 Experiments
5 Conclusion
References
LAC-Nav: Collision-Free Multiagent Navigation Based on the Local Action Cells
1 Introduction
2 The Local Action Cells
3 Collision-Free Navigation
4 Experiments
5 Discussions
References
MGHRL: Meta Goal-Generation for Hierarchical Reinforcement Learning
1 Introduction
2 Related Work
3 Preliminaries
4 Algorithm
4.1 Two-Level Hierarchy
4.2 Meta Goal-Generation for Hierarchical Reinforcement Learning
5 Experiments
5.1 Environmental Setup
5.2 Results
6 Discussion and Future Work
References
D3PG: Decomposed Deep Deterministic Policy Gradient for Continuous Control
1 Introduction
2 Background
2.1 Reinforcement Learning (RL)
2.2 Deep Deterministic Policy Gradient (DDPG)
3 The D3PG Algorithm for Robotic Control
3.1 Structural Decomposition
3.2 The PCG Method
3.3 The D3PG Algorithm
4 Experiment
5 Related Work
6 Conclusions
A Appendix
A.1 Appendix
A.2 MuJoCo Platform
References
Lyapunov-Based Reinforcement Learning for Decentralized Multi-agent Control
1 Introduction
2 Preliminaries
2.1 Networked Markov Game
2.2 Soft Actor-Critic Algorithm
2.3 Lyapunov Stability in Control Theory
3 Multi-agent Reinforcement Learning with Lyapunov Stability Constraint
3.1 Multi-agent Soft Actor-Critic Algorithm
3.2 Lyapunov Stability Constraint
4 Experiment
5 Conclusion
References
Hybrid Independent Learning in Cooperative Markov Games
1 Introduction
2 Theoretical Framework
2.1 Markov Games
2.2 Policies and Nash Equilibria
2.3 Q-Learning
3 Hybrid Q-Learning
4 Pathologies in Multi-Agent RL
4.1 Relative Overgeneralization
4.2 The Stochasticity Problem
4.3 Miscoordination
4.4 The Alter-Exploration Problem
5 Independent Learner Baselines
5.1 Independent Q-Learning
5.2 Distributed Q-Learning
5.3 Hysteretic Q-Learning
5.4 LMRL2
5.5 Parameters
6 Experiments
6.1 Climb Games
6.2 Heaven and Hell Game
6.3 Common Interest Game
6.4 Meeting in a Grid
7 Conclusions
References
Efficient Exploration by Novelty-Pursuit
1 Introduction
2 Related Work
3 Background
4 Method
4.1 Selecting Goals from the Experience Buffer
4.2 Training Goal-Conditioned Policy Efficiently
4.3 Exploiting Experience Collected by Exploration Policy
5 Experiment
5.1 Comparison of Exploration Efficiency
5.2 Ablation Study of Training Techniques
5.3 Evaluation on Complicated Environments
6 Conclusion
A Appendix
A.1 Reward Shaping for Training Goal-Conditioned Policy
A.2 Additional Results
A.3 Environment Prepossessing
A.4 Network Architecture
A.5 Hyperparameters
References
Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction
1 Introduction
2 Motivation Scenario
3 Problem Description
4 Algorithms
4.1 Description of MACUCB
4.2 Description of VE
4.3 Extensions
5 Regret Analysis
6 Experiment
6.1 Experiment Setting
6.2 Experimental Results
7 Conclusion
References
Battery Management for Automated Warehouses via Deep Reinforcement Learning
1 Introduction
2 Related Work
3 Motivation Scenario
4 Problem Statement and MDP Formulation
4.1 Battery Management Problem
4.2 MDP Formulation
5 Solving the MDP
5.1 TD3
5.2 Enforcing State Dependent Exploration via Action Regulation Loss
6 Simulator Design
7 Empirical Evaluation
7.1 Experimental Configurations
7.2 Experimental Results
8 Conclusion
References
Author Index