...in reversed chronological order.
- ICRAPlanning with Occluded Traffic Agents using Bi-Level Variational Occlusion ModelsChristianos Filippos, Karkus Peter, Ivanovic Boris, Albrecht Stefano, and Pavone MarcoIn IEEE International Conference on Robotics and Automation, 2023
Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios.
- AAMASDecoupling Exploration and Exploitation in Reinforcement LearningSchäfer Lukas, Christianos Filippos, Hanna Josiah, and Albrecht StefanoIn International Conference on Autonomous Agents and Multiagent Systems, 2022
Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
- AICDeep Reinforcement Learning for Multi-Agent InteractionAhmed Ibrahim H., Brewitt Cillian, Carlucho Ignacio, Christianos Filippos, Dunion Mhairi, Fosong Elliot, Garcin Samuel, Guo Shangmin, Gyevnar Balint, McInroe Trevor, Papoudakis Georgios, Rahman Arrasy, Schäfer Lukas, Tamborski Massimiliano, Vecchio Giuseppe, Wang Cheng, and Albrecht Stefano2022
The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
- ICMLScaling Multi-Agent Reinforcement Learning with Selective Parameter SharingChristianos Filippos, Papoudakis Georgios, Rahman Arrasy, and Albrecht StefanoIn Proceedings of the 38th International Conference on Machine Learning, 2021
Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
- NeurIPSBenchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksPapoudakis Georgios *, Christianos Filippos *, Schäfer Lukas, and Albrecht StefanoIn Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.
- NeurIPSAgent Modelling under Partial Observability for Deep Reinforcement LearningPapoudakis Georgios, Christianos Filippos, and Albrecht StefanoIn Advances in Neural Information Processing Systems, 2021
Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent’s decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent’s information, and significantly higher returns than a baseline method which does not use the learned embeddings.
- ICMLTowards Open Ad Hoc Teamwork Using Graph-based Policy LearningRahman Arrasy, Hopner Niklas, Christianos Filippos, and Albrecht StefanoIn Proceedings of the 38th International Conference on Machine Learning, 2021
Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with previously unknown teammates. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents of varying types to enter and leave the team without prior notification. Our solution builds on graph neural networks to learn agent models and joint action-value decompositions under varying team sizes, which can be trained with reinforcement learning using a discounted returns objective. We demonstrate empirically that our approach effectively models the impact of other agents actions on the controlled agent’s returns to produce policies which can robustly adapt to dynamic team composition and is able to effectively generalize to larger teams than were seen during training.
- NeurIPSShared Experience Actor-Critic for Multi-Agent Reinforcement LearningChristianos Filippos, Schäfer Lukas, and Albrecht StefanoIn Advances in Neural Information Processing Systems, 2020
Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called shared Experience Actor-Critic(SEAC), applies experience sharing in an actor-critic framework by combining the gradients of different agents. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms several baselines and state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
- ABERA Low-Complexity Non-Intrusive Approach to Predict the Energy Demand of Buildings over Short-Term HorizonsPanagopoulos Athanasios Aris, Christianos Filippos, Katsigiannis Michail, Mykoniatis Konstantinos, Pritoni Marco, Panagopoulos Orestis P., Peffer Therese, Chalkiadakis Georgios, Culler David E., Jennings Nicholas R., and Lipman Timothy2020
Reliable, non-intrusive, short-term (of up to 12 h ahead) prediction of a building’s energy demand is a critical component of intelligent energy management applications. A number of such approaches have been proposed over time, utilizing various statistical and, more , machine learning techniques, such as decision trees, neural networks and support vector machines. Importantly, all of these works barely outperform simple seasonal auto-regressive integrated moving average models, while their complexity is significantly higher. In this work, we propose a novel low-complexity non-intrusive approach that improves the predictive accuracy of the state-of-the-art by up to ∼ 10 % . The backbone of our approach is a K-nearest neighbours search method, that exploits the demand pattern of the most similar historical days, and incorporates appropriate time-series pre-processing and easing. In the context of this work, we evaluate our approach against state-of-the-art methods and provide insights on their performance.
- EUMASEfficient Multi-Criteria Coalition Formation Using Hypergraphs (with Application to the V2G Problem)Christianos Filippos, and Chalkiadakis GeorgiosIn Multi-Agent Systems and Agreement Technologies, 2017
This paper proposes, for the first time in the literature, the use of hypergraphs for the efficient formation of effective coalitions. We put forward several formation methods that build on existing hypergraph pruning, transversal, and clustering algorithms, and exploit the hypergraph structure to identify agents with desirable characteristics. Our approach allows the near-instantaneous formation of high quality coalitions, adhering to multiple stated quality requirements. Moreover, our methods are shown to scale to dozens of thousands of agents within fractions of a second; with one of them scaling to even millions of agents within seconds. We apply our approach to the problem of forming coalitions to provide (electric) vehicle-to-grid (V2G) services. Ours is the first approach able to deal with large-scale, real-time coalition formation for the V2G problem, while taking multiple criteria into account for creating the electric vehicle coalitions.
- ECAIEmploying Hypergraphs for Efficient Coalition Formation with Application to the V2G ProblemChristianos Filippos, and Chalkiadakis GeorgiosIn Proceedings of the Twenty-Second European Conference on Artificial Intelligence, 2016
This paper proposes, for the first time in the literature, the use of hypergraphs for the efficient formation of effective coalitions. We put forward several formation methods that build on existing hypergraph algorithms, and exploit hypergraph structure to identify agents with desirable characteristics. Our approach allows the near-instantaneous formation of high quality coalitions, while adhering to multiple stated requirements regarding coalition quality. Moreover, our methods are shown to scale to dozens of thousands of agents within fractions of a second; with one of them scaling to even millions of agents within seconds. We apply our approach to the problem of forming coalitions to provide (electric) vehicle-to-grid (V2G) services. Ours is the first approach able to deal with large-scale, realtime coalition formation for the V2G problem, while taking multiple criteria into account for creating electric vehicle coalitions.