publications | Filippos Christianos

2023

TMLR

Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

Christianos Filippos, Papoudakis Georgios, and Albrecht Stefano V.

Transactions on Machine Learning Research, 2023

Abs arXiv OpenReview

This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal Nash equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and, therefore, is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC, which is shown to efficiently scale in games with a large number of agents.
AAMAS (ws)

Revisiting the Gumbel-Softmax in MADDPG

Tilbury Callum Rhys, Christianos Filippos, and Albrecht Stefano

In Adaptive and Learning Agents Workshop (ALA), 2023
AAMAS (ws)

SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

Michalski Adam, Christianos Filippos, and Albrecht Stefano

In Multiagent Sequential Decision Making under Uncertainty Workshop (MSDM), 2023
ICRA

Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models

Christianos Filippos, Karkus Peter, Ivanovic Boris, Albrecht Stefano, and Pavone Marco

In IEEE International Conference on Robotics and Automation, 2023

Abs arXiv

Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios.

2022

AAMAS

Decoupling Exploration and Exploitation in Reinforcement Learning

Schäfer Lukas, Christianos Filippos, Hanna Josiah, and Albrecht Stefano

In International Conference on Autonomous Agents and Multiagent Systems, 2022

Abs arXiv

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.
AIC

Deep Reinforcement Learning for Multi-Agent Interaction

Ahmed Ibrahim H., Brewitt Cillian, Carlucho Ignacio, Christianos Filippos, Dunion Mhairi, Fosong Elliot, Garcin Samuel, Guo Shangmin, Gyevnar Balint, McInroe Trevor, Papoudakis Georgios, Rahman Arrasy, Schäfer Lukas, Tamborski Massimiliano, Vecchio Giuseppe, Wang Cheng, and Albrecht Stefano

2022

Abs arXiv

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.

2021

ICML

Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing

Christianos Filippos, Papoudakis Georgios, Rahman Arrasy, and Albrecht Stefano

In Proceedings of the 38th International Conference on Machine Learning, 2021

Abs Paper Code Poster Slides

Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
NeurIPS

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks

Papoudakis Georgios *, Christianos Filippos *, Schäfer Lukas, and Albrecht Stefano

In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

Abs arXiv OpenReview Code

Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria, making comparisons between approaches difficult. In this work, we evaluate and compare three different classes of MARL algorithms (independent learners, centralised training with decentralised execution, and value decomposition) in a diverse range of multi-agent learning tasks. Our results show that (1) algorithm performance depends strongly on environment properties and no algorithm learns efficiently across all learning tasks; (2) independent learners often achieve equal or better performance than more complex algorithms; (3) tested algorithms struggle to solve multi-agent tasks with sparse rewards. We report detailed empirical data, including a reliability analysis, and provide insights into the limitations of the tested algorithms.
NeurIPS

Agent Modelling under Partial Observability for Deep Reinforcement Learning

Papoudakis Georgios, Christianos Filippos, and Albrecht Stefano

In Advances in Neural Information Processing Systems, 2021

Abs OpenReview Code

Modelling the behaviours of other agents (opponents) is essential for understanding how agents interact and making effective decisions. Existing methods for opponent modelling commonly assume knowledge of the local observations and chosen actions of the modelled opponents, which can significantly limit their applicability. We propose a new modelling technique based on variational autoencoders, which are trained to reconstruct the local actions and observations of the opponent based on embeddings which depend only on the local observations of the modelling agent (its observed world state, chosen actions, and received rewards). The embeddings are used to augment the modelling agent’s decision policy which is trained via deep reinforcement learning; thus the policy does not require access to opponent observations. We provide a comprehensive evaluation and ablation study in diverse multi-agent tasks, showing that our method achieves comparable performance to an ideal baseline which has full access to opponent’s information, and significantly higher returns than a baseline method which does not use the learned embeddings.
ICML

Towards Open Ad Hoc Teamwork Using Graph-based Policy Learning

Rahman Arrasy, Hopner Niklas, Christianos Filippos, and Albrecht Stefano

In Proceedings of the 38th International Conference on Machine Learning, 2021

Abs Paper Code

Ad hoc teamwork is the challenging problem of designing an autonomous agent which can adapt quickly to collaborate with previously unknown teammates. Prior work in this area has focused on closed teams in which the number of agents is fixed. In this work, we consider open teams by allowing agents of varying types to enter and leave the team without prior notification. Our solution builds on graph neural networks to learn agent models and joint action-value decompositions under varying team sizes, which can be trained with reinforcement learning using a discounted returns objective. We demonstrate empirically that our approach effectively models the impact of other agents actions on the controlled agent’s returns to produce policies which can robustly adapt to dynamic team composition and is able to effectively generalize to larger teams than were seen during training.

2020

NeurIPS

Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning

Christianos Filippos, Schäfer Lukas, and Albrecht Stefano

In Advances in Neural Information Processing Systems, 2020

Abs Paper Code Poster Slides

Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called shared Experience Actor-Critic(SEAC), applies experience sharing in an actor-critic framework by combining the gradients of different agents. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms several baselines and state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
ABER

A Low-Complexity Non-Intrusive Approach to Predict the Energy Demand of Buildings over Short-Term Horizons

Panagopoulos Athanasios Aris, Christianos Filippos, Katsigiannis Michail, Mykoniatis Konstantinos, Pritoni Marco, Panagopoulos Orestis P., Peffer Therese, Chalkiadakis Georgios, Culler David E., Jennings Nicholas R., and Lipman Timothy

2020

Abs Paper

Reliable, non-intrusive, short-term (of up to 12 h ahead) prediction of a building’s energy demand is a critical component of intelligent energy management applications. A number of such approaches have been proposed over time, utilizing various statistical and, more , machine learning techniques, such as decision trees, neural networks and support vector machines. Importantly, all of these works barely outperform simple seasonal auto-regressive integrated moving average models, while their complexity is significantly higher. In this work, we propose a novel low-complexity non-intrusive approach that improves the predictive accuracy of the state-of-the-art by up to ∼ 10 % . The backbone of our approach is a K-nearest neighbours search method, that exploits the demand pattern of the most similar historical days, and incorporates appropriate time-series pre-processing and easing. In the context of this work, we evaluate our approach against state-of-the-art methods and provide insights on their performance.

2017

EUMAS

Efficient Multi-Criteria Coalition Formation Using Hypergraphs (with Application to the V2G Problem)

Christianos Filippos, and Chalkiadakis Georgios

In Multi-Agent Systems and Agreement Technologies, 2017

Abs

This paper proposes, for the first time in the literature, the use of hypergraphs for the efficient formation of effective coalitions. We put forward several formation methods that build on existing hypergraph pruning, transversal, and clustering algorithms, and exploit the hypergraph structure to identify agents with desirable characteristics. Our approach allows the near-instantaneous formation of high quality coalitions, adhering to multiple stated quality requirements. Moreover, our methods are shown to scale to dozens of thousands of agents within fractions of a second; with one of them scaling to even millions of agents within seconds. We apply our approach to the problem of forming coalitions to provide (electric) vehicle-to-grid (V2G) services. Ours is the first approach able to deal with large-scale, real-time coalition formation for the V2G problem, while taking multiple criteria into account for creating the electric vehicle coalitions.

2016

ECAI

Employing Hypergraphs for Efficient Coalition Formation with Application to the V2G Problem

Christianos Filippos, and Chalkiadakis Georgios

In Proceedings of the Twenty-Second European Conference on Artificial Intelligence, 2016

Abs

This paper proposes, for the first time in the literature, the use of hypergraphs for the efficient formation of effective coalitions. We put forward several formation methods that build on existing hypergraph algorithms, and exploit hypergraph structure to identify agents with desirable characteristics. Our approach allows the near-instantaneous formation of high quality coalitions, while adhering to multiple stated requirements regarding coalition quality. Moreover, our methods are shown to scale to dozens of thousands of agents within fractions of a second; with one of them scaling to even millions of agents within seconds. We apply our approach to the problem of forming coalitions to provide (electric) vehicle-to-grid (V2G) services. Ours is the first approach able to deal with large-scale, realtime coalition formation for the V2G problem, while taking multiple criteria into account for creating electric vehicle coalitions.