Source Themes

Gemini: A Family of Highly Capable Multimodal Models

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging …

Diversifying AI: Towards Creative Chess with AlphaZero

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. …

Faster sorting algorithms discovered using deep reinforcement learning

Fundamental algorithms such as sorting or hashing are used trillions of times on any given day. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been …

Optimizing Memory Mapping Using Deep Reinforcement Learning

Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, …

Automated Planning for Robotic Guidewire Navigation in the Coronary Arteries

Soft continuum robots, and comparable instruments allow to perform some surgical procedures non-invasively. While safer, less morbid and more cost-effective, these medical interventions increase the complexity for the practitioners: the manipulation …

Safe and Efficient Reinforcement Learning for Behavioural Planning in Autonomous Driving

In this Ph.D. thesis, we study how autonomous vehicles can learn to act *safely* and avoid accidents, despite sharing the road with human drivers whose behaviours are uncertain. To explicitly account for this uncertainty, informed by online …

Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has …

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for …

Monte-Carlo Graph Search: the Value of Merging Similar States

We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as …

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on exploring …