How should a driving robot make decisions?

Safe and Efficient Reinforcement Learning for Behavioural Planning in Autonomous Driving

How should a driving robot make decisions?

Safe and Efficient Reinforcement Learning for Behavioural Planning in Autonomous Driving

Abstract

In this Ph.D. thesis, we study how autonomous vehicles can learn to act safely and avoid accidents, despite sharing the road with human drivers whose behaviours are uncertain. To explicitly account for this uncertainty, informed by online observations of the environment, we construct a high-confidence region over the system dynamics, which we propagate through time to bound the possible trajectories of nearby traffic. To ensure safety under such uncertainty, we resort to robust decision-making and act by always considering the worst-case outcomes. This approach guarantees that the performance reached during planning is at least achieved for the true system, and we show by end-to-end analysis that the overall sub-optimality is bounded. Tractability is preserved at all stages, by leveraging sample-efficient tree-based planning algorithms. Another contribution is motivated by the observation that this pessimistic approach tends to produce overly conservative behaviours: imagine you wish to overtake a vehicle, what certainty do you have that they will not change lane at the very last moment, causing an accident? Such reasoning makes it difficult for robots to drive amidst other drivers, merge into a highway, or cross an intersection –an issue colloquially known as the “freezing robot problem”. Thus, the presence of uncertainty induces a trade-off between two contradictory objectives: safety and efficiency. How does one arbitrate this conflict? The question can be temporarily circumvented by reducing uncertainty as much as possible. For instance, we propose an attention-based neural network architecture that better accounts for interactions between traffic participants to improve predictions. But to actively embrace this trade-off, we draw on constrained decision-making to consider both the task completion and safety objectives independently. Rather than a unique driving policy, we train a whole continuum of behaviours, ranging from conservative to aggressive. This provides the system designer with a slider allowing them to adjust the level of risk assumed by the vehicle in real-time.

Type
Publication
PhD Thesis