Picture of Ben Agro

Ben Agro

Welcome to my website! My interests are real-world robotics and understanding (machine) intelligence. I have been a high-level rock climber for the majority of my life.

Experience

Tesla Logo
I am part of the Autopilot Fleet Learning Team working on multimodal foundation models under Phil Duan. Our goal is end-to-end driverless autonomy.
Waabi Logo
I was a researcher at Waabi working on next-generation autonomy systems under Sergio Casas and Raquel Urtasun (who is also my PhD supervisor).
RVL Logo
I was a research intern at the Robotics Vision and Learning Lab supervised by Florian Shkurti. We were working on learning methods for task and motion planning, and I developed a new algorithm for PDDLStream that learned-task specific heuristics for expanding the space of possible robot actions.
ASRL Logo
I was a research intern at the Autonomous Space and Robotics Lab supervised by Tim Barfoot. We worked on self-supervised semantic LiDAR segmentation for autonomous navigation. I developed a simulation of a complex indoor environment complete with dynamic actors and an augmented navigation stack used to train and evaluate our method.

Publications

An image showing MADs object memory

MAD: Memory-Augmented Detection of 3D Objects

Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun

CVPR 2025, details coming soon!

How can we embue object detectors with learned long term memory? Achieves first place on the Waymo leaderboard for online non-ensemble methods without test-time augmentations.
An image showing DIOs instance segmentation

DIO: Decomposable Implicit 4D Occupancy-Flow World Model

Christopher Diehl, Quinlan Sykora, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun

CVPR 2025, details coming soon!

UnO + new sparse archiecture for better details and performance + understanding of instances.
A gif showing DeTras detections and forecasts

DeTra: A Unified Model for Object Detection and Trajectory Forecasting

Sergio Casas*, Ben Agro*, Jiageng Mao*, Thomas Gilles, Alexander Cui, Thomas Li, Raquel Urtasun

ECCV 2024

Previous works perform object detection and trajectory forecasting through cascading modules. We reformulate this as a single unified trajectory refinement task, which removes the problem of compounding errors. To fulfill this task, we propose a flexible transformer refinement archiecture that is easily extensible to alternative input modalities and tasks.

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Ben Agro*, Quinlan Sykora*, Sergio Casas, Thomas Gilles, Raquel Urtasun

CVPR 2024 (Oral, top 0.7%)

We introduce a foundation model for the physical world that learns to perceive and forecast 4D (spatio-temporal) occupancy fields with self-supervision from LiDAR data. We show that this model can be transfered to a variety of downstream tasks, such as LiDAR forecasting and semantic birds eye view occupancy forecasting. We came first in the CVPR 2024 Argoverse 2 LiDAR 4D Occupancy Challenge.
A cartoon describing certifiable optimization

Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations

Frederike Dümbgen, Connor Holmes, Ben Agro, Timothy D. Barfoot.

Pre-print

During my undergraduate thesis, I spent much time trying to find redundant constraints to tighten an optimization problem such that it was globally optimal. Finding these redundant constraints was time-consuming and tedious. This paper presents an automated method for finding redundant constraints for optimization problems.

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

Ben Agro*, Quinlan Sykora*, Sergio Casas*, Raquel Urtasun

CVPR 2023 (Highlight)

A new approach to perception and motion-forcasting for self-driving vehicles using a neural network to implicitly represent future occupancy and flow directly from sensor data.
Picture of a Franka Panda stacking blocks

Learning to Search in Task and Motion Planning with Streams

Mohammed Khodeir*, Ben Agro*, Florian Shkurti

CoRR 2021

Presents a new algorithm for PDDLStream that uses a graph neural network to search for geometrically feasible plans in a "best first" manner.
Diagram of thet semantic segmentation pipeline

Self-Supervised Learning of Lidar Segmentation for Autonomous Indoor Navigation

Hugues Thomas, Ben Agro, Mona Gridseth, Jian Zhang, Timothy D. Barfoot

ICRA 2021

A self-supervised learning approach for semantic segmentation of LiDAR over repeated navigation sessions.

Personal Projects

BARFT Thumbnail

BARFT: Bundle Adjusting Neural Radiance Fields with Temporal Regularization

A framework for training NeRFs with unknown (learned) camera poses.
TaS Thumbnail

An explainer of ``Transformers As Statisticans"

TaS Thumbnail

Zero-Shot Video Retrieval with Vision Language Models

A zero-shot video retrieval system leveraging open-source Vision-Language models.
Stereo Localization Thumbnail

Towards Globally Optimal Stereo Localization (Undergrad Thesis)

This is my undergrad thesis for Engineering Science at UofT under Prof. Tim Barfoot. We investigated how to make the problem of stereo localization (determining the pose of a stereo camera with respect to observed landmarks) globally optimal.
Video of Captor on CT4

Captor (Autonomous Drone)

I built and programmed an autonomous drone with reliable onboard SLAM and vision-only obstacle avoidance. The simulator I built for this project is here.
Video of Geometry Boy

Geometry Boy

I programmed a version of the popular game Geometry Dash that runs on the original Gameboy hardware.

Climbing

I love bouldering, and my focus is on sending hard boulders outdoors. My current goal is to send V13 V14. Here is some of my climbing related media:

News

Blog