The Perception team at Zoox is responsible for the robot’s understanding of the world, fusing data from Lidar, Radar, and Cameras to create a unified representation of the environment. In this role, you will contribute to the development of our next-generation 3D occupancy and segmentation networks. You will architect and optimize high-performance deep learning models that generate dense, temporally consistent voxel representations of the driving environment. This work is critical for enabling our vehicle to navigate complex urban scenarios, handle rare obstacles, and drive safely in tight spaces by providing precise geometry and motion estimates to downstream planners.
In this role, you will:
Design and implement state-of-the-art multi-modal sensor fusion architectures (Lidar, Camera, Radar) to predict 3D occupancy, semantic segmentation, and flow .
Develop "vision-first" fusion strategies to enhance geometric understanding and reduce dependency on sparse sensor modalities .
Engineer temporal processing modules to improve the stability and consistency of predictions over time.
Optimize model architectures for real-time on-vehicle inference, balancing high-fidelity range extension with strict latency constraints .
Collaborate with downstream consumers (Tracking, Prediction, Planner) to refine geometric outputs, such as contours and free-space estimations, for complex maneuvering.
Qualifications
MS or PhD in Computer Science, Robotics, Machine Learning, or related field with 6+ years of industry experience.
Deep expertise in 3D Computer Vision and Deep Learning, specifically with voxel-based or BEV (Bird's Eye View) architectures.
Strong proficiency in Python and deep learning frameworks (PyTorch) for model training and design as well as some experience in C++ for model integration.
Experience with multi-sensor fusion (Lidar, Camera, Radar) and handling temporal data sequences.
Experience with occupancy networks, implicit representations (NeRF/Gaussian Splats), or scene flow estimation.
Bonus Qualifications
Experience optimizing models for TensorRT/CUDA to achieve low-latency inference.
Familiarity with sparse convolutions or query-based architectures for efficient 3D processing.
Experience with Vision Language Model or multi-modal 3D foundation model.
Additional Information
About Zoox
Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.
If you need an accommodation to participate in the application or interview process please reach out to [email protected] or your assigned recruiter.
A Final Note:
You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.
These cookies are necessary for the website to function and cannot be turned off in our systems. You can set your browser to block these cookies, but then some parts of the website might not work.
Security
User experience
Target group oriented cookies
These cookies are set through our website by our advertising partners. They may be used by these companies to profile your interests and show you relevant advertising elsewhere.
Google Analytics
Google Ads
We use cookies
🍪
Our website uses cookies and similar technologies to personalize content, optimize the user experience and to indvidualize and evaluate advertising. By clicking Okay or activating an option in the cookie settings, you agree to this.
The best remote jobs via email
Join 5'000+ people getting weekly alerts with remote jobs!