cuNRTO: GPU-Accelerated Nonlinear Robust Trajectory Optimization

cuNRTO on 7-DoF Franka manipulator with GPU pipeline

Robust Planning at GPU Speed

cuNRTO offloads the expensive inner-loop SOCP solves of nonlinear robust trajectory optimization onto the GPU, significantly accelerating robust planning for high-dimensional systems under bounded uncertainty.

139.6×

Max Speedup

25.9×

Wall-Clock
(Franka 7-DoF)

100%

Constraint
Satisfaction

Abstract

Robust trajectory optimization enables autonomous systems to operate safely under uncertainty by computing control policies that satisfy the constraints for all bounded disturbances. However, these problems often lead to large Second Order Conic Programming (SOCP) constraints, which are computationally expensive. In this work, we propose the CUDA Nonlinear Robust Trajectory Optimization (cuNRTO) framework by introducing two dynamic optimization architectures that have direct application to robust decision-making and are implemented on CUDA. The first architecture, NRTO-DR, leverages the Douglas-Rachford (DR) splitting method to solve the SOCP inner subproblems of NRTO, thereby significantly reducing the computational burden through parallel SOCP projections and sparse direct solves. The second architecture, NRTO-FullADMM, is a novel variant that further exploits the problem structure to improve scalability using the Alternating Direction Method of Multipliers (ADMM). Finally, we provide GPU implementation of the proposed methodologies using custom CUDA kernels for SOC projection steps and cuBLAS GEMM chains for feedback gain updates. We validate the performance of cuNRTO through simulated experiments on unicycle, quadcopter, and Franka manipulator models, demonstrating speedup up to 139.6×.

Method

cuNRTO addresses the computational bottleneck of Nonlinear Robust Trajectory Optimization by offloading the expensive inner-loop SOCP solves onto the GPU. The outer successive convexification loop runs on the CPU, while the inner-loop solver executes entirely on the GPU. We propose two architectures: NRTO-DR applies Douglas-Rachford splitting to decompose SOCP subproblems into parallel SOC projections and sparse affine-set projections. NRTO-FullADMM reformulates the entire inner loop using ADMM, moving all update blocks onto the GPU and achieving 86.5% GPU utilization (vs. 34.9% for NRTO-DR).

The cuNRTO pipeline. The outer successive linearization loop runs on the Host (CPU), while the inner ADMM loop runs entirely on the Device (GPU) with parallel SOC projections.

Results

Franka Manipulator (7-DoF)

Franka manipulator trajectory and Isaac Sim

End-effector trajectory with obstacle avoidance validated in Isaac Sim. NRTO-FullADMM achieves 25.9× wall-clock speedup with 100% constraint satisfaction.

Unicycle & Quadcopter

Unicycle — NRTO-DR

Unicycle — NRTO-FullADMM

Quadcopter — NRTO-DR

Quadcopter — NRTO-FullADMM

Real-World Robotarium Experiment

Hardware validation on the Georgia Tech Robotarium. Drag the slider to compare: the NRTO disturbance-feedback policy successfully navigates around obstacles, while the nominal controller collides.

With NRTO Feedback Nominal Only

Citation

@inproceedings{wang2026cunrto,
  title     = {cuNRTO: GPU-Accelerated Nonlinear Robust
               Trajectory Optimization},
  author    = {Wang, Jiawei and Abdul, Arshiya Taj
               and Theodorou, Evangelos A.},
  booktitle = {Robotics: Science and Systems (RSS)},
  year      = {2026}
}