Compliant Residual DAgger

Improving Real-World Contact-Rich Manipulation with Human Corrections

Xiaomeng Xu^* Yifan Hou^* Chendong Xin Zeyi Liu Shuran Song

Stanford University

We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50% on four challenging tasks (book flipping, belt assembly, gear insertion, and cable routing) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks.

System Overview

To improve a robot manipulation policy, we propose a compliant intervention interface (a) for collecting human correction data, and use this data to update a compliant residual policy (b), and thoroughly study their effects by deploying the updated policy on two contact-rich manipulation tasks in the real world (c).

Compliant Intervention Interface

[On-Policy Delta] introduces less distribution shift ▼

Here shows the distribution of fingertip trajectories across all dimensions in base policy training data, [On-Policy Delta] and [Take-Over-Correction] data. [On-Policy Delta] data's distribution is better aligned with base policy training data's distribution than [Take-Over-Correction] data.

[On-Policy Delta] enables smoother trajectories ▼

Here compares velocity magnitude within 1.5s of the corrections starts/ends. [On-Policy Delta] velocity magnitudes are smaller and more consistent, [Take-Over Correction] has notably larger magnitude and variations, demonstrating that [On-Policy Delta] encourages smoother trajectories.

Findings & Results

Finding 1: Compliant Residual Policy improves base policy by a large margin

Base Policy: Incomplete flipping

Base + Compliant Residual Policy (ours)

Base Policy: Missed insertion

Base + Compliant Residual Policy (ours)

Base Policy: Stuck on base

Base Policy: Missed the slot

Base + Compliant Residual Policy (ours)

Base Policy: Missed insertion

Base + Compliant Residual Policy (ours)

Base Policy: Missed slot

Base + Compliant Residual Policy (ours)

Finding 2: Residual policy allows additional useful modality to be added during correction

Residual w/o force: Incomplete flipping

Compliant Residual Policy (ours)

Residual w/o force: Missed the slot

Compliant Residual Policy (ours)

Finding 3: Smooth On-Policy Delta data makes training more stable

Trained with Take-Over correction: Insert too high

Trained with On-Policy Delta correction (ours)

Trained with Take-Over correction: Missed the slot

Trained with On-Policy Delta correction (ours)

Finding 4: Retraining base policy is stable but learns correction behavior slowly

Retrain with correction: Incomplete flipping

Compliant Residual Policy (ours)

Retrain with correction: Stuck on base

Compliant Residual Policy (ours)

Finding 5: Finetuning base policy is unstable

Finetune with correction: unstable motion

Compliant Residual Policy (ours)

Finetune with correction: unstable motion

Compliant Residual Policy (ours)

Citation

	
@article{xu2025compliant,
  title={Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections},
  author={Xu, Xiaomeng and Hou, Yifan and Liu, Zeyi and Song, Shuran},
  journal={arXiv preprint arXiv:2506.16685},
  year={2025}
}

Contact

If you have any questions, please feel free to contact Xiaomeng Xu and Yifan Hou.

Acknowledgement

We would like to thank Eric Cousineau, Huy Ha, and Benjamin Burchfiel for thoughtful discussions on the proposed method, thank Mandi Zhao, Maximillian Du, Mengda Xu, and all REALab members for their suggestions on the experiment setup and the manuscript. This work was supported in part by the NSF Award #2143601, #2037101, and #2132519, Sloan Fellowship, and Toyota Research Institute. We would like to thank Google and TRI for the UR5 robot hardware. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

Compliant Residual DAgger

Improving Real-World Contact-Rich Manipulation with Human Corrections

Xiaomeng Xu* Yifan Hou* Chendong Xin Zeyi Liu Shuran Song Stanford University

System Overview

Compliant Intervention Interface

Findings & Results

Finding 1: Compliant Residual Policy improves base policy by a large margin

Finding 2: Residual policy allows additional useful modality to be added during correction

Finding 3: Smooth On-Policy Delta data makes training more stable

Finding 4: Retraining base policy is stable but learns correction behavior slowly

Finding 5: Finetuning base policy is unstable

Citation

Contact

Acknowledgement

Xiaomeng Xu^* Yifan Hou^* Chendong Xin Zeyi Liu Shuran Song

Stanford University