Compliant Residual DAgger

Improving Real-World Contact-Rich Manipulation with Human Corrections

Xiaomeng Xu*    Yifan Hou*    Zeyi Liu    Shuran Song

Stanford University

Paper Code (Coming Soon)

1x

1x

We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks.


System Overview

To improve a robot manipulation policy, we propose a compliant intervention interface (a) for collecting human correction data, and use this data to update a compliant residual policy (b), and thoroughly study their effects by deploying the updated policy on two contact-rich manipulation tasks in the real world (c).


Compliant Intervention Interface

1x

2x


Findings & Results

CR-DAgger Results Preview

Finding 1: Compliant Residual Policy improves base policy by a large margin

3x

Base Policy: Incomplete flipping

3x

Base + Compliant Residual Policy (ours)

1x

Base Policy: Missed insertion

3x

Base + Compliant Residual Policy (ours)

4x

Base Policy: Stuck on base

4x

Base Policy: Missed the slot

4x

Base + Compliant Residual Policy (ours)

Finding 2: Residual policy allows additional useful modality to be added during correction

3x

Residual w/o force: Incomplete flipping

3x

Compliant Residual Policy (ours)

4x

Residual w/o force: Missed the slot

4x

Compliant Residual Policy (ours)

Finding 3: Smooth On-Policy Delta data makes training more stable

1x

Trained with Take-Over correction: Insert too high

3x

Trained with On-Policy Delta correction (ours)

4x

Trained with Take-Over correction: Missed the slot

4x

Trained with On-Policy Delta correction (ours)

Finding 4: Retraining base policy is stable but learns correction behavior slowly

3x

Retrain with correction: Incomplete flipping

3x

Compliant Residual Policy (ours)

4x

Retrain with correction: Stuck on base

4x

Compliant Residual Policy (ours)

Finding 5: Finetuning base policy is unstable

3x

Finetune with correction: unstable motion

3x

Compliant Residual Policy (ours)

4x

Finetune with correction: unstable motion

4x

Compliant Residual Policy (ours)

Citation

	
@misc{xu2025compliantresidualdaggerimproving,
      title={Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections}, 
      author={Xiaomeng Xu and Yifan Hou and Zeyi Liu and Shuran Song},
      year={2025},
      eprint={2506.16685},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2506.16685}, 
}
						

Contact

If you have any questions, please feel free to contact Xiaomeng Xu and Yifan Hou.

Acknowledgement

We would like to thank Eric Cousineau, Huy Ha, and Benjamin Burchfiel for thoughtful discussions on the proposed method, thank Mandi Zhao, Maximillian Du, Mengda Xu, and all REALab members for their suggestions on the experiment setup and the manuscript. This work was supported in part by the NSF Award #2143601, #2037101, and #2132519, Sloan Fellowship, and Toyota Research Institute. We would like to thank Google and TRI for the UR5 robot hardware. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.