Research/May 17, 2026

Beyond DPO: OTPO Refines LLM Alignment via Optimal Transport

Researchers have introduced Optimal Transport Preference Optimization (OTPO), a method that improves Large Language Model alignment by assigning non-uniform weights to preference pairs. Unlike standard Direct Preference Optimization (DPO), OTPO utilizes optimal transport theory to mitigate the impact of noisy or low-quality data during the fine-tuning process.

ORIGINAL SOURCE

Beyond DPO: OTPO Refines LLM Alignment via Optimal Transport

View at Towards Data Science