Research/
Beyond DPO: OTPO Refines LLM Alignment via Optimal Transport
Researchers have introduced Optimal Transport Preference Optimization (OTPO), a method that improves Large Language Model alignment by assigning non-uniform weights to preference pairs. Unlike standard Direct Preference Optimization (DPO), OTPO utilizes optimal transport theory to mitigate the impact of noisy or low-quality data during the fine-tuning process.