Mind Lab Toolkit (MinT)
Cookbook

chat-dpo

Eval-first DPO experiment for chat-quality preference pairs. The benchmark is a held-out pairwise preference set, not a generation benchmark with an external grader.

At a glance

AlgorithmDPO (pairwise preference)
Base modelQwen/Qwen3-4B-Instruct-2507
Training datalocal data/train/full.jsonl
Benchmarkheld-out pairwise preference eval on data/eval/full.jsonl
Primary metricMETRIC eval_pair_accuracy
Upstream READMEOpen in mint-cookbook →

For setup, runnable commands, and full eval protocol, see the upstream README. The experiment follows the shared cookbook lifecycle: uv sync--dry-run--eval-only → train.

On this page