dapo-aime

Self-contained MinT experiment that trains direct GRPO on a local materialization of BytedTsinghua-SIA/DAPO-Math-17k and reports on AIME 2024 as the fixed benchmark. AIME 2025 and AIME 2026 manifests ship under the same row contract for auxiliary eval. No SFT warm-start.

At a glance


Algorithm	direct GRPO (no SFT warm-start)
Base model	`Qwen/Qwen3-4B-Instruct-2507`
Training data	local materialization of `BytedTsinghua-SIA/DAPO-Math-17k` (`data/train/full.jsonl`)
Benchmark	AIME 2024 (`data/eval/aime2024.jsonl`); auxiliary: AIME 2025 / 2026
Primary metrics	`METRIC eval_accuracy`, `METRIC eval_greedy_accuracy`, `METRIC eval_pass_at_k`
Upstream README	Open in mint-cookbook →

For setup, runnable commands, and full eval protocol, see the upstream README. The experiment follows the shared cookbook lifecycle: uv sync → --dry-run → --eval-only → train.

dapo-aime

At a glance

On this page