Cookbook
dapo-aime
Self-contained MinT experiment that trains direct GRPO on a local materialization of BytedTsinghua-SIA/DAPO-Math-17k and reports on AIME 2024 as the fixed benchmark. AIME 2025 and AIME 2026 manifests ship under the same row contract for auxiliary eval. No SFT warm-start.
At a glance
| Algorithm | direct GRPO (no SFT warm-start) |
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Training data | local materialization of BytedTsinghua-SIA/DAPO-Math-17k (data/train/full.jsonl) |
| Benchmark | AIME 2024 (data/eval/aime2024.jsonl); auxiliary: AIME 2025 / 2026 |
| Primary metrics | METRIC eval_accuracy, METRIC eval_greedy_accuracy, METRIC eval_pass_at_k |
| Upstream README | Open in mint-cookbook → |
For setup, runnable commands, and full eval protocol, see the upstream README. The experiment follows the shared cookbook lifecycle: uv sync → --dry-run → --eval-only → train.