RL Resume
This page documents advanced/resume.py in mint-quickstart.
Why resume is recommended for RL
RL runs can be interrupted by preemption, quota limits, network issues, or manual stops. This demo keeps resume logic at the quickstart layer and recommends resuming from explicit checkpoints instead of restarting from step 0.
What this demo does
- Runs a minimal GRPO-style RL loop (
sample -> reward -> importance_sampling). - Saves full training state every
MINT_CHECKPOINT_EVERY_STEPS. - Resumes from explicit
MINT_RESUME_PATH. - Restores optimizer state via
training_client.load_state_with_optimizer(...). - Fails fast if
MINT_RESUME_PATHcannot be loaded. - Prints clear logs for
step,checkpoint path, andresume source.
Prerequisites
- Python >= 3.11
MINT_API_KEYis set (or available via.env)MINT_BASE_URLis set if you are not using the default endpoint
First run (minimal command)
export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=6
export MINT_CHECKPOINT_EVERY_STEPS=2
python advanced/resume.pyExpected log pattern:
[run] model=... total_steps=... checkpoint_every=... resume_source=none start_step=0
[train] step=1/...
[checkpoint] step=2 reason=periodic path=...
...
[done] final_step=... resume_source=none latest_checkpoint=...Resume run (explicit checkpoint)
- Copy the latest checkpoint path from
[checkpoint]or[done]logs. - Start a new process with
MINT_RESUME_PATH.
export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=10
export MINT_CHECKPOINT_EVERY_STEPS=2
export MINT_RESUME_PATH="mint://.../rl-resume-periodic-step-000006"
python advanced/resume.pyIf the checkpoint is valid, global_step continues from the inferred step and keeps increasing. If the path is invalid, startup fails immediately.
Environment variables
MINT_BASE_URL: default unsetMINT_BASE_MODEL: defaultQwen/Qwen3-0.6BMINT_LORA_RANK: default16MINT_RL_LR: default5e-5MINT_GROUP_SIZE: default4MINT_MAX_TOKENS: default256MINT_TEMPERATURE: default1.0MINT_TOTAL_STEPS: default100MINT_CHECKPOINT_EVERY_STEPS: default20MINT_RESUME_PATH: default unsetMINT_UPLOAD_ARCHIVE: default unsetMINT_UPLOAD_ONLY: defaultfalseMINT_UPLOAD_TIMEOUT_S: default300MINT_API_KEY: required
Common failure cases
MINT_RESUME_PATHpoints to a missing/invalid checkpoint: script raises a fail-fast error before training steps.- Model unavailable or permission denied: set
MINT_BASE_MODELto an available model for your account. - Checkpoint cadence too sparse: decrease
MINT_CHECKPOINT_EVERY_STEPSso interruption recovery loses fewer steps. - Interrupted before periodic save: use the latest
[checkpoint]path or rerun from the previous valid checkpoint.
Key resume logic
load_resume_state(...) uses training_client.load_state_with_optimizer(...) so
the checkpoint restores both weights and optimizer state.
if resume_path:
start_step = load_resume_state(training_client, resume_path)
for step in range(start_step + 1, cfg.total_steps + 1):
stats = rl_train_step(training_client, cfg, step)
if step % cfg.checkpoint_every == 0:
latest_checkpoint = save_training_state(training_client, step, "periodic")Validation targets
Validation matrix targets:
Qwen/Qwen3-0.6BQwen/Qwen3-235B-A22B-Instruct-2507
Latest execution status (including upstream blockers) is tracked in:
mint-quickstart-alpha/docs/rl_resume_test_record.md.