RL Resume

This page documents advanced/resume.py in mint-quickstart.

Why resume is recommended for RL

RL runs can be interrupted by preemption, quota limits, network issues, or manual stops. This demo keeps resume logic at the quickstart layer and recommends resuming from explicit checkpoints instead of restarting from step 0.

What this demo does

Runs a minimal GRPO-style RL loop (sample -> reward -> importance_sampling).
Saves full training state every MINT_CHECKPOINT_EVERY_STEPS.
Resumes from explicit MINT_RESUME_PATH.
Restores optimizer state via training_client.load_state_with_optimizer(...).
Fails fast if MINT_RESUME_PATH cannot be loaded.
Prints clear logs for step, checkpoint path, and resume source.

Prerequisites

Python >= 3.11
MINT_API_KEY is set (or available via .env)
MINT_BASE_URL is set if you are not using the default endpoint

First run (minimal command)

export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=6
export MINT_CHECKPOINT_EVERY_STEPS=2
python advanced/resume.py

Expected log pattern:

[run] model=... total_steps=... checkpoint_every=... resume_source=none start_step=0
[train] step=1/...
[checkpoint] step=2 reason=periodic path=...
...
[done] final_step=... resume_source=none latest_checkpoint=...

Resume run (explicit checkpoint)

Copy the latest checkpoint path from [checkpoint] or [done] logs.
Start a new process with MINT_RESUME_PATH.

export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=10
export MINT_CHECKPOINT_EVERY_STEPS=2
export MINT_RESUME_PATH="mint://.../rl-resume-periodic-step-000006"
python advanced/resume.py

If the checkpoint is valid, global_step continues from the inferred step and keeps increasing. If the path is invalid, startup fails immediately.

Environment variables

MINT_BASE_URL: default unset
MINT_BASE_MODEL: default Qwen/Qwen3-0.6B
MINT_LORA_RANK: default 16
MINT_RL_LR: default 5e-5
MINT_GROUP_SIZE: default 4
MINT_MAX_TOKENS: default 256
MINT_TEMPERATURE: default 1.0
MINT_TOTAL_STEPS: default 100
MINT_CHECKPOINT_EVERY_STEPS: default 20
MINT_RESUME_PATH: default unset
MINT_UPLOAD_ARCHIVE: default unset
MINT_UPLOAD_ONLY: default false
MINT_UPLOAD_TIMEOUT_S: default 300
MINT_API_KEY: required

Common failure cases

MINT_RESUME_PATH points to a missing/invalid checkpoint: script raises a fail-fast error before training steps.
Model unavailable or permission denied: set MINT_BASE_MODEL to an available model for your account.
Checkpoint cadence too sparse: decrease MINT_CHECKPOINT_EVERY_STEPS so interruption recovery loses fewer steps.
Interrupted before periodic save: use the latest [checkpoint] path or rerun from the previous valid checkpoint.

Key resume logic

load_resume_state(...) uses training_client.load_state_with_optimizer(...) so the checkpoint restores both weights and optimizer state.

if resume_path:
    start_step = load_resume_state(training_client, resume_path)
 
for step in range(start_step + 1, cfg.total_steps + 1):
    stats = rl_train_step(training_client, cfg, step)
    if step % cfg.checkpoint_every == 0:
        latest_checkpoint = save_training_state(training_client, step, "periodic")

Validation targets

Validation matrix targets:

Qwen/Qwen3-0.6B
Qwen/Qwen3-235B-A22B-Instruct-2507

Latest execution status (including upstream blockers) is tracked in: mint-quickstart-alpha/docs/rl_resume_test_record.md.

Chat RL