DemoRL Resume

RL Resume

This page documents advanced/resume.py in mint-quickstart.

RL runs can be interrupted by preemption, quota limits, network issues, or manual stops. This demo keeps resume logic at the quickstart layer and recommends resuming from explicit checkpoints instead of restarting from step 0.

What this demo does

  • Runs a minimal GRPO-style RL loop (sample -> reward -> importance_sampling).
  • Saves full training state every MINT_CHECKPOINT_EVERY_STEPS.
  • Resumes from explicit MINT_RESUME_PATH.
  • Restores optimizer state via training_client.load_state_with_optimizer(...).
  • Fails fast if MINT_RESUME_PATH cannot be loaded.
  • Prints clear logs for step, checkpoint path, and resume source.

Prerequisites

  • Python >= 3.11
  • MINT_API_KEY is set (or available via .env)
  • MINT_BASE_URL is set if you are not using the default endpoint

First run (minimal command)

export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=6
export MINT_CHECKPOINT_EVERY_STEPS=2
python advanced/resume.py

Expected log pattern:

[run] model=... total_steps=... checkpoint_every=... resume_source=none start_step=0
[train] step=1/...
[checkpoint] step=2 reason=periodic path=...
...
[done] final_step=... resume_source=none latest_checkpoint=...

Resume run (explicit checkpoint)

  1. Copy the latest checkpoint path from [checkpoint] or [done] logs.
  2. Start a new process with MINT_RESUME_PATH.
export MINT_API_KEY=sk-mint-...
export MINT_BASE_URL=https://mint.macaron.im/
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_TOTAL_STEPS=10
export MINT_CHECKPOINT_EVERY_STEPS=2
export MINT_RESUME_PATH="mint://.../rl-resume-periodic-step-000006"
python advanced/resume.py

If the checkpoint is valid, global_step continues from the inferred step and keeps increasing. If the path is invalid, startup fails immediately.

Environment variables

  • MINT_BASE_URL: default unset
  • MINT_BASE_MODEL: default Qwen/Qwen3-0.6B
  • MINT_LORA_RANK: default 16
  • MINT_RL_LR: default 5e-5
  • MINT_GROUP_SIZE: default 4
  • MINT_MAX_TOKENS: default 256
  • MINT_TEMPERATURE: default 1.0
  • MINT_TOTAL_STEPS: default 100
  • MINT_CHECKPOINT_EVERY_STEPS: default 20
  • MINT_RESUME_PATH: default unset
  • MINT_UPLOAD_ARCHIVE: default unset
  • MINT_UPLOAD_ONLY: default false
  • MINT_UPLOAD_TIMEOUT_S: default 300
  • MINT_API_KEY: required

Common failure cases

  • MINT_RESUME_PATH points to a missing/invalid checkpoint: script raises a fail-fast error before training steps.
  • Model unavailable or permission denied: set MINT_BASE_MODEL to an available model for your account.
  • Checkpoint cadence too sparse: decrease MINT_CHECKPOINT_EVERY_STEPS so interruption recovery loses fewer steps.
  • Interrupted before periodic save: use the latest [checkpoint] path or rerun from the previous valid checkpoint.

Key resume logic

load_resume_state(...) uses training_client.load_state_with_optimizer(...) so the checkpoint restores both weights and optimizer state.

if resume_path:
    start_step = load_resume_state(training_client, resume_path)
 
for step in range(start_step + 1, cfg.total_steps + 1):
    stats = rl_train_step(training_client, cfg, step)
    if step % cfg.checkpoint_every == 0:
        latest_checkpoint = save_training_state(training_client, step, "periodic")

Validation targets

Validation matrix targets:

  • Qwen/Qwen3-0.6B
  • Qwen/Qwen3-235B-A22B-Instruct-2507

Latest execution status (including upstream blockers) is tracked in: mint-quickstart-alpha/docs/rl_resume_test_record.md.