AdvancedCheckpoint
Resume Training from a Checkpoint
This page documents advanced/checkpoint.py resume in mint-quickstart.
Recommended resume shape
For a true training resume, create a fresh LoRA training client with the same model/rank/options, then load the checkpoint with optimizer state:
training_client = service_client.create_lora_training_client(
base_model=model,
rank=rank,
train_mlp=True,
train_attn=True,
train_unembed=True,
)
training_client.load_state_with_optimizer(resume_path).result()This is the shape used by advanced/checkpoint.py resume --with-optimizer. Do not present load_state(...) as full training resume; it loads weights only and resets optimizer state.
Two resume modes
- With optimizer: recommended when you want to continue training from the same optimizer state. It uses
create_lora_training_client(...)plusload_state_with_optimizer(path)and requires matchingMINT_BASE_MODEL,MINT_LORA_RANK, and LoRA options. - Weights only: useful when optimizer state does not matter. The script first tries
create_training_client_from_state(path)for auto-detection. If the metadata lookup returns404for a raw checkpoint path, it falls back tocreate_lora_training_client(...)plusload_state(path)usingMINT_BASE_MODEL/MINT_LORA_RANK(or their defaults).
Use the MinT endpoint that matches your region:
- Mainland China:
https://mint-cn.macaron.xin/ - Outside Mainland China:
https://mint.macaron.xin/
Commands
# Preserve optimizer state
export MINT_API_KEY=sk-...
export MINT_BASE_MODEL=Qwen/Qwen3-0.6B
export MINT_LORA_RANK=16
python advanced/checkpoint.py resume tinker://<run-id>/weights/<checkpoint-name> --with-optimizer --steps 3
# Weights only; optimizer resets
export MINT_API_KEY=sk-...
python advanced/checkpoint.py resume tinker://<run-id>/weights/<checkpoint-name>Useful flags:
--with-optimizer: preserve optimizer state--steps: number of post-resume SFT steps to run--lr: learning rate for those steps--save-name: name of the checkpoint saved after the resume steps finish
Core APIs
# Full training resume: weights + optimizer state
training_client = service_client.create_lora_training_client(
base_model=model,
rank=rank,
train_mlp=True,
train_attn=True,
train_unembed=True,
)
training_client.load_state_with_optimizer(resume_path).result()
# Weights-only load: optimizer state resets
training_client = service_client.create_lora_training_client(base_model=model, rank=rank)
training_client.load_state(resume_path).result()Expected output
[resume] path=tinker://.../weights/my-ckpt-state with_optimizer=True steps=3
[resume] fallback to explicit training client: model=Qwen/Qwen3-0.6B rank=16
[resume] loading state from tinker://.../weights/my-ckpt-state...
[resume] loaded, running 3 SFT step(s)...
[resume] step 1/3 done
[resume] saved: tinker://.../weights/resumed-checkpointCommon failure cases
- the checkpoint path is missing or invalid
--with-optimizeris used without matchingMINT_BASE_MODEL/MINT_LORA_RANK- the checkpoint was saved for a different adapter shape than the new client
- the base model is unavailable for your account
load_state(...)is used when you expected optimizer-preserving resume