SFT Hyperparameters
This recipe runs a real supervised fine-tuning sweep on MinT. It trains the same small arithmetic chat dataset with several hyperparameter configurations and compares the final training loss.
The important change: the recipe does not build Datum objects by hand. It uses recipe.supervised.conversation_to_datum() and the high-level recipe.supervised.train.main(config) loop.
Use Case
- Instruction tuning: Train a model to follow task-specific instructions.
- Domain adaptation: Fine-tune on in-domain Q&A or chat examples.
- Small grid search: Compare learning rates and LoRA ranks before a larger run.
- Recipe debugging: Verify your MinT API key, renderer, tokenizer, and SFT training loop all work together.
What the Recipe Sweeps
By default, recipes/sft_hyperparameters.py runs:
learning_rate: [1e-5, 5e-5]
lora_rank: [8, 16]
max_steps: 2 per config
configs: 2 x 2 = 4You can override these from the environment:
MINT_SFT_STEPS=1 \
MINT_SFT_LRS=1e-5,5e-5 \
MINT_LORA_RANKS=8,16 \
python recipes/sft_hyperparameters.pyCore Pattern
The recipe creates an in-memory supervised dataset:
class ArithmeticSFTDataset(recipe.supervised.types.SupervisedDataset):
def __init__(self, conversations, model_name, renderer_name, batch_size, max_length):
tokenizer = get_tokenizer(model_name)
renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
self.datums = [
recipe.supervised.conversation_to_datum(
conversation,
renderer,
max_length=max_length,
)
for conversation in conversations
]
self.batch_size = batch_size
def __len__(self):
return max(1, (len(self.datums) + self.batch_size - 1) // self.batch_size)
def get_batch(self, index):
start = (index * self.batch_size) % len(self.datums)
return self.datums[start : start + self.batch_size]Then each grid item calls the standard SFT trainer:
config = recipe.supervised.train.Config(
log_path="/tmp/mint-sft-sweep-lr1e-5-rank8",
model_name="Qwen/Qwen3-0.6B",
renderer_name="qwen3",
dataset_builder=ArithmeticSFTDatasetBuilder(...),
learning_rate=1e-5,
lora_rank=8,
max_steps=2,
save_every=999,
eval_every=999,
infrequent_eval_every=999,
ttl_seconds=3600,
)
await recipe.supervised.train.main(config=config)View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/sft_hyperparameters.py
Run It
export MINT_API_KEY=sk-your-api-key
python recipes/sft_hyperparameters.pyExpected output ends with a table like:
=== Grid Search Summary ===
LR Rank Steps Final train NLL Log path
------------------------------------------------------------------------------------------
1e-05 8 2 10.4975 /tmp/mint-sft-sweep-...-1em05-rank8
1e-05 16 2 10.4919 /tmp/mint-sft-sweep-...-1em05-rank16
5e-05 8 2 10.1574 /tmp/mint-sft-sweep-...-5em05-rank8
5e-05 16 2 9.4928 /tmp/mint-sft-sweep-...-5em05-rank16Verified Run
Verified on MinT with Qwen/Qwen3-0.6B, 8 generated arithmetic conversations, 2 steps per config:
| Learning rate | LoRA rank | Steps | Final train NLL |
|---|---|---|---|
1e-5 | 8 | 2 | 10.4975 |
1e-5 | 16 | 2 | 10.4919 |
5e-5 | 8 | 2 | 10.1574 |
5e-5 | 16 | 2 | 9.4928 |
These numbers come from a tiny smoke dataset, so they are for API verification and shape checking, not for model quality claims. For real tuning, use your actual dataset, run more steps, and compare validation metrics.
Why This Shape Works
chat messages
│
▼
conversation_to_datum()
│ applies renderer + assistant-token loss mask
▼
SupervisedDataset.get_batch()
│
▼
recipe.supervised.train.main(config)
│ creates LoRA TrainingClient, runs forward/backward + optimizer steps
▼
metrics.jsonl + final checkpointThis keeps tokenization and loss masking aligned with the renderer used by MinT. It also uses the same high-level training loop that larger supervised recipes use, so the sweep checks the real path users will copy.