Mind Lab Toolkit (MinT)
CustomizeSFT

SFT Hyperparameters

This recipe runs a real supervised fine-tuning sweep on MinT. It trains the same small arithmetic chat dataset with several hyperparameter configurations and compares the final training loss.

The important change: the recipe does not build Datum objects by hand. It uses recipe.supervised.conversation_to_datum() and the high-level recipe.supervised.train.main(config) loop.

Use Case

  • Instruction tuning: Train a model to follow task-specific instructions.
  • Domain adaptation: Fine-tune on in-domain Q&A or chat examples.
  • Small grid search: Compare learning rates and LoRA ranks before a larger run.
  • Recipe debugging: Verify your MinT API key, renderer, tokenizer, and SFT training loop all work together.

What the Recipe Sweeps

By default, recipes/sft_hyperparameters.py runs:

learning_rate:  [1e-5, 5e-5]
lora_rank:      [8, 16]
max_steps:      2 per config
configs:        2 x 2 = 4

You can override these from the environment:

MINT_SFT_STEPS=1 \
MINT_SFT_LRS=1e-5,5e-5 \
MINT_LORA_RANKS=8,16 \
python recipes/sft_hyperparameters.py

Core Pattern

The recipe creates an in-memory supervised dataset:

class ArithmeticSFTDataset(recipe.supervised.types.SupervisedDataset):
    def __init__(self, conversations, model_name, renderer_name, batch_size, max_length):
        tokenizer = get_tokenizer(model_name)
        renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
        self.datums = [
            recipe.supervised.conversation_to_datum(
                conversation,
                renderer,
                max_length=max_length,
            )
            for conversation in conversations
        ]
        self.batch_size = batch_size

    def __len__(self):
        return max(1, (len(self.datums) + self.batch_size - 1) // self.batch_size)

    def get_batch(self, index):
        start = (index * self.batch_size) % len(self.datums)
        return self.datums[start : start + self.batch_size]

Then each grid item calls the standard SFT trainer:

config = recipe.supervised.train.Config(
    log_path="/tmp/mint-sft-sweep-lr1e-5-rank8",
    model_name="Qwen/Qwen3-0.6B",
    renderer_name="qwen3",
    dataset_builder=ArithmeticSFTDatasetBuilder(...),
    learning_rate=1e-5,
    lora_rank=8,
    max_steps=2,
    save_every=999,
    eval_every=999,
    infrequent_eval_every=999,
    ttl_seconds=3600,
)

await recipe.supervised.train.main(config=config)

View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/sft_hyperparameters.py

Run It

export MINT_API_KEY=sk-your-api-key
python recipes/sft_hyperparameters.py

Expected output ends with a table like:

=== Grid Search Summary ===
LR           Rank     Steps    Final train NLL    Log path
------------------------------------------------------------------------------------------
1e-05        8        2        10.4975            /tmp/mint-sft-sweep-...-1em05-rank8
1e-05        16       2        10.4919            /tmp/mint-sft-sweep-...-1em05-rank16
5e-05        8        2        10.1574            /tmp/mint-sft-sweep-...-5em05-rank8
5e-05        16       2        9.4928             /tmp/mint-sft-sweep-...-5em05-rank16

Verified Run

Verified on MinT with Qwen/Qwen3-0.6B, 8 generated arithmetic conversations, 2 steps per config:

Learning rateLoRA rankStepsFinal train NLL
1e-58210.4975
1e-516210.4919
5e-58210.1574
5e-51629.4928

These numbers come from a tiny smoke dataset, so they are for API verification and shape checking, not for model quality claims. For real tuning, use your actual dataset, run more steps, and compare validation metrics.

Why This Shape Works

chat messages


conversation_to_datum()
    │  applies renderer + assistant-token loss mask

SupervisedDataset.get_batch()


recipe.supervised.train.main(config)
    │  creates LoRA TrainingClient, runs forward/backward + optimizer steps

metrics.jsonl + final checkpoint

This keeps tokenization and loss masking aligned with the renderer used by MinT. It also uses the same high-level training loop that larger supervised recipes use, so the sweep checks the real path users will copy.

On this page