Prompt Distillation
Prompt distillation uses a teacher model to generate training responses, then trains a smaller student model on those responses.
This page matches recipes/distillation.py. The recipe runs real MinT API calls:
- Create a
SamplingClientfor the teacher model. - Sample one teacher response per prompt.
- Convert
(prompt, teacher_response)pairs into supervised chat data withconversation_to_datum(). - Train the student model with
recipe.supervised.train.main(config).
Use Case
- Model compression: Teach a smaller model to imitate responses from a larger model.
- Cost reduction: Use a larger model for data creation, then serve or iterate on a smaller student.
- Domain style transfer: Capture a teacher's format, tone, or reasoning style on your own prompts.
- SFT pipeline check: Verify teacher sampling and student SFT both work on MinT.
Configuration
Default models:
Teacher: Qwen/Qwen3-30B-A3B-Instruct-2507
Student: Qwen/Qwen3-0.6BYou can override them:
MINT_TEACHER_MODEL=Qwen/Qwen3-30B-A3B-Instruct-2507 \
MINT_STUDENT_MODEL=Qwen/Qwen3-0.6B \
MINT_SFT_STEPS=2 \
python recipes/distillation.pyIf the requested teacher model is not listed in capabilities.supported_models, the recipe prints a warning and falls back to self-distillation with the student model.
Stage 1: Teacher Sampling
The recipe uses a real teacher SamplingClient:
sampling_client = service_client.create_sampling_client(base_model=teacher_model)
tokenizer = sampling_client.get_tokenizer()
result = sampling_client.sample(
prompt=types.ModelInput.from_ints(tokens=prompt_tokens),
num_samples=1,
sampling_params=types.SamplingParams(
max_tokens=64,
temperature=0.2,
stop=[tokenizer.eos_token_id],
),
).result()
teacher_response = tokenizer.decode(result.sequences[0].tokens).strip()The sampled examples have this shape:
{
"prompt": "Explain TCP in one sentence.",
"teacher_response": "TCP is a reliable, connection-oriented protocol...",
}Stage 2: Student SFT
The student dataset is built with the same supervised recipe path as other MinT SFT examples:
class DistilledSFTDataset(recipe.supervised.types.SupervisedDataset):
def __init__(self, examples, model_name, renderer_name, batch_size, max_length):
tokenizer = get_tokenizer(model_name)
renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
self.datums = [
recipe.supervised.conversation_to_datum(
[
{"role": "user", "content": item["prompt"]},
{"role": "assistant", "content": item["teacher_response"]},
],
renderer,
max_length=max_length,
)
for item in examples
]
self.batch_size = batch_sizeThen the high-level SFT loop trains the student:
config = recipe.supervised.train.Config(
log_path="/tmp/mint-distillation-run",
model_name="Qwen/Qwen3-0.6B",
renderer_name="qwen3",
dataset_builder=DistilledSFTDatasetBuilder(...),
learning_rate=1e-5,
lora_rank=16,
max_steps=2,
save_every=999,
eval_every=999,
infrequent_eval_every=999,
ttl_seconds=3600,
)
await recipe.supervised.train.main(config=config)View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/distillation.py
Verified Run
Verified on MinT:
| Field | Value |
|---|---|
| Teacher | Qwen/Qwen3-30B-A3B-Instruct-2507 |
| Student | Qwen/Qwen3-0.6B |
| Teacher prompts | 4 |
| Student SFT steps | 2 |
| Final train mean NLL | 2.277863025665283 |
The run completed both stages:
=== Stage 1: Teacher Sampling ===
Teacher model: Qwen/Qwen3-30B-A3B-Instruct-2507
Prompts: 4
[1/4] prompt: Give one practical tip for keeping regular backups.
teacher: One practical tip for keeping regular backups is ...
...
=== Stage 2: Student SFT ===
Student model: Qwen/Qwen3-0.6B
Examples: 4
Steps: 2
Training completed successfullyThis recipe is a minimal verification recipe. It proves the teacher-sampling → supervised-dataset → student-SFT path. For quality distillation, increase prompt count, use domain-specific prompts, add validation, and compare generated outputs before and after training.
Why This Shape Works
prompts
│
▼
teacher SamplingClient
│ sample real teacher responses
▼
(prompt, teacher_response) pairs
│
▼
conversation_to_datum()
│ renderer + assistant-token mask
▼
recipe.supervised.train.main()
│
▼
student LoRA checkpointThis avoids fake “distillation” printouts. Every stage is an actual MinT call.