Mind Lab Toolkit (MinT)
CustomizeRL

Prompt Distillation

Prompt distillation uses a teacher model to generate training responses, then trains a smaller student model on those responses.

This page matches recipes/distillation.py. The recipe runs real MinT API calls:

  1. Create a SamplingClient for the teacher model.
  2. Sample one teacher response per prompt.
  3. Convert (prompt, teacher_response) pairs into supervised chat data with conversation_to_datum().
  4. Train the student model with recipe.supervised.train.main(config).

Use Case

  • Model compression: Teach a smaller model to imitate responses from a larger model.
  • Cost reduction: Use a larger model for data creation, then serve or iterate on a smaller student.
  • Domain style transfer: Capture a teacher's format, tone, or reasoning style on your own prompts.
  • SFT pipeline check: Verify teacher sampling and student SFT both work on MinT.

Configuration

Default models:

Teacher: Qwen/Qwen3-30B-A3B-Instruct-2507
Student: Qwen/Qwen3-0.6B

You can override them:

MINT_TEACHER_MODEL=Qwen/Qwen3-30B-A3B-Instruct-2507 \
MINT_STUDENT_MODEL=Qwen/Qwen3-0.6B \
MINT_SFT_STEPS=2 \
python recipes/distillation.py

If the requested teacher model is not listed in capabilities.supported_models, the recipe prints a warning and falls back to self-distillation with the student model.

Stage 1: Teacher Sampling

The recipe uses a real teacher SamplingClient:

sampling_client = service_client.create_sampling_client(base_model=teacher_model)
tokenizer = sampling_client.get_tokenizer()

result = sampling_client.sample(
    prompt=types.ModelInput.from_ints(tokens=prompt_tokens),
    num_samples=1,
    sampling_params=types.SamplingParams(
        max_tokens=64,
        temperature=0.2,
        stop=[tokenizer.eos_token_id],
    ),
).result()

teacher_response = tokenizer.decode(result.sequences[0].tokens).strip()

The sampled examples have this shape:

{
    "prompt": "Explain TCP in one sentence.",
    "teacher_response": "TCP is a reliable, connection-oriented protocol...",
}

Stage 2: Student SFT

The student dataset is built with the same supervised recipe path as other MinT SFT examples:

class DistilledSFTDataset(recipe.supervised.types.SupervisedDataset):
    def __init__(self, examples, model_name, renderer_name, batch_size, max_length):
        tokenizer = get_tokenizer(model_name)
        renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
        self.datums = [
            recipe.supervised.conversation_to_datum(
                [
                    {"role": "user", "content": item["prompt"]},
                    {"role": "assistant", "content": item["teacher_response"]},
                ],
                renderer,
                max_length=max_length,
            )
            for item in examples
        ]
        self.batch_size = batch_size

Then the high-level SFT loop trains the student:

config = recipe.supervised.train.Config(
    log_path="/tmp/mint-distillation-run",
    model_name="Qwen/Qwen3-0.6B",
    renderer_name="qwen3",
    dataset_builder=DistilledSFTDatasetBuilder(...),
    learning_rate=1e-5,
    lora_rank=16,
    max_steps=2,
    save_every=999,
    eval_every=999,
    infrequent_eval_every=999,
    ttl_seconds=3600,
)

await recipe.supervised.train.main(config=config)

View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/distillation.py

Verified Run

Verified on MinT:

FieldValue
TeacherQwen/Qwen3-30B-A3B-Instruct-2507
StudentQwen/Qwen3-0.6B
Teacher prompts4
Student SFT steps2
Final train mean NLL2.277863025665283

The run completed both stages:

=== Stage 1: Teacher Sampling ===
Teacher model: Qwen/Qwen3-30B-A3B-Instruct-2507
Prompts:       4
[1/4] prompt: Give one practical tip for keeping regular backups.
          teacher: One practical tip for keeping regular backups is ...
...

=== Stage 2: Student SFT ===
Student model: Qwen/Qwen3-0.6B
Examples:      4
Steps:         2
Training completed successfully

This recipe is a minimal verification recipe. It proves the teacher-sampling → supervised-dataset → student-SFT path. For quality distillation, increase prompt count, use domain-specific prompts, add validation, and compare generated outputs before and after training.

Why This Shape Works

prompts


teacher SamplingClient
  │  sample real teacher responses

(prompt, teacher_response) pairs


conversation_to_datum()
  │  renderer + assistant-token mask

recipe.supervised.train.main()


student LoRA checkpoint

This avoids fake “distillation” printouts. Every stage is an actual MinT call.

On this page