Mind Lab Toolkit (MinT)
CustomizeRL

Prompt Distillation

Prompt distillation 是:用 teacher model 生成训练 response,然后用这些 response 训练更小的 student model。

这个页面和 recipes/distillation.py 对齐。这个 recipe 会跑真实 MinT API:

  1. 为 teacher model 创建 SamplingClient
  2. 每个 prompt 采样一个 teacher response。
  3. conversation_to_datum()(prompt, teacher_response) 转成 supervised chat data。
  4. recipe.supervised.train.main(config) 训练 student model。

Use Case

  • Model compression:让小 model 模仿大 model 的 response。
  • 降成本:用大 model 生成数据,再用小 student 做服务或迭代。
  • 领域风格迁移:在自己的 prompts 上捕获 teacher 的格式、语气或推理风格。
  • SFT pipeline check:验证 teacher sampling 和 student SFT 都能在 MinT 上跑通。

配置

默认 models:

Teacher: Qwen/Qwen3-30B-A3B-Instruct-2507
Student: Qwen/Qwen3-0.6B

可以用环境变量覆盖:

MINT_TEACHER_MODEL=Qwen/Qwen3-30B-A3B-Instruct-2507 \
MINT_STUDENT_MODEL=Qwen/Qwen3-0.6B \
MINT_SFT_STEPS=2 \
python recipes/distillation.py

如果 requested teacher model 不在 capabilities.supported_models 里,recipe 会打印 warning,并 fallback 到 self-distillation:用 student model 当 teacher。

Stage 1:Teacher Sampling

recipe 使用真实 teacher SamplingClient

sampling_client = service_client.create_sampling_client(base_model=teacher_model)
tokenizer = sampling_client.get_tokenizer()

result = sampling_client.sample(
    prompt=types.ModelInput.from_ints(tokens=prompt_tokens),
    num_samples=1,
    sampling_params=types.SamplingParams(
        max_tokens=64,
        temperature=0.2,
        stop=[tokenizer.eos_token_id],
    ),
).result()

teacher_response = tokenizer.decode(result.sequences[0].tokens).strip()

采样后的 example 形状:

{
    "prompt": "Explain TCP in one sentence.",
    "teacher_response": "TCP is a reliable, connection-oriented protocol...",
}

Stage 2:Student SFT

student dataset 使用和其他 MinT SFT examples 一样的 supervised recipe path:

class DistilledSFTDataset(recipe.supervised.types.SupervisedDataset):
    def __init__(self, examples, model_name, renderer_name, batch_size, max_length):
        tokenizer = get_tokenizer(model_name)
        renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
        self.datums = [
            recipe.supervised.conversation_to_datum(
                [
                    {"role": "user", "content": item["prompt"]},
                    {"role": "assistant", "content": item["teacher_response"]},
                ],
                renderer,
                max_length=max_length,
            )
            for item in examples
        ]
        self.batch_size = batch_size

然后用高层 SFT loop 训练 student:

config = recipe.supervised.train.Config(
    log_path="/tmp/mint-distillation-run",
    model_name="Qwen/Qwen3-0.6B",
    renderer_name="qwen3",
    dataset_builder=DistilledSFTDatasetBuilder(...),
    learning_rate=1e-5,
    lora_rank=16,
    max_steps=2,
    save_every=999,
    eval_every=999,
    infrequent_eval_every=999,
    ttl_seconds=3600,
)

await recipe.supervised.train.main(config=config)

完整源码:https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/distillation.py

Verified Run

已在 MinT 上验证:

FieldValue
TeacherQwen/Qwen3-30B-A3B-Instruct-2507
StudentQwen/Qwen3-0.6B
Teacher prompts4
Student SFT steps2
Final train mean NLL2.277863025665283

运行完成了两个阶段:

=== Stage 1: Teacher Sampling ===
Teacher model: Qwen/Qwen3-30B-A3B-Instruct-2507
Prompts:       4
[1/4] prompt: Give one practical tip for keeping regular backups.
          teacher: One practical tip for keeping regular backups is ...
...

=== Stage 2: Student SFT ===
Student model: Qwen/Qwen3-0.6B
Examples:      4
Steps:         2
Training completed successfully

这是一个最小验证 recipe。它证明 teacher-sampling → supervised-dataset → student-SFT 这条路径真实可跑。要做质量蒸馏,请增加 prompts 数量,使用领域 prompts,加 validation,并比较训练前后的生成结果。

为什么这个形状有效

prompts


teacher SamplingClient
  │  真实采样 teacher responses

(prompt, teacher_response) pairs


conversation_to_datum()
  │  renderer + assistant-token mask

recipe.supervised.train.main()


student LoRA checkpoint

这样不会只是打印“蒸馏伪代码”。每个阶段都是真实 MinT 调用。

本页目录