CustomizeRL
Prompt Distillation
Prompt distillation 是:用 teacher model 生成训练 response,然后用这些 response 训练更小的 student model。
这个页面和 recipes/distillation.py 对齐。这个 recipe 会跑真实 MinT API:
- 为 teacher model 创建
SamplingClient。 - 每个 prompt 采样一个 teacher response。
- 用
conversation_to_datum()把(prompt, teacher_response)转成 supervised chat data。 - 用
recipe.supervised.train.main(config)训练 student model。
Use Case
- Model compression:让小 model 模仿大 model 的 response。
- 降成本:用大 model 生成数据,再用小 student 做服务或迭代。
- 领域风格迁移:在自己的 prompts 上捕获 teacher 的格式、语气或推理风格。
- SFT pipeline check:验证 teacher sampling 和 student SFT 都能在 MinT 上跑通。
配置
默认 models:
Teacher: Qwen/Qwen3-30B-A3B-Instruct-2507
Student: Qwen/Qwen3-0.6B可以用环境变量覆盖:
MINT_TEACHER_MODEL=Qwen/Qwen3-30B-A3B-Instruct-2507 \
MINT_STUDENT_MODEL=Qwen/Qwen3-0.6B \
MINT_SFT_STEPS=2 \
python recipes/distillation.py如果 requested teacher model 不在 capabilities.supported_models 里,recipe 会打印 warning,并 fallback 到 self-distillation:用 student model 当 teacher。
Stage 1:Teacher Sampling
recipe 使用真实 teacher SamplingClient:
sampling_client = service_client.create_sampling_client(base_model=teacher_model)
tokenizer = sampling_client.get_tokenizer()
result = sampling_client.sample(
prompt=types.ModelInput.from_ints(tokens=prompt_tokens),
num_samples=1,
sampling_params=types.SamplingParams(
max_tokens=64,
temperature=0.2,
stop=[tokenizer.eos_token_id],
),
).result()
teacher_response = tokenizer.decode(result.sequences[0].tokens).strip()采样后的 example 形状:
{
"prompt": "Explain TCP in one sentence.",
"teacher_response": "TCP is a reliable, connection-oriented protocol...",
}Stage 2:Student SFT
student dataset 使用和其他 MinT SFT examples 一样的 supervised recipe path:
class DistilledSFTDataset(recipe.supervised.types.SupervisedDataset):
def __init__(self, examples, model_name, renderer_name, batch_size, max_length):
tokenizer = get_tokenizer(model_name)
renderer = recipe.renderers.get_renderer(renderer_name, tokenizer)
self.datums = [
recipe.supervised.conversation_to_datum(
[
{"role": "user", "content": item["prompt"]},
{"role": "assistant", "content": item["teacher_response"]},
],
renderer,
max_length=max_length,
)
for item in examples
]
self.batch_size = batch_size然后用高层 SFT loop 训练 student:
config = recipe.supervised.train.Config(
log_path="/tmp/mint-distillation-run",
model_name="Qwen/Qwen3-0.6B",
renderer_name="qwen3",
dataset_builder=DistilledSFTDatasetBuilder(...),
learning_rate=1e-5,
lora_rank=16,
max_steps=2,
save_every=999,
eval_every=999,
infrequent_eval_every=999,
ttl_seconds=3600,
)
await recipe.supervised.train.main(config=config)完整源码:https://github.com/MindLab-Research/mint-quickstart/blob/main/recipes/distillation.py
Verified Run
已在 MinT 上验证:
| Field | Value |
|---|---|
| Teacher | Qwen/Qwen3-30B-A3B-Instruct-2507 |
| Student | Qwen/Qwen3-0.6B |
| Teacher prompts | 4 |
| Student SFT steps | 2 |
| Final train mean NLL | 2.277863025665283 |
运行完成了两个阶段:
=== Stage 1: Teacher Sampling ===
Teacher model: Qwen/Qwen3-30B-A3B-Instruct-2507
Prompts: 4
[1/4] prompt: Give one practical tip for keeping regular backups.
teacher: One practical tip for keeping regular backups is ...
...
=== Stage 2: Student SFT ===
Student model: Qwen/Qwen3-0.6B
Examples: 4
Steps: 2
Training completed successfully这是一个最小验证 recipe。它证明 teacher-sampling → supervised-dataset → student-SFT 这条路径真实可跑。要做质量蒸馏,请增加 prompts 数量,使用领域 prompts,加 validation,并比较训练前后的生成结果。
为什么这个形状有效
prompts
│
▼
teacher SamplingClient
│ 真实采样 teacher responses
▼
(prompt, teacher_response) pairs
│
▼
conversation_to_datum()
│ renderer + assistant-token mask
▼
recipe.supervised.train.main()
│
▼
student LoRA checkpoint这样不会只是打印“蒸馏伪代码”。每个阶段都是真实 MinT 调用。