Human Quickstart

按 7 步把模型从零跑到训练完成 —— 全部跑在远端 MinT 服务器（mint.macaron.xin，中国大陆用 mint-cn.macaron.xin）。本地不需要 GPU。

1. 安装 MinT

从公开的 toolkit 仓库安装 MinT 客户端 SDK。需要 Python 3.11+。

pip install git+https://github.com/MindLab-Research/mindlab-toolkit.git

这一步会安装 mint、对应的 tinker>=0.15.0 以及运行时辅助函数。安装后，import mint 会给 Tinker 的 key 校验打补丁，让 MinT 的 sk-* key 直接可用。

如果你已经有按 Tinker 客户端写的代码，最低成本的迁移方式是改一行 import：

import mint as tinker

剩下的 Tinker 代码不用动，只要设置以下环境变量：

export TINKER_BASE_URL=https://mint.macaron.xin/        # 中国大陆改用 mint-cn
export TINKER_API_KEY=$MINT_API_KEY

为什么这样能跑：原版 import tinker 仍然只接受 tml- 前缀的 key，但 MinT 的 key 是 sk- 前缀。import mint as tinker 会应用 MinT 的兼容补丁，同时保留你熟悉的 Tinker 调用形态。注意： 在 MinT 训练循环里不要调 zero_grad_async() —— 梯度归零由服务端自动处理。

2. 配置 API key

访问 macaron.im/mindlab/mint 自助注册即可获得 key。拿到后设置为环境变量：

export MINT_API_KEY=sk-your-api-key-here
export MINT_BASE_URL=https://mint.macaron.xin/        # 中国大陆: mint-cn.macaron.xin

如果你的脚本读 .env，也可以放到项目根目录的 .env 文件里。

3. 传入数据

MinT 在一串 mint.types.Datum 上训练。每个 Datum 是一段 token 序列加上对应的 loss 权重。chat 数据的标准转换：

import mint
from mint import types

def process_sft_example(example: dict, tokenizer) -> types.Datum:
    # example = {"prompt": "...", "response": "..."}
    prompt_ids = tokenizer.encode(example["prompt"])
    response_ids = tokenizer.encode(example["response"])
    return types.Datum(
        model_input_ids=prompt_ids + response_ids,
        loss_weights=[0.0] * len(prompt_ids) + [1.0] * len(response_ids),
    )

RL 场景下，先采样再用 reward 计算的 advantage 构造 Datum —— 详见 RL 概览。

4. 选模型

Qwen/Qwen3-0.6B 是轻量默认，适合快速 iterate 和首次跑通的 smoke test。真实训练任务从下表挑：

模型	适用场景
`Qwen/Qwen3-0.6B`	快速 iterate，smoke test
`Qwen/Qwen3-30B-A3B-Instruct-2507`	中等规模 chat / instruction following
`Qwen/Qwen3-235B-A22B-Instruct-2507`	大规模 instruction tuning
`Qwen/Qwen3-235B-A22B-Thinking-2507`	推理 / chain-of-thought

完整列表见 Supported Models。

5. 选算法

你有这些数据...	用	MinT 调用
带标签的 prompt → response	SFT	`loss_fn="cross_entropy"`
chosen / rejected 偏好对	DPO	`forward_backward_custom` 自定义 preference loss
reward / verifier / 环境反馈	RL (GRPO)	`loss_fn="importance_sampling"`

如果同时有监督数据和 reward，可以先 SFT 再 RL —— 标准脚本 quickstart.py 就是这么做的。

6. 开始训练

最小 SFT 例子：

import mint
from mint import types

service_client = mint.ServiceClient()
training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3-0.6B",
    rank=16,
    train_mlp=True,
    train_attn=True,
    train_unembed=True,
)

# data: 第 3 步生成的 list[types.Datum]
adam_params = types.AdamParams(learning_rate=5e-5)

for step, batch in enumerate(batches_of(data, batch_size=8)):
    fb_future = training_client.forward_backward(batch, loss_fn="cross_entropy")
    optim_future = training_client.optim_step(adam_params)
    fb_result = fb_future.result()
    optim_future.result()
    print(f"step={step} loss={fb_result.loss}")

训练实际跑在远端 MinT 服务器上。脚本只在 .result() 处阻塞；重的活都不在你这台机器上。

7. 开始 sampling

训练完之后，从你的 LoRA 上采样：

sampling_client = training_client.save_weights_and_get_sampling_client(name="my-run-v1")

prompt_ids = tokenizer.encode("3 * 7 =")
samples = sampling_client.sample(
    prompt=types.ModelInput.from_ints(prompt_ids),
    sampling_params=types.SamplingParams(max_tokens=16, temperature=0.7),
    num_samples=4,
)
for s in samples.sequences:
    print(tokenizer.decode(s.tokens))

更多采样 log 和评估用法，详见 Concepts → Evaluations。

接下来读什么？

SFT 概览 —— 数据集、renderers、completers、蒸馏
DPO 概览 —— 偏好对、β 调参
RL 概览 —— GRPO、自定义 reward、环境
Customize 总览 —— 全部参数交叉索引

排错。 如果 _require_api_key() 抛异常或 preflight 超时，检查 MINT_API_KEY 和 MINT_BASE_URL 是否设置好，以及能否访问 mint.macaron.xin（中国大陆：mint-cn）。其它问题见 FAQ。

本页目录