Supported Models
MinT serves base models from two lineup pools, separated by access plan, plus a broader technically-compatible set you can request access to:
- Community Lineup — available on the shared hosted endpoint at
mint.macaron.xinvia API-key request. - Enterprise-only Lineup — available via Enterprise plan capacity reservation on a dedicated cluster.
Community Lineup
These are the Qwen3 base models with provisioned capacity on mint.macaron.xin and test coverage in mint-quickstart. The list mirrors the Qwen entries returned by service_client.get_server_capabilities().supported_models from a live preflight. Use one of these for a smoke run if you have no preference.
Qwen3 base models
| Variant | Training Type | Architecture | Size | Context | Default | Tested algorithms |
|---|---|---|---|---|---|---|
Qwen/Qwen3-0.6B | Hybrid | Dense | Tiny | 32k | yes | SFT, GRPO |
Qwen/Qwen3-4B-Instruct-2507 | Instruction | Dense | Compact | 32k | cookbook | SFT, DPO, GRPO |
Qwen/Qwen3-4B-Thinking-2507 | Reasoning | Dense | Compact | 32k | no | SFT, GRPO |
Qwen/Qwen3-30B-A3B-Instruct-2507 | Instruction | MoE | Medium | 32k | optional | SFT, GRPO |
Qwen/Qwen3-235B-A22B-Instruct-2507 | Instruction | MoE | Large | 32k | no | SFT, GRPO |
Per-model notes
Qwen3-0.6B— lightweight default. Quickstart, custom_reward, custom_loss, and sampling_log all run on this.Qwen3-4B-Instruct-2507— base model for the four maintained cookbook recipes (dapo-aime, chat-dpo, fingpt, lawbench).Qwen3-4B-Thinking-2507— reasoning / chain-of-thought variant of the 4B model.Qwen3-30B-A3B-Instruct-2507— mid-scale instruction following.Qwen3-235B-A22B-Instruct-2507— large-scale instruction tuning. Volcano A800 cluster:inference_tp=16, train_tp=4, train_pp=1, train_ep=8.
Column legend
- Training Type —
Hybridsupports both thinking and non-thinking modes;Instructionis chat-tuned without chain-of-thought;Reasoningalways emits chain-of-thought before the visible response. - Architecture —
Denseactivates all parameters per token;MoEis a sparse mixture-of-experts. - Size — total parameters, not active.
Tiny< 1B;Compact1B–4B;Medium30B–32B;Large70B+. - Context — native context window without YaRN extension.
- Default —
yesis the quickstart default;cookbookis the default in maintained cookbook recipes;optionalmeans scripts accept it viaMINT_BASE_MODEL;nomeans request-only.
Override the default by setting MINT_BASE_MODEL before running any quickstart script:
export MINT_BASE_MODEL=Qwen/Qwen3-30B-A3B-Instruct-2507
python quickstart/quickstart.pyEmbodied / VLA
| Variant | Tested algorithms | Notes |
|---|---|---|
mintx.OPENPI_FAST_MODEL (constant) | VLA via SDK / HTTP | Embodied-agent track. See VLA. |
Enterprise-only Lineup
These models require an Enterprise plan. Capacity is provisioned per customer on a dedicated cluster — they are not available on the Community shared endpoint. Canonical Hugging Face IDs are confirmed at the time of capacity provisioning.
| Family | Variant | Tested algorithms | Notes |
|---|---|---|---|
| GLM | GLM-5 | SFT, RL | Zhipu GLM-5 family. Provisioned on customer cluster on request. |
| GLM | GLM-5.1 | SFT, RL | Successor to GLM-5; capacity reserved per customer. |
| Kimi | Kimi-K2 | SFT, RL | Moonshot Kimi-K2. Long-context workloads. |
| Kimi | Kimi-K2.5 | SFT, RL | Successor to Kimi-K2. |
| DeepSeek | DeepSeek-V3 | SFT, RL | DeepSeek V3 base model. |
To reserve capacity for any of these, email sales@mindlab.ltd or Schedule a Demo and mention the model and workload class.
Technically Compatible
The MinT server endpoint accepts any HuggingFace-hub-style model string that maps to one of the supported transformer architecture families below. Lineup is the explicitly-tested subset. Other families known to work in principle:
- Qwen series (Qwen2.5, Qwen3 — Instruct, Thinking, Coder variants)
- Llama 3.x family
- Gemma 2.x and 3.x
- DeepSeek family
There is no in-repo evidence of these being run end-to-end on MinT today. If you need one of them on the lineup with provisioned capacity, request access (below).
Request a Model
If a model you need is not listed:
- Email
sales@mindlab.ltdwith the model identifier and your intended workload (SFT / DPO / RL, batch size, expected weeks of usage). - Or open an issue at the public
mint-quickstartrepo: github.com/MindLab-Research/mint-quickstart/issues.
VLM (Vision-Language Model) base models are tracked separately as a server capability — see the VLM page for the current state.
How to know what your endpoint actually serves. A successful preflight returns a capabilities.supported_models list. The default quickstart.py prints Auth preflight: OK (N supported models) — call service_client.get_server_capabilities() from your own script to enumerate them.