Mind Lab Toolkit (MinT)
CustomizeConcepts

Rendering

Rendering converts a list of messages into a token sequence that a model can consume. While similar to HuggingFace chat templates, MinT's rendering system handles the full training lifecycle: supervised learning, reinforcement learning, and deployment. The renderer sits between your high-level conversation data and the low-level tokens the model sees.

Concept

A renderer maps between two worlds: structured message dicts (with roles like "user", "assistant", "system") and flat token sequences. During SFT, you use build_supervised_example() to separate prompt tokens (masked from loss) and completion tokens (where the model learns). During RL and sampling, you use build_generation_prompt() to construct a prompt, then parse_response() to decode the model's sampled tokens back into a message.

Renderers are model-family-specific. Qwen3, Llama 3, DeepSeek V3, and other models each have their own special tokens and format conventions. MinT ships renderers for the major families; each one returns the correct chat template:

Messages (list of dicts)  -->  Renderer  -->  Token IDs (list of ints)
                                              + loss masks (for SFT)
                                              + stop tokens (for sampling)

Pattern

Here is the canonical pattern for rendering a conversation:

import mint
from mint import types

# Initialize renderer and tokenizer
service_client = mint.ServiceClient()
training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3-0.6B",
    rank=16,
)
tokenizer = training_client.get_tokenizer()
renderer = mint.renderers.get_renderer("qwen3", tokenizer)

# Build a conversation
messages = [
    {"role": "system", "content": "Answer concisely; at most one sentence per response"},
    {"role": "user", "content": "What is the longest-lived rodent species?"},
    {"role": "assistant", "content": "The naked mole rat, which can live over 30 years."},
    {"role": "user", "content": "How do they live so long?"},
    {
        "role": "assistant",
        "content": "They evolved protective mechanisms including special hyaluronic acid that prevents cancer, efficient DNA repair systems, and extremely stable proteins.",
    },
]

# For SFT: build tokens + loss weights
model_input, weights = renderer.build_supervised_example(messages)

# For sampling: build a prompt for the model to complete
prompt = renderer.build_generation_prompt(messages[:-1])  # Exclude last assistant message
stop_sequences = renderer.get_stop_sequences()  # When to stop sampling

# Parse sampled tokens back into a message
sampled_tokens = [45, 7741, 34651, 31410]  # (in practice, from model.sample)
message, success = renderer.parse_response(sampled_tokens)

View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/concepts/rendering.py

API Surface

MethodPurposeReturns
build_supervised_example(messages, train_on_what)Convert messages to tokens + loss weights for SFT(ModelInput, weights_array)
build_generation_prompt(messages)Convert messages to a prompt for model completionModelInput
get_stop_sequences()Tokens that signal end-of-generationlist[int | str]
parse_response(tokens)Convert sampled tokens back to a message(dict, bool) success flag

Renderers (use get_renderer(name, tokenizer))

  • "qwen3" — Qwen3 with thinking enabled (default)
  • "qwen3_disable_thinking" — Qwen3 without thinking
  • "llama3" — Llama 3
  • "deepseekv3" — DeepSeek V3 (non-thinking)
  • "deepseekv3_thinking" — DeepSeek V3 (thinking mode)
  • "nemotron3" — NVIDIA Nemotron 3
  • "kimi_k2" — Kimi K2

Vision renderers (for VLMs like Qwen3-VL):

from mint.image_processing_utils import get_image_processor

image_processor = get_image_processor("Qwen/Qwen3-VL-235B-A22B-Instruct")
renderer = mint.renderers.get_renderer("qwen3_vl_instruct", tokenizer, image_processor=image_processor)

Caveats & Pitfalls

  • Renderer mismatch: Always use get_renderer() with the correct model family. A Llama3 renderer applied to Qwen3 tokens will produce incorrect special tokens.
  • Vision encoding: VL models require both a tokenizer and an image processor. Missing the image processor will cause shape mismatches on vision inputs.
  • Loss masking: By default, build_supervised_example() trains only on the final assistant message. Use train_on_what=TrainOnWhat.ALL_ASSISTANT_MESSAGES to train on all assistant turns.
  • Stop tokens: get_stop_sequences() returns token IDs specific to the model family. For sampling, always pass these to SamplingParams(stop=...).
  • Custom formats: For unlisted model families, use register_renderer() to define your own renderer before training. An unregistered format will silently use the wrong special tokens.

On this page