Planned Feature: The mint.renderers module described in this document is not yet implemented. This page serves as a design reference for the upcoming multi-turn training support.
Multi-turn Training
Multi-turn training allows you to train models on conversational data with multiple turns of interaction. MinT uses a renderer system to efficiently handle multi-turn conversations.
Why Renderers?
In multi-turn RL, each turn builds on previous turns. A naive approach recomputes the entire conversation for each turn, leading to O(T^2) complexity. MinT’s renderer system uses the extension property to achieve O(T) complexity.
Naive approach (O(T^2)):
Turn 1: [system, user1, assistant1]
Turn 2: [system, user1, assistant1, user2, assistant2]
Turn 3: [system, user1, assistant1, user2, assistant2, user3, assistant3]
...
Extension property (O(T)):
Turn 1: [system, user1, assistant1]
Turn 2: [user2, assistant2] # extends Turn 1
Turn 3: [user3, assistant3] # extends Turn 2Using Renderers
MinT provides built-in renderers for common chat formats:
from mint.renderers import get_renderer
# Get renderer for Qwen3 instruct format
renderer = get_renderer("qwen3_instruct")
# Render a conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "How are you?"},
]
tokens = renderer.render(messages, tokenizer)Available Renderers
| Renderer | Model Family | Format |
|---|---|---|
qwen3_instruct | Qwen3 | <|im_start|>role\ncontent<|im_end|> |
kimi_k2 | Kimi-K2 | Similar to Qwen3 with thinking blocks |
deepseekv3 | DeepSeek-V3 | DeepSeek chat format |
role_colon | Generic | role: content\n |
Multi-turn RL Example
import mint
from mint import types
from mint.renderers import get_renderer
service_client = mint.ServiceClient()
training_client = service_client.create_lora_training_client(
base_model="Qwen/Qwen3-4B-Instruct-2507",
rank=16
)
tokenizer = training_client.get_tokenizer()
renderer = get_renderer("qwen3_instruct")
# Environment with multi-turn interaction
def run_episode():
messages = [{"role": "system", "content": "Count down from the given number."}]
total_reward = 0
for turn in range(3):
# Add user message
messages.append({"role": "user", "content": f"Start from {5 - turn}"})
# Get model response
prompt_tokens = renderer.render(messages, tokenizer)
result = sampling_client.sample(
prompt=types.ModelInput.from_ints(tokens=prompt_tokens),
sampling_params=types.SamplingParams(max_tokens=50, temperature=0.8)
).result()
response = tokenizer.decode(result.sequences[0].tokens)
messages.append({"role": "assistant", "content": response})
# Compute reward for this turn
reward = compute_reward(response, expected=f"{5-turn}, {4-turn}, ...")
total_reward += reward
return messages, total_rewardExtension Property Optimization
When training on multi-turn data, use the extension property to avoid redundant computation:
# Instead of recomputing full conversation each turn:
# Turn 1: forward([sys, u1, a1])
# Turn 2: forward([sys, u1, a1, u2, a2]) # redundant prefix
# Use extension:
# Turn 1: forward([sys, u1, a1])
# Turn 2: forward([u2, a2], extends=turn1_state) # only new tokensThis is handled automatically by MinT’s renderer system when you use the extend parameter.