CustomizeConcepts
Completers
Completers provide two levels of abstraction over the sampling client. TokenCompleter operates on token IDs and raw ModelInput objects — used by RL loops that work at the token level. MessageCompleter operates on message dicts with role and content — used by evaluators, LLM-as-judge patterns, and chat applications.
Concept
The sampling client returns raw token IDs. For most training and evaluation workflows, you need higher-level abstractions:
- TokenCompleter — Input:
ModelInput(tokens). Output:TokensWithLogprobs(tokens + log-probabilities). Used in RL rollouts, online reward collection, and token-level analysis. - MessageCompleter — Input:
list[Message](role + content). Output:Message(structured response). Used in evaluators, LLM judges, multi-turn interactions, and production inference.
Both completers handle:
- Stop tokens — Automatically stops generation when a stop sequence is reached.
- Sampling parameters — Temperature, top-p, max-tokens, and other standard LLM sampling controls.
- Logprobs — For RL and importance weighting, you can retrieve log-probabilities alongside tokens.
Pattern
import mint
from mint.completers import TinkerTokenCompleter, TinkerMessageCompleter
from mint.renderers import get_renderer
service_client = mint.ServiceClient()
sampling_client = service_client.create_sampling_client(base_model="Qwen/Qwen3-0.6B")
tokenizer = sampling_client.get_tokenizer()
renderer = get_renderer("qwen3", tokenizer)
# Example 1: TokenCompleter for RL rollouts
token_completer = TinkerTokenCompleter(sampling_client=sampling_client)
prompt_ids = tokenizer.encode("The capital of France is")
prompt = mint.types.ModelInput.from_ints(prompt_ids)
sampling_params = mint.types.SamplingParams(
max_tokens=16,
temperature=0.7,
stop=renderer.get_stop_sequences(),
)
token_result = token_completer.complete(
prompt=prompt,
sampling_params=sampling_params,
)
print(f"Tokens: {token_result.tokens}")
print(f"Logprobs: {token_result.logprobs}")
# Example 2: MessageCompleter for evaluation
message_completer = TinkerMessageCompleter(
sampling_client=sampling_client,
renderer=renderer,
)
messages = [
{"role": "system", "content": "You are a math tutor."},
{"role": "user", "content": "What is 7 * 8?"},
]
message_result = message_completer.complete(
messages=messages,
sampling_params=mint.types.SamplingParams(max_tokens=32, temperature=0.0),
)
print(f"Response: {message_result}") # {"role": "assistant", "content": "..."}View full source: https://github.com/MindLab-Research/mint-quickstart/blob/main/concepts/completers.py
API Surface
| Class | Input | Output | Use case |
|---|---|---|---|
TinkerTokenCompleter | ModelInput (tokens) | TokensWithLogprobs | RL loops, token-level analysis |
TinkerMessageCompleter | list[Message] (role/content) | Message (role/content) | Evaluation, LLM-as-judge, chat |
Common parameters:
sampling_client— The underlyingSamplingClientinstance.renderer(MessageCompleter only) — Handles message → tokens and tokens → message conversion.sampling_params—SamplingParams(max_tokens, temperature, top_p, stop, ...).
Return types:
TokensWithLogprobs— Namedtuple:(tokens: list[int], logprobs: list[float]).Message— Dict:{"role": str, "content": str}.
Caveats & Pitfalls
- Stop sequences: Always pass
stop=renderer.get_stop_sequences()to prevent over-generation. Missing stop tokens can cause the model to continue past the intended message boundary. - MessageCompleter needs a renderer: A
MessageCompleterwithout a renderer will fail on the first.complete()call. Always initialize with the correct renderer for your model family. - Logprobs cost: Requesting logprobs adds a small computational overhead. In large-scale RL, consider batching completions or filtering logprob retrieval to important trajectories.
- Async variants: Use
complete_async()for concurrent completions. Always gather futures before.result()to maximize throughput. - Sampler desync: After saving and reloading model weights, create a new
SamplingClientand a newCompleter. A stale completer silently samples from old weights.