OpenPI VLA HTTP
This page documents demos/embodied/openpi_vla_http.py in mint-quickstart.
What this demo does
- Uses direct HTTP instead of the higher-level
mintxSDK helper, so the current OpenPI FAST wire format stays visible. - Runs one minimal VLA training round-trip:
create_session->create_model->train_step->save_weights_for_sampler->delete model. - Shows the exact payload split between
model_inputandloss_fn_inputs. - Uses three 1x1 PNG placeholders so the script stays self-contained.
Wire shape
model_input.chunks
1. image: base_0_rgb
2. image: left_wrist_0_rgb
3. image: right_wrist_0_rgb
4. encoded_text: prefix_tokens
loss_fn_inputs
- state
- target_tokens
- weights
- token_ar_mask
- optional: logprobs + advantagesHow to run
pip install httpx python-dotenv
export MINT_API_KEY=sk-...
python demos/embodied/openpi_vla_http.pyUse the MinT endpoint that matches your region:
- Mainland China:
https://mint-cn.macaron.xin/ - Outside Mainland China:
https://mint.macaron.xin/
Parameters (env vars)
MINT_API_KEY/TINKER_API_KEY: authMINT_BASE_URL/TINKER_BASE_URL: server endpointMINT_OPENPI_HTTP_BASE_MODEL: defaultopenpi/pi0-fast-libero-low-mem-finetuneMINT_OPENPI_HTTP_LORA_RANK: default16MINT_OPENPI_HTTP_LR: default0.003MINT_OPENPI_HTTP_SAMPLER_PATH: defaultmint-openpi-vla-http-exampleMINT_OPENPI_HTTP_CLIENT_TIMEOUT_SECONDS: default120MINT_OPENPI_HTTP_FUTURE_TIMEOUT_SECONDS: default1200
Key payload code
def build_openpi_fast_datum_payload(
*,
prefix_tokens,
image_bytes_by_camera,
state,
target_tokens,
weights,
token_ar_mask,
logprobs=None,
advantages=None,
):
return {
"model_input": {
"chunks": [
{
"data": "<base64 png>",
"format": "png",
"expected_tokens": 256,
"type": "image",
},
{
"data": "<base64 png>",
"format": "png",
"expected_tokens": 256,
"type": "image",
},
{
"data": "<base64 png>",
"format": "png",
"expected_tokens": 256,
"type": "image",
},
{"tokens": list(prefix_tokens), "type": "encoded_text"},
]
},
"loss_fn_inputs": {
"state": {"data": list(state), "shape": [len(state)], "dtype": "float32"},
"target_tokens": {"data": list(target_tokens), "shape": [len(target_tokens)], "dtype": "int64"},
"weights": {"data": list(weights), "shape": [len(weights)], "dtype": "float32"},
"token_ar_mask": {"data": list(token_ar_mask), "shape": [len(token_ar_mask)], "dtype": "int32"},
},
}Future polling code
def poll_future(client, *, request_id, timeout_seconds=None, sleep=time.sleep):
deadline = time.monotonic() + timeout_seconds
while True:
response = client.post("/api/v1/retrieve_future", json={"request_id": request_id})
if response.status_code == 408:
payload = response.json()
delay_seconds = _poll_delay_seconds(payload, response.headers)
if time.monotonic() + delay_seconds > deadline:
raise TimeoutError(...)
sleep(delay_seconds)
continue
response.raise_for_status()
return response.json()Main flow
def run_example():
session = client.post("/api/v1/create_session", json=build_create_session_request())
create_future = client.post("/api/v1/create_model", json=build_create_model_request(...))
create_result = poll_future(client, request_id=create_future.json()["request_id"])
step_future = client.post("/api/v1/train_step", json=build_train_step_request(...))
step_result = poll_future(client, request_id=step_future.json()["request_id"])
sampler_future = client.post(
"/api/v1/save_weights_for_sampler",
json=build_save_weights_for_sampler_request(...),
)
sampler_result = poll_future(client, request_id=sampler_future.json()["request_id"])Expected output shape
The script prints a Python dict with these keys:
session_id
model_id
model_info
train_step
sampler
models
delete_modelImportant notes
- The main MinT SDK path now lives in
mint.mint/mintx; see OpenPI VLA SDK. - This example is intentionally raw HTTP. It documents the exact request shape directly instead of the SDK helper layer.
- The 1x1 PNGs are placeholders only. For a real task, replace them with real camera frames and real robot rollout tensors.
save_weights_for_samplerconfirms that the trained weights can be exported for later sampling or downstream evaluation.
Next steps
- Use OpenPI VLA SDK for the main MinT SDK path.
- Replace
ONE_PIXEL_PNGwith realbase_0_rgb/ wrist camera frames. - Replace the toy
stateandtarget_tokenswith real rollout data. - Use the saved sampler path in a later evaluation or rollout client.