OpenPI VLA HTTP

This page documents demos/embodied/openpi_vla_http.py in mint-quickstart.

What this demo does

Uses direct HTTP instead of the higher-level mintx SDK helper, so the current OpenPI FAST wire format stays visible.
Runs one minimal VLA training round-trip: create_session -> create_model -> train_step -> save_weights_for_sampler -> delete model.
Shows the exact payload split between model_input and loss_fn_inputs.
Uses three 1x1 PNG placeholders so the script stays self-contained.

Wire shape

model_input.chunks
  1. image: base_0_rgb
  2. image: left_wrist_0_rgb
  3. image: right_wrist_0_rgb
  4. encoded_text: prefix_tokens

loss_fn_inputs
  - state
  - target_tokens
  - weights
  - token_ar_mask
  - optional: logprobs + advantages

How to run

pip install httpx python-dotenv
export MINT_API_KEY=sk-...
python demos/embodied/openpi_vla_http.py

Use the MinT endpoint that matches your region:

Mainland China: https://mint-cn.macaron.xin/
Outside Mainland China: https://mint.macaron.xin/

Parameters (env vars)

MINT_API_KEY / TINKER_API_KEY: auth
MINT_BASE_URL / TINKER_BASE_URL: server endpoint
MINT_OPENPI_HTTP_BASE_MODEL: default openpi/pi0-fast-libero-low-mem-finetune
MINT_OPENPI_HTTP_LORA_RANK: default 16
MINT_OPENPI_HTTP_LR: default 0.003
MINT_OPENPI_HTTP_SAMPLER_PATH: default mint-openpi-vla-http-example
MINT_OPENPI_HTTP_CLIENT_TIMEOUT_SECONDS: default 120
MINT_OPENPI_HTTP_FUTURE_TIMEOUT_SECONDS: default 1200

Key payload code

def build_openpi_fast_datum_payload(
    *,
    prefix_tokens,
    image_bytes_by_camera,
    state,
    target_tokens,
    weights,
    token_ar_mask,
    logprobs=None,
    advantages=None,
):
    return {
        "model_input": {
            "chunks": [
                {
                    "data": "<base64 png>",
                    "format": "png",
                    "expected_tokens": 256,
                    "type": "image",
                },
                {
                    "data": "<base64 png>",
                    "format": "png",
                    "expected_tokens": 256,
                    "type": "image",
                },
                {
                    "data": "<base64 png>",
                    "format": "png",
                    "expected_tokens": 256,
                    "type": "image",
                },
                {"tokens": list(prefix_tokens), "type": "encoded_text"},
            ]
        },
        "loss_fn_inputs": {
            "state": {"data": list(state), "shape": [len(state)], "dtype": "float32"},
            "target_tokens": {"data": list(target_tokens), "shape": [len(target_tokens)], "dtype": "int64"},
            "weights": {"data": list(weights), "shape": [len(weights)], "dtype": "float32"},
            "token_ar_mask": {"data": list(token_ar_mask), "shape": [len(token_ar_mask)], "dtype": "int32"},
        },
    }

Future polling code

def poll_future(client, *, request_id, timeout_seconds=None, sleep=time.sleep):
    deadline = time.monotonic() + timeout_seconds
    while True:
        response = client.post("/api/v1/retrieve_future", json={"request_id": request_id})
        if response.status_code == 408:
            payload = response.json()
            delay_seconds = _poll_delay_seconds(payload, response.headers)
            if time.monotonic() + delay_seconds > deadline:
                raise TimeoutError(...)
            sleep(delay_seconds)
            continue
        response.raise_for_status()
        return response.json()

Main flow

def run_example():
    session = client.post("/api/v1/create_session", json=build_create_session_request())
    create_future = client.post("/api/v1/create_model", json=build_create_model_request(...))
    create_result = poll_future(client, request_id=create_future.json()["request_id"])

    step_future = client.post("/api/v1/train_step", json=build_train_step_request(...))
    step_result = poll_future(client, request_id=step_future.json()["request_id"])

    sampler_future = client.post(
        "/api/v1/save_weights_for_sampler",
        json=build_save_weights_for_sampler_request(...),
    )
    sampler_result = poll_future(client, request_id=sampler_future.json()["request_id"])

Expected output shape

The script prints a Python dict with these keys:

session_id
model_id
model_info
train_step
sampler
models
delete_model

Important notes

The main MinT SDK path now lives in mint.mint / mintx; see OpenPI VLA SDK.
This example is intentionally raw HTTP. It documents the exact request shape directly instead of the SDK helper layer.
The 1x1 PNGs are placeholders only. For a real task, replace them with real camera frames and real robot rollout tensors.
save_weights_for_sampler confirms that the trained weights can be exported for later sampling or downstream evaluation.

Next steps

Use OpenPI VLA SDK for the main MinT SDK path.
Replace ONE_PIXEL_PNG with real base_0_rgb / wrist camera frames.
Replace the toy state and target_tokens with real rollout data.
Use the saved sampler path in a later evaluation or rollout client.

On this page