Using the APILimits and Quotas

Limits and Quotas

Session Timeout

Sampling sessions have a 30-minute inactivity timeout. If no sampling requests are made for 30 minutes, the session expires.

Create a new sampling client to continue:

sampling_client = training_client.save_weights_and_get_sampling_client()

Rate Limiting

When rate limited, the API returns HTTP 429 with RateLimitError. Implement exponential backoff:

import asyncio
from mint import RateLimitError
 
async def call_with_backoff(fn, max_retries=5):
    for attempt in range(max_retries):
        try:
            return await fn()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            await asyncio.sleep(wait_time)

Best Practices for Long Training Runs

For RL training or other long-running workloads:

  1. Create fresh sampling client each batch - Avoids session timeout issues
  2. Complete sampling within 25 minutes - Leave buffer before 30-minute timeout
  3. Implement retry logic - Handle transient RequestFailedError with exponential backoff
  4. Monitor request_id - Save for debugging if errors persist
for batch in range(num_batches):
    sampling_client = training_client.save_weights_and_get_sampling_client()
 
    for step in range(max_steps):
        result = await sample_with_retry(sampling_client, prompt, params)
 
    await training_client.forward_backward_async(...)
    await training_client.optim_step_async(...)