Reliability

This page explains reliability guarantees and failure behavior in Vix Inference.

Inference in the real world fails for boring reasons:

provider rate limits
network timeouts
model cold starts
overloaded GPUs
partial responses during streaming
invalid JSON from clients
slow clients or disconnects

The goal is not "never fail". The goal is predictable behavior under failure.

Reliability goals

Vix Inference is designed around these principles:

Fast failure for invalid requests
Retries for transient provider errors
Clear error codes and messages
Safe streaming termination
Observability (logs + request id)
No silent partial success

Request lifecycle

A typical request goes through:

validate input (schema and limits)
select provider and model
execute inference with timeouts
stream or buffer output
finalize and emit metrics

If any step fails, Vix returns a structured error.

Timeouts

Always set timeouts.

Recommended timeouts:

connect timeout: 2 to 5 seconds
first token timeout: 5 to 15 seconds
total request timeout: 30 to 120 seconds (depends on workload)
per chunk flush interval: small and constant

Timeouts must be enforced both:

at transport level (HTTP / WS)
at provider level (SDK / HTTP client)

Retries

Retries should be limited and only for transient failures.

Good retry candidates:

429 Too Many Requests
503 Service Unavailable
network timeouts
connection reset

Bad retry candidates:

400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
422 Validation errors

Recommended strategy:

max attempts: 2 or 3
exponential backoff
jitter
retry budget per request

Idempotency

Inference is usually safe to retry because:

it does not mutate server state
it returns generated text

But streaming can duplicate partial tokens if you retry mid-stream.

If you support retries on the client:

retry only before the first token
or use an idempotency key and server-side replay

Streaming reliability

Streaming has extra failure modes:

client disconnects mid-stream
provider disconnects mid-stream
chunk parsing errors
backpressure and slow readers

Rules:

detect disconnect and cancel inference
always send an explicit end event when possible
do not keep streaming forever
cap total streamed bytes

Recommended events:

inference.start
inference.token
inference.error
inference.end

If the client disconnects, Vix must stop work quickly.

Circuit breaker

If a provider is failing repeatedly, stop sending traffic to it.

A minimal circuit breaker uses:

failure counter in a short window
open state for a cooldown period
half-open probe requests

This protects your system from cascading failure.

Fallback providers

If you configure multiple providers, you can fail over.

Example behavior:

prefer local provider
if it fails with timeout or 503, try remote provider
if all fail, return an error with attempts info

Failover must be visible in logs and metrics.

Error format

A reliable API returns predictable errors.

Recommended error envelope:

json

{
  "ok": false,
  "error": {
    "code": "provider_timeout",
    "message": "Provider did not respond in time",
    "retryable": true
  },
  "request_id": "...."
}

For streaming, send the same envelope inside inference.error.

Limits

Reliability includes protecting resources.

Common limits:

max input tokens
max output tokens
max request body bytes
max concurrent streams
max requests per minute per key
max streaming duration

Reject early with a clear 4xx error.

Observability

To debug production failures you need:

request id in every response
structured logs (provider, model, latency)
metrics (p50/p95/p99, error rates)
retry counts and final reason

Minimal log fields:

request_id
provider
model
status
latency_ms
retry_count
streaming (true/false)

Practical checklist

Before shipping:

timeouts configured
input validation enforced
retry policy defined
streaming end event implemented
circuit breaker enabled
rate limit enabled
logs and metrics enabled

Reliability is not a feature. It is the default behavior.

Project

Registry

Packaging

Database

Network

Reliability

Reliability goals

Request lifecycle

Timeouts

Retries

Idempotency

Streaming reliability

Circuit breaker

Fallback providers

Error format

Limits

Observability

Practical checklist

Reliability ​

Reliability goals ​

Request lifecycle ​

Timeouts ​

Retries ​

Idempotency ​

Streaming reliability ​

Circuit breaker ​

Fallback providers ​

Error format ​

Limits ​

Observability ​

Practical checklist ​

Reliability

Reliability goals

Request lifecycle

Timeouts

Retries

Idempotency

Streaming reliability

Circuit breaker

Fallback providers

Error format

Limits

Observability

Practical checklist