Appearance
Reliability
This page explains reliability guarantees and failure behavior in Vix Inference.
Inference in the real world fails for boring reasons:
- provider rate limits
- network timeouts
- model cold starts
- overloaded GPUs
- partial responses during streaming
- invalid JSON from clients
- slow clients or disconnects
The goal is not "never fail". The goal is predictable behavior under failure.
Reliability goals
Vix Inference is designed around these principles:
- Fast failure for invalid requests
- Retries for transient provider errors
- Clear error codes and messages
- Safe streaming termination
- Observability (logs + request id)
- No silent partial success
Request lifecycle
A typical request goes through:
- validate input (schema and limits)
- select provider and model
- execute inference with timeouts
- stream or buffer output
- finalize and emit metrics
If any step fails, Vix returns a structured error.
Timeouts
Always set timeouts.
Recommended timeouts:
- connect timeout: 2 to 5 seconds
- first token timeout: 5 to 15 seconds
- total request timeout: 30 to 120 seconds (depends on workload)
- per chunk flush interval: small and constant
Timeouts must be enforced both:
- at transport level (HTTP / WS)
- at provider level (SDK / HTTP client)
Retries
Retries should be limited and only for transient failures.
Good retry candidates:
- 429 Too Many Requests
- 503 Service Unavailable
- network timeouts
- connection reset
Bad retry candidates:
- 400 Bad Request
- 401 Unauthorized
- 403 Forbidden
- 404 Not Found
- 422 Validation errors
Recommended strategy:
- max attempts: 2 or 3
- exponential backoff
- jitter
- retry budget per request
Idempotency
Inference is usually safe to retry because:
- it does not mutate server state
- it returns generated text
But streaming can duplicate partial tokens if you retry mid-stream.
If you support retries on the client:
- retry only before the first token
- or use an idempotency key and server-side replay
Streaming reliability
Streaming has extra failure modes:
- client disconnects mid-stream
- provider disconnects mid-stream
- chunk parsing errors
- backpressure and slow readers
Rules:
- detect disconnect and cancel inference
- always send an explicit end event when possible
- do not keep streaming forever
- cap total streamed bytes
Recommended events:
inference.startinference.tokeninference.errorinference.end
If the client disconnects, Vix must stop work quickly.
Circuit breaker
If a provider is failing repeatedly, stop sending traffic to it.
A minimal circuit breaker uses:
- failure counter in a short window
- open state for a cooldown period
- half-open probe requests
This protects your system from cascading failure.
Fallback providers
If you configure multiple providers, you can fail over.
Example behavior:
- prefer local provider
- if it fails with timeout or 503, try remote provider
- if all fail, return an error with attempts info
Failover must be visible in logs and metrics.
Error format
A reliable API returns predictable errors.
Recommended error envelope:
json
{
"ok": false,
"error": {
"code": "provider_timeout",
"message": "Provider did not respond in time",
"retryable": true
},
"request_id": "...."
}For streaming, send the same envelope inside inference.error.
Limits
Reliability includes protecting resources.
Common limits:
- max input tokens
- max output tokens
- max request body bytes
- max concurrent streams
- max requests per minute per key
- max streaming duration
Reject early with a clear 4xx error.
Observability
To debug production failures you need:
- request id in every response
- structured logs (provider, model, latency)
- metrics (p50/p95/p99, error rates)
- retry counts and final reason
Minimal log fields:
- request_id
- provider
- model
- status
- latency_ms
- retry_count
- streaming (true/false)
Practical checklist
Before shipping:
- timeouts configured
- input validation enforced
- retry policy defined
- streaming end event implemented
- circuit breaker enabled
- rate limit enabled
- logs and metrics enabled
Reliability is not a feature. It is the default behavior.