API documentation
LLMApi is an OpenAI-compatible gateway in front of local AI engines (Codex and Claude). Point any OpenAI SDK at it, authenticate with an API key, and call chat completions, image generation and model listing — no client changes beyond the base URL and key.
Base URL & authentication
All endpoints live under /v1:
https://YOUR_HOST/v1
Authenticate with your API key as a Bearer token (create one in the console under API Keys):
Authorization: Bearer sk-...
Keys carry scopes (chat, image), an optional daily quota and max concurrency, and can be revoked instantly. The secret is shown exactly once at creation — store it safely.
Quickstart
Point any OpenAI SDK at your gateway — change the base URL and the key, nothing else.
curl https://YOUR_HOST/v1/chat/completions \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{ "role": "user", "content": "Hello!" }]
}'Chat completions
POST /v1/chat/completions — requires the chat scope.
| Field | Type | Notes |
|---|---|---|
model |
string | A model id from GET /v1/models. |
messages |
array | OpenAI chat messages (system / user / assistant). |
stream |
boolean | Stream tokens as SSE chunks. Default false. |
temperature |
number | Sampling temperature, 0–2. Optional. |
max_tokens |
integer | Upper bound on output tokens. Optional. |
reasoning_effort |
string | minimal · low · medium · high · xhigh (Codex). |
web_search |
boolean | Allow the model to use web search for this call. |
response_format |
object | { "type": "json_object" } or { "type": "json_schema", "json_schema": { ... } }. |
Non-streaming response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "claude-sonnet-4-6",
"choices": [
{ "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }
],
"usage": { "prompt_tokens": 12, "completion_tokens": 34, "total_tokens": 46 }
}
For streaming, set "stream": true — the response is text/event-stream, each event is data: { ...chat.completion.chunk... }, and the stream ends with data: [DONE]. Concatenate choices[0].delta.content.
Models
GET /v1/models lists the models your account can actually call — discovered live from each engine, so you never hit an unsupported-model error.
curl https://YOUR_HOST/v1/models -H "Authorization: Bearer sk-..."
Codex serves the GPT-5 family (e.g. gpt-5.5, gpt-5.4); Claude serves opus, sonnet, haiku and dated ids such as claude-sonnet-4-6.
Image generation
POST /v1/images/generations — requires the image scope. Backed by Codex's image tool.
| Field | Type | Notes |
|---|---|---|
prompt |
string | The image brief. Required. |
n |
integer | Number of images, 1–4 (each is a separate run). |
ratio |
string | 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3. |
purpose |
string | hero · logo · icon · product · illustration · … |
background |
string | transparent for a cut-out subject. |
response_format |
string | b64_json (default) or url. |
curl https://YOUR_HOST/v1/images/generations \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{ "prompt": "a calm mountain lake at dawn", "ratio": "16:9", "n": 1 }'
Returns { "created": ..., "data": [{ "b64_json": "..." }] }.
Key status
GET /v1/key returns the calling key's own status — scopes, remaining daily quota, current concurrency and recent usage. Authenticate with the key itself; no special scope is required. Useful for showing your users their remaining limit before a call.
curl https://YOUR_HOST/v1/key -H "Authorization: Bearer sk-..."
{
"object": "api_key",
"prefix": "sk-1a2b3c4d",
"scopes": ["chat", "image"],
"enabled": true,
"expires_at": null,
"limits": {
"daily_quota": 1000,
"used_today": 12,
"remaining_today": 988,
"resets_at": "2026-06-18T00:00:00.000Z",
"max_concurrency": 5,
"in_flight": 1
},
"totals": { "requests": 1234, "errors": 3 },
"usage_30d": { "requests": 420, "total_tokens": 91234, "cost_usd": 0.0 }
}
Errors
Every error uses the OpenAI envelope:
{ "error": { "message": "Incorrect API key provided.", "type": "invalid_request_error", "code": "invalid_api_key" } }
| Code | HTTP | Meaning |
|---|---|---|
missing_api_key |
401 | No bearer token was sent. |
invalid_api_key |
401 | The key is unknown. |
key_revoked |
401 | The key has been disabled. |
key_expired |
401 | The key is past its expiry. |
insufficient_scope |
403 | The key lacks the required scope. |
quota_exceeded |
429 | The key's daily quota is spent. |
concurrency_limit |
429 | Too many concurrent requests for the key. |
model_not_found |
404 | Unknown or disabled model. |
Rate limits & quotas
Each key may set a daily request quota and a max concurrency, both enforced on every request. Usage — requests, tokens and cost — is metered and broken down per key and per model in the console under Usage.
For AI agents
This page is the canonical, machine-readable reference. The surface is OpenAI-compatible: set the base URL to /v1 and pass the key as a bearer token. Always call GET /v1/models to discover valid model ids before issuing a completion, and read errors from the error.code field above.

