API documentation

LLMApi is an OpenAI-compatible gateway in front of local AI engines (Codex and Claude). Point any OpenAI SDK at it, authenticate with an API key, and call chat completions, image generation and model listing — no client changes beyond the base URL and key.

Base URL & authentication

All endpoints live under /v1:

https://YOUR_HOST/v1

Authenticate with your API key as a Bearer token (create one in the console under API Keys):

Authorization: Bearer sk-...

Keys carry scopes (chat, image), an optional daily quota and max concurrency, and can be revoked instantly. The secret is shown exactly once at creation — store it safely.

Quickstart

Point any OpenAI SDK at your gateway — change the base URL and the key, nothing else.

curl https://YOUR_HOST/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

Chat completions

POST /v1/chat/completions — requires the chat scope.

Field Type Notes
model string A model id from GET /v1/models.
messages array OpenAI chat messages (system / user / assistant).
stream boolean Stream tokens as SSE chunks. Default false.
temperature number Sampling temperature, 0–2. Optional.
max_tokens integer Upper bound on output tokens. Optional.
reasoning_effort string minimal · low · medium · high · xhigh (Codex).
web_search boolean Allow the model to use web search for this call.
response_format object { "type": "json_object" } or { "type": "json_schema", "json_schema": { ... } }.

Non-streaming response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "claude-sonnet-4-6",
  "choices": [
    { "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 34, "total_tokens": 46 }
}

For streaming, set "stream": true — the response is text/event-stream, each event is data: { ...chat.completion.chunk... }, and the stream ends with data: [DONE]. Concatenate choices[0].delta.content.

Models

GET /v1/models lists the models your account can actually call — discovered live from each engine, so you never hit an unsupported-model error.

curl https://YOUR_HOST/v1/models -H "Authorization: Bearer sk-..."

Codex serves the GPT-5 family (e.g. gpt-5.5, gpt-5.4); Claude serves opus, sonnet, haiku and dated ids such as claude-sonnet-4-6.

Image generation

POST /v1/images/generations — requires the image scope. Backed by Codex's image tool.

Field Type Notes
prompt string The image brief. Required.
n integer Number of images, 1–4 (each is a separate run).
ratio string 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3.
purpose string hero · logo · icon · product · illustration · …
background string transparent for a cut-out subject.
response_format string b64_json (default) or url.
curl https://YOUR_HOST/v1/images/generations \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "a calm mountain lake at dawn", "ratio": "16:9", "n": 1 }'

Returns { "created": ..., "data": [{ "b64_json": "..." }] }.

Key status

GET /v1/key returns the calling key's own status — scopes, remaining daily quota, current concurrency and recent usage. Authenticate with the key itself; no special scope is required. Useful for showing your users their remaining limit before a call.

curl https://YOUR_HOST/v1/key -H "Authorization: Bearer sk-..."
{
  "object": "api_key",
  "prefix": "sk-1a2b3c4d",
  "scopes": ["chat", "image"],
  "enabled": true,
  "expires_at": null,
  "limits": {
    "daily_quota": 1000,
    "used_today": 12,
    "remaining_today": 988,
    "resets_at": "2026-06-18T00:00:00.000Z",
    "max_concurrency": 5,
    "in_flight": 1
  },
  "totals": { "requests": 1234, "errors": 3 },
  "usage_30d": { "requests": 420, "total_tokens": 91234, "cost_usd": 0.0 }
}

Errors

Every error uses the OpenAI envelope:

{ "error": { "message": "Incorrect API key provided.", "type": "invalid_request_error", "code": "invalid_api_key" } }
Code HTTP Meaning
missing_api_key 401 No bearer token was sent.
invalid_api_key 401 The key is unknown.
key_revoked 401 The key has been disabled.
key_expired 401 The key is past its expiry.
insufficient_scope 403 The key lacks the required scope.
quota_exceeded 429 The key's daily quota is spent.
concurrency_limit 429 Too many concurrent requests for the key.
model_not_found 404 Unknown or disabled model.

Rate limits & quotas

Each key may set a daily request quota and a max concurrency, both enforced on every request. Usage — requests, tokens and cost — is metered and broken down per key and per model in the console under Usage.

For AI agents

This page is the canonical, machine-readable reference. The surface is OpenAI-compatible: set the base URL to /v1 and pass the key as a bearer token. Always call GET /v1/models to discover valid model ids before issuing a completion, and read errors from the error.code field above.

Get an API key and start building.

Open the console