API documentation

LLMApi is an OpenAI-compatible gateway in front of local AI engines (Codex and Claude). Point any OpenAI SDK at it, authenticate with an API key, and call chat completions, image generation and model listing — no client changes beyond the base URL and key.

Base URL & authentication

All endpoints live under /v1:

https://YOUR_HOST/v1

Authenticate with your API key as a Bearer token (create one in the console under API Keys):

Authorization: Bearer sk-...

Keys carry scopes (chat, image), an optional daily quota and max concurrency, and can be revoked instantly. The secret is shown exactly once at creation — store it safely.

Quickstart

Point any OpenAI SDK at your gateway — change the base URL and the key, nothing else.

curl https://YOUR_HOST/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

Chat completions

POST /v1/chat/completions — requires the chat scope.

Field	Type	Notes
`model`	string	A model id from `GET /v1/models`.
`messages`	array	OpenAI chat messages (`system` / `user` / `assistant`).
`stream`	boolean	Stream tokens as SSE chunks. Default `false`.
`temperature`	number	Sampling temperature, 0–2. Optional.
`max_tokens`	integer	Upper bound on output tokens. Optional.
`reasoning_effort`	string	`minimal` · `low` · `medium` · `high` · `xhigh` (Codex).
`web_search`	boolean	Allow the model to use web search for this call.
`response_format`	object	`{ "type": "json_object" }` or `{ "type": "json_schema", "json_schema": { ... } }`.

Non-streaming response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "claude-sonnet-4-6",
  "choices": [
    { "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 12, "completion_tokens": 34, "total_tokens": 46 }
}

For streaming, set "stream": true — the response is text/event-stream, each event is data: { ...chat.completion.chunk... }, and the stream ends with data: [DONE]. Concatenate choices[0].delta.content.

Models

GET /v1/models lists the models your account can actually call — discovered live from each engine, so you never hit an unsupported-model error.

curl https://YOUR_HOST/v1/models -H "Authorization: Bearer sk-..."

Codex serves the GPT-5 family (e.g. gpt-5.5, gpt-5.4); Claude serves opus, sonnet, haiku and dated ids such as claude-sonnet-4-6.

Image generation

POST /v1/images/generations — requires the image scope. Backed by Codex's image tool.

Field	Type	Notes
`prompt`	string	The image brief. Required.
`n`	integer	Number of images, 1–4 (each is a separate run).
`ratio`	string	`1:1` · `16:9` · `9:16` · `4:3` · `3:4` · `3:2` · `2:3`.
`purpose`	string	`hero` · `logo` · `icon` · `product` · `illustration` · …
`background`	string	`transparent` for a cut-out subject.
`response_format`	string	`b64_json` (default) or `url`.

curl https://YOUR_HOST/v1/images/generations \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "a calm mountain lake at dawn", "ratio": "16:9", "n": 1 }'

Returns { "created": ..., "data": [{ "b64_json": "..." }] }.

Key status

GET /v1/key returns the calling key's own status — scopes, remaining daily quota, current concurrency and recent usage. Authenticate with the key itself; no special scope is required. Useful for showing your users their remaining limit before a call.

curl https://YOUR_HOST/v1/key -H "Authorization: Bearer sk-..."

{
  "object": "api_key",
  "prefix": "sk-1a2b3c4d",
  "scopes": ["chat", "image"],
  "enabled": true,
  "expires_at": null,
  "limits": {
    "daily_quota": 1000,
    "used_today": 12,
    "remaining_today": 988,
    "resets_at": "2026-06-18T00:00:00.000Z",
    "max_concurrency": 5,
    "in_flight": 1
  },
  "totals": { "requests": 1234, "errors": 3 },
  "usage_30d": { "requests": 420, "total_tokens": 91234, "cost_usd": 0.0 }
}

Errors

Every error uses the OpenAI envelope:

{ "error": { "message": "Incorrect API key provided.", "type": "invalid_request_error", "code": "invalid_api_key" } }

Code	HTTP	Meaning
`missing_api_key`	401	No bearer token was sent.
`invalid_api_key`	401	The key is unknown.
`key_revoked`	401	The key has been disabled.
`key_expired`	401	The key is past its expiry.
`insufficient_scope`	403	The key lacks the required scope.
`quota_exceeded`	429	The key's daily quota is spent.
`concurrency_limit`	429	Too many concurrent requests for the key.
`model_not_found`	404	Unknown or disabled model.

Rate limits & quotas

Each key may set a daily request quota and a max concurrency, both enforced on every request. Usage — requests, tokens and cost — is metered and broken down per key and per model in the console under Usage.

For AI agents

This page is the canonical, machine-readable reference. The surface is OpenAI-compatible: set the base URL to /v1 and pass the key as a bearer token. Always call GET /v1/models to discover valid model ids before issuing a completion, and read errors from the error.code field above.

Get an API key and start building.

Open the console