AI Gateway API

Overview

The Swiftor AI Gateway acts as a unified interface to access a variety of Large Language Models (LLMs) from different providers. It utilizes LiteLLM to provide an OpenAI-compatible API endpoint, simplifying integration and allowing seamless switching between models based on user access tiers.

Key features include:

OpenAI Compatibility: Use familiar API request/response formats.
Unified Model Access: Access diverse models using a consistent swiftor/ prefix.
Tiered Access Control: Model availability is determined by the user's subscription tier (Starter, Hacker, Engineer).
Rate Limiting: Usage is subject to rate limits based on the user's tier.

Authentication

All requests to the AI Gateway must be authenticated. Include your Swiftor API Key in the Authorization header as a Bearer token.

http

Authorization: Bearer YOUR_SWIFTOR_API_KEY

Endpoints

The Swiftor AI Gateway provides OpenAI-compatible endpoints. Due to its foundation on LiteLLM, it supports multiple path variations for common operations to ensure broad compatibility.

Chat Completions

Used to generate text completions based on a conversation history (messages).

Supported Paths:

http

POST /v1/chat/completions
POST /chat/completions
POST /engines/{model}/chat/completions
POST /openai/deployments/{model}/chat/completions

The {model} in the path parameters should be the Swiftor model ID (e.g., swiftor/gemini-2.0-flash).

(Primary recommended path is /v1/chat/completions)

Request Format Example (`/v1/chat/completions`)

json

{
  "model": "swiftor/gemini-2.0-flash", // Specify the desired Swiftor model
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain the concept of Large Language Models."
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false // Set to true for streaming responses
}

Response Format Example (Non-Streaming, `/v1/chat/completions`)

json

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1678886400,
  "model": "swiftor/gemini-2.0-flash", // The model actually used
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 135,
    "total_tokens": 160
  }
}

For streaming responses (stream: true), you will receive a series of Server-Sent Events (SSE).

Completions (Legacy)

Used for generating text completions based on a single prompt string (older API style).

Supported Paths:

http

POST /v1/completions
POST /completions
POST /engines/{model}/completions
POST /openai/deployments/{model}/completions

The {model} in the path parameters should be the Swiftor model ID.

(Primary recommended path is /v1/completions)

Request/Response format typically follows the legacy OpenAI /v1/completions structure (using prompt instead of messages).

Embeddings

Used to generate vector embeddings for input text, typically used for semantic search or clustering.

Supported Paths:

http

POST /v1/embeddings
POST /embeddings
POST /engines/{model}/embeddings
POST /openai/deployments/{model}/embeddings

The {model} in the path parameters should be the Swiftor model ID compatible with embeddings.

(Primary recommended path is /v1/embeddings)

Request/Response format typically follows the OpenAI /v1/embeddings structure (using input and returning embedding vectors). Model compatibility for embeddings needs to be verified.

Model Routing and Naming

Model Names: Always specify the desired model using the swiftor/ prefix followed by the model identifier (e.g., swiftor/gemini-2.0-flash, swiftor/claude-sonnet-4).
Routing: The gateway routes the request to the appropriate underlying model provider configured in config.yaml (primarily OpenRouter in this configuration).

Available Models & Tiers

Model access is restricted based on your subscription tier. Higher tiers include access to models from lower tiers.

🟢 Starter Tier Models

Accessible by Starter, Hacker, and Engineer users.

Model ID (`model` parameter)	Base Model / Provider Info
`swiftor/gemini-2.0-flash`	Google Gemini 2.0 Flash
`swiftor/mistral-nemo`	Mistral Nemo
`swiftor/llama-3.3-70b`	Meta LLaMA 3.3 70B
`swiftor/qwen-3-235b`	Qwen 3 235B
`swiftor/grok-3-mini`	Grok 3 Mini
`swiftor/devstral-small`	DevStral Small

🔵 Hacker Tier Models

Accessible by Hacker and Engineer users.

Model ID (`model` parameter)	Base Model / Provider Info
`swiftor/gemini-2.5-flash`	Google Gemini 2.5 Flash
`swiftor/codex-mini`	Codex Mini
`swiftor/claude-3.7-sonnet`	Claude 3.7 Sonnet
`swiftor/gpt-4o-mini`	GPT-4o Mini
`swiftor/chatgpt-4o`	ChatGPT-4o
`swiftor/sonar`	Sonar
`swiftor/tts-1`	TTS-1
`swiftor/whiterabbitneo-70b`	WhiteRabbitNeo 70B
`swiftor/deepseek-chat-v3-0324`	DeepSeek Chat V3
`swiftor/sarvam-m`	Sarvam M

🧠 Engineer Tier Models

Accessible only by Engineer users.

Model ID (`model` parameter)	Base Model / Provider Info
`swiftor/gemini-2.5-pro`	Google Gemini 2.5 Pro
`swiftor/o3-pro`	O3 Pro
`swiftor/claude-sonnet-4`	Claude Sonnet 4
`swiftor/grok-3-beta`	Grok 3 Beta
`swiftor/sonar-pro`	Sonar Pro
`swiftor/deepseek-r1`	DeepSeek R1
`swiftor/nemotron-253b`	Nemotron Ultra 253B
`swiftor/llama-4-maverick`	Meta LLaMA 4 Maverick

Rate Limits (Budgets)

Usage of the AI Gateway is subject to limits based on your subscription tier to ensure fair usage and service stability. These include both request/token limits and daily spending caps.

Rate Limits (Approximate):

Tier	Requests per Minute (RPM)	Tokens per Minute (TPM)	Notes
Starter	10	10,000	Suitable for basic usage and testing
Hacker	60	100,000	Balanced for regular development & usage
Engineer	120	500,000	Designed for intensive use & applications

Note: Specific models might have lower individual RPMs as defined in the gateway configuration (e.g., many models currently have an rpm: 6 limit), which apply in addition to the overall tier budget.

Daily Spending Limits:

In addition to rate limits, daily spending budgets are enforced based on estimated costs (calculated using token counts and model-specific pricing):

Tier	Maximum Daily Spend
Starter	$5 USD
Hacker	$25 USD
Engineer	$50 USD

These budgets reset every 24 hours.
Once the daily spending limit is reached, further requests within that 24-hour period will be blocked until the budget resets.

Exceeding either the rate limits or the daily spending budget will result in 429 Too Many Requests errors.

Request Format Example

Send a POST request to /v1/chat/completions with a JSON body like this:

json

{
  "model": "swiftor/gemini-2.0-flash", // Specify the desired Swiftor model
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain the concept of Large Language Models."
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false // Set to true for streaming responses
}

Response Format Example (Non-Streaming)

json

{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1678886400,
  "model": "swiftor/gemini-2.0-flash", // The model actually used
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 135,
    "total_tokens": 160
  }
}

For streaming responses (stream: true), you will receive a series of Server-Sent Events (SSE).

AI Gateway API ​

Overview ​

Authentication ​

Endpoints ​

Chat Completions ​

Request Format Example (/v1/chat/completions) ​

Response Format Example (Non-Streaming, /v1/chat/completions) ​

Completions (Legacy) ​

Embeddings ​

Model Routing and Naming ​

Available Models & Tiers ​

🟢 Starter Tier Models ​

🔵 Hacker Tier Models ​

🧠 Engineer Tier Models ​

Rate Limits (Budgets) ​

Request Format Example ​

Response Format Example (Non-Streaming) ​

On this page

AI Gateway API

Overview

Authentication

Endpoints

Chat Completions

Request Format Example (`/v1/chat/completions`)

Response Format Example (Non-Streaming, `/v1/chat/completions`)

Completions (Legacy)

Embeddings

Model Routing and Naming

Available Models & Tiers

🟢 Starter Tier Models

🔵 Hacker Tier Models

🧠 Engineer Tier Models

Rate Limits (Budgets)

Request Format Example

Response Format Example (Non-Streaming)