logo

AI Gateway API ​

Overview ​

The Swiftor AI Gateway acts as a unified interface to access a variety of Large Language Models (LLMs) from different providers. It utilizes LiteLLM to provide an OpenAI-compatible API endpoint, simplifying integration and allowing seamless switching between models based on user access tiers.

Key features include:

  • OpenAI Compatibility: Use familiar API request/response formats.
  • Unified Model Access: Access diverse models using a consistent swiftor/ prefix.
  • Tiered Access Control: Model availability is determined by the user's subscription tier (Starter, Hacker, Engineer).
  • Rate Limiting: Usage is subject to rate limits based on the user's tier.

Authentication ​

All requests to the AI Gateway must be authenticated. Include your Swiftor API Key in the Authorization header as a Bearer token.

http
Authorization: Bearer YOUR_SWIFTOR_API_KEY

Endpoints ​

The Swiftor AI Gateway provides OpenAI-compatible endpoints. Due to its foundation on LiteLLM, it supports multiple path variations for common operations to ensure broad compatibility.

Chat Completions ​

Used to generate text completions based on a conversation history (messages).

Supported Paths:

http
POST /v1/chat/completions
POST /chat/completions
POST /engines/{model}/chat/completions
POST /openai/deployments/{model}/chat/completions

The {model} in the path parameters should be the Swiftor model ID (e.g., swiftor/gemini-2.0-flash-exp).

(Primary recommended path is /v1/chat/completions)

Request Format Example (/v1/chat/completions) ​

json
{
  "model": "swiftor/gemini-2.0-flash-exp", // Specify the desired Swiftor model
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain the concept of Large Language Models."
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false // Set to true for streaming responses
}

Response Format Example (Non-Streaming, /v1/chat/completions) ​

json
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1678886400,
  "model": "swiftor/gemini-2.0-flash-exp", // The model actually used
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 135,
    "total_tokens": 160
  }
}

For streaming responses (stream: true), you will receive a series of Server-Sent Events (SSE).


Completions (Legacy) ​

Used for generating text completions based on a single prompt string (older API style).

Supported Paths:

http
POST /v1/completions
POST /completions
POST /engines/{model}/completions
POST /openai/deployments/{model}/completions

The {model} in the path parameters should be the Swiftor model ID.

(Primary recommended path is /v1/completions)

Request/Response format typically follows the legacy OpenAI /v1/completions structure (using prompt instead of messages).


Embeddings ​

Used to generate vector embeddings for input text, typically used for semantic search or clustering.

Supported Paths:

http
POST /v1/embeddings
POST /embeddings
POST /engines/{model}/embeddings
POST /openai/deployments/{model}/embeddings

The {model} in the path parameters should be the Swiftor model ID compatible with embeddings.

(Primary recommended path is /v1/embeddings)

Request/Response format typically follows the OpenAI /v1/embeddings structure (using input and returning embedding vectors). Model compatibility for embeddings needs to be verified.

Model Routing and Naming ​

  • Model Names: Always specify the desired model using the swiftor/ prefix followed by the model identifier (e.g., swiftor/gemini-2.0-flash-exp, swiftor/claude-3.5-sonnet).
  • Routing: The gateway routes the request to the appropriate underlying model provider configured in config.yaml (primarily OpenRouter in this configuration).

Available Models & Tiers ​

Model access is restricted based on your subscription tier. Higher tiers include access to models from lower tiers.

🟒 Starter Tier Models ​

Accessible by Starter, Hacker, and Engineer users.

Model ID (model parameter)Base Model / Provider Info
swiftor/gemini-2.0-flash-expGoogle Gemini 2.0 Flash Exp
swiftor/mistral-7b-instructMistral 7B Instruct
swiftor/llama-3.2-3b-instructMeta LLaMA 3.2 3B Instruct
swiftor/qwen-2.5-7b-instructQwen 2.5 7B Instruct
swiftor/deepseek-chat-v3DeepSeek Chat V3

πŸ”΅ Hacker Tier Models ​

Accessible by Hacker and Engineer users.

Model ID (model parameter)Base Model / Provider Info
swiftor/gemini-2.5-proGoogle Gemini 2.5 Pro Exp
swiftor/mistral-small-24bMistral 3.1 Small 24B Instruct
swiftor/llama-3.2-vision-11bMeta LLaMA 3.2 Vision 11B
swiftor/deepseek-r1-llama-70bDeepSeek R1 Distill LLaMA 70B
swiftor/qwen-2.5-72bQwen 2.5 72B Instruct
swiftor/gemma-3-12bGoogle Gemma 3 12B IT
swiftor/dolphin-r1-24bDolphin 3.0 R1 Mistral 24B
swiftor/arliai-qwq-32bARLIAI QWQ 32B RPR V1
swiftor/mistral-nemoMistral Nemo

🧠 Engineer Tier Models ​

Accessible only by Engineer users.

Model ID (model parameter)Base Model / Provider Info
swiftor/claude-3.5-sonnetAnthropic Claude 3.5 Sonnet
swiftor/whiterabbitneo-70bWhiteRabbitNeo 2 70B
swiftor/llama-3.3-70bMeta LLaMA 3.3 70B Instruct
swiftor/nemotron-49bNvidia Nemotron 3.3 Super 49B
swiftor/nemotron-253b-ultraNvidia Nemotron 3.1 Ultra 253B
swiftor/kimi-vlMoonshot AI Kimi VL A3B Thinking
swiftor/deepcoder-14bAgentica DeepCoder 14B Preview
swiftor/llama-4-maverickMeta LLaMA 4 Maverick
swiftor/llama-4-scoutMeta LLaMA 4 Scout

Rate Limits (Budgets) ​

Usage of the AI Gateway is subject to limits based on your subscription tier to ensure fair usage and service stability. These include both request/token limits and daily spending caps.

Rate Limits (Approximate):

TierRequests per Minute (RPM)Tokens per Minute (TPM)Notes
Starter1010,000Suitable for basic usage and testing
Hacker60100,000Balanced for regular development & usage
Engineer120500,000Designed for intensive use & applications

Note: Specific models might have lower individual RPMs as defined in the gateway configuration (e.g., many models currently have an rpm: 6 limit), which apply in addition to the overall tier budget.

Daily Spending Limits:

In addition to rate limits, daily spending budgets are enforced based on estimated costs (calculated using token counts and model-specific pricing):

TierMaximum Daily Spend
Starter$5 USD
Hacker$25 USD
Engineer$50 USD
  • These budgets reset every 24 hours.
  • Once the daily spending limit is reached, further requests within that 24-hour period will be blocked until the budget resets.

Exceeding either the rate limits or the daily spending budget will result in 429 Too Many Requests errors.

Request Format Example ​

Send a POST request to /v1/chat/completions with a JSON body like this:

json
{
  "model": "swiftor/gemini-2.0-flash-exp", // Specify the desired Swiftor model
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Explain the concept of Large Language Models."
    }
  ],
  "max_tokens": 150,
  "temperature": 0.7,
  "stream": false // Set to true for streaming responses
}

Response Format Example (Non-Streaming) ​

json
{
  "id": "chatcmpl-xxxxxxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1678886400,
  "model": "swiftor/gemini-2.0-flash-exp", // The model actually used
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 135,
    "total_tokens": 160
  }
}

For streaming responses (stream: true), you will receive a series of Server-Sent Events (SSE).