The Swiftor AI Gateway acts as a unified interface to access a variety of Large Language Models (LLMs) from different providers. It utilizes LiteLLM to provide an OpenAI-compatible API endpoint, simplifying integration and allowing seamless switching between models based on user access tiers.
Key features include:
swiftor/
prefix.All requests to the AI Gateway must be authenticated. Include your Swiftor API Key in the Authorization
header as a Bearer token.
Authorization: Bearer YOUR_SWIFTOR_API_KEY
The Swiftor AI Gateway provides OpenAI-compatible endpoints. Due to its foundation on LiteLLM, it supports multiple path variations for common operations to ensure broad compatibility.
Used to generate text completions based on a conversation history (messages).
Supported Paths:
POST /v1/chat/completions
POST /chat/completions
POST /engines/{model}/chat/completions
POST /openai/deployments/{model}/chat/completions
The {model}
in the path parameters should be the Swiftor model ID (e.g., swiftor/gemini-2.0-flash-exp
).
(Primary recommended path is /v1/chat/completions
)
/v1/chat/completions
) β{
"model": "swiftor/gemini-2.0-flash-exp", // Specify the desired Swiftor model
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain the concept of Large Language Models."
}
],
"max_tokens": 150,
"temperature": 0.7,
"stream": false // Set to true for streaming responses
}
/v1/chat/completions
) β{
"id": "chatcmpl-xxxxxxxxxxxxxxxxx",
"object": "chat.completion",
"created": 1678886400,
"model": "swiftor/gemini-2.0-flash-exp", // The model actually used
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 135,
"total_tokens": 160
}
}
For streaming responses (stream: true
), you will receive a series of Server-Sent Events (SSE).
Used for generating text completions based on a single prompt string (older API style).
Supported Paths:
POST /v1/completions
POST /completions
POST /engines/{model}/completions
POST /openai/deployments/{model}/completions
The {model}
in the path parameters should be the Swiftor model ID.
(Primary recommended path is /v1/completions
)
Request/Response format typically follows the legacy OpenAI /v1/completions
structure (using prompt
instead of messages
).
Used to generate vector embeddings for input text, typically used for semantic search or clustering.
Supported Paths:
POST /v1/embeddings
POST /embeddings
POST /engines/{model}/embeddings
POST /openai/deployments/{model}/embeddings
The {model}
in the path parameters should be the Swiftor model ID compatible with embeddings.
(Primary recommended path is /v1/embeddings
)
Request/Response format typically follows the OpenAI /v1/embeddings
structure (using input
and returning embedding vectors). Model compatibility for embeddings needs to be verified.
swiftor/
prefix followed by the model identifier (e.g., swiftor/gemini-2.0-flash-exp
, swiftor/claude-3.5-sonnet
).config.yaml
(primarily OpenRouter in this configuration).Model access is restricted based on your subscription tier. Higher tiers include access to models from lower tiers.
Accessible by Starter, Hacker, and Engineer users.
Model ID (model parameter) | Base Model / Provider Info |
---|---|
swiftor/gemini-2.0-flash-exp | Google Gemini 2.0 Flash Exp |
swiftor/mistral-7b-instruct | Mistral 7B Instruct |
swiftor/llama-3.2-3b-instruct | Meta LLaMA 3.2 3B Instruct |
swiftor/qwen-2.5-7b-instruct | Qwen 2.5 7B Instruct |
swiftor/deepseek-chat-v3 | DeepSeek Chat V3 |
Accessible by Hacker and Engineer users.
Model ID (model parameter) | Base Model / Provider Info |
---|---|
swiftor/gemini-2.5-pro | Google Gemini 2.5 Pro Exp |
swiftor/mistral-small-24b | Mistral 3.1 Small 24B Instruct |
swiftor/llama-3.2-vision-11b | Meta LLaMA 3.2 Vision 11B |
swiftor/deepseek-r1-llama-70b | DeepSeek R1 Distill LLaMA 70B |
swiftor/qwen-2.5-72b | Qwen 2.5 72B Instruct |
swiftor/gemma-3-12b | Google Gemma 3 12B IT |
swiftor/dolphin-r1-24b | Dolphin 3.0 R1 Mistral 24B |
swiftor/arliai-qwq-32b | ARLIAI QWQ 32B RPR V1 |
swiftor/mistral-nemo | Mistral Nemo |
Accessible only by Engineer users.
Model ID (model parameter) | Base Model / Provider Info |
---|---|
swiftor/claude-3.5-sonnet | Anthropic Claude 3.5 Sonnet |
swiftor/whiterabbitneo-70b | WhiteRabbitNeo 2 70B |
swiftor/llama-3.3-70b | Meta LLaMA 3.3 70B Instruct |
swiftor/nemotron-49b | Nvidia Nemotron 3.3 Super 49B |
swiftor/nemotron-253b-ultra | Nvidia Nemotron 3.1 Ultra 253B |
swiftor/kimi-vl | Moonshot AI Kimi VL A3B Thinking |
swiftor/deepcoder-14b | Agentica DeepCoder 14B Preview |
swiftor/llama-4-maverick | Meta LLaMA 4 Maverick |
swiftor/llama-4-scout | Meta LLaMA 4 Scout |
Usage of the AI Gateway is subject to limits based on your subscription tier to ensure fair usage and service stability. These include both request/token limits and daily spending caps.
Rate Limits (Approximate):
Tier | Requests per Minute (RPM) | Tokens per Minute (TPM) | Notes |
---|---|---|---|
Starter | 10 | 10,000 | Suitable for basic usage and testing |
Hacker | 60 | 100,000 | Balanced for regular development & usage |
Engineer | 120 | 500,000 | Designed for intensive use & applications |
Note: Specific models might have lower individual RPMs as defined in the gateway configuration (e.g., many models currently have an rpm: 6
limit), which apply in addition to the overall tier budget.
Daily Spending Limits:
In addition to rate limits, daily spending budgets are enforced based on estimated costs (calculated using token counts and model-specific pricing):
Tier | Maximum Daily Spend |
---|---|
Starter | $5 USD |
Hacker | $25 USD |
Engineer | $50 USD |
Exceeding either the rate limits or the daily spending budget will result in 429 Too Many Requests
errors.
Send a POST request to /v1/chat/completions
with a JSON body like this:
{
"model": "swiftor/gemini-2.0-flash-exp", // Specify the desired Swiftor model
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain the concept of Large Language Models."
}
],
"max_tokens": 150,
"temperature": 0.7,
"stream": false // Set to true for streaming responses
}
{
"id": "chatcmpl-xxxxxxxxxxxxxxxxx",
"object": "chat.completion",
"created": 1678886400,
"model": "swiftor/gemini-2.0-flash-exp", // The model actually used
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Large Language Models (LLMs) are advanced artificial intelligence systems..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 135,
"total_tokens": 160
}
}
For streaming responses (stream: true
), you will receive a series of Server-Sent Events (SSE).