API Reference

API Reference

Complete guide to integrating ApeKey API into your application

Quick Start

ApeKey provides a unified API for accessing multiple AI providers. Use the same OpenAI-compatible format with intelligent routing, caching, and optimization.

Base URL: https://apekey.ai/v1

Authentication

All API requests require authentication using a Bearer token in the Authorization header.

Authorization: Bearer sk_live_xxx

Get your API key from the API Keys dashboard

Chat Completions

POST/v1/chat/completions

Request Body

{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "routing": {
    "prefer": "speed"
  }
}
model(required)

Recommended: Use "auto" for intelligent routing. We automatically select the best model and provider based on your request.

Advanced (Starter+ only): Specify a specific model (e.g., "gpt-4", "claude-3-opus") for explicit model selection. Free plan users must use "auto".

messages(required)

Array of message objects with "role" (user/assistant/system) and "content".

temperature(optional)

Sampling temperature between 0 and 2. Default: 1

max_tokens(optional)

Maximum number of tokens to generate. Default: varies by model

routing(optional)

Optional object to control routing behavior.

prefer(optional)

Routing preference: "speed", "quality", or "cost". Default: cost-optimized routing.

fallback(optional)

Enable automatic fallback to alternative providers if the primary provider fails. Default: true.

Response

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-3.1-8b-instant",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! I'm doing well, thank you..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "_optimization": {
    "cached": false,
    "original_tokens": 15,
    "optimized_tokens": 10,
    "tokens_saved": 5
  },
  "_routing": {
    "provider": "groq",
    "original_model": "auto",
    "mapped_model": "llama-3.1-8b-instant",
    "reason": "Best speed for this request"
  }
}

Routing Preferences

Control how requests are routed by specifying your preference for speed, quality, or cost.

By default, the system optimizes for cost. You can override this behavior by setting the routing preference.

Speed Priority

Prioritize fastest response times. Routes to Groq (fastest provider) when speed is preferred.

"prefer": "speed"

Quality Priority

Prioritize highest quality responses. Routes to providers with better models for complex tasks.

"prefer": "quality"

Cost Priority

Prioritize lowest cost. Routes to Together AI (cheapest provider) when cost is preferred. This is the default behavior.

"prefer": "cost"

Example Request:

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ],
  "routing": {
    "prefer": "quality"
  }
}

Code Examples

const response = await fetch('https://apekey.ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk_live_xxx'
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [
      { role: 'user', content: 'Hello, how are you?' }
    ],
    routing: {
      prefer: 'speed'  // 'speed', 'quality', or 'cost'
    }
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Available Models

Free Plan: Only "auto" is available. Intelligent routing automatically selects the best model for you.

Starter+ Plans: You can use "auto" or specify any model below for explicit control.

auto
Recommended

Intelligent routing - automatically selects the best provider and model based on your request. Recommended for all users.

llama-3.3-70b-versatile
Starter+

Groq Llama 3.3 70B - Fast inference with high quality (5 cents input, 8 cents output per 1M tokens)

llama-3.1-8b-instant
Starter+

Groq Llama 3.1 8B - Fastest inference (5 cents input, 8 cents output per 1M tokens)

meta-llama/Llama-3.2-3B-Instruct-Turbo
Starter+

Together AI Llama 3.2 3B - Most cost-effective (6 cents/1M tokens)

meta-llama/Llama-3.1-8B-Instruct
Starter+

Together AI Llama 3.1 8B - Good balance (6 cents/1M tokens)

accounts/fireworks/models/llama-v3p1-8b-instruct
Starter+

Fireworks AI Llama 3.1 8B - Fast and reliable (10 cents/1M tokens)

llama-v3p1-8b-instruct
Starter+

Fireworks AI Llama 3.1 8B - Alternative format (10 cents/1M tokens)

accounts/fireworks/models/llama-v3p3-70b-instruct
Starter+

Fireworks AI Llama 3.3 70B - High quality for complex tasks (10 cents/1M tokens)

Rate Limits

Rate limits are applied per API key to ensure fair usage and system stability. Limits vary by plan.

Rate limits by plan: Varies by plan: Free (10/min), Starter (30/min), Pro (60/min), Scaling (300/min)
Free tier: 200,000 tokens per month
Overage limits can be adjusted in your dashboard settings

Error Handling

CodeStatusDescription
invalid_api_key401The API key provided is invalid or missing
missing_model400The model parameter is required
invalid_messages400Messages must be a non-empty array with valid role and content
invalid_json400Invalid JSON in request body
request_too_large413Request too large. Maximum size: 10MB
too_many_messages400Too many messages. Maximum: 1000
message_too_long400Message too long. Maximum length: 1MB
deprecated_model400Model is no longer supported. Use "auto" or specify a specific model
model_selection_restricted403Specific model and provider selection is only available for Starter+ plans. Free plan users must use "auto" for intelligent routing
rate_limit_exceeded429Too many requests. Check your rate limit settings
ip_rate_limit_exceeded429IP rate limit exceeded. Too many requests from this IP address
token_limit_exceeded429Monthly token limit exceeded. Upgrade your plan
provider_rate_limit429Provider rate limit reached. The AI provider is temporarily rate limiting requests
routing_error400Unable to route request to a provider
provider_error500Error communicating with the AI provider

Features

Intelligent Routing

Automatically routes requests to the best provider based on your preferences (speed, quality, cost).

Smart Caching

Automatic response caching reduces costs by serving cached responses for identical requests.

Prompt Optimization

Automatically optimizes your prompts to reduce token usage while maintaining quality.

OpenAI Compatible

Drop-in replacement for OpenAI API. No code changes needed - just update the base URL.

Ready to get started?

Create your free account and get 200,000 tokens to start building with ApeKey API.