API Reference

Complete guide to integrating ApeKey API into your application

Quick Start

ApeKey provides a unified API for accessing multiple AI providers. Use the same OpenAI-compatible format with intelligent routing, caching, and optimization.

Base URL: https://apekey.ai/v1

Authentication

All API requests require authentication using a Bearer token in the Authorization header.

Authorization: Bearer sk_live_xxx

Get your API key from the API Keys dashboard

Chat Completions

POST/v1/chat/completions

Request Body

{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "routing": {
    "prefer": "speed"
  }
}
model(required)

Recommended: Use "auto" for intelligent routing. We automatically select the best model and provider based on your request.

Advanced (Starter+ only): Specify a specific model (e.g., "llama-3.3-70b-versatile", "meta-llama/Meta-Llama-3-8B-Instruct-Lite") for explicit model selection. Free plan users must use "auto".

messages(required)

Array of message objects with "role" (user/assistant/system) and "content".

temperature(optional)

Sampling temperature between 0 and 2. Default: 1

max_tokens(optional)

Maximum number of tokens to generate. Default: varies by model

routing(optional)

Optional object to control routing behavior.

prefer(optional)

Routing preference: "speed", "quality", or "cost". Default: cost-optimized routing.

fallback(optional)

Enable automatic fallback to alternative providers if the primary provider fails. Default: true.

Response

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-3.1-8b-instant",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! I'm doing well, thank you..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "_optimization": {
    "cached": false,
    "original_tokens": 15,
    "optimized_tokens": 10,
    "tokens_saved": 5
  },
  "_routing": {
    "provider": "groq",
    "original_model": "auto",
    "mapped_model": "llama-3.1-8b-instant",
    "reason": "Best speed for this request"
  }
}

Routing Preferences

Control how requests are routed by specifying your preference for speed, quality, or cost.

By default, the system optimizes for cost. You can override this behavior by setting the routing preference.

Speed Priority

Prioritize fastest response times. Routes to Groq (fastest provider) when speed is preferred.

"prefer": "speed"

Quality Priority

Prioritize highest quality responses. Routes to providers with better models for complex tasks.

"prefer": "quality"

Cost Priority

Prioritize lowest cost. Routes to Together AI (cheapest provider) when cost is preferred. This is the default behavior.

"prefer": "cost"

Example Request:

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ],
  "routing": {
    "prefer": "quality"
  }
}

Code Examples

const response = await fetch('https://apekey.ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk_live_xxx'
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [
      { role: 'user', content: 'Hello, how are you?' }
    ],
    routing: {
      prefer: 'speed'  // 'speed', 'quality', or 'cost'
    }
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Available Models

Free Plan: Only "auto" is available. Intelligent routing automatically selects the best model for you.

Starter+ Plans: You can use "auto" or specify any model below for explicit control.

auto
Recommended

Intelligent routing - automatically selects the best provider and model based on your request. Recommended for all users.

llama-3.3-70b-versatile
Starter+

Groq Llama 3.3 70B - Fast inference with high quality (5 cents input, 8 cents output per 1M tokens)

llama-3.1-8b-instant
Starter+

Groq Llama 3.1 8B - Fastest inference (5 cents input, 8 cents output per 1M tokens)

meta-llama/Meta-Llama-3-8B-Instruct-Lite
Starter+

Together AI Llama 3 8B Lite - Most cost-effective (10 cents/1M tokens)

meta-llama/Llama-3.3-70B-Instruct-Turbo
Starter+

Together AI Llama 3.3 70B - High quality (88 cents/1M tokens)

accounts/fireworks/models/llama-v3p1-8b-instruct
Starter+

Fireworks AI Llama 3.1 8B - Fast and reliable (10 cents/1M tokens)

llama-v3p1-8b-instruct
Starter+

Fireworks AI Llama 3.1 8B - Alternative format (10 cents/1M tokens)

accounts/fireworks/models/llama-v3p3-70b-instruct
Starter+

Fireworks AI Llama 3.3 70B - High quality for complex tasks (10 cents/1M tokens)

Rate Limits

Rate limits are applied per API key to ensure fair usage and system stability. Limits vary by plan.

Rate limits by plan: Varies by plan: Free (5/min), Starter (30/min), Pro (60/min), Scaling (300/min)
Free tier: 20,000 tokens per month
Overage limits can be adjusted in your dashboard settings

Error Handling

invalid_api_key401

The API key provided is invalid or missing

missing_model400

The model parameter is required

invalid_messages400

Messages must be a non-empty array with valid role and content

invalid_json400

Invalid JSON in request body

request_too_large413

Request too large. Maximum size: 10MB

too_many_messages400

Too many messages. Maximum: 1000

message_too_long400

Message too long. Maximum length: 1MB

deprecated_model400

Model is no longer supported. Use "auto" or specify a specific model

model_selection_restricted403

Specific model and provider selection is only available for Starter+ plans. Free plan users must use "auto" for intelligent routing

rate_limit_exceeded429

Too many requests. Check your rate limit settings

ip_rate_limit_exceeded429

IP rate limit exceeded. Too many requests from this IP address

token_limit_exceeded429

Monthly token limit exceeded. Upgrade your plan

provider_rate_limit429

Provider rate limit reached. The AI provider is temporarily rate limiting requests

routing_error400

Unable to route request to a provider

provider_error500

Error communicating with the AI provider

Features

Intelligent Routing

Automatically routes requests to the best provider based on your preferences (speed, quality, cost).

Smart Caching

Automatic response caching reduces costs by serving cached responses for identical requests.

Prompt Optimization

Automatically optimizes your prompts to reduce token usage while maintaining quality.

OpenAI Compatible

Drop-in replacement for OpenAI API. No code changes needed - just update the base URL.

Ready to get started?

Create your free account and get 20,000 tokens to start building with ApeKey API.