Apekey

API Reference

Complete guide to integrating ApeKey API into your application

Quick Start

ApeKey provides a unified API for accessing multiple AI providers. Use the same OpenAI-compatible format with intelligent routing, caching, and optimization.

Base URL: https://apekey.ai/v1

Authentication

All API requests require authentication using a Bearer token in the Authorization header.

Authorization: Bearer sk_live_xxx

Get your API key from the API Keys dashboard

Chat Completions

POST/v1/chat/completions

Request Body

{
  "model": "auto",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "routing": {
    "prefer": "speed"
  }
}

model(required)

Recommended: Use "auto" for intelligent routing. We automatically select the best model and provider based on your request.

Advanced (Starter+ only): Specify a specific model (e.g., "gpt-4", "claude-3-opus") for explicit model selection. Free plan users must use "auto".

messages(required)

Array of message objects with "role" (user/assistant/system) and "content".

temperature(optional)

Sampling temperature between 0 and 2. Default: 1

max_tokens(optional)

Maximum number of tokens to generate. Default: varies by model

routing(optional)

Optional object to control routing behavior.

prefer(optional)

Routing preference: "speed", "quality", or "cost". Default: cost-optimized routing.

fallback(optional)

Enable automatic fallback to alternative providers if the primary provider fails. Default: true.

Response

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "llama-3.1-8b-instant",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! I'm doing well, thank you..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "_optimization": {
    "cached": false,
    "original_tokens": 15,
    "optimized_tokens": 10,
    "tokens_saved": 5
  },
  "_routing": {
    "provider": "groq",
    "original_model": "auto",
    "mapped_model": "llama-3.1-8b-instant",
    "reason": "Best speed for this request"
  }
}

Routing Preferences

Control how requests are routed by specifying your preference for speed, quality, or cost.

By default, the system optimizes for cost. You can override this behavior by setting the routing preference.

Speed Priority

Prioritize fastest response times. Routes to Groq (fastest provider) when speed is preferred.

"prefer": "speed"

Quality Priority

Prioritize highest quality responses. Routes to providers with better models for complex tasks.

"prefer": "quality"

Cost Priority

Prioritize lowest cost. Routes to Together AI (cheapest provider) when cost is preferred. This is the default behavior.

"prefer": "cost"

Example Request:

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Explain quantum computing"}
  ],
  "routing": {
    "prefer": "quality"
  }
}

Code Examples

const response = await fetch('https://apekey.ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer sk_live_xxx'
  },
  body: JSON.stringify({
    model: 'auto',
    messages: [
      { role: 'user', content: 'Hello, how are you?' }
    ],
    routing: {
      prefer: 'speed'  // 'speed', 'quality', or 'cost'
    }
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

Available Models

Free Plan: Only "auto" is available. Intelligent routing automatically selects the best model for you.

Starter+ Plans: You can use "auto" or specify any model below for explicit control.

auto

Recommended

Intelligent routing - automatically selects the best provider and model based on your request. Recommended for all users.

llama-3.3-70b-versatile

Starter+

Groq Llama 3.3 70B - Fast inference with high quality (5 cents input, 8 cents output per 1M tokens)

llama-3.1-8b-instant

Starter+

Groq Llama 3.1 8B - Fastest inference (5 cents input, 8 cents output per 1M tokens)

meta-llama/Llama-3.2-3B-Instruct-Turbo

Starter+

Together AI Llama 3.2 3B - Most cost-effective (6 cents/1M tokens)

meta-llama/Llama-3.1-8B-Instruct

Starter+

Together AI Llama 3.1 8B - Good balance (6 cents/1M tokens)

accounts/fireworks/models/llama-v3p1-8b-instruct

Starter+

Fireworks AI Llama 3.1 8B - Fast and reliable (10 cents/1M tokens)

llama-v3p1-8b-instruct

Starter+

Fireworks AI Llama 3.1 8B - Alternative format (10 cents/1M tokens)

accounts/fireworks/models/llama-v3p3-70b-instruct

Starter+

Fireworks AI Llama 3.3 70B - High quality for complex tasks (10 cents/1M tokens)

Rate Limits

Rate limits are applied per API key to ensure fair usage and system stability. Limits vary by plan.

Rate limits by plan: Varies by plan: Free (10/min), Starter (30/min), Pro (60/min), Scaling (300/min)

Free tier: 200,000 tokens per month

Overage limits can be adjusted in your dashboard settings

Error Handling

Code	Status	Description
`invalid_api_key`	401	The API key provided is invalid or missing
`missing_model`	400	The model parameter is required
`invalid_messages`	400	Messages must be a non-empty array with valid role and content
`invalid_json`	400	Invalid JSON in request body
`request_too_large`	413	Request too large. Maximum size: 10MB
`too_many_messages`	400	Too many messages. Maximum: 1000
`message_too_long`	400	Message too long. Maximum length: 1MB
`deprecated_model`	400	Model is no longer supported. Use "auto" or specify a specific model
`model_selection_restricted`	403	Specific model and provider selection is only available for Starter+ plans. Free plan users must use "auto" for intelligent routing
`rate_limit_exceeded`	429	Too many requests. Check your rate limit settings
`ip_rate_limit_exceeded`	429	IP rate limit exceeded. Too many requests from this IP address
`token_limit_exceeded`	429	Monthly token limit exceeded. Upgrade your plan
`provider_rate_limit`	429	Provider rate limit reached. The AI provider is temporarily rate limiting requests
`routing_error`	400	Unable to route request to a provider
`provider_error`	500	Error communicating with the AI provider

Features

Intelligent Routing

Automatically routes requests to the best provider based on your preferences (speed, quality, cost).

Smart Caching

Automatic response caching reduces costs by serving cached responses for identical requests.

Prompt Optimization

Automatically optimizes your prompts to reduce token usage while maintaining quality.

OpenAI Compatible

Drop-in replacement for OpenAI API. No code changes needed - just update the base URL.

Ready to get started?

Create your free account and get 200,000 tokens to start building with ApeKey API.

Get Started Free View API Keys