Skip to main content

COO-LLM Overview

COO-LLM is an intelligent reverse proxy and load balancer for Large Language Model (LLM) APIs. It provides a unified, OpenAI-compatible interface to multiple LLM providers while intelligently distributing requests across API keys and providers based on performance, cost, and rate limits.

What makes COO-LLM special? Unlike simple API gateways, COO-LLM uses advanced algorithms to automatically choose the best provider for each request, handle rate limits gracefully, and provide deep observability into your LLM usage.

Key Features

🚀 Core Capabilities

  • OpenAI API Compatibility: Drop-in replacement for OpenAI API with identical request/response formats
  • Multi-Provider Support: Seamlessly route requests to OpenAI, Google Gemini, Anthropic Claude, and custom providers
  • Intelligent Load Balancing: Advanced algorithms for optimal request distribution

💰 Cost & Performance Optimization

  • Real-time Cost Tracking: Monitor and optimize API costs across providers
  • Rate Limit Management: Automatic key rotation to avoid 429 errors
  • Performance Monitoring: Track latency, success rates, and token usage

🔧 Enterprise-Ready

  • Multi-Tenant Client Management: Dynamic API client registration with provider restrictions
  • Advanced Security: Rate limiting, audit logging, and access control
  • Extensible Architecture: Plugin system for custom providers, storage, and logging
  • Production Observability: Prometheus metrics, structured logging, and health checks
  • Configuration Management: YAML-based configuration with hot-reload capabilities

📊 Advanced Features

  • Model Aliases: Map custom model names to provider-specific models
  • Request Routing: Smart routing based on model availability and performance
  • Admin API: Runtime configuration and monitoring endpoints

Use Cases

  • Cost Optimization: Automatically choose the cheapest provider for each request
  • High Availability: Failover between providers and keys during outages
  • Rate Limit Scaling: Distribute load across multiple API keys
  • Multi-Cloud LLM: Unified interface to multiple cloud LLM services
  • Development: Easy switching between providers during development

Architecture Overview

Client Apps (OpenAI SDK)

COO-LLM Proxy
├── API Layer (OpenAI-compatible)
├── Load Balancer (Smart routing)
├── Provider Adapters (OpenAI, Gemini, Claude)
├── Storage (Redis/File/HTTP)
└── Logging (File/Prometheus/Webhook)

External LLM Providers

Quick Example

# Configure providers
cat > config.yaml << EOF
version: "1.0"
server:
listen: ":2906"

llm_providers:
- id: "openai-prod"
type: "openai"
api_keys: ["sk-your-key"]
base_url: "https://api.openai.com"
model: "gpt-4o"
pricing:
input_token_cost: 0.002
output_token_cost: 0.01
limits:
req_per_min: 200
tokens_per_min: 100000

# Use "openai-prod:gpt-4o" directly (model_aliases removed)
EOF

# Run COO-LLM
./coo-llm -config config.yaml

# Use like OpenAI API
curl -X POST http://localhost:2906/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model": "openai:gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

How to Use This Documentation

This documentation is organized into sections based on your role and needs:

📚 Documentation Structure

  • 🚀 Getting Started: Quick setup and basic concepts
  • 👤 User Guide: API usage, examples, and best practices
  • ⚙️ Administrator Guide: Configuration, monitoring, and troubleshooting
  • 🔧 Developer Guide: Architecture, API reference, and contributing
  • 📖 Reference: Complete schemas, error codes, and glossary

🔍 Search & Navigation

Search Box: Use the search box in the top navigation to find specific topics, functions, or error messages.

Keyboard Shortcuts:

  • Ctrl+K (Linux/Windows) or Cmd+K (Mac): Open search
  • /: Focus search box
  • ↑/↓: Navigate results
  • Enter: Open selected result

Tips for Effective Search:

  • Search for error messages: "rate limit exceeded"
  • Find configuration options: "req_per_min"
  • API endpoints: "chat/completions"
  • Provider setup: "openai configuration"

Browse by Category:

  • New users: Start with Getting Started → Quick Start
  • API integration: User Guide → API Usage → Examples
  • Production setup: Administrator Guide → Configuration → Monitoring
  • Code contributions: Developer Guide → Contributing

📖 Reading Tips

  • Follow the sidebar: Topics are ordered logically
  • Use cross-references: Links connect related concepts
  • Check examples: Code samples in multiple languages
  • Reference for details: Use Reference section for complete specs

🆘 Getting Help

  • GitHub Issues: Report bugs or request features
  • Discussions: Ask questions and share experiences
  • Contributing: Help improve the documentation

Getting Started

See Deployment for installation instructions and Configuration for setup details.