COO-LLM Overview

COO-LLM is an intelligent reverse proxy and load balancer for Large Language Model (LLM) APIs. It provides a unified, OpenAI-compatible interface to multiple LLM providers while intelligently distributing requests across API keys and providers based on performance, cost, and rate limits.

What makes COO-LLM special? Unlike simple API gateways, COO-LLM uses advanced algorithms to automatically choose the best provider for each request, handle rate limits gracefully, and provide deep observability into your LLM usage.

Key Features

🚀 Core Capabilities

OpenAI API Compatibility: Drop-in replacement for OpenAI API with identical request/response formats
Multi-Provider Support: Seamlessly route requests to OpenAI, Google Gemini, Anthropic Claude, and custom providers
Intelligent Load Balancing: Advanced algorithms for optimal request distribution

💰 Cost & Performance Optimization

Real-time Cost Tracking: Monitor and optimize API costs across providers
Rate Limit Management: Automatic key rotation to avoid 429 errors
Performance Monitoring: Track latency, success rates, and token usage

🔧 Enterprise-Ready

Multi-Tenant Client Management: Dynamic API client registration with provider restrictions
Advanced Security: Rate limiting, audit logging, and access control
Extensible Architecture: Plugin system for custom providers, storage, and logging
Production Observability: Prometheus metrics, structured logging, and health checks
Configuration Management: YAML-based configuration with hot-reload capabilities

📊 Advanced Features

Model Aliases: Map custom model names to provider-specific models
Request Routing: Smart routing based on model availability and performance
Admin API: Runtime configuration and monitoring endpoints

Use Cases

Cost Optimization: Automatically choose the cheapest provider for each request
High Availability: Failover between providers and keys during outages
Rate Limit Scaling: Distribute load across multiple API keys
Multi-Cloud LLM: Unified interface to multiple cloud LLM services
Development: Easy switching between providers during development

Architecture Overview

Client Apps (OpenAI SDK)
    ↓
COO-LLM Proxy
├── API Layer (OpenAI-compatible)
├── Load Balancer (Smart routing)
├── Provider Adapters (OpenAI, Gemini, Claude)
├── Storage (Redis/File/HTTP)
└── Logging (File/Prometheus/Webhook)
    ↓
External LLM Providers

Quick Example

# Configure providers
cat > config.yaml << EOF
version: "1.0"
server:
  listen: ":2906"

llm_providers:
  - id: "openai-prod"
    type: "openai"
    api_keys: ["sk-your-key"]
    base_url: "https://api.openai.com"
    model: "gpt-4o"
    pricing:
      input_token_cost: 0.002
      output_token_cost: 0.01
    limits:
      req_per_min: 200
      tokens_per_min: 100000

# Use "openai-prod:gpt-4o" directly (model_aliases removed)
EOF

# Run COO-LLM
./coo-llm -config config.yaml

# Use like OpenAI API
curl -X POST http://localhost:2906/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -d '{"model": "openai:gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

How to Use This Documentation

This documentation is organized into sections based on your role and needs:

📚 Documentation Structure

🚀 Getting Started: Quick setup and basic concepts
👤 User Guide: API usage, examples, and best practices
⚙️ Administrator Guide: Configuration, monitoring, and troubleshooting
🔧 Developer Guide: Architecture, API reference, and contributing
📖 Reference: Complete schemas, error codes, and glossary

Search Box: Use the search box in the top navigation to find specific topics, functions, or error messages.

Keyboard Shortcuts:

Ctrl+K (Linux/Windows) or Cmd+K (Mac): Open search
/: Focus search box
↑/↓: Navigate results
Enter: Open selected result

Tips for Effective Search:

Search for error messages: "rate limit exceeded"
Find configuration options: "req_per_min"
API endpoints: "chat/completions"
Provider setup: "openai configuration"

Browse by Category:

New users: Start with Getting Started → Quick Start
API integration: User Guide → API Usage → Examples
Production setup: Administrator Guide → Configuration → Monitoring
Code contributions: Developer Guide → Contributing

📖 Reading Tips

Follow the sidebar: Topics are ordered logically
Use cross-references: Links connect related concepts
Check examples: Code samples in multiple languages
Reference for details: Use Reference section for complete specs

🆘 Getting Help

GitHub Issues: Report bugs or request features
Discussions: Ask questions and share experiences
Contributing: Help improve the documentation

Getting Started

See Deployment for installation instructions and Configuration for setup details.

Key Features​

🚀 Core Capabilities​

💰 Cost & Performance Optimization​

🔧 Enterprise-Ready​

📊 Advanced Features​

Use Cases​

Architecture Overview​

Quick Example​

How to Use This Documentation​

📚 Documentation Structure​

🔍 Search & Navigation​

📖 Reading Tips​

🆘 Getting Help​

Getting Started​