Skip to content

Access 300+ AI models from your containers. Hoody AI is a self-hosted gateway running on YOUR server that gives containers simple HTTP access to Claude, GPT, Gemini, Llama, and hundreds of other models.

The simplicity: Use container-X to authenticate from any container. Fully OpenAI-compatible API—works with any library or tool that supports OpenAI’s format (OpenAI SDK, LangChain, hoody-agent, Claude Code, Cline, Cursor, etc.). Zero configuration needed.

The security: No real API keys in containers—just container-X tokens that only work from your infrastructure. Safe freelancer onboarding, vibe coding, and AI-generated code without key exposure risk.

Transparent pricing: Hoody AI adds a 5% markup on model provider costs. See Models & Pricing for current model pricing and cost optimization strategies.


Hoody AI is an AI gateway that runs on the HOST (your bare metal server), not inside containers.

Architecture:

Your Server
├── Hoody AI Gateway (Host Only)
│ ├── URL: https://ai.hoody.icu/api/v1
│ ├── Credits: Your Hoody AI credits
│ └── Accessible: Only from containers on this server
└── Container 1, 2, 3...
└── Auth: "container-1" (proves container identity)

Privacy by architecture: Hoody AI runs on YOUR server (the host). Traffic flows from your containers through the Hoody AI gateway running on your own server, then out to AI providers (OpenAI, Anthropic, Google, Meta, etc.) with a 5% markup. No Hoody-operated platform servers sit between your gateway and the inference providers — only your host and the provider. For complete control, you can proxy to your own AI providers via hoody-exec.


When you create a container, it automatically gets access to Hoody AI via a container identity token:

API Key: container-{containerName}

Example: Container named dev-env → identity token is container-dev-env. Containers created without a custom name are addressable by their numbered form (container-1, container-2, …); both the name-derived and numbered forms are accepted — they identify the same container, not an API key you can copy.

Hoody AI is fully OpenAI-compatible, so it works with:

  • Any OpenAI SDK (Python, Node.js, Go, etc.)
  • AI frameworks (LangChain, LlamaIndex, etc.)
  • AI coding tools (Cursor, Windsurf, Claude Code, Cline, Continue.dev)
  • hoody-agent - Native integration for container orchestration
Terminal window
# hoody-agent uses Hoody AI automatically
# Just set base URL and key in config
curl -X POST "https://{projectId}-{containerId}-workspaces-1.{node}.containers.hoody.icu/api/tasks" \
-d '{
"prompt": "Build a todo app",
"ai_config": {
"base_url": "https://ai.hoody.icu/api/v1",
"api_key": "container-dev-env",
"model": "anthropic/claude-sonnet-4.5"
}
}'

Once your container is running, call Hoody AI directly:

import { HoodyClient } from '@hoody-ai/hoody-sdk';
const client = new HoodyClient({ baseURL: 'https://api.hoody.icu', token: process.env.HOODY_TOKEN });
// List available AI models
const models = await client.api.ai.listModels();
console.log(models.data.models);
// For chat completions, call the AI gateway directly from your container:
const response = await fetch('https://ai.hoody.icu/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer container-dev-env',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4.5',
messages: [{ role: 'user', content: 'Hello!' }]
})
});
const data = await response.json();
console.log(data.choices[0].message.content);

AI access is enabled by default for all containers. Each container can immediately start making AI requests using its container-X authentication token.


300+ models from 15+ AI inference providers through one API. Your server communicates directly with these providers through Hoody AI’s gateway (with our 5% markup):

Major Inference Providers:

  • Anthropic - Claude Opus 4.1, Sonnet 4.5, Haiku 4.0
  • OpenAI - GPT-4o, GPT-4 Turbo, GPT-3.5
  • Google (Vertex AI) - Gemini 2.5 Pro Exp, Gemini 1.5 Pro, Gemini Flash
  • Meta (via providers) - Llama 3.3 70B, Llama 3.1 405B, Llama 3.1 70B
  • Mistral AI - Mistral Large, Mistral Medium, Mixtral 8x7B
  • Deepseek - Deepseek V3, Deepseek Coder V2
  • Qwen (Alibaba) - Qwen 2.5 72B, QwQ 32B Preview
  • Cohere - Command R+, Command R, Embed models
  • xAI - Grok 2, Grok 2 Vision
  • Perplexity AI - Sonar Pro, Sonar models
  • Together AI - Hosting platform for open models
  • Fireworks AI - Optimized inference for open models
  • And more providers…

Browse the complete model list: See Models & Pricing for all available models, current pricing, and provider-specific capabilities.

No provider management needed: One API, hundreds of models. No separate accounts or SDKs.

Bring Your Own Provider — 75+ Options, No Lock-In

Section titled “Bring Your Own Provider — 75+ Options, No Lock-In”

You’re never forced into Hoody AI’s gateway. Connect any provider directly: set ANTHROPIC_API_KEY, OPENAI_API_KEY, or any provider’s environment variable inside a container and call them straight from your code. Point hoody-agent, Cursor, or Cline at any OpenAI-compatible endpoint — local Ollama, Azure OpenAI, Together AI, an enterprise proxy — and the entire stack works without modification. Switch models mid-conversation by swapping the active profile. A/B test Claude versus GPT-4o across two containers simultaneously. Today Claude Sonnet, tomorrow your own fine-tuned Llama, next week whatever model wins the benchmarks. It’s a config change, not a migration.

Terminal window
# Direct provider access — set in container environment
ANTHROPIC_API_KEY=sk-ant-... # Anthropic direct
OPENAI_API_KEY=sk-... # OpenAI direct
OPENAI_BASE_URL=http://localhost:11434/v1 # Local Ollama
# Or route through Hoody AI for key-less container auth
# API key: container-{containerName} → works from this server only

Built specifically for container-based workflows:

  • Each container gets automatic AI access
  • Use container-X format for authentication
  • Works immediately with hoody-agent, AI coding tools
  • No environment variable management
  • No credential rotation needed

Container-restricted authentication means:

  • No real API keys stored in containers
  • Access automatically tied to container lifecycle
  • Safe freelancer/contractor onboarding
  • Protection against key leakage in AI-generated code

The game-changer: Because Hoody AI requests flow through HTTP, you can intercept and modify everything using hoody-exec as a MITM (Man-In-The-Middle) proxy.

The simplicity: Deploy a MITM script once, then just change the URL in your AI client.

Without MITM: https://ai.hoody.icu/api/v1
With MITM: https://your-project-container-exec-1.node-us.containers.hoody.icu/api/v1
Switch on-demand. No code changes. Complete control.

What you gain:

  • 🎛️ Tool Call Tampering - Intercept and modify AI tool calls (redirect file writes, block dangerous commands, modify paths)
  • 👤 Human-in-the-Loop - Pause AI for approval on high-stakes operations (deployments, deletions, payments)
  • 🤖 Agent Cascades - Trigger other agent instances via HTTP, building multi-agent systems
  • 💰 Cost Optimization - Compress prompts, cache responses, route to cheaper models (40-70% savings)
  • 🧠 Context Injection - Auto-enhance AI with your knowledge base, company policies, codebase docs
  • 📊 Complete Observability - Log every prompt, response, and decision for debugging and compliance

Quick example - Sandbox all AI file operations in 30 lines:

// /api/ai-proxy.js in hoody-exec
// @mode worker
const response = await fetch('https://ai.hoody.icu/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer container-1',
'Content-Type': 'application/json'
},
body: JSON.stringify(req.body)
});
const data = await response.json();
// Intercept and redirect file operations to sandbox
if (data.choices[0].message.tool_calls) {
data.choices[0].message.tool_calls.forEach(call => {
if (call.function.name === 'write_file') {
const args = JSON.parse(call.function.arguments);
args.path = '/sandbox' + args.path; // Force sandbox
call.function.arguments = JSON.stringify(args);
}
});
}
return res.json(data);

AI can code freely. All writes automatically sandboxed. Zero risk.

See the full guide: Intercept & Control AI → covers tool call interception, agent cascade orchestration, human-in-the-loop workflows, stalling patterns, context injection, cost optimization, and more.


Give contractors container access with container-X API key. They can use AI tools (Cursor, Windsurf, Claude Code) without ever seeing your real keys. Delete container when project ends—instant revocation.

Build applications that use AI without embedding real API keys. Users can’t extract keys even with full source access. Keys only work from your infrastructure.

Let AI generate entire applications. Even if generated code tries to log/exfiltrate API keys, container-X is useless outside your server.

Each client gets their own container with isolated AI access. Enable/disable per client. Track usage per container.

Developers use AI coding assistants (Cursor, Cline) with container-dev key. Production uses container-prod. No key sharing between environments.


Use descriptive container names—they become part of the API key:

  • container-prod-frontend (clear purpose)
  • container-dev-alice (per-developer)
  • container-client-acme (per-client)

Specify exact models in AI requests to control costs:

{
"model": "anthropic/claude-haiku-4.0"
}

Use cheaper models (Haiku) for simple tasks, more capable models (Opus) for complex ones.

Monitor which containers have AI access enabled:

Terminal window
# List all containers and their AI status
hoody containers list -o json | jq '.[] | {id, name, ai, status}'

Create snapshot before letting AI generate large amounts of code. Restore if results are undesirable.


Can I use my own API keys or other AI gateways?

Section titled “Can I use my own API keys or other AI gateways?”

Absolutely. Because it’s YOUR infrastructure, you’re completely free to use any AI provider or gateway you want:

  • Set environment variables in your containers and use providers directly (OpenAI, Anthropic, etc.)
  • Use other AI gateways, Together AI, or self-hosted models
  • Proxy through hoody-exec to any external service (see MITM section above)
  • Mix multiple providers with custom routing logic

Trade-offs to consider:

  • Using raw API keys in containers exposes them to container processes (security risk if containers are compromised)
  • Hoody AI’s container-X authentication provides key isolation—keys only work from your infrastructure
  • With custom proxies via hoody-exec, you control everything but manage your own security

The freedom: It’s your bare metal server, your containers, your choice of AI provider. Hoody AI is convenient and secure, but not required.

What happens if I copy the container-X key elsewhere?

Section titled “What happens if I copy the container-X key elsewhere?”

It won’t work. Keys are cryptographically bound to the specific container and your server. Copied keys fail authentication.

Can containers access each other’s AI requests?

Section titled “Can containers access each other’s AI requests?”

No. Each container’s AI authentication is isolated. Container A cannot see or intercept Container B’s AI traffic.

Does this work with local models (Ollama, LM Studio)?

Section titled “Does this work with local models (Ollama, LM Studio)?”

Hoody AI is for cloud providers. For local models, run them directly in containers and access via localhost.

See Models for the complete list. Hoody AI supports 300+ models from providers including Anthropic, OpenAI, Google, Meta, Mistral, and more.


Problem: AI requests return 401 Unauthorized

Solutions:

  • Check API key format: container-{containerName} (exact match)
  • Confirm request is coming from the correct container
  • Ensure container is running (stopped containers can’t access AI)
  • Verify Hoody AI service is running on your server

Problem: Requested model doesn’t exist

Solution: Model IDs pass through from Hoody AI’s upstream catalog, so the live list is authoritative. Pull it from your gateway and use the exact ID returned:

Terminal window
curl -s https://ai.hoody.icu/api/v1/models -H "Authorization: Bearer container-$NAME" | jq -r '.data[].id'
{"model": "anthropic/claude-sonnet-4.5"} // ✅ exact ID returned by /models
{"model": "claude-sonnet"} // ❌ short names are not valid IDs

Problem: Too many requests from container

Solutions:

  • Implement request throttling in application code
  • Check your Hoody AI credits balance
  • Distribute workload across multiple containers
  • Use more efficient models (e.g., Haiku instead of Opus)

Problem: AI requests timing out

Solutions:

  • Verify container has network access (firewall rules)
  • Check if Hoody AI service is running on host
  • Ensure container isn’t being rate-limited at network level
  • Try with simpler prompt to isolate issue

Learn More:

Use AI in Apps:

Related Concepts: