Skip to content

Access 300+ AI models through Hoody AI. Your server communicates directly with 15+ inference providers through Hoody AI’s gateway (with our 5% markup on provider costs).


Your server connects directly to these providers through Hoody AI:

  • Anthropic - Claude models (Opus, Sonnet, Haiku)
  • OpenAI - GPT-4, GPT-3.5, Embeddings
  • Google (Vertex AI) - Gemini models, PaLM
  • Meta (via providers) - Llama 3.3, Llama 3.1
  • Mistral AI - Mistral Large, Medium, Mixtral
  • Deepseek - Deepseek V3, Deepseek Coder
  • Qwen (Alibaba) - Qwen 2.5, QwQ models
  • Cohere - Command R+, Embed models
  • xAI - Grok 2, Grok Vision
  • Perplexity AI - Sonar Pro, Sonar models
  • Together AI - Open model hosting platform
  • Fireworks AI - Optimized open model inference
  • And more providers…

How it works: Hoody AI adds a 5% markup (default, configurable via HOODY_AI_MODELS_MARKUP_BPS) on each provider’s base cost. Your prompts and responses flow through the Hoody AI gateway running on your own host and then out to these providers — no Hoody-operated platform servers sit between the gateway and the provider.

The Authorization: Bearer container-<name|N> shown in the HTTP examples below is a container identity/tracking token automatically minted for each container, not a Hoody API token you copy from a dashboard.


Hoody AI provides access to multiple categories of AI models:

Chat and completion models for conversations, code generation, analysis, and general-purpose tasks.

Leading Providers:

  • Anthropic - Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.0
  • OpenAI - GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
  • Google - Gemini 2.5 Pro Exp, Gemini 1.5 Pro, Gemini Flash
  • Meta - Llama 3.3 70B, Llama 3.1 405B
  • Mistral - Mistral Large, Mistral Medium, Mixtral
  • Deepseek - Deepseek V3, Deepseek Coder
  • Qwen - Qwen 2.5 72B, QwQ 32B Preview

Example usage:

Terminal window
# Chat completion from your container
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
-H "Authorization: Bearer container-1" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [{"role": "user", "content": "Hello!"}]
}'
POST Text generation request
https://ai.hoody.icu/api/v1/chat/completions
Click "Run" to execute the request

Generate images from text — through the same chat endpoint.

Hoody AI’s catalog is served from the gateway’s upstream model list, and image generation happens through the standard OpenAI-compatible /api/v1/chat/completions route, not a separate images endpoint. Models that can return images advertise "image" in their output_modalities in the /models catalog — request one of those models and the response message includes the generated image.

Example usage:

Terminal window
# Ask an image-capable model to generate an image (output via chat/completions)
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
-H "Authorization: Bearer container-1" \
-H "Content-Type: application/json" \
-d '{
"model": "<image-capable-model-id>",
"messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}]
}'

Convert text into vector embeddings for semantic search, similarity matching, and RAG applications.

Available Models:

  • OpenAI - text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
  • Cohere - embed-english-v3.0, embed-multilingual-v3.0
  • Google - text-embedding-004
  • Voyage AI - voyage-large-2, voyage-code-2

Example usage:

Terminal window
# Generate text embeddings
curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \
-H "Authorization: Bearer container-1" \
-H "Content-Type: application/json" \
-d '{"model": "openai/text-embedding-3-large", "input": "Search for similar documents"}'
POST Text embedding request
https://ai.hoody.icu/api/v1/embeddings
Click "Run" to execute the request

Code Generation & Analysis:

  • anthropic/claude-sonnet-4.5 - Best for complex code
  • anthropic/claude-opus-4.1 - Most capable, slower
  • openai/gpt-4o - Fast, good for most tasks
  • deepseek/deepseek-coder - Specialized for coding

Creative Writing:

  • openai/gpt-4o - Excellent creativity
  • anthropic/claude-opus-4.1 - Nuanced writing
  • google/gemini-2.5-pro-exp - Long-form content

Fast Responses:

  • anthropic/claude-haiku-4.0 - Ultra-fast, economical
  • openai/gpt-3.5-turbo - Quick responses
  • google/gemini-flash - Speed-optimized

Large Context:

  • anthropic/claude-sonnet-4.5 - 200K token context
  • google/gemini-2.5-pro-exp - 2M token context
  • openai/gpt-4-turbo - 128K token context

Most Economical:

  • anthropic/claude-haiku-4.0
  • meta/llama-3.3-70b
  • google/gemini-flash
  • openai/gpt-3.5-turbo

Balanced Cost/Performance:

  • anthropic/claude-sonnet-4.5
  • openai/gpt-4o
  • google/gemini-1.5-pro

Premium Capability:

  • anthropic/claude-opus-4.1
  • openai/gpt-4-turbo
  • google/gemini-2.5-pro-exp

All models use Hoody AI’s standard model identifier format:

{provider}/{model-name}

Examples:

  • anthropic/claude-sonnet-4.5
  • openai/gpt-4o
  • google/gemini-2.5-pro-exp
  • meta/llama-3.3-70b
  • deepseek/deepseek-v3

Important: Use the exact model identifier shown. Variations won’t work:

  • anthropic/claude-sonnet-4.5
  • claude-sonnet-4.5
  • claude-sonnet
  • anthropic/claude

SDK equivalent: client.api.ai.listModels() returns the same data from any supported language.


All text models support streaming responses:

POST Streaming chat completion
https://ai.hoody.icu/api/v1/chat/completions
Click "Run" to execute the request

Returns Server-Sent Events (SSE) for real-time token streaming.

Models with function calling support:

  • anthropic/claude-sonnet-4.5 and newer
  • openai/gpt-4o and newer
  • google/gemini-2.5-pro-exp
POST Function calling request
https://ai.hoody.icu/api/v1/chat/completions
Click "Run" to execute the request

Models with image understanding:

  • openai/gpt-4o
  • anthropic/claude-sonnet-4.5
  • google/gemini-2.5-pro-exp
POST Vision request with image
https://ai.hoody.icu/api/v1/chat/completions
Click "Run" to execute the request

Start cheap, scale up:

  1. Prototype with anthropic/claude-haiku-4.0 or openai/gpt-3.5-turbo
  2. Test with anthropic/claude-sonnet-4.5 or openai/gpt-4o
  3. Use anthropic/claude-opus-4.1 only when needed

Match model to task complexity:

  • Simple tasks → Use fast, cheap models
  • Complex reasoning → Use premium models
  • Bulk operations → Batch requests with economical models

Example:

// Classification: Use cheap model
const category = await classifyWithModel('anthropic/claude-haiku-4.0', text);
// Based on category, use appropriate model
const modelMap = {
'simple': 'anthropic/claude-haiku-4.0',
'moderate': 'anthropic/claude-sonnet-4.5',
'complex': 'anthropic/claude-opus-4.1'
};
const response = await processWithModel(modelMap[category], text);

Monitor AI usage per container:

Terminal window
# Check which containers have AI enabled
curl "https://api.hoody.icu/api/v1/containers" \
| jq '.data.containers[] | select(.ai == true) | {id, name, ai}'
# Enable/disable AI per container to control access
curl -X PATCH "https://api.hoody.icu/api/v1/containers/{id}" \
-d '{"ai": false}' # Disable AI to prevent usage

Note: Container-level quotas and rate limiting are not currently available. Cost management is achieved by enabling/disabling AI access per container.


Problem: Invalid model identifier

Solution: Verify exact model string:

Terminal window
# ❌ Wrong
"model": "claude-sonnet"
# ✅ Correct
"model": "anthropic/claude-sonnet-4.5"

Problem: 429 Too Many Requests

Solutions:

  • Implement exponential backoff
  • Use multiple containers to distribute load
  • Switch to faster models to reduce request count
  • Contact Hoody support for increased AI credit allocation

Problem: Long wait times for responses

Solutions:

  • Use streaming ("stream": true) for immediate feedback
  • Switch to faster models (Haiku, GPT-3.5-Turbo, Gemini Flash)
  • Reduce max_tokens parameter
  • Simplify prompts

Dynamic Model Browser (Coming Soon):

  • Live model availability
  • Real-time pricing
  • Capability comparison
  • Performance benchmarks
  • Usage recommendations

Current Resources: