Hoody AI Models & Pricing
Section titled “Hoody AI Models & Pricing”Access 300+ AI models through Hoody AI. Your server communicates directly with 15+ inference providers through Hoody AI’s gateway (with our 5% markup on provider costs).
Inference Providers
Section titled “Inference Providers”Your server connects directly to these providers through Hoody AI:
- Anthropic - Claude models (Opus, Sonnet, Haiku)
- OpenAI - GPT-4, GPT-3.5, Embeddings
- Google (Vertex AI) - Gemini models, PaLM
- Meta (via providers) - Llama 3.3, Llama 3.1
- Mistral AI - Mistral Large, Medium, Mixtral
- Deepseek - Deepseek V3, Deepseek Coder
- Qwen (Alibaba) - Qwen 2.5, QwQ models
- Cohere - Command R+, Embed models
- xAI - Grok 2, Grok Vision
- Perplexity AI - Sonar Pro, Sonar models
- Together AI - Open model hosting platform
- Fireworks AI - Optimized open model inference
- And more providers…
How it works: Hoody AI adds a 5% markup (default, configurable via HOODY_AI_MODELS_MARKUP_BPS) on each provider’s base cost. Your prompts and responses flow through the Hoody AI gateway running on your own host and then out to these providers — no Hoody-operated platform servers sit between the gateway and the provider.
The Authorization: Bearer container-<name|N> shown in the HTTP examples below is a container identity/tracking token automatically minted for each container, not a Hoody API token you copy from a dashboard.
Model Categories
Section titled “Model Categories”Hoody AI provides access to multiple categories of AI models:
Text Generation Models
Section titled “Text Generation Models”Chat and completion models for conversations, code generation, analysis, and general-purpose tasks.
Leading Providers:
- Anthropic - Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.0
- OpenAI - GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
- Google - Gemini 2.5 Pro Exp, Gemini 1.5 Pro, Gemini Flash
- Meta - Llama 3.3 70B, Llama 3.1 405B
- Mistral - Mistral Large, Mistral Medium, Mixtral
- Deepseek - Deepseek V3, Deepseek Coder
- Qwen - Qwen 2.5 72B, QwQ 32B Preview
Example usage:
# Chat completion from your containercurl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4.5", "messages": [{"role": "user", "content": "Hello!"}] }'import { HoodyClient } from '@hoody-ai/hoody-sdk';
const client = new HoodyClient({ baseURL: 'https://api.hoody.icu', token: process.env.HOODY_TOKEN });
// List available modelsconst models = await client.api.ai.listModels();
// Chat completion (use HTTP endpoint directly — see HTTP tab)// The SDK provides model listing; for chat completions,// call the AI gateway endpoint from your container:// POST https://ai.hoody.icu/api/v1/chat/completions# Chat completioncurl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4.5", "messages": [{"role": "user", "content": "Hello!"}] }'
# Streamingcurl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4.5", "messages": [{"role": "user", "content": "Explain AI"}], "stream": true }' Image-Capable Models
Section titled “Image-Capable Models”Generate images from text — through the same chat endpoint.
Hoody AI’s catalog is served from the gateway’s upstream model list, and image generation happens through the standard OpenAI-compatible /api/v1/chat/completions route, not a separate images endpoint. Models that can return images advertise "image" in their output_modalities in the /models catalog — request one of those models and the response message includes the generated image.
Example usage:
# Ask an image-capable model to generate an image (output via chat/completions)curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "<image-capable-model-id>", "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}] }'// Image-capable models return images in the chat response.// Pick a model whose output_modalities include "image" (see /models).const response = await fetch('https://ai.hoody.icu/api/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer container-1', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: '<image-capable-model-id>', messages: [{ role: 'user', content: 'A serene mountain landscape at sunset' }] })});const data = await response.json();console.log(data.choices[0].message);curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "<image-capable-model-id>", "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}] }'Embedding Models
Section titled “Embedding Models”Convert text into vector embeddings for semantic search, similarity matching, and RAG applications.
Available Models:
- OpenAI - text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
- Cohere - embed-english-v3.0, embed-multilingual-v3.0
- Google - text-embedding-004
- Voyage AI - voyage-large-2, voyage-code-2
Example usage:
# Generate text embeddingscurl -X POST "https://ai.hoody.icu/api/v1/embeddings" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{"model": "openai/text-embedding-3-large", "input": "Search for similar documents"}'// Generate embeddings — call the AI gateway directly from your containerconst response = await fetch('https://ai.hoody.icu/api/v1/embeddings', { method: 'POST', headers: { 'Authorization': 'Bearer container-1', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'openai/text-embedding-3-large', input: 'Search for similar documents' })});const data = await response.json();console.log(data.data[0].embedding.length, 'dimensions');curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \ -H "Authorization: Bearer container-1" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/text-embedding-3-large", "input": "Search for similar documents" }' Model Selection Guide
Section titled “Model Selection Guide”By Use Case
Section titled “By Use Case”Code Generation & Analysis:
anthropic/claude-sonnet-4.5- Best for complex codeanthropic/claude-opus-4.1- Most capable, sloweropenai/gpt-4o- Fast, good for most tasksdeepseek/deepseek-coder- Specialized for coding
Creative Writing:
openai/gpt-4o- Excellent creativityanthropic/claude-opus-4.1- Nuanced writinggoogle/gemini-2.5-pro-exp- Long-form content
Fast Responses:
anthropic/claude-haiku-4.0- Ultra-fast, economicalopenai/gpt-3.5-turbo- Quick responsesgoogle/gemini-flash- Speed-optimized
Large Context:
anthropic/claude-sonnet-4.5- 200K token contextgoogle/gemini-2.5-pro-exp- 2M token contextopenai/gpt-4-turbo- 128K token context
By Cost
Section titled “By Cost”Most Economical:
anthropic/claude-haiku-4.0meta/llama-3.3-70bgoogle/gemini-flashopenai/gpt-3.5-turbo
Balanced Cost/Performance:
anthropic/claude-sonnet-4.5openai/gpt-4ogoogle/gemini-1.5-pro
Premium Capability:
anthropic/claude-opus-4.1openai/gpt-4-turbogoogle/gemini-2.5-pro-exp
Model Format
Section titled “Model Format”All models use Hoody AI’s standard model identifier format:
{provider}/{model-name}Examples:
anthropic/claude-sonnet-4.5openai/gpt-4ogoogle/gemini-2.5-pro-expmeta/llama-3.3-70bdeepseek/deepseek-v3
Important: Use the exact model identifier shown. Variations won’t work:
- ✅
anthropic/claude-sonnet-4.5 - ❌
claude-sonnet-4.5 - ❌
claude-sonnet - ❌
anthropic/claude
Checking Model Availability
Section titled “Checking Model Availability”SDK equivalent: client.api.ai.listModels() returns the same data from any supported language.
Model-Specific Features
Section titled “Model-Specific Features”Streaming Support
Section titled “Streaming Support”All text models support streaming responses:
Returns Server-Sent Events (SSE) for real-time token streaming.
Function Calling
Section titled “Function Calling”Models with function calling support:
anthropic/claude-sonnet-4.5and neweropenai/gpt-4oand newergoogle/gemini-2.5-pro-exp
Vision Capabilities
Section titled “Vision Capabilities”Models with image understanding:
openai/gpt-4oanthropic/claude-sonnet-4.5google/gemini-2.5-pro-exp
Best Practices
Section titled “Best Practices”Model Selection
Section titled “Model Selection”Start cheap, scale up:
- Prototype with
anthropic/claude-haiku-4.0oropenai/gpt-3.5-turbo - Test with
anthropic/claude-sonnet-4.5oropenai/gpt-4o - Use
anthropic/claude-opus-4.1only when needed
Performance Optimization
Section titled “Performance Optimization”Match model to task complexity:
- Simple tasks → Use fast, cheap models
- Complex reasoning → Use premium models
- Bulk operations → Batch requests with economical models
Example:
// Classification: Use cheap modelconst category = await classifyWithModel('anthropic/claude-haiku-4.0', text);
// Based on category, use appropriate modelconst modelMap = { 'simple': 'anthropic/claude-haiku-4.0', 'moderate': 'anthropic/claude-sonnet-4.5', 'complex': 'anthropic/claude-opus-4.1'};
const response = await processWithModel(modelMap[category], text);Cost Management
Section titled “Cost Management”Monitor AI usage per container:
# Check which containers have AI enabledcurl "https://api.hoody.icu/api/v1/containers" \ | jq '.data.containers[] | select(.ai == true) | {id, name, ai}'
# Enable/disable AI per container to control accesscurl -X PATCH "https://api.hoody.icu/api/v1/containers/{id}" \ -d '{"ai": false}' # Disable AI to prevent usageNote: Container-level quotas and rate limiting are not currently available. Cost management is achieved by enabling/disabling AI access per container.
Troubleshooting
Section titled “Troubleshooting””Model not found” Error
Section titled “”Model not found” Error”Problem: Invalid model identifier
Solution: Verify exact model string:
# ❌ Wrong"model": "claude-sonnet"
# ✅ Correct"model": "anthropic/claude-sonnet-4.5"Rate Limiting
Section titled “Rate Limiting”Problem: 429 Too Many Requests
Solutions:
- Implement exponential backoff
- Use multiple containers to distribute load
- Switch to faster models to reduce request count
- Contact Hoody support for increased AI credit allocation
Slow Responses
Section titled “Slow Responses”Problem: Long wait times for responses
Solutions:
- Use streaming (
"stream": true) for immediate feedback - Switch to faster models (Haiku, GPT-3.5-Turbo, Gemini Flash)
- Reduce
max_tokensparameter - Simplify prompts
What’s Next
Section titled “What’s Next”Dynamic Model Browser (Coming Soon):
- Live model availability
- Real-time pricing
- Capability comparison
- Performance benchmarks
- Usage recommendations
Current Resources:
- Usage Guide → - Integration examples
- Security → - Key-less operation
- Hoody AI Overview → - Gateway features and pricing