Hoody AI Models & Pricing

Access 300+ AI models through Hoody AI. Your server communicates directly with 15+ inference providers through Hoody AI’s gateway (with our 5% markup on provider costs).

Inference Providers

Your server connects directly to these providers through Hoody AI:

Anthropic - Claude models (Opus, Sonnet, Haiku)
OpenAI - GPT-4, GPT-3.5, Embeddings
Google (Vertex AI) - Gemini models, PaLM
Meta (via providers) - Llama 3.3, Llama 3.1
Mistral AI - Mistral Large, Medium, Mixtral
Deepseek - Deepseek V3, Deepseek Coder
Qwen (Alibaba) - Qwen 2.5, QwQ models
Cohere - Command R+, Embed models
xAI - Grok 2, Grok Vision
Perplexity AI - Sonar Pro, Sonar models
Together AI - Open model hosting platform
Fireworks AI - Optimized open model inference
And more providers…

How it works: Hoody AI adds a 5% markup (default, configurable via HOODY_AI_MODELS_MARKUP_BPS) on each provider’s base cost. Your prompts and responses flow through the Hoody AI gateway running on your own host and then out to these providers — no Hoody-operated platform servers sit between the gateway and the provider.

The Authorization: Bearer container-<name|N> shown in the HTTP examples below is a container identity/tracking token automatically minted for each container, not a Hoody API token you copy from a dashboard.

Model Categories

Hoody AI provides access to multiple categories of AI models:

Text Generation Models

Chat and completion models for conversations, code generation, analysis, and general-purpose tasks.

Leading Providers:

Anthropic - Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.0
OpenAI - GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
Google - Gemini 2.5 Pro Exp, Gemini 1.5 Pro, Gemini Flash
Meta - Llama 3.3 70B, Llama 3.1 405B
Mistral - Mistral Large, Mistral Medium, Mixtral
Deepseek - Deepseek V3, Deepseek Coder
Qwen - Qwen 2.5 72B, QwQ 32B Preview

Example usage:

# Chat completion from your container
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

import { HoodyClient } from '@hoody-ai/hoody-sdk';

const client = new HoodyClient({ baseURL: 'https://api.hoody.icu', token: process.env.HOODY_TOKEN });

// List available models
const models = await client.api.ai.listModels();

// Chat completion (use HTTP endpoint directly — see HTTP tab)
// The SDK provides model listing; for chat completions,
// call the AI gateway endpoint from your container:
//   POST https://ai.hoody.icu/api/v1/chat/completions

# Chat completion
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Streaming
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Explain AI"}],
    "stream": true
  }'

POST Text generation request

https://ai.hoody.icu/api/v1/chat/completions

Click "Run" to execute the request

Image-Capable Models

Generate images from text — through the same chat endpoint.

Hoody AI’s catalog is served from the gateway’s upstream model list, and image generation happens through the standard OpenAI-compatible /api/v1/chat/completions route, not a separate images endpoint. Models that can return images advertise "image" in their output_modalities in the /models catalog — request one of those models and the response message includes the generated image.

Discover image-capable models from the live catalog. Don’t hard-code a model name from this page — pull /models and filter on output_modalities. The exact set of image-output models depends on what the upstream catalog currently offers.

curl -s https://ai.hoody.icu/api/v1/models -H "Authorization: Bearer container-1" \
  | jq -r '.data[] | select(.output_modalities | index("image")) | .id'

Example usage:

# Ask an image-capable model to generate an image (output via chat/completions)
curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<image-capable-model-id>",
    "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}]
  }'

// Image-capable models return images in the chat response.
// Pick a model whose output_modalities include "image" (see /models).
const response = await fetch('https://ai.hoody.icu/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer container-1',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: '<image-capable-model-id>',
    messages: [{ role: 'user', content: 'A serene mountain landscape at sunset' }]
  })
});
const data = await response.json();
console.log(data.choices[0].message);

curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<image-capable-model-id>",
    "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}]
  }'

Embedding Models

Convert text into vector embeddings for semantic search, similarity matching, and RAG applications.

Available Models:

OpenAI - text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
Cohere - embed-english-v3.0, embed-multilingual-v3.0
Google - text-embedding-004
Voyage AI - voyage-large-2, voyage-code-2

Example usage:

# Generate text embeddings
curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/text-embedding-3-large", "input": "Search for similar documents"}'

// Generate embeddings — call the AI gateway directly from your container
const response = await fetch('https://ai.hoody.icu/api/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer container-1',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'openai/text-embedding-3-large',
    input: 'Search for similar documents'
  })
});
const data = await response.json();
console.log(data.data[0].embedding.length, 'dimensions');

curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \
  -H "Authorization: Bearer container-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-large",
    "input": "Search for similar documents"
  }'

POST Text embedding request

https://ai.hoody.icu/api/v1/embeddings

Click "Run" to execute the request

Model Selection Guide

By Use Case

Code Generation & Analysis:

anthropic/claude-sonnet-4.5 - Best for complex code
anthropic/claude-opus-4.1 - Most capable, slower
openai/gpt-4o - Fast, good for most tasks
deepseek/deepseek-coder - Specialized for coding

Creative Writing:

openai/gpt-4o - Excellent creativity
anthropic/claude-opus-4.1 - Nuanced writing
google/gemini-2.5-pro-exp - Long-form content

Fast Responses:

anthropic/claude-haiku-4.0 - Ultra-fast, economical
openai/gpt-3.5-turbo - Quick responses
google/gemini-flash - Speed-optimized

Large Context:

anthropic/claude-sonnet-4.5 - 200K token context
google/gemini-2.5-pro-exp - 2M token context
openai/gpt-4-turbo - 128K token context

By Cost

Most Economical:

anthropic/claude-haiku-4.0
meta/llama-3.3-70b
google/gemini-flash
openai/gpt-3.5-turbo

Balanced Cost/Performance:

anthropic/claude-sonnet-4.5
openai/gpt-4o
google/gemini-1.5-pro

Premium Capability:

anthropic/claude-opus-4.1
openai/gpt-4-turbo
google/gemini-2.5-pro-exp

Model Format

All models use Hoody AI’s standard model identifier format:

{provider}/{model-name}

Examples:

anthropic/claude-sonnet-4.5
openai/gpt-4o
google/gemini-2.5-pro-exp
meta/llama-3.3-70b
deepseek/deepseek-v3

Important: Use the exact model identifier shown. Variations won’t work:

✅ anthropic/claude-sonnet-4.5
❌ claude-sonnet-4.5
❌ claude-sonnet
❌ anthropic/claude

Checking Model Availability

Live model list: Query the Hoody API for the current list of available models, including real-time availability.

curl "https://api.hoody.icu/api/v1/ai/models" \
  -H "Authorization: Bearer $HOODY_TOKEN"

SDK equivalent: client.api.ai.listModels() returns the same data from any supported language.

Model-Specific Features

Streaming Support

All text models support streaming responses:

POST Streaming chat completion

https://ai.hoody.icu/api/v1/chat/completions

Click "Run" to execute the request

Returns Server-Sent Events (SSE) for real-time token streaming.

Function Calling

Models with function calling support:

anthropic/claude-sonnet-4.5 and newer
openai/gpt-4o and newer
google/gemini-2.5-pro-exp

POST Function calling request

https://ai.hoody.icu/api/v1/chat/completions

Click "Run" to execute the request

Vision Capabilities

Models with image understanding:

openai/gpt-4o
anthropic/claude-sonnet-4.5
google/gemini-2.5-pro-exp

POST Vision request with image

https://ai.hoody.icu/api/v1/chat/completions

Click "Run" to execute the request

Best Practices

Model Selection

Start cheap, scale up:

Prototype with anthropic/claude-haiku-4.0 or openai/gpt-3.5-turbo
Test with anthropic/claude-sonnet-4.5 or openai/gpt-4o
Use anthropic/claude-opus-4.1 only when needed

Performance Optimization

Match model to task complexity:

Simple tasks → Use fast, cheap models
Complex reasoning → Use premium models
Bulk operations → Batch requests with economical models

Example:

// Classification: Use cheap model
const category = await classifyWithModel('anthropic/claude-haiku-4.0', text);

// Based on category, use appropriate model
const modelMap = {
  'simple': 'anthropic/claude-haiku-4.0',
  'moderate': 'anthropic/claude-sonnet-4.5',
  'complex': 'anthropic/claude-opus-4.1'
};

const response = await processWithModel(modelMap[category], text);

Cost Management

Monitor AI usage per container:

# Check which containers have AI enabled
curl "https://api.hoody.icu/api/v1/containers" \
  | jq '.data.containers[] | select(.ai == true) | {id, name, ai}'

# Enable/disable AI per container to control access
curl -X PATCH "https://api.hoody.icu/api/v1/containers/{id}" \
  -d '{"ai": false}'  # Disable AI to prevent usage

Note: Container-level quotas and rate limiting are not currently available. Cost management is achieved by enabling/disabling AI access per container.

Troubleshooting

”Model not found” Error

Problem: Invalid model identifier

Solution: Verify exact model string:

# ❌ Wrong
"model": "claude-sonnet"

# ✅ Correct
"model": "anthropic/claude-sonnet-4.5"

Rate Limiting

Problem: 429 Too Many Requests

Solutions:

Implement exponential backoff
Use multiple containers to distribute load
Switch to faster models to reduce request count
Contact Hoody support for increased AI credit allocation

Slow Responses

Problem: Long wait times for responses

Solutions:

Use streaming ("stream": true) for immediate feedback
Switch to faster models (Haiku, GPT-3.5-Turbo, Gemini Flash)
Reduce max_tokens parameter
Simplify prompts

What’s Next

Dynamic Model Browser (Coming Soon):

Live model availability
Real-time pricing
Capability comparison
Performance benchmarks
Usage recommendations

Current Resources:

Usage Guide → - Integration examples
Security → - Key-less operation
Hoody AI Overview → - Gateway features and pricing