# Hoody AI Models & Pricing

**Page:** foundation/hoody-ai/models

[Download Raw Markdown](./foundation/hoody-ai/models.md)

---

# Hoody AI Models & Pricing

**Access 300+ AI models through Hoody AI.** Your server communicates directly with 15+ inference providers through Hoody AI's gateway (with our 5% markup on provider costs).


**Dynamic Model List Coming Soon:** This page will soon feature a live model browser that fetches the current list of available models directly from Hoody AI, including real-time pricing, capabilities, and performance metrics.


---

## Inference Providers

**Your server connects directly to these providers through Hoody AI:**

- **Anthropic** - Claude models (Opus, Sonnet, Haiku)
- **OpenAI** - GPT-4, GPT-3.5, Embeddings
- **Google (Vertex AI)** - Gemini models, PaLM
- **Meta (via providers)** - Llama 3.3, Llama 3.1
- **Mistral AI** - Mistral Large, Medium, Mixtral
- **Deepseek** - Deepseek V3, Deepseek Coder
- **Qwen (Alibaba)** - Qwen 2.5, QwQ models
- **Cohere** - Command R+, Embed models
- **xAI** - Grok 2, Grok Vision
- **Perplexity AI** - Sonar Pro, Sonar models
- **Together AI** - Open model hosting platform
- **Fireworks AI** - Optimized open model inference
- **And more providers...**

**How it works:** Hoody AI adds a 5% markup (default, configurable via `HOODY_AI_MODELS_MARKUP_BPS`) on each provider's base cost. Your prompts and responses flow through the Hoody AI gateway running on your own host and then out to these providers — no Hoody-operated platform servers sit between the gateway and the provider.

The `Authorization: Bearer container-<name|N>` shown in the HTTP examples below is a container identity/tracking token automatically minted for each container, not a Hoody API token you copy from a dashboard.

---

## Model Categories

Hoody AI provides access to multiple categories of AI models:

### Text Generation Models

**Chat and completion models** for conversations, code generation, analysis, and general-purpose tasks.

**Leading Providers:**
- **Anthropic** - Claude Opus 4.1, Claude Sonnet 4.5, Claude Haiku 4.0
- **OpenAI** - GPT-4o, GPT-4 Turbo, GPT-3.5 Turbo
- **Google** - Gemini 2.5 Pro Exp, Gemini 1.5 Pro, Gemini Flash
- **Meta** - Llama 3.3 70B, Llama 3.1 405B
- **Mistral** - Mistral Large, Mistral Medium, Mixtral
- **Deepseek** - Deepseek V3, Deepseek Coder
- **Qwen** - Qwen 2.5 72B, QwQ 32B Preview

**Example usage:**


  
    ```bash
    # Chat completion from your container
    curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "anthropic/claude-sonnet-4.5",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'
    ```
  
  
    ```typescript
    import { HoodyClient } from '@hoody-ai/hoody-sdk';

    const client = new HoodyClient({ baseURL: 'https://api.hoody.icu', token: process.env.HOODY_TOKEN });

    // List available models
    const models = await client.api.ai.listModels();

    // Chat completion (use HTTP endpoint directly — see HTTP tab)
    // The SDK provides model listing; for chat completions,
    // call the AI gateway endpoint from your container:
    //   POST https://ai.hoody.icu/api/v1/chat/completions
    ```
  
  
    ```bash
    # Chat completion
    curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "anthropic/claude-sonnet-4.5",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

    # Streaming
    curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "anthropic/claude-sonnet-4.5",
        "messages": [{"role": "user", "content": "Explain AI"}],
        "stream": true
      }'
    ```
  




### Image-Capable Models

**Generate images from text — through the same chat endpoint.**

Hoody AI's catalog is served from the gateway's upstream model list, and image generation happens through the standard OpenAI-compatible `/api/v1/chat/completions` route, not a separate images endpoint. Models that can return images advertise `"image"` in their `output_modalities` in the [`/models`](#checking-model-availability) catalog — request one of those models and the response message includes the generated image.


**Discover image-capable models from the live catalog.** Don't hard-code a model name from this page — pull `/models` and filter on `output_modalities`. The exact set of image-output models depends on what the upstream catalog currently offers.

```bash
curl -s https://ai.hoody.icu/api/v1/models -H "Authorization: Bearer container-1" \
  | jq -r '.data[] | select(.output_modalities | index("image")) | .id'
```


**Example usage:**


  
    ```bash
    # Ask an image-capable model to generate an image (output via chat/completions)
    curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "<image-capable-model-id>",
        "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}]
      }'
    ```
  
  
    ```typescript
    // Image-capable models return images in the chat response.
    // Pick a model whose output_modalities include "image" (see /models).
    const response = await fetch('https://ai.hoody.icu/api/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer container-1',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: '<image-capable-model-id>',
        messages: [{ role: 'user', content: 'A serene mountain landscape at sunset' }]
      })
    });
    const data = await response.json();
    console.log(data.choices[0].message);
    ```
  
  
    ```bash
    curl -X POST "https://ai.hoody.icu/api/v1/chat/completions" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "<image-capable-model-id>",
        "messages": [{"role": "user", "content": "A serene mountain landscape at sunset"}]
      }'
    ```
  


### Embedding Models

**Convert text into vector embeddings** for semantic search, similarity matching, and RAG applications.

**Available Models:**
- **OpenAI** - text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002
- **Cohere** - embed-english-v3.0, embed-multilingual-v3.0
- **Google** - text-embedding-004
- **Voyage AI** - voyage-large-2, voyage-code-2

**Example usage:**


  
    ```bash
    # Generate text embeddings
    curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{"model": "openai/text-embedding-3-large", "input": "Search for similar documents"}'
    ```
  
  
    ```typescript
    // Generate embeddings — call the AI gateway directly from your container
    const response = await fetch('https://ai.hoody.icu/api/v1/embeddings', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer container-1',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'openai/text-embedding-3-large',
        input: 'Search for similar documents'
      })
    });
    const data = await response.json();
    console.log(data.data[0].embedding.length, 'dimensions');
    ```
  
  
    ```bash
    curl -X POST "https://ai.hoody.icu/api/v1/embeddings" \
      -H "Authorization: Bearer container-1" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "openai/text-embedding-3-large",
        "input": "Search for similar documents"
      }'
    ```
  




---

## Model Selection Guide

### By Use Case

**Code Generation & Analysis:**
- `anthropic/claude-sonnet-4.5` - Best for complex code
- `anthropic/claude-opus-4.1` - Most capable, slower
- `openai/gpt-4o` - Fast, good for most tasks
- `deepseek/deepseek-coder` - Specialized for coding

**Creative Writing:**
- `openai/gpt-4o` - Excellent creativity
- `anthropic/claude-opus-4.1` - Nuanced writing
- `google/gemini-2.5-pro-exp` - Long-form content

**Fast Responses:**
- `anthropic/claude-haiku-4.0` - Ultra-fast, economical
- `openai/gpt-3.5-turbo` - Quick responses
- `google/gemini-flash` - Speed-optimized

**Large Context:**
- `anthropic/claude-sonnet-4.5` - 200K token context
- `google/gemini-2.5-pro-exp` - 2M token context
- `openai/gpt-4-turbo` - 128K token context

### By Cost

**Most Economical:**
- `anthropic/claude-haiku-4.0`
- `meta/llama-3.3-70b`
- `google/gemini-flash`
- `openai/gpt-3.5-turbo`

**Balanced Cost/Performance:**
- `anthropic/claude-sonnet-4.5`
- `openai/gpt-4o`
- `google/gemini-1.5-pro`

**Premium Capability:**
- `anthropic/claude-opus-4.1`
- `openai/gpt-4-turbo`
- `google/gemini-2.5-pro-exp`

---

## Model Format

All models use Hoody AI's standard model identifier format:

```
{provider}/{model-name}
```

**Examples:**
- `anthropic/claude-sonnet-4.5`
- `openai/gpt-4o`
- `google/gemini-2.5-pro-exp`
- `meta/llama-3.3-70b`
- `deepseek/deepseek-v3`

**Important:** Use the exact model identifier shown. Variations won't work:
- ✅ `anthropic/claude-sonnet-4.5`
- ❌ `claude-sonnet-4.5`
- ❌ `claude-sonnet`
- ❌ `anthropic/claude`

---

## Checking Model Availability


**Live model list:** Query the Hoody API for the current list of available models, including real-time availability.

```bash
curl "https://api.hoody.icu/api/v1/ai/models" \
  -H "Authorization: Bearer $HOODY_TOKEN"
```


**SDK equivalent:** `client.api.ai.listModels()` returns the same data from any supported language.

---

## Model-Specific Features

### Streaming Support

All text models support streaming responses:



Returns Server-Sent Events (SSE) for real-time token streaming.

### Function Calling

Models with function calling support:
- `anthropic/claude-sonnet-4.5` and newer
- `openai/gpt-4o` and newer
- `google/gemini-2.5-pro-exp`



### Vision Capabilities

Models with image understanding:
- `openai/gpt-4o`
- `anthropic/claude-sonnet-4.5`
- `google/gemini-2.5-pro-exp`



---

## Best Practices

### Model Selection

**Start cheap, scale up:**
1. Prototype with `anthropic/claude-haiku-4.0` or `openai/gpt-3.5-turbo`
2. Test with `anthropic/claude-sonnet-4.5` or `openai/gpt-4o`
3. Use `anthropic/claude-opus-4.1` only when needed

### Performance Optimization

**Match model to task complexity:**
- Simple tasks → Use fast, cheap models
- Complex reasoning → Use premium models
- Bulk operations → Batch requests with economical models

**Example:**
```typescript
// Classification: Use cheap model
const category = await classifyWithModel('anthropic/claude-haiku-4.0', text);

// Based on category, use appropriate model
const modelMap = {
  'simple': 'anthropic/claude-haiku-4.0',
  'moderate': 'anthropic/claude-sonnet-4.5',
  'complex': 'anthropic/claude-opus-4.1'
};

const response = await processWithModel(modelMap[category], text);
```

### Cost Management

**Monitor AI usage per container:**
```bash
# Check which containers have AI enabled
curl "https://api.hoody.icu/api/v1/containers" \
  | jq '.data.containers[] | select(.ai == true) | {id, name, ai}'

# Enable/disable AI per container to control access
curl -X PATCH "https://api.hoody.icu/api/v1/containers/{id}" \
  -d '{"ai": false}'  # Disable AI to prevent usage
```

**Note:** Container-level quotas and rate limiting are not currently available. Cost management is achieved by enabling/disabling AI access per container.

---

## Troubleshooting

### "Model not found" Error

**Problem:** Invalid model identifier

**Solution:** Verify exact model string:
```bash
# ❌ Wrong
"model": "claude-sonnet"

# ✅ Correct
"model": "anthropic/claude-sonnet-4.5"
```

### Rate Limiting

**Problem:** `429 Too Many Requests`

**Solutions:**
- Implement exponential backoff
- Use multiple containers to distribute load
- Switch to faster models to reduce request count
- Contact Hoody support for increased AI credit allocation

### Slow Responses

**Problem:** Long wait times for responses

**Solutions:**
- Use streaming (`"stream": true`) for immediate feedback
- Switch to faster models (Haiku, GPT-3.5-Turbo, Gemini Flash)
- Reduce `max_tokens` parameter
- Simplify prompts

---

## What's Next

**Dynamic Model Browser** (Coming Soon):
- Live model availability
- Real-time pricing
- Capability comparison
- Performance benchmarks
- Usage recommendations

**Current Resources:**
- [Usage Guide →](/foundation/hoody-ai/usage/) - Integration examples
- [Security →](/foundation/hoody-ai/security/) - Key-less operation
- [Hoody AI Overview →](/foundation/hoody-ai/) - Gateway features and pricing