<!--
hoody-display Subskill (sdk)
Auto-generated by Hoody Skills Generator
Generated: 2026-05-06T20:11:35.251Z
Model: mimo-v2.5-pro + fixer:mimo-v2.5-pro
Mode: sdk


Tokens: 9685

DO NOT EDIT MANUALLY - Changes will be overwritten on next generation
-->

# hoody-display Subskill

## Overview

The `hoody-display` service provides fully embeddable, multiplayer desktop environments accessible via URL. It enables AI agents and users to interact with GUI applications running in containerized displays through a comprehensive REST API and HTML5 web client.

### When to Use

- **GUI Automation**: Click buttons, type text, navigate menus in desktop applications
- **Visual Verification**: Capture screenshots to verify UI state after actions
- **Window Management**: Focus, move, resize, and search for application windows
- **Clipboard Operations**: Read and write clipboard content between agent and display
- **Desktop Monitoring**: Track display state, mouse position, and active windows

### Hoody Philosophy Fit

hoody-display embodies the "Desktop environments, fully embeddable and multiplayer" philosophy by providing:
- **Embeddable**: Access any display via URL with the HTML5 client (`GET /api/v1/display/`)
- **Multiplayer**: Multiple agents or users can interact with the same display instance
- **API-First**: Every desktop interaction is a REST endpoint, enabling programmatic control

### Service Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    hoody-display                         │
├─────────────────────────────────────────────────────────┤
│  HTML5 Client  │  REST API  │  Screenshot Engine        │
├─────────────────────────────────────────────────────────┤
│  Mouse Control │  Keyboard  │  Window Management        │
├─────────────────────────────────────────────────────────┤
│  Clipboard     │  Input     │  Display Geometry         │
└─────────────────────────────────────────────────────────┘
```

### Base URL Pattern

```
https://{projectId}-{containerId}-display-{serviceId}.{node}.containers.hoody.icu
```

All endpoints use the `/api/v1/display/` prefix. The service is accessed through the Hoody SDK client:

```
import { HoodyClient } from '@hoody-ai/hoody-sdk'

const client = new HoodyClient({ baseURL: 'https://api.hoody.icu', token: 'TOKEN' })
```

---

## Common Workflows

### Workflow 1: Health Check and Service Verification

Always verify service availability before performing operations.

```
// Step 1: Check service health
const health = await client.display.health.check()

// Step 2: Get display information
const info = await client.display.getInformation()

// Step 3: List available windows
const windows = await client.display.listWindows()
```

**Expected Health Response:**
```
{
  "status": "healthy",
  "timestamp": "2025-01-15T10:30:00Z",
  "service": "hoody-display",
  "version": "1.0.0",
  "uptime": 3600,
  "checks": {
    "display": "ok",
    "input": "ok",
    "screenshot": "ok"
  }
}
```

### Workflow 2: Screenshot Capture and Analysis

Capture visual state for verification or monitoring.

```
// Step 1: Capture a fresh screenshot
const screenshot = await client.display.screenshots.capture()

// Step 2: Get screenshot metadata without image data
const metadata = await client.display.screenshots.captureMetadata()

// Step 3: Retrieve the latest screenshot (cached)
const latest = await client.display.screenshots.getLatest()

// Step 4: Get metadata for latest screenshot
const latestMeta = await client.display.screenshots.getLatestMetadata()

// Step 5: Retrieve screenshot by timestamp
const timestamp = "1705312200"
const historical = await client.display.screenshots.getByTimestamp(timestamp)
```

**Screenshot Response Format:**
```
{
  "timestamp": "2025-01-15T10:30:00Z",
  "format": "png",
  "width": 1920,
  "height": 1080,
  "data": "base64-encoded-image-data"
}
```

**Metadata Response Format:**
```
{
  "timestamp": "2025-01-15T10:30:00Z",
  "format": "png",
  "width": 1920,
  "height": 1080,
  "size": 245760
}
```

### Workflow 3: Mouse Interaction

Perform precise mouse operations for GUI automation.

```
// Step 1: Get current mouse position
const location = await client.display.input.mouseLocation()

// Step 2: Move mouse to absolute position
await client.display.input.mouseMove({
  x: 500,
  y: 300
})

// Step 3: Click at current position
await client.display.input.mouseClick({
  button: "left"
})

// Step 4: Double-click
await client.display.input.mouseDoubleClick({
  button: "left"
})

// Step 5: Move relative to current position
await client.display.input.mouseMoveRelative({
  x: 100,
  y: 50
})

// Step 6: Scroll
await client.display.input.mouseScroll({
  direction: "down"
})

// Step 7: Drag operation
await client.display.input.drag({
  startX: 100,
  startY: 100,
  endX: 500,
  endY: 300
})
```

**Mouse Location Response:**
```
{
  "x": 500,
  "y": 300
}
```

### Workflow 4: Keyboard Input

Type text and send key combinations.

```
// Step 1: Type a string of text
await client.display.input.keyboardType({
  text: "Hello, World!"
})

// Step 2: Press key combination (Ctrl+C)
await client.display.input.keyboardKey({
  keys: ["ctrl+c"]
})

// Step 3: Press Enter
await client.display.input.keyboardKey({
  keys: ["Return"]
})

// Step 4: Hold a key down
await client.display.input.keyboardKeyDown({
  key: "Shift_L"
})

// Step 5: Release the key
await client.display.input.keyboardKeyUp({
  key: "Shift_L"
})
```

### Workflow 5: Combined Click and Type Operations

Efficiently interact with input fields.

```
// Step 1: Click at specific coordinates and type
await client.display.input.clickAt({
  x: 400,
  y: 200,
  button: "left"
})

// Step 2: Type at specific coordinates (move, click, type)
await client.display.input.typeAt({
  x: 400,
  y: 200,
  text: "username@example.com",
  button: "left"
})

// Step 3: Select text range
await client.display.input.select({
  startX: 100,
  startY: 200,
  endX: 500,
  endY: 200
})
```

### Workflow 6: Window Management

Manage application windows programmatically.

```
// Step 1: List all windows
const windows = await client.display.listWindows()

// Step 2: Get active window
const active = await client.display.input.windowActive()

// Step 3: Search for windows by pattern
const searchResults = await client.display.input.windowSearch({
  pattern: "Firefox"
})

// Step 4: Focus a window
await client.display.input.windowFocus({
  windowId: "0x04000003"
})

// Step 5: Move window
await client.display.input.windowMove({
  windowId: "0x04000003",
  x: 100,
  y: 100
})

// Step 6: Resize window
await client.display.input.windowResize({
  windowId: "0x04000003",
  width: 800,
  height: 600
})

// Step 7: Minimize window
await client.display.input.windowMinimize({
  windowId: "0x04000003"
})

// Step 8: Raise window to top
await client.display.input.windowRaise({
  windowId: "0x04000003"
})

// Step 9: Close window
await client.display.input.windowClose({
  windowId: "0x04000003"
})

// Step 10: Get window properties
const props = await client.display.getWindowProperties({ windowId: "0x04000003" })

// Step 11: Get window geometry
const geometry = await client.display.input.windowGeometry({ windowId: "0x04000003" })

// Step 12: Get window name
const name = await client.display.input.windowName({ windowId: "0x04000003" })
```

**Window List Response:**
```
{
  "windows": [
    {
      "id": "0x04000003",
      "name": "Firefox",
      "visible": true
    },
    {
      "id": "0x04000007",
      "name": "Terminal",
      "visible": true
    }
  ]
}
```

### Workflow 7: Clipboard Operations

Transfer text between agent and display.

```
// Step 1: Read clipboard content
const clipboard = await client.display.getClipboard()

// Step 2: Write to clipboard
await client.display.setClipboard({
  text: "Text to copy to clipboard"
})

// Step 3: Read clipboard with specific selection
const selection = await client.display.getClipboard({
  selection: "PRIMARY"
})
```

**Clipboard Read Response:**
```
{
  "text": "Clipboard content here",
  "format": "text/plain"
}
```

---

## Advanced Operations

### Standard Response Patterns

Most action endpoints return a common response structure:

```
{
  "success": true,
  "timestamp": "2025-01-15T10:30:00Z"
}
```

When `screenshot: true` is included, the response adds a `screenshot` object matching the format in [Workflow 2](#workflow-2-screenshot-capture-and-analysis). Error responses follow the format in [Error Responses](#error-responses).

### Workflow 8: Batch Action Execution

Execute multiple actions in sequence for complex interactions.

```
// Execute a batch of actions
const batchResult = await client.display.input.batch({
  actions: [
    {
      type: "mouseMove",
      x: 100,
      y: 200
    },
    {
      type: "mouseClick",
      button: "left"
    },
    {
      type: "keyboardType",
      text: "Hello"
    },
    {
      type: "keyboardKey",
      keys: ["Return"]
    }
  ]
})
```

**Batch Response:**
```
{
  "success": true,
  "actionsExecuted": 4,
  "results": [
    { "type": "mouseMove", "success": true },
    { "type": "mouseClick", "success": true },
    { "type": "keyboardType", "success": true },
    { "type": "keyboardKey", "success": true }
  ]
}
```

### Workflow 9: Action with Screenshot Verification

Execute an action and capture the resulting state.

```
// Perform action and get screenshot in one call
const result = await client.display.input.act({
  action: {
    type: "mouseClick",
    x: 500,
    y: 300,
    button: "left"
  },
  screenshot: true
})
```

**Action with Screenshot Response:** Returns `success: true` with a `screenshot` object (see [Workflow 2](#workflow-2-screenshot-capture-and-analysis) for format).

### Workflow 10: Wait with Screenshot

Wait for a duration and optionally capture state.

```
// Wait 2 seconds and capture screenshot
const waitResult = await client.display.input.wait({
  duration: 2000,
  screenshot: true
})
```

**Wait Response:** Returns `success: true`, `waited` (duration in ms), and optionally a `screenshot` object (see [Workflow 2](#workflow-2-screenshot-capture-and-analysis) for format).

### Workflow 11: Emergency Input Reset

Release all stuck inputs when automation fails.

```
// Emergency reset - release all inputs
await client.display.input.reset()
```

**Reset Response:** Returns `success: true` with a confirmation message.

### Workflow 12: Display Geometry and Thumbnails

Get display dimensions and efficient preview images.

```
// Get display dimensions
const geometry = await client.display.input.geometry()

// Capture thumbnail (320x180 scaled)
const thumbnail = await client.display.thumbnails.capture()

// Get latest thumbnail
const latestThumb = await client.display.thumbnails.getLatest()

// Get thumbnail by timestamp
const historicalThumb = await client.display.thumbnails.getByTimestamp({ timestamp: "1705312200" })
```

**Display Geometry Response:**
```
{
  "width": 1920,
  "height": 1080
}
```

### Error Recovery Pattern

```
async function safeAction(action: () => Promise<any>, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await action()
    } catch (error) {
      console.error(`Attempt ${attempt} failed:`, error)
      
      if (attempt === maxRetries) {
        throw error
      }
      
      // Reset inputs on failure
      await client.display.input.reset()
      
      // Wait before retry
      await client.display.input.wait({ duration: 1000 })
    }
  }
}

// Usage
await safeAction(async () => {
  await client.display.input.typeAt({
    x: 400,
    y: 200,
    text: "test input"
  })
})
```

### Performance Considerations

1. **Use Thumbnails**: For monitoring, use thumbnails instead of full screenshots
2. **Batch Operations**: Combine multiple actions into batch calls
3. **Cache Screenshots**: Use `getLatest()` instead of `capture()` when recent state is sufficient
4. **Metadata Only**: Use `captureMetadata()` when you only need dimensions/timestamp
5. **Reset on Failure**: Always call `reset()` after errors to release stuck inputs

---

## Quick Reference

### Most Common Endpoints

| Operation | SDK Method |
|-----------|------------|
| Health Check | `client.display.health.check()` |
| Capture Screenshot | `client.display.screenshots.capture()` |
| Get Latest Screenshot | `client.display.screenshots.getLatest()` |
| Mouse Click | `client.display.input.mouseClick()` |
| Mouse Move | `client.display.input.mouseMove()` |
| Type Text | `client.display.input.keyboardType()` |
| Press Key | `client.display.input.keyboardKey()` |
| List Windows | `client.display.listWindows()` |
| Focus Window | `client.display.input.windowFocus()` |
| Get Clipboard | `client.display.getClipboard()` |
| Set Clipboard | `client.display.setClipboard()` |
| Reset Inputs | `client.display.input.reset()` |

### Essential Parameters

**Mouse Operations:**
- `x`, `y`: Integer coordinates (absolute position)
- `button`: `"left"`, `"right"`, or `"middle"`
- `direction`: `"up"`, `"down"`, `"left"`, `"right"` (for scroll)

**Keyboard Operations:**
- `text`: String to type
- `keys`: Array of key combinations (e.g., `["ctrl+c"]`, `["Return"]`)
- `key`: Single key name (e.g., `"Shift_L"`, `"ctrl"`)

**Window Operations:**
- `windowId`: Window identifier (e.g., `"0x04000003"`)
- `pattern`: Search pattern for window names

### Error Responses

```
{
  "success": false,
  "error": "Window not found",
  "code": "WINDOW_NOT_FOUND"
}
```

### Display Client Access

Access the HTML5 web interface directly:

```
// Get display client URL with options
const clientUrl = await client.display.accessClient({
  readonly: false,
  toolbar: true,
  keyboard: true,
  clipboard: true
})
```

This returns the URL to open in a browser for direct visual interaction with the desktop environment.