Vision Support

Emby supports vision-enabled models that can understand, analyze, describe, and transform images.
You can send images using HTTPS URLs or inline base64—fully compatible with the OpenAI Messages format your tools already use. Vision models such as Qwen-VL, DeepSeek-Vision, GLM-VL, and others are available directly through:

GET https://api.emby.dev/v1/models

Vision-Enabled Models

Any model with "vision": true in the Emby model list can:

Analyze images
Describe objects, scenes, and text
Perform comparisons
Extract information (OCR-like)
Modify or process image inputs
Work with mixed text + image messages

Sending Images to Vision Models

Images can be passed in two formats:

HTTPS URL

Provide a publicly reachable HTTPS image URL.

Base64 Data

Send an inline base64-encoded image directly in the request body.

1. Using HTTPS URLs

curl -X POST "https://api.emby.dev/v1/chat/completions" \
  -H "Authorization: Bearer $EMBY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-vision",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is shown in this image?" },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/dog.jpg"
            }
          }
        ]
      }
    ]
  }'

2. Using Base64 Inline Data

Perfect when the image is local, or cannot be publicly hosted.

curl -X POST "https://api.emby.dev/v1/chat/completions" \
  -H "Authorization: Bearer $EMBY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-vision",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this image" },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/png;base64,<your_base64_here>"
            }
          }
        ]
      }
    ]
  }'

Content Format (Mixed Input)

Vision models require content to be an array when sending images. Types:

Text block:

{ "type": "text", "text": "Your question" }

Image block:

{ "type": "image_url", "image_url": { "url": "<url_or_base64>" } }

Sending Multiple Images

You can include any number of images to compare, combine, or analyze together.

curl -X POST "https://api.emby.dev/v1/chat/completions" \
  -H "Authorization: Bearer $EMBY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-vision",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Compare these two photos." },
          {
            "type": "image_url",
            "image_url": { "url": "https://example.com/img1.jpg" }
          },
          {
            "type": "image_url",
            "image_url": { "url": "https://example.com/img2.jpg" }
          }
        ]
      }
    ]
  }'

Simple Text-Only Messages Still Work

If no images are included, you may still send simple string content:

curl -X POST "https://api.emby.dev/v1/chat/completions" \
  -H "Authorization: Bearer $EMBY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-vision",
    "messages": [
      { "role": "user", "content": "Hello! What models do you support?" }
    ]
  }'

Supported Image Types

Vision models typically support:

JPEG (.jpg, .jpeg)
PNG (.png)
WebP (.webp)
GIF (.gif)

Most Emby-hosted models also support high-resolution input (up to 20–30MB depending on the model).

Error Handling

If an image fails (wrong format, unreachable URL, base64 error), Emby handles it gracefully:

Graceful Fallback

Emby returns a helpful error message instead of crashing your request.

Per-model Handling

Some models may attempt inference with remaining images if one fails.

Typical error example:

{
  "error": {
    "message": "Image could not be fetched or decoded",
    "type": "invalid_image_input"
  }
}

Need help testing vision models?

We’re here to help: WhatsApp us: https://wa.absolum.nl
Book a call: https://cal.com/absolum/30min

Bootup

Features

Integrate

Merchant of Record (MoR)

Vision Support

Vision Support

Vision-Enabled Models

Sending Images to Vision Models

HTTPS URL

Base64 Data

1. Using HTTPS URLs

2. Using Base64 Inline Data

Content Format (Mixed Input)

Sending Multiple Images

Simple Text-Only Messages Still Work

Supported Image Types

Error Handling

Graceful Fallback

Per-model Handling

Need help testing vision models?

Bootup

Features

Integrate

Merchant of Record (MoR)

​Vision Support

​Vision-Enabled Models

​Sending Images to Vision Models

HTTPS URL

Base64 Data

​1. Using HTTPS URLs

​2. Using Base64 Inline Data

​Content Format (Mixed Input)

​Sending Multiple Images

​Simple Text-Only Messages Still Work

​Supported Image Types

​Error Handling

Graceful Fallback

Per-model Handling

​Need help testing vision models?

Vision Support

Vision-Enabled Models

Sending Images to Vision Models

1. Using HTTPS URLs

2. Using Base64 Inline Data

Content Format (Mixed Input)

Sending Multiple Images

Simple Text-Only Messages Still Work

Supported Image Types

Error Handling

Need help testing vision models?