Vision Support
Emby supports vision-enabled models that can understand, analyze, describe, and transform images.You can send images using HTTPS URLs or inline base64—fully compatible with the OpenAI Messages format your tools already use. Vision models such as Qwen-VL, DeepSeek-Vision, GLM-VL, and others are available directly through:
Vision-Enabled Models
Any model with"vision": true in the Emby model list can:
- Analyze images
- Describe objects, scenes, and text
- Perform comparisons
- Extract information (OCR-like)
- Modify or process image inputs
- Work with mixed text + image messages
Sending Images to Vision Models
Images can be passed in two formats:HTTPS URL
Provide a publicly reachable HTTPS image URL.
Base64 Data
Send an inline base64-encoded image directly in the request body.
1. Using HTTPS URLs
2. Using Base64 Inline Data
Perfect when the image is local, or cannot be publicly hosted.Content Format (Mixed Input)
Vision models requirecontent to be an array when sending images.
Types:
-
Text block:
-
Image block:
Sending Multiple Images
You can include any number of images to compare, combine, or analyze together.Simple Text-Only Messages Still Work
If no images are included, you may still send simple string content:Supported Image Types
Vision models typically support:- JPEG (
.jpg,.jpeg) - PNG (
.png) - WebP (
.webp) - GIF (
.gif)
Error Handling
If an image fails (wrong format, unreachable URL, base64 error), Emby handles it gracefully:Graceful Fallback
Emby returns a helpful error message instead of crashing your request.
Per-model Handling
Some models may attempt inference with remaining images if one fails.
Need help testing vision models?
We’re here to help: WhatsApp us: https://wa.absolum.nlBook a call: https://cal.com/absolum/30min

