Tongyi-Embedding-Vision-Flash

Tongyi-Embedding-Vision-Flash
Alibaba Cloud · Embedding
POST /v1/chat/completions

Speed-optimised multimodal embedding model. Same input/output shape as Tongyi-Embedding-Vision-Plus but with cheaper image/video pricing — ideal for high-volume image and video indexing workflows. For text-only embeddings, use Text-Embedding-v4 instead.

At a glance

FieldValue
Model idtongyi-embedding-vision-flash
Input modalitiestext, image, video
Output modalitiesembedding
Context window1024
RegionSingapore
Featuresmultimodal, independent vectors, low cost
NewYes
Native inferenceNo
Supported endpointsPOST /v1/embeddings

Pricing

ChargeSpecRate
Text inputper 1M tokens$0.09
Image / video inputper 1M tokens$0.03

Example request

$curl https://api.empiriolabs.ai/v1/chat/completions \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "tongyi-embedding-vision-flash", "messages": [{"role":"user","content":"Hello"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
inputarrayyesArray of content parts. Either OpenAI shape [{type:"image",url:"..."},{type:"text",text:"..."}] or DashScope shape {contents:[{image:"..."},{text:"..."}]}. Up to 8 images @3 MB each, video up to 10 MB, text up to 1024 tokens.
userstringno

Notes

Embedding dimension: fixed at 768.\n\nPer-input limits:\n\n- Text: up to 1,024 tokens\n- Images: up to 8 per request, max 3 MB each (JPG, PNG, BMP)\n- Video: up to 10 MB per file (MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV)\n\nPricing: image/video tokens bill at 0.03/1M(3xcheaperthanthePlusvariant).Texttokensbillat0.03/1M (3x cheaper than the Plus variant). Text tokens bill at 0.09/1M — same as Plus, so prefer Text-Embedding-v4 for text-only workloads.\n\nLanguages: Chinese and English.


Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/tongyi-embedding-vision-flash.