Tongyi Embedding Vision Flash

Tongyi Embedding Vision Flash
Alibaba Cloud · Embeddings
POST /v1/embeddings

Speed-optimised multimodal embedding — same shape as Vision-Plus, 3× cheaper image/video tokens.

At a glance

FieldValue
Model idtongyi-embedding-vision-flash
Model release date2025-09-23
Input modalitiesText, Image, Video
Output modalitiesEmbedding
Context window1024
Weight precision-
RegionSingapore
Featuresmultimodal, independent vectors, low cost
Native inferenceNo
NewYes
Supported endpointsPOST /v1/embeddings

Pricing

ChargeSpecRate
Text inputper 1M tokens$0.09
Image / video inputper 1M tokens$0.03

Example request

$curl https://api.empiriolabs.ai/v1/embeddings \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "tongyi-embedding-vision-flash", "input": [{"type":"text","text":"Embed me."},{"type":"image","url":"https://media.empiriolabs.ai/example.jpg"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
inputstringyes-Either an OpenAI-style part array [{type:'image',url:...},{type:'text',text:...}] or a native part list {contents:[{image:'...'},{text:'...'}]}. Up to 8 images at 3 MB each, video up to 10 MB, text up to 1024 tokens.
userstringno-Optional caller identifier.

Notes

Output

  • Fixed 768-dim vector per input

Per-input limits

  • Text: up to 1,024 tokens
  • Image: up to 8 per request, 3 MB each (JPG, PNG, BMP)
  • Video: up to 10 MB per file (MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV)

Languages

  • Chinese, English

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/tongyi-embedding-vision-flash.