Tongyi Embedding Vision Plus

Tongyi Embedding Vision Plus
Alibaba Cloud · Embeddings
POST /v1/embeddings

Multimodal embedding producing independent vectors for text, image, and video inputs.

At a glance

FieldValue
Model idtongyi-embedding-vision-plus
Input modalitiesText, Image, Video
Output modalitiesEmbedding
Context window1024
Weight precision-
RegionSingapore
Featuresmultimodal, independent vectors
Native inferenceNo
NewYes
Supported endpointsPOST /v1/embeddings

Pricing

ChargeSpecRate
Text inputper 1M tokens$0.09
Image / video inputper 1M tokens$0.09

Example request

$curl https://api.empiriolabs.ai/v1/embeddings \
> -H 'Authorization: Bearer $EMPIRIOLABS_API_KEY' \
> -H 'Content-Type: application/json' \
> -d '{"model": "tongyi-embedding-vision-plus", "input": [{"type":"text","text":"Embed me."},{"type":"image","url":"https://media.empiriolabs.ai/example.jpg"}]}'

Parameters

ParameterTypeRequiredDefaultDescription
inputstringyes-Either an OpenAI-style part array [{type:'image',url:...},{type:'text',text:...}] or a native part list {contents:[{image:'...'},{text:'...'}]}. Up to 8 images at 3 MB each, video up to 10 MB, text up to 1024 tokens.
userstringno-Optional caller identifier.

Notes

Output

  • Fixed 1152-dim vector per input (no fusion across modalities)

Per-input limits

  • Text: up to 1,024 tokens
  • Image: up to 8 per request, 3 MB each (JPG, PNG, BMP)
  • Video: up to 10 MB per file (MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV)

Languages

  • Chinese, English

Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/tongyi-embedding-vision-plus.