Tongyi-Embedding-Vision-Flash
Tongyi-Embedding-Vision-Flash

POST /v1/chat/completionsSpeed-optimised multimodal embedding model. Same input/output shape as Tongyi-Embedding-Vision-Plus but with cheaper image/video pricing — ideal for high-volume image and video indexing workflows. For text-only embeddings, use Text-Embedding-v4 instead.
At a glance
Pricing
Example request
Parameters
Notes
Embedding dimension: fixed at 768.\n\nPer-input limits:\n\n- Text: up to 1,024 tokens\n- Images: up to 8 per request, max 3 MB each (JPG, PNG, BMP)\n- Video: up to 10 MB per file (MP4, MPEG, MOV, MPG, WEBM, AVI, FLV, MKV)\n\nPricing: image/video tokens bill at 0.09/1M — same as Plus, so prefer Text-Embedding-v4 for text-only workloads.\n\nLanguages: Chinese and English.
Machine-readable schema: GET https://api.empiriolabs.ai/v1/models/tongyi-embedding-vision-flash.
