Alibaba Cloud

Models from Alibaba Cloud

HappyHorse-1.0
Video model offering Text-to-Video, Image-to-Video, Reference-to-Video, and Video Edit modes with high-fidelity, motion-smooth output.
Qwen-Image-2.0
Unified image generation and editing model with class-leading complex Chinese/English text rendering, realistic textures, and multi-image fusion.
Qwen3-Max
256K-context flagship with major improvements in reasoning, instruction following, and multilingual support, plus higher coding/math accuracy.
Qwen3-Max-Preview
Preview release with major gains over the 2.5 series in Chinese-English understanding, complex instructions, multilingual ability, and tool use.
Qwen3-Max-Thinking
Reasoning model with adaptive tool use (search, memory, code interpreter) and test-time scaling for higher accuracy on complex tasks.
Qwen3.5-Flash
Vision-language model with hybrid linear-attention plus sparse MoE, 1M context, and fast multimodal text/image/video inference.
Qwen3.5-Omni-Flash
Cost-efficient omni-modal model handling text, image, audio, and video, with up to 3 hours of audio and 1 hour of video across 90+ languages.
Qwen3.5-Omni-Plus
Flagship omni-modal model for text, image, audio, and video. 3h audio, 1h video, 90+ input and 30+ output languages, 55 voice timbres.
Qwen3.5-Plus
Multimodal model with hybrid architecture for efficient deep thinking and visual understanding across text, image, and video on a 1M context.
Qwen3.6-Max-Preview
Largest preview variant in the 3.6 series (text-only): improved coding agent execution, stronger front-end skills, and broader long-tail knowledge.
Qwen3.6-Plus
Vision-language model with major upgrades over 3.5: agentic and front-end coding, multimodal recognition, OCR, and object localization.
Wan-2.6
Multimodal video generation model for cinematic, multi-shot stories with native audio-visual sync (lip-sync, dialogue, music, SFX).
Wan-2.7
Multimodal video model supporting T2V, I2V, video editing, and reference-to-video, with high-fidelity output from text, image, or video inputs.
Wan2.7-Image
Image generation and editing companion model: text-to-image, bounding-box edits, and cohesive image sets, with up to 4K output on Pro.