Quick Links
Summary
- Qwen Max is a powerful AI model with support for 29 languages and can handle 128,000 tokens.
- Versions of the AI model like Qwen2.5-Coder-32B excel at coding, while QwQ-32B can think and reason.
- Qwen outperforms DeepSeek in benchmarks, offering quicker responses and better alignment with human preferences.
Before DeepSeek could fade out the headline, another Chinese AI model has come to upseat it. Chinese e-commerce giant Alibaba has announced a new version of its Qwen family of AI models, and there’s a lot to like—it’s better than DeepSeek and ChatGPT in some key areas.
What Is Qwen Max?
Qwen Max is the latest and the most powerful AI model in Alibaba’s Qwen AI family. Currently, the following Qwen AI models are available:
- Qwen2.5-Plus
- Qwen2.5-Max
- Qwen2.5-VL-72B-Instruct
- Qwen2.5-14B-Instruct-1M
- QVQ-72B-Preview
- QwQ-32B-Preview
- Qwen2.5-Coder-32B-Instruct
- Qwen2.5-Turbo
- Qwen2.5-72B-Instruct
All of the models above are free to use once you’ve created an account via email, Google, or GitHub. Qwen’s AI models are also open-source, meaning you can find them on GitHub or HuggingFace. You can also install them locally on your device (depending on its specs), allowing you to run the AI offline.
Qwen2.5-Max is a 72-billion parameter Mixture-of-Experts (MoE) model, supports 29 languages, and is trained on over 20 trillion tokens. It can also handle up to 128,000 tokens in a single conversation, meaning running lengthy documents through the AI will not be an issue. If you’re working with data, Qwen can process structured formats like tables, CSVs, and JSON files.
As the names suggest, some Qwen models are better at specific tasks. For example, Qwen2.5-Coder-32B-Instruct excels at coding tasks, while QwQ-32B-Preview is capable of thinking and reasoning. Not all of the models can do everything, but most models can handle text prompts, as well as image and video generation.
Another rather unique feature is the ability to combine two models together. In my experience, you can get slightly better results when combining two versions into a stronger pairing. For example, pairing Qwen2.5-Max with Qwen2.5-Coder-32B-Instruct helped me generate code with fewer prompts and issues in the output.
Qwen’s website is the only place to officially access the AI model. You can type in prompts and work with text, but there are image and video generation capabilities as well and in multiple aspect ratios. There’s a Web Search feature that has yet to be launched.
On the downside, sometimes, Qwen takes a while to process your prompts. So much so that, at first, I thought the website wasn’t functioning correctly. I found that the first prompt you send in a conversation can take about 30 seconds to generate a response, after which the responses speed up.
Images and videos are generated faster than I expected. They aren’t top-of-the-line when it comes to quality or realism, but if you need to generate a quick image in a pinch, they’ll do. You can expect a fair amount of random artifacts in most generated media as well.
The increased response time could just be because of server load, as was the case with almost every AI chatbot at launch, including DeepSeek and ChatGPT. I did receive errors connecting to Qwen as there were too many requests in the queue from time to time.
Is Qwen Better Than DeepSeek?
Technically speaking, Qwen is better than DeepSeek across the board. Alibaba’s model feels more natural to interact with and runs ever so slightly faster. However, if you were to ignore benchmark results, you’d be hard-pressed to find differences between the two.
Qwen’s major advantage over DeepSeek is its better alignment with human preferences, making it easier to type in more complicated prompts and get accurate responses without much fine-tuning. Even simple one-liners can generate quite a detailed response with lots of information. DeepSeek, while a terrific AI model, can take a few tries and some prompt engineering before you get your desired results.
When it comes to general knowledge and factual accuracy, both models perform similarly, but Qwen does have a slight edge when it comes to factual consistency.
One area where DeepSeek is the clear winner, though, is the usage cost. DeepSeek costs $0.25 per million tokens, while Qwen costs $0.38. That said, it’s still significantly cheaper than the $5 and $3 rates offered by GPT-4o and Claude 3.5, respectively.
Benchmark Comparisons
As mentioned before, Qwen outperforms DeepSeek pretty much across the board when it comes to benchmarks.
Benchmark |
Qwen 2.5 Max |
DeepSeek V3 R1 |
---|---|---|
Arena-Hard |
89.4 |
85.5 |
MMLU-Pro |
76.1 |
75.9 |
GPQA-Diamond |
60.1 |
59.1 |
LiveCodeBench |
38.7 |
37.6 |
LiveBench |
62.2 |
60.5 |
The benchmarks clearly show Qwen’s better understanding and alignment with human values. Apart from that, in terms of knowledge and reasoning, general knowledge, coding, and overall ability, it’s only slightly better than DeepSeek.
These two AI models from China have introduced a new benchmark for AI development. There are security and privacy concerns, though, especially considering DeepSeek has already suffered its first data breach. Still, Qwen and DeepSeek’s AI models are clearly better than their Western counterparts in terms of performance and have really put the AI world on edge.