DeepSeek is back: China's AI claims to surpass ChatGPT and Gemini in key benchmarks

14 hours ago 1

ARTICLE AD BOX

DeepSeek has introduced its new DeepSeek-V4 AI models, comprising Pro and Flash versions. The new model claims to compete with ChatGPT and Gemini in many key benchmarks.

DeepSeek has released its latest V4 series models to take on ChatGPT and Gemini

Chinese AI startup DeepSeek has officially released preview versions of its highly anticipated DeepSeek-V4 models. The much awaited update from DeepSeek comes more than a year after its R1 and V3 models went viral last year and broke all notions of US supremacy in the AI race.

The latest model from DeepSeek comes with significant architectural upgrades, multiple reasoning modes, and a massive one-million token context window.

DeepSeek's new AI model:

The new DeepSeek-V4 series of models is split into a Pro and Flash model. The flagship DeepSeek-V4-Pro features a massive 1.6 trillion total parameters while the V4-Flash is a smaller model with 284 billion parameters.

Both models support an ultra-long context length of one million tokens (approximately 750,000 words)

The new DeepSeek-V4 models come in three reasoning modes: Non-think, Think High and Think Max. DeepSeek says the Non-think mode is aimed for daily tasks and low-risk decisions while Think High is for questions that require Complex problem-solving and planning. Meanwhile, the Think Max is for handling hardest coding and math problems.

In a Hugging Face page for the model, DeepSeek says that the V-4 Pro Max and V4 Pro “significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks”

DeepSeek vs ChatGPT vs Gemini vs Claude:

DeepSeek also revealed the benchmark data for its new model vs existing models from rivals such as OpenAI's GPT-5.4, Anthropic's Claude Opus 4.6, and Google's Gemini 3.1 Pro.

DeepSeek-V4-Pro-Max leads in coding and mathematical performance, topping the Apex Shortlist, a benchmark focused on high-difficulty reasoning and problem-solving, with a score of 90.2%. It also achieves a Codeforces rating of 3206, which shows a strong real-world competitive programming ability, and ties for first place on SWE Verified, a benchmark which evaluates performance on practical software engineering tasks.

However, the model lags behind its American counterparts in general knowledge and broader reasoning. Gemini 3.1 Pro leads on SimpleQA-Verified, a benchmark designed to test factual accuracy and question answering, while GPT-5.4 ranks highest on Terminal Bench 2.0, which measures how effectively models can use tools and operate in agent-like environments.

DeepSeek says the V4-Pro-Max achieves these results while being far more efficient, using nearly 10 times less memory than its V3.2 model when handling long inputs.

Benchmark (Category)DeepSeek-V4-Pro MaxGPT-5.4 xHighClaude Opus 4.6 MaxGemini 3.1 Pro High

Codeforces Rating (Coding)	3206	3168	-	3052
Apex Shortlist (Math/Coding)	90.2%	78.1%	85.9%	89.1%
SWE Verified (Agentic Coding)	80.6%	-	80.8%	80.6%
MMLU-Pro (Knowledge)	87.5%	87.5%	89.1%	91.0%
SimpleQA-Verified (Accuracy)	57.9%	45.3%	46.2%	75.6%
GPQA Diamond (Reasoning)	90.1%	93.0%	91.3%	94.3%
Terminal Bench 2.0 (Agentic)	67.9%	75.1%	65.4%	68.5%
Toolathlon (Tool Use)	51.8%	54.6%	47.2%	48.8%

About the Author

Aman Gupta

Aman Gupta is a Digital Content Producer at LiveMint with over 3.5 years of experience covering the technology landscape. He specializes in artificial intelligence and consumer technology, reporting on everything from the ethical debates around AI models to shifts in the smartphone market. <br> His reporting is grounded in first-hand testing, independent analysis, and a focus on how technology impacts everyday users. He holds a PG Diploma in Radio and Television Journalism from the Indian Institute of Mass Communication, Delhi (Class of 2022). <br> Outside the newsroom, he spends his time reading biographies, hunting for the perfect coffee beans, or planning his next trip. <br><br> You can find Aman on <a href="https://www.linkedin.com/in/aman-gupta-894180214">LinkedIn</a> and on X at <a href="https://x.com/nobugsfound">@nobugsfound</a>, or reach him via email at <a href="aman.gupta@htdigital.in">aman.gupta@htdigital.in</a>.

Read Entire Article