Cerebras Sets New AI Speed Record on MBZUAI and G42’s K2 Think at 2,000 Tokens/Second Inference Performance

News > Technology News

Audio By Carbonatix

9:00 AM on Wednesday, September 10

The Associated Press

SUNNYVALE, Calif. & ABU DHABI, United Arab Emirates--(BUSINESS WIRE)--Sep 10, 2025--

Cerebras Systems, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42 today announced that K2 Think, a leading open-source advanced reasoning model, is now available at the industry’s fastest speeds of 2,000 tokens/second on Cerebras Inference.

With just 32 billion parameters, K2 Think rivals the performance of much larger models like OpenAI’s GPT-4 and DeepSeek-V3.1, while running over 6x faster at 2,000 tokens per second on Cerebras’ wafer-scale infrastructure, redefining speed, cost, and efficiency in frontier AI reasoning.

“K2 Think on Cerebras represents a breakthrough moment for open-source advanced AI reasoning models,” said Andrew Feldman, CEO and Co-Founder of Cerebras. “MBZUAI and G42 have proven that you don’t need 100B+ parameters to achieve top-tier performance, you need the right architecture and the right infrastructure. And when you pair an ultra-efficient model like K2 Think with Cerebras’ world-fastest inference engine, the result is nothing short of revolutionary. It’s the new standard for fast, smart, open AI.”

Built for advanced chain-of-thought reasoning, agentic planning, and hard-problem solving in math, science, and code, K2 Think is trained with verifiable reward signals and optimized for speculative decoding, a combination that allows it to decompose complex tasks and reason through them with precision. With Cerebras’ wafer-scale inference architecture, enterprises and developers worldwide can access this power instantly, affordably, and at unprecedented scale.

A New Era of Reasoning AI: Fast, Open, Scalable

K2 Think has achieved top math reasoning performance on competitive benchmarks including AIME ’24/’25, HMMT ’25, and OMNI-Math-HARD. Its compact 32B architecture makes it easier to deploy, fine-tune, and scale, while full open-source transparency, - weights, training data, code, and test-time optimizations - invites global collaboration and scientific reproducibility.

Peng Xiao, Board Member, MBZUAI, and Group CEO, G42, said: “K2 Think reflects our commitment to building advanced AI that is both open and efficient. By pairing our compact, reasoning-first architecture with Cerebras’ record-breaking inference platform, we’re enabling the global community to access frontier reasoning at unprecedented speed and scale.”

Now running on Cerebras Inference Cloud in collaboration with G42, K2 Think is available to developers and researchers via API with no refactoring needed. A simple endpoint swap delivers instant access to world-class reasoning at industry-leading speeds.

Use cases include:

Live math tutors and code assistants
Real-time Q&A on long documents and technical content
Agentic multi-step planners and reasoning chains
Scientific research copilots for fields like physics and biology

“K2 Think sets a new benchmark for reproducible, high-performance, advanced AI reasoning in an extraordinarily parameter-efficient package,” said Prof. Eric Xing, President and University Professor at MBZUAI. “In partnership with Cerebras, we are delivering not just a model, but a fully open, end-to-end system for advanced reasoning that the global AI community can use, inspect, and extend. This marks a leap forward in transparency, speed, and opportunities for real-world application.”

Developers and enterprises can access K2 Think today. A free API key is available at https://www.cerebras.ai/.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world’s largest and fastest commercially available AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on-premises. For further information, visit cerebras.ai or follow us on LinkedIn, X and/or Threads.

View source version on businesswire.com:https://www.businesswire.com/news/home/20250910137362/en/

CONTACT: Media Contact

[email protected]

KEYWORD: UNITED STATES UNITED ARAB EMIRATES NORTH AMERICA MIDDLE EAST CALIFORNIA

INDUSTRY KEYWORD: SOFTWARE INTERNET HARDWARE ARTIFICIAL INTELLIGENCE DATA MANAGEMENT TECHNOLOGY SEMICONDUCTOR MOBILE/WIRELESS

SOURCE: Cerebras Systems

PUB: 09/10/2025 09:00 AM/DISC: 09/10/2025 09:00 AM

http://www.businesswire.com/news/home/20250910137362/en