Cerebras Systems Sets Record for Largest AI Models Ever Trained on a Single Device

Single CS-2 system trains multi-billion parameter NLP models including GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B; provides simple setup, faster training and enables switching between models with a few keystrokes

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, the pioneer in high performance artificial intelligence (AI) computing, today announced, for the first time ever, the ability to train models with up to 20 billion parameters on a single CS-2 system – a feat not possible on any other single device. By enabling a single CS-2 to train these models, Cerebras reduces the system engineering time necessary to run large natural language processing (NLP) models from months to minutes. It also eliminates one of the most painful aspects of NLP — namely the partitioning of the model across hundreds or thousands of small graphics processing units (GPU).

“In NLP, bigger models are shown to be more accurate. But traditionally, only a very select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units,” said Andrew Feldman, CEO and Co-Founder of Cerebras Systems. “As a result, only very few companies could train large NLP models – it was too expensive, time-consuming and inaccessible for the rest of the industry. Today we are proud to democratize access to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B, enabling the entire AI ecosystem to set up large models in minutes and train them on a single CS-2.”

“GSK generates extremely large datasets through its genomic and genetic research, and these datasets require new equipment to conduct machine learning,” said Kim Branson, SVP of Artificial Intelligence and Machine Learning at GSK. “The Cerebras CS-2 is a critical component that allows GSK to train language models using biological datasets at a scale and size previously unattainable. These foundational models form the basis of many of our AI systems and play a vital role in the discovery of transformational medicines.”

These world first capabilities are made possible by a combination of the size and computational resources available in the Cerebras Wafer Scale Engine-2 (WSE-2) and the Weight Streaming software architecture extensions available via release of version R1.4 of the Cerebras Software Platform, CSoft.

When a model fits on a single processor, AI training is easy. But when a model has either more parameters than can fit in memory, or a layer requires more compute than a single processor can handle, complexity explodes. The model must be broken up and spread across hundreds or thousands of GPU. This process is painful, often taking months to complete. To make matters worse, the process is unique to each network compute cluster pair, so the work is not portable to different compute clusters, or across neural networks. It is entirely bespoke.

The Cerebras WSE-2 is the largest processor ever built. It is 56 times larger, has 2.55 trillion more transistors, and has 100 times as many compute cores as the largest GPU. The size and computational resources on the WSE-2 enables every layer of even the largest neural networks to fit. The Cerebras Weight Streaming architecture disaggregates memory and compute allowing memory (which is used to store parameters) to grow separately from compute. Thus a single CS-2 can support models with hundreds of billions even trillions of parameters.

Graphics processing units on the other hand have a fixed amount of memory per GPU. If the model requires more parameters than fit in memory, one needs to buy more graphics processors and then spread work over multiple GPUs. The result is an explosion of complexity. The Cerebras solution is far simpler and more elegant: by disaggregating compute from memory, the Weight Streaming architecture allows support for models with any number of parameters to run on a single CS-2.

Powered by the computational capacity of the WSE-2 and the architectural elegance of the Weight Streaming architecture, Cerebras is able to support, on a single system, the largest NLP networks. By supporting these networks on a single CS-2, Cerebras reduces set up time to minutes and enables model portability. One can switch between GPT-J and GPT-Neo, for example, with a few keystrokes, a task that would take months of engineering time to achieve on a cluster of hundreds of GPUs.

“Cerebras’ ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI. It gives organizations that can’t spend tens of millions an easy and inexpensive on-ramp to major league NLP,” said Dan Olds, Chief Research Officer, Intersect360 Research. “It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.”

With customers in North America, Asia, Europe and the Middle East, Cerebras is delivering industry leading AI solutions to a growing roster of customers in the enterprise, government, and high performance computing (HPC) segments including GlaxoSmithKline, AstraZeneca, TotalEnergies, nference, Argonne National Laboratory, Lawrence Livermore National Laboratory, Pittsburgh Supercomputing Center, Leibniz Supercomputing Centre, National Center for Supercomputing Applications, Edinburgh Parallel Computing Centre (EPCC), National Energy Technology Laboratory, and Tokyo Electron Devices.

For more information about the Cerebras Software Platform, please visit https://www.cerebras.net/product-software/.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computer system, designed for the singular purpose of accelerating AI and changing the future of AI work forever. Our flagship product, the CS-2 system is powered by the world’s largest processor – the 850,000 core Cerebras WSE-2, enables customers to accelerate their deep learning work by orders of magnitude over graphics processing units.

Contacts

Media Contact
Kim Ziesemer
pr@zmcommunications.com

Industry:

More News From Cerebras Systems

Cerebras Systems Announces Launch of Initial Public Offering

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, Inc. (“Cerebras”) today announced that it plans to commence the roadshow for its proposed initial public offering of its Class A common stock. Cerebras has filed a registration statement on Form S-1 with the U.S. Securities and Exchange Commission (the “SEC”) to offer an aggregate of 28,000,000 shares of its Class A common stock to the public. In addition, Cerebras intends to grant the underwriters a 30-day option to purchase up to an additi...

Cerebras Systems Announces Filing of Registration Statement for Proposed Initial Public Offering

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems Inc. (“Cerebras”) today announced that it has filed a registration statement on Form S-1 with the U.S. Securities and Exchange Commission (“SEC”) relating to a proposed initial public offering of its Class A common stock. The number of shares of Class A common stock to be offered and the price range for the proposed offering have not yet been determined. The offering is subject to market conditions, and there can be no assurance as to whether...

Cerebras Systems Closes $850 Million Revolving Credit Facility

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, makers of the fastest AI infrastructure in the industry, today announced the closing of a new five-year syndicated revolving credit facility for up to $850 million. This follows the company’s $1 billion Series G financing closed in September 2025, and an additional $1 billion Series H in January 2026. “We are pleased to have closed our inaugural credit facility with the support of a syndicate of leading financial institutions,” said Bob Komi...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Cerebras Systems Sets Record for Largest AI Models Ever Trained on a Single Device

Contacts