Cerebras, Petuum, and MBZUAI Announce New Open-Source CrystalCoder and LLM360 Methodology to Accelerate Development of Transparent and Responsible AI Models

CrystalCoder-7B and LLM360 Release Enable Next Era of Open-Source Contributions Through Reproducible Methodology Available to All AI Researchers and Practitioners

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, the pioneer in accelerating generative AI, and Petuum, the generative AI company focused on building transparent LLMs, in partnership with MBZUAI today launched CrystalCoder, a new 7 billion parameter model designed for English language and coding tasks. While previous models were suitable for either English or coding, CrystalCoder achieves high accuracy for both tasks simultaneously. Trained on Condor Galaxy 1, the AI supercomputer built by G42 and Cerebras, CrystalCoder-7B has been released under the new reproducible LLM360 methodology that promotes open source and transparent, responsible use. CrystalCoder and the LLM360 release methodology are available now on Hugging Face.

As open-source models gain parity with closed-source LLMs in terms of performance and accuracy, leading open-source developers are increasingly releasing checkpoints and recipes to help others study LLM training and promote collaboration and education across the research community. The LLM360 methodology, developed by Petuum, MBZUAI and Cerebras, is a novel approach to advancing transparency and safety by open sourcing more of the model ingredients so that work is reproducible by others. In addition to releasing weights under the Apache 2.0 license, the model recipe, and a paper, the LLM360 methodology open sources the training code, up to 360 training checkpoints, pre-processing scripts, data buckets, and analytics tools:

Model: Releases 360 checkpoints across the training run
Data: Provides access to the data buckets for each checkpoint
Code: Provides pre-processing, training, inference code, and analysis code (if applicable)
Metrics: All the training logs, evaluations and analysis results collected during training are publicly disclosed, also in correspondence to the training steps and data sequence

“Cerebras is proud to be the inaugural hardware partner for LLM360 and, in partnership with Petuum, to release the first model under this methodology, CrystalCoder-7B,” said Andrew Feldman, CEO and co-founder, Cerebras Systems. “We believe that transparency and reproducibility matter as much as model quality for the safe advancement of AI. We look forward to seeing more models released to the open source in this manner.”

In coding tasks, CrystalCoder approaches StarCoder-base in accuracy while in language it is comparable to Llama and MPT-7B. The significance of this model is that it is optimal for both coding and language tasks, better at coding than the best language models, and better at language than the best coding models. While previously developers had to choose between coding or language, Crystal Code is optimal for both of these tasks.

"Petuum and MBZUAI are excited to announce the release of the CrystalCoder-7B large language model (LLM). This groundbreaking collaboration, strengthened by our partnership with Cerebras, marks a significant milestone in the field of advanced open-source LLMs. CrystalCoder stands out due to its meticulously balanced and carefully selected data sets, unparalleled performance and reliability on language and code tasks,” said Hector Liu, Head of Engineering at Petuum and LLM Team Lead at MBZUAI. “Additionally, the unique design of Cerebras’s Condor Galaxy 1 transforms previously daunting large-scale training challenges into manageable tasks, setting new standards for efficiency and effectiveness in LLM training."

CrystalCoder-7B is the latest in a family of leading open-source models co-developed by Cerebras, including Jais 13B and Jais30B, the best bilingual Arabic models in the world created in partnership with Core42, now available on Azure Cloud. In June, Cerebras released BTLM-3B-8K, which is the number one leading 3B model on HuggingFace, offering 7B parameter performance in a light 3B parameter model for inference. Med42, developed with M42 and Core42, is a leading clinical LLM, trained on Condor Galaxy 1 in a weekend and surpassing MedPaLM on performance and accuracy. In March, Cerebras released the first open-source family of GPT models, named Cerebras-GPT, followed by the release of the SlimPajama dataset, the best LLM dataset with highest training efficiency.

For more information on CrystalCoder-7B and LLM360 Methodology, please visit llm360.ai.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computer system, designed for the singular purpose of accelerating generative AI. Our flagship product, the CS-2 system, powered by the world’s largest and fastest AI processor, makes training large models simple and easy, by avoiding the complexity of distributed computing. Cerebras solutions are available in the cloud, through the Cerebras AI Model Studio or on premises. For further information, visit https://www.cerebras.net.

Contacts

Kim Ziesemer
Pr@zmcommunications.com

Industry:

More News From Cerebras Systems

Cerebras Systems Announces Launch of Initial Public Offering

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, Inc. (“Cerebras”) today announced that it plans to commence the roadshow for its proposed initial public offering of its Class A common stock. Cerebras has filed a registration statement on Form S-1 with the U.S. Securities and Exchange Commission (the “SEC”) to offer an aggregate of 28,000,000 shares of its Class A common stock to the public. In addition, Cerebras intends to grant the underwriters a 30-day option to purchase up to an additi...

Cerebras Systems Announces Filing of Registration Statement for Proposed Initial Public Offering

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems Inc. (“Cerebras”) today announced that it has filed a registration statement on Form S-1 with the U.S. Securities and Exchange Commission (“SEC”) relating to a proposed initial public offering of its Class A common stock. The number of shares of Class A common stock to be offered and the price range for the proposed offering have not yet been determined. The offering is subject to market conditions, and there can be no assurance as to whether...

Cerebras Systems Closes $850 Million Revolving Credit Facility

SUNNYVALE, Calif.--(BUSINESS WIRE)--Cerebras Systems, makers of the fastest AI infrastructure in the industry, today announced the closing of a new five-year syndicated revolving credit facility for up to $850 million. This follows the company’s $1 billion Series G financing closed in September 2025, and an additional $1 billion Series H in January 2026. “We are pleased to have closed our inaugural credit facility with the support of a syndicate of leading financial institutions,” said Bob Komi...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

Cerebras, Petuum, and MBZUAI Announce New Open-Source CrystalCoder and LLM360 Methodology to Accelerate Development of Transparent and Responsible AI Models

Contacts

Cerebras Systems

Contacts

Cerebras Systems Announces Launch of Initial Public Offering

Cerebras Systems Announces Filing of Registration Statement for Proposed Initial Public Offering

Cerebras Systems Closes $850 Million Revolving Credit Facility

Cerebras Systems

Contacts