Cerebras Selects Qualcomm to Deliver Unprecedented Performance in AI Inference
Best-in-class solution
developed with Qualcomm® AI Cloud 100 Ultra offers up to 10x number of tokens
per dollar, radically lowering operating costs of AI deployment
Bangalore,
India. – March 15, 2024 – Cerebras
Systems, a pioneer in accelerating generative artificial
intelligence (AI), today announced the company’s plans to deliver
groundbreaking performance and value for production artificial intelligence (AI).
By using Cerebras’ industry-leading CS-3 AI accelerators for training with the AI 100 Ultra, a product of
Qualcomm Technologies, Inc., for inference, production grade
deployments can realize up to a 10x price-performance improvement.
“These joint efforts are aimed at ushering in
a new era of high-performance low-cost inference and the timing couldn’t be
better. Our customers are focused on training the highest quality
state-of-the-art models that won’t break the bank at time of inference,” said
Andrew Feldman, CEO and co-founder of Cerebras. “Utilizing the AI 100 Ultra
from Qualcomm Technologies, we can radically reduce the cost of inference –
without sacrificing model quality -- leading to the most efficient deployments
available today.”
Leveraging the latest cutting-edge ML
techniques and world-class AI expertise, Cerebras will work with Qualcomm
Technologies’ AI 100 Ultra to speed up AI inference. Some of the advanced
techniques to be used are as follows:
- Unstructured Sparsity: Cerebras and
Qualcomm Technologies solutions can perform training and inference using unstructured,
dynamic sparsity – a hardware accelerated AI technique that dramatically
improves performance efficiency. For example, a Llama 13B model trained on
Cerebras hardware with 85% sparsity trains 3-4x faster and using AI 100 Ultra inference
generates tokens with a 2-3x higher throughput.
-
Speculative Decoding: this advanced
AI technique marries the high throughput of a small LLM with the accuracy of a
large LLM. The Cerebras Software Platform can automatically train and generate
both models, which are seamlessly ingested via the Qualcomm® AI Stack, a
product of Qualcomm Technologies. The resulting model can output tokens at up
to 2x the throughput with uncompromised accuracy.
-
-
Efficient MX6 inference: The AI 100
Ultra supports MX6, an industry standard micro-exponent format that performs
high accuracy inference using half the memory footprint and twice the
throughput of FP16.
-
NAS service from Cerebras: Using
Network Architecture Search for targeted use cases the Cerebras platform can
deliver models that are optimized for the Qualcomm AI architecture leading to
up to 2x higher inference performance.
A combination of these and other advanced
techniques are designed to allow the Cerebras and Qualcomm Technologies
solutions to deliver an order of magnitude performance improvement while enabling
it at model release, resulting in inference-ready models that can be deployed
on Qualcomm cloud instances anywhere.
“The combination of Cerebras’ AI training
solution with the AI 100 Ultra helps deliver industry leading perf/TCO$ for AI
Inference, as well as optimized and deployment-ready AI models to customers
helping reduce time to deployment and time to RoI,” said Rashid Attar, Vice
President, Cloud Computing, Qualcomm Technologies, Inc.
By training on Cerebras, customers can now unlock
massive performance and cost advantages with inference-aware training. Models
trained on Cerebras are optimized to run inference on the AI 100 Ultra leading to
friction-free deployments.
“AI
has become a key part of pharmaceutical research and development, and the cost
of operating models is a critical consideration in the research budget,” said
Kim Branson, Sr. Vice President and Global Head of AI/ML at GlaxoSmithKline.
“Techniques like sparsity and speculative decoding that make inference faster
while lowering operating costs are critical: this allows everyone to integrate
and experiment with AI."
For more information on the Qualcomm
Technologies and Cerebras AI training and inference solutions, please visit the
Cerebras blog. The
combined Cerebras CS-3 for AI training and Qualcomm AI 100 Ultra for inference
at scale will be available in Q2/Q3 2024.
About Cerebras Systems
Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building a new class of computer system. Our flagship product, the CS-2 system, is powered by the world’s largest and fastest AI processor, our Wafer-Scale Engine. It makes training large models simple and easy by avoiding the complexity of distributed computing. Cerebras CS-2s are clustered together to make the largest AI supercomputers in the world, which are used by leading corporations for proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on premise. For further information, visit https://www.cerebras.net.
Qualcomm Cloud
AI and Qualcomm AI Stack are products of Qualcomm Technologies, Inc., and/or
its subsidiaries.