Amazon Signs Inference Chip Deal With Cerebras Systems

Amazon said it has reached a deal with Cerebras Systems to bring Cerebras’ artificial intelligence inference chips to Amazon Web Services, expanding the menu of specialized hardware available to customers building and running generative AI applications in the cloud.
The arrangement will make Cerebras chips accessible through AWS, according to reports from The Wall Street Journal and Reuters, and statements from Cerebras describing the company’s technology coming to AWS. The companies are positioning the partnership around AI inference, the stage where trained models generate outputs such as text, images, or code in response to user prompts.
Amazon, through AWS, is a major provider of cloud computing infrastructure for businesses, governments, and developers. Cerebras Systems is known for building large-scale AI chips and systems designed to accelerate machine learning workloads. The companies said the deal focuses on improving performance for inference workloads, which have become a central cost and capacity concern as generative AI tools are deployed more broadly.
The move adds another option for AWS customers seeking higher speed and efficiency as they serve AI-powered features to end users. Inference can require massive compute resources when applications scale, creating demand for hardware optimized for running models quickly and economically.
For AWS, offering Cerebras hardware alongside other cloud compute options is a way to compete in a market where customers increasingly evaluate cloud providers based on their ability to deliver AI performance and availability. For Cerebras, the deal places its chips within one of the largest cloud platforms, potentially widening access to its technology for enterprises that prefer to consume AI infrastructure as a cloud service rather than buying and operating their own systems.
The partnership also underscores the growing specialization inside cloud data centers, where providers are expanding beyond general-purpose computing into tailored systems built for AI. As more companies move from experimentation to production AI, inference performance and cost have become practical constraints that can shape user experience, latency, and operating budgets.
Details about customer availability, pricing, and the specific AWS service path for accessing Cerebras inference hardware were not fully laid out in the provided context. The companies have framed the agreement as an expansion of infrastructure choices for customers running generative AI workloads.
Next, AWS customers and developers will be looking for concrete rollout timelines, regions where the capacity will be offered, and technical documentation showing how Cerebras-based inference integrates with existing AWS tools and model deployment workflows. Enterprises evaluating the offering will also seek benchmarks and service-level commitments once the capability is generally available.
The deal marks another step in the intensifying push by cloud providers and chipmakers to supply the infrastructure that will run the next wave of AI applications at scale.
