DigitalOcean’s Inference Cloud Platform, Powered by AMD Instinct GPUs, Delivers 2X Production Inference Performance for Character.ai
DigitalOcean’s Inference Cloud Platform, Powered by AMD Instinct GPUs, Delivers 2X Production Inference Performance for Character.ai
Platform-level inference optimization enables higher throughput, lower latency, and improved cost efficiency by 50% for Character.ai
BROOMFIELD, Colo.--(BUSINESS WIRE)--DigitalOcean (NYSE: DOCN) today announced that its Inference Cloud Platform is delivering 2X production inference throughput for Character.ai, a leading AI entertainment platform operating one of the most demanding production inference workloads in the market handling over a billion queries per day, through a tightly integrated software and hardware collaboration with AMD.
Character.ai leverages both proprietary and open source models to power its high-volume, high-concurrency, latency-sensitive applications. By migrating these workloads to DigitalOcean’s inference cloud platform, Character.ai achieved significantly higher request throughput while adhering to rigorous latency targets. Compared to standard, non-optimized GPU infrastructure, this transition reduced the cost per token by 50% and substantially expanded usable capacity for their end users.
David Brinker, Senior Vice President of Partnerships at Character.ai, said the results exceeded expectations. “We pushed DigitalOcean aggressively on performance, latency, and scale. DigitalOcean delivered reliable performance that unlocked higher sustained throughput and improved economics, which directly supports the growth of our platform.”
This performance milestone builds on DigitalOcean’s growing momentum with large-scale AI customers like Character.ai, supporting platform expansion and richer multimodal experiences.
Platform and Hardware, Working Together
DigitalOcean worked closely with Character.ai and AMD to deploy AMD Instinct™ GPUs optimized specifically for inference workloads. Rather than treating GPUs as interchangeable infrastructure, DigitalOcean’s platform integrates hardware-aware scheduling and optimized inference runtimes to extract higher sustained performance per node. AMD has invested heavily in ROCm™, its open end-to-end AI software stack. Through our deep collaboration, the teams optimized ROCm with vLLM, AITER – AMD’s inference-focused runtime and optimization framework for transformer workloads – and deployment configurations for Character’s workloads on DigitalOcean AMD Instinct™ MI300X and MI325X GPUs, contributing to throughput improvement.
"This work demonstrates what’s possible when platform and silicon teams partner deeply to solve real production challenges for customers, as our collaboration with DigitalOcean helped Character.ai unlock higher sustained inference throughput and improved efficiency,” said Vamsi Boppana, Senior Vice President of Artificial Intelligence at AMD. "By combining AMD Instinct™ GPUs, the open ROCm™ software stack and platform-level optimization, DigitalOcean’s Inference Cloud is delivering a scalable, cost-effective foundation for running large-scale, latency-sensitive AI workloads in production. Together, we are accelerating the builders who are defining the next generation of AI applications.”
In collaboration with Character.ai, DigitalOcean engineers tuned distributed inference configurations to balance latency, throughput, and concurrency. In some production scenarios, these optimizations increased throughput by 2X under the same latency constraints, directly improving total cost of ownership.
Operating large-scale AI inference under real production constraints
This approach reflects DigitalOcean’s broader strategy: GPUs matter, but outcomes matter more. DigitalOcean is designing, operating, and optimizing systems that can yield significantly more reliable performance for its customers.
Unlike traditional cloud approaches that emphasize GPU availability alone, DigitalOcean’s Inference Cloud is designed to operate AI applications in production. The platform delivers a unified hardware-software paradigm, where orchestration and system-level tuning work together to deliver cost-efficiency, observability, and operational simplicity across production AI workloads at scale.
“Character.ai runs one of the most demanding real-time inference workloads in the market,” said Paddy Srinivasan, Chief Executive Officer of DigitalOcean. “This work shows what happens when advanced hardware meets a platform designed specifically for production inference. We’re not just delivering faster models, we’re making large-scale AI applications easier and more economical to run.”
The Character.ai deployment reflects a broader shift in how AI infrastructure is built and evaluated. As inference workloads scale, customers are prioritizing predictable performance, operational simplicity, and cost efficiency over raw hardware specifications. For additional information on the specific testing methodologies, hardware configurations, and performance benchmarks used to achieve these results, as well as important information regarding performance variability, please see our technical deep-dive here.
Learn more about the DigitalOcean Inference Cloud.
About DigitalOcean
DigitalOcean is an inference cloud platform that helps AI and Digital Native Businesses build, run, and scale intelligent applications with speed, simplicity, and predictable economics. The platform combines production-ready GPU infrastructure, a full-stack cloud, model-first inference workflows, and an agentic experience layer to reduce operational complexity and accelerate time to production. More than 640,000 customers trust DigitalOcean to deliver the cloud and AI infrastructure they need to build and grow. To learn more, visit www.digitalocean.com.
Contacts
Media Relations
Julie Wolf: press@digitalocean.com
Investor Relations
Melanie Strate: investors@digitalocean.com
