ScaleFlux, FarmGPU, and Lightbits Labs Preview Solution to Solve Long-Context AI Inference at NVIDIA GTC

NVIDIA GTC – San Jose | March 16–19, 2026 | Booth 7006

SAN JOSE, Calif.--(BUSINESS WIRE)--ScaleFlux, FarmGPU, and Lightbits Labs today announced the public debut of a collaborative architecture designed to solve one of AI inference’s most persistent challenges: the memory and I/O constraints created by long-context workloads.

LightInferra intelligently manages KV-cache persistence to solve long-context AI inference challenges, reducing latency, eliminating GPU stalls, and enabling significantly higher infrastructure efficiency.
Share

At NVIDIA GTC San Jose, in March, the companies will debut an implementation that brings together ScaleFlux high-performance NVMe, FarmGPU’s managed inference environment, and Lightbits LightInferra™ software to solve how KV-cache data can be persisted, reused, and streamed more efficiently across inference sessions to reduce GPU stalls caused by repeated context recomputation and open the door to more predictable, scalable performance and infrastructure efficiency.

“We’re transforming inference memory from a reactive cache into an intelligent, streamed data layer,” said Arthur Rassmuson, Director of AI Architecture at Lightbits Labs. “By prefetching only the data that matters and delivering it to GPUs over high-speed RDMA before it's needed, we eliminate the stalls that traditionally limit long-context performance. The result is lower Time-to-First-Token (TTFT), more stable throughput under real-world load, and significantly higher effective GPU utilization. For enterprises, that means serving larger models and longer conversations at lower infrastructure cost — and for end users, it means faster, smoother, more responsive AI experiences.”

“Fast networked storage from Lightbits unlocks a lot of new use cases for long context inference,” said Jonmichael Hands, Chief Executive Officer at FarmGPU. "By pairing our managed service with Lightbit’s high-performance storage running on ScaleFlux NVMe, we are able to lower time to first token and increase utilization on GPUs, drastically lowering the TCO for inference."

Key areas under exploration include:

Higher GPU Utilization and Inference Throughput: Extending and sharing the KV cache beyond limited GPU memory, enabling the same GPUs to serve up to 3X more inference requests by eliminating redundant computation.
Reduced Latency and Increased Stability: Lowering TTFT and Time Per Output Token (TPOT) by retrieving attention states from storage instead of recomputing them to mitigate inference stalls as context windows expand.
AI-Native Security and Isolation: Providing end-to-end security, including encryption for KV cache blocks, tenant isolation, and integration with Key Management Systems (KMS) and Trusted Platform Modules (TPM) for shared inference environments.

“As members of the NVIDIA Magnum IO GPU Direct Network, we see this as an opportunity to collaborate openly with the ecosystem,” said Keith McKay, Senior Director of Solutions Architecture and Technical Partnerships at ScaleFlux. “What we’re showing at GTC is an early look at how smarter data placement and persistent attention state management could help inference systems stay responsive as context windows grow. This is very much a collaboration we want to shape alongside real operators.”

This announcement marks the beginning of a design-partner-driven effort, with the companies actively seeking feedback from AI infrastructure teams, platform builders, and service providers running large-scale or long-context inference workloads. To learn more about TTFT and long-context inference performance improvements delivered from LightInferra, read the blog Introducing LightInferra: 280x Improved AI Token Economy LightBits Labs, ScaleFlux, and FarmGPU.

Conference attendees are invited to visit the ScaleFlux booth 7006 to view live demonstrations, speak with engineers from all three companies, and discuss participation as design partners in the next phase of development.

About ScaleFlux

ScaleFlux advances Flash Storage and CXL Memory with breakthrough performance, efficiency, security, and scalability for AI/ML workloads and demanding applications in data centers, enterprise and edge infrastructure.

About FarmGPU

FarmGPU is redefining the future of GPU-powered cloud computing by offering cost-effective, scalable, and high-performance GPU resources tailored specifically for AI developers, innovative startups, and enterprises worldwide. To learn more about FarmGPU, visit docs.farmgpu.com.

About Lightbits Labs

Lightbits Labs® (Lightbits) invented the NVMe over TCP storage protocol and embedded it natively in their software-defined block storage to deliver ultra-low latency and exceptional throughput while leveraging commodity infrastructure—essential for reducing the cost and complexity of data infrastructure at scale. Built from the ground up for high performance, scalability, resiliency, and cost efficiency, Lightbits software delivers the best price-performance value for real-time analytics, transactional, and AI workloads. Lightbits Labs is backed by enterprise technology leaders [Cisco Investments, Dell Technologies Capital, Intel Capital, Lenovo, and Micron] and is on a mission to deliver best-in-class block storage for performance-sensitive workloads.

To learn more about Lightbits Labs, visit https://www.lightbitslabs.com/ and follow Lightbits Labs on Linkedin, X, Facebook, Instagram, and YouTube.

Lightbits and Lightbits Labs are registered trademarks of Lightbits Labs, Ltd.

Contacts

Media Contact
Carol Platz
Lightbits Labs
+1 408.688.4679
pr@lightbitslabs.com

Industry:

More News From Lightbits Labs

Lightbits to Showcase High-Performance, Cost-Efficient Storage for Capital Markets at STAC Summits 2026

SAN JOSE, Calif.--(BUSINESS WIRE)--Lightbits Labs (Lightbits®), inventor of the NVMe® over TCP storage protocol and the first KV cache engine optimized for AI, will showcase industry-leading software-defined block storage, demonstrating how financial institutions can eliminate storage bottlenecks for analytics and transactional workloads at scale, reduce infrastructure costs, and modernize legacy SAN environments at STAC Summits worldwide in 2026. For capital markets, where storage system agili...

Lightbits Labs Sweeps Industry Awards in Recognition of Their Critical Role in Data Infrastructure Modernization

SAN JOSE, Calif.--(BUSINESS WIRE)--Lightbits Labs (Lightbits®), inventor of the NVMe® over TCP storage protocol and the first KV cache engine optimized for AI, today announced a dual victory in industry recognition, earning a spot on the CRN® Storage 100 list and being named a winner in the StorageNewsletter Awards 2026 for the Connectivity and Networking category. Together, these honors underscore Lightbits’ accelerating momentum and reflect the growing industry consensus that true software-de...

Coredge Selects Lightbits to Power AI Cloud Services Infrastructure

SAN JOSE--(BUSINESS WIRE)--Lightbits Labs (Lightbits®), inventor of the NVMe® over TCP storage protocol, today announced that Coredge, a leading cloud solutions provider, has selected Lightbits software-defined storage to power next-gen AI cloud services. Following its recent acquisition by Sirius Digitech, Coredge is scaling its platforms to support large-scale AI adoption across regulated industries, telecommunications providers, and public-sector entities worldwide. This collaboration is int...

Back to Newsroom

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

ScaleFlux, FarmGPU, and Lightbits Labs Preview Solution to Solve Long-Context AI Inference at NVIDIA GTC

Contacts

Lightbits Labs

Contacts

Lightbits to Showcase High-Performance, Cost-Efficient Storage for Capital Markets at STAC Summits 2026

Lightbits Labs Sweeps Industry Awards in Recognition of Their Critical Role in Data Infrastructure Modernization

Coredge Selects Lightbits to Power AI Cloud Services Infrastructure

Lightbits Labs

Contacts