ISC 2026: KAYTUS Launches Rack-Scale KSManage Ultra for AI Factories

KSManage Ultra delivers full-stack visibility across GPUs, racks, and data centers, integrating in-band and out-of-band system management to address performance bottlenecks and maximize AI Factory operational efficiency.

KSManage Ultra for AI Factories

FRANKFURT, Germany--(BUSINESS WIRE)--KAYTUS, a leading provider in AI infrastructure and liquid cooling solutions, has launched KSManage Ultra at ISC 2026, a next-generation intelligent infrastructure management platform purpose-built for AI Factories. Designed for the latest high-density AI racks, KSManage Ultra enables unified, intelligent management of key rack-level components, including compute trays, switch trays, power distribution units (PDUs), and cooling distribution units (CDUs). Through end-to-end visibility, performance-level diagnostics, and automated operations, the platform transforms highly coupled AI infrastructure from fragmented oversight into integrated, system-level operations, helping enterprises build more efficient, reliable, and sustainable AI infrastructure.

KSManage Ultra shifts customers from fragmented, component-level monitoring to unified operational control.
Share

Three Key Challenges Facing Traditional AI Operations

Compared with traditional data centers, AI data centers face significantly greater operational complexity, including rack-scale AI system management, intricate network topologies, challenging fault isolation, and liquid-cooling safety requirements. As a result, traditional operations approaches are increasingly constrained by three key challenges:

First, Management complexity is soaring: the basic unit of an AI Factory is no longer an individual server, but a resource-coupled, high-density AI rack. A single rack integrates multiple deeply coordinated subsystems, including computing, networking, power supply, and liquid cooling. Compared with traditional 4U 8-GPU deployments, NVL72 rack-scale systems integrate nearly one hundred accelerators and thousands of high-speed interconnects. In a 100 kW-class rack, power density can be 2–3 times higher¹, while thermal management becomes significantly more complex, involving coolant distribution, CDUs, flow rates, and related safety controls. As AI Factories continue to scale, operational complexity rises sharply, and fluctuations in any single component can affect the performance and stability of the entire rack.

Second, fault identification has moved beyond the hardware layer. AI training and inference workloads are highly sensitive to performance fluctuations, and hidden anomalies can significantly reduce operational efficiency. Unlike traditional downtime-related failures, performance degradation in AI systems often occurs silently. Because these performance issues are closely linked to underlying hardware and infrastructure conditions, identifying the true root cause can be difficult when relying on isolated data from either the workload or infrastructure side alone.

Third, AI Factories face a growing operational efficiency crisis as deployments scale. Traditional device-by-device onboarding is inefficient, slowing deployment and increasing the risk of configuration inconsistencies. At the same time, conventional configuration methods are time-consuming and error prone. With multiple device types integrated within each AI rack, even minor configuration deviations can lead to cluster-wide performance degradation or service interruptions.

KAYTUS Builds an Integrated Intelligent Operations Platform for AI Factories

Against this backdrop, traditional operations models that depend on manual processes or fragmented tools often result in delayed deployment, challenging troubleshooting, and inconsistent configurations, limiting the development and large-scale adoption of AI applications. To help simplify the operation and management of AI data centers, KAYTUS has introduced KSManage Ultra. The platform delivers integrated management across the full infrastructure stack, spanning components, nodes, racks, clusters, and the data center level, by connecting in band and out of band management paths and correlating IT infrastructure status with physical infrastructure conditions. It represents the shift from reactive operations to proactive alerting, helping customers build intelligent operations capabilities for monitoring, diagnosis, fault isolation, and full recovery in complex AI Factory environments.

Single-Pane Global Visibility into AI Data Center Operational Status

KSManage Ultra delivers full-stack unified management across both traditional infrastructure and advanced AI rack systems. The platform provides centralized management for GPUs, CPUs, memory, high-speed switching modules, management networks, power shelves, CDUs, liquid cooling systems, racks, and cluster resources. By breaking down management boundaries between IT and physical infrastructure, as well as between individual components and full racks, KSManage Ultra creates a multi-level resource view spanning components, nodes, racks, clusters, and the entire data center.

Through a unified platform, customers can avoid repeatedly switching between multiple systems and quickly assess resource health, rack availability, and cluster readiness for efficient production deployment and operation.

Integrated In-Band and Out-of-Band System Management for Proactive Remediation

KSManage Ultra consolidates in-band data, including operating systems, drivers, applications, and performance, with out-of-band data, such as BMC, firmware, power, temperature, and hardware logs, together with infrastructure data into a single unified management system. It enables correlation analysis across operating status, hardware health, link topology, power supply, and liquid cooling conditions, shifting operations from reactive response to proactive alerting. When the system detects GPU anomalies, degraded link quality, liquid cooling fluctuations, or declining node health, it can proactively identify at-risk nodes and guide customers to isolate, maintain, or reconfigure resources, helping prevent faulty nodes from entering critical task runs.

Using liquid cooling monitoring as an example, KSManage Ultra supports three-level leak detection at the node, rack, and loop levels. Once a leakage risk is detected, the platform can coordinate safety shutdown, solenoid valve closure, and node isolation, while also triggering email alerts, work order generation, and closed-loop remediation. This helps customers build system-level proactive operations capabilities for AI rack systems.

Real-Time Agile System Health Monitoring and Compute Power Resource Allocation

Designed for multi-rack deployment scenarios, KSManage Ultra provides resource health identification and fault isolation capabilities. The platform continuously evaluates node and rack health based on indicators such as GPU status, memory and PCIe status, network link quality, firmware consistency, liquid cooling conditions, and power supply status. When abnormal nodes or high-risk components are detected, the system can apply intelligent tagging, analyze the potential impact scope, and initiate isolation actions, helping prevent faulty nodes from entering critical task runs.

KSManage Ultra helps customers establish a clear view of available resources, including which nodes should be removed from service, which racks remain suitable for combined use, which resources are ready for training and inference workloads, and which resources should enter in maintenance procedures. As a result, customers can move beyond reactive repairs after failures occur, and continuously maintain a stable compute health-zone, improving AI Factory business continuity and resource utilization.

Minute-Level Onboarding, Configuration, and Full-Stack Automated Operations

KSManage Ultra supports one-click batch scanning and automatic node addition. By intelligently identifying device serial numbers and IP addresses, the platform automatically builds topology mappings between nodes and racks, reducing single-rack onboarding time from the traditional 50 minutes to less than 3 minutes. KSManage Ultra supports one-click batch stress testing at L10 and L11 levels, reducing fault root-cause localization from hours to minutes. The platform also enables rack-level automated initialization and configuration, including driver installation, hardware configuration, and software deployment, all of which can be delivered in batches based on templates. By significantly improving operational efficiency while helping maintain consistent hardware environments across the same cluster, KSManage Ultra effectively reduces the risk of performance fluctuations or task failures caused by configuration drift.

As a comprehensive unified platform for AI Factories, KSManage Ultra features open and highly compatible architecture. Through open APIs, it seamlessly integrates with upper-layer systems such as scheduling platforms and CMDBs, while also providing unified management of lower-layer heterogeneous devices, including servers, networking equipment, power infrastructure, and cooling systems. This enables centralized management across the entire data center environment. KSManage Ultra is designed to help enterprises achieve unified management and intelligent operations for heterogeneous infrastructure, providing a solid foundation for stable and efficient operation of AI Factories.

Source：

^1. Traditional HGX H100/H200 4U 8-GPU servers typically support 4 to 8 units per 42U rack, resulting in rack-level power consumption of approximately 40 to 80 kW. In contrast, GB200 NVL72 racks can exceed 120 kW, driving a roughly 2x to 3x increase in power density.

About KAYTUS

KAYTUS is a leading provider of AI infrastructure and liquid cooling solutions, delivering a diverse range of innovative, open, and eco-friendly products for cloud, AI, edge computing, and other emerging applications. With a customer-centric approach, KAYTUS is agile and responsive to user needs through its adaptable business model. Discover more at KAYTUS.com and follow us on LinkedIn and X

Contacts

Media Contacts
media@kaytus.com

Industry:

More News From KAYTUS

Services & Solutions

Services

Solutions For

Resources

Education

Why Business Wire

ISC 2026: KAYTUS Launches Rack-Scale KSManage Ultra for AI Factories

Contacts

KAYTUS

Contacts

ISC2026: KAYTUS Launches Gigawatt-Scale Prefabricated AI Factory Data Center

ISC 2026: KAYTUS Unveils Gigawatt-Scale AI Infrastructure and Intelligent Management to Empower Europe’s AI Future

KAYTUS Launches MotusAI Enterprise Token Management Platform

KAYTUS

Contacts