Oracle Debuts OCI Zettascale10 Cloud AI Supercomputer

Oracle has unveiled Oracle Cloud Infrastructure (OCI) Zettascale10, a new generation of AI supercomputing infrastructure that it calls the largest AI supercomputer in the cloud. The system connects hundreds of thousands of NVIDIA GPUs across multiple Oracle data centers to create multi-gigawatt superclusters capable of delivering up to 16 zettaFLOPS of peak performance.

OCI Zettascale10 forms the computational foundation of Stargate, the flagship supercluster built in collaboration with OpenAI in Abilene, Texas. The system represents a major leap in cloud-based AI performance, fusing Oracle’s next-generation Acceleron RoCE networking architecture with NVIDIA’s full-stack AI infrastructure. Together, the companies are setting new benchmarks for scale, energy efficiency, and reliability in distributed AI computing.

According to Mahesh Thiagarajan, Executive Vice President at Oracle Cloud Infrastructure, Zettascale10 redefines what’s possible for enterprise-scale AI. “With OCI Zettascale10, we’re combining OCI’s groundbreaking Acceleron RoCE network with NVIDIA’s latest AI infrastructure to deliver multi-gigawatt AI capacity at unmatched scale,” he said. “Customers can build, train, and deploy their largest AI models into production with less power per unit of performance and with high reliability. They’ll also benefit from strong data and AI sovereignty controls across Oracle’s distributed cloud.”

The system builds on Oracle’s first Zettascale cluster, launched in 2024, but scales dramatically in size and performance. Each Zettascale10 cluster is housed in a gigawatt-class data center campus engineered for extreme density within a two-kilometer radius – an architectural design that minimizes GPU-to-GPU latency, a critical factor in large-scale AI model training. The Abilene Stargate site serves as the pilot deployment for the new architecture, offering a real-world testbed for next-generation AI infrastructure.

Peter Hoeschele, Vice President of Infrastructure and Industrial Compute at OpenAI, emphasized the scale of the project. “The OCI Zettascale10 network and cluster fabric was developed and deployed first at our joint supercluster in Abilene,” he said. “The highly scalable RoCE design maximizes performance at gigawatt scale while ensuring most of the power is focused on compute. We’re excited to continue scaling Abilene and the broader Stargate program together.”

Delivering AI at Extraordinary Scale

Oracle’s Zettascale10 clusters are designed to handle workloads of extraordinary complexity, targeting deployments of up to 800,000 NVIDIA GPUs with consistent, predictable performance. The combination of Oracle’s low-latency RoCEv2 networking and NVIDIA’s AI infrastructure stack enables enterprises to scale from research environments to industrialized AI production with minimal friction and strong cost efficiency. 

Ian Buck, Vice President of Hyperscale at NVIDIA, said the partnership brings together the best of both companies’ technologies. “Oracle and NVIDIA are uniting OCI’s distributed cloud and our full-stack AI infrastructure to deliver AI at extraordinary scale,” Buck said. “OCI Zettascale10 provides the compute fabric needed to advance state-of-the-art AI research and help organizations move from experimentation to production-grade AI.”

Central to Oracle’s new design is the Acceleron RoCE networking system, which leverages the switching capabilities built into modern GPU network interface cards (NICs). These GPUs can connect to multiple switches simultaneously, each on an independent network plane. This architecture boosts scalability and reliability by automatically rerouting traffic when network planes experience congestion or failure – eliminating costly job restarts during AI training.

The networking system includes several key innovations designed to support hyperscale AI workloads. Its wide, shallow fabric allows customers to build larger clusters faster at lower cost by using GPUs as mini-switches that connect across multiple isolated planes, cutting power use and infrastructure tiers. Enhanced reliability prevents job interruptions by isolating data flows and dynamically shifting traffic away from unstable areas. Oracle’s use of Linear Pluggable Optics (LPO) and Linear Receiver Optics (LRO) also reduces network and cooling energy consumption without compromising high-throughput connectivity at 400G or 800G speeds.

Zettascale10 is now open for preorders and is expected to be available in the second half of next year. Oracle plans to make multi-gigawatt-scale deployments accessible to customers across its distributed cloud regions, enabling organizations to train and deploy massive AI models while maintaining regional data control.

With the introduction of OCI Zettascale10, Oracle is positioning itself as a direct competitor in the race to build the most powerful cloud AI infrastructure – a race currently dominated by hyperscalers like Amazon, Microsoft, and Google. The combination of Oracle’s proprietary networking, NVIDIA’s AI compute platforms, and OpenAI’s early deployment signals a major shift toward industrial-scale AI computing, where efficiency, sustainability, and sovereignty are as important as raw power.


Discover more from WIREDGORILLA

Subscribe to get the latest posts sent to your email.

Similar Posts