Google Cloud is expanding its NVIDIA offerings at GTC 2026, with a notable introduction of fractional G4 VMs built on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. This new option allows customers to rent 1/2, 1/4, and 1/8 slices for various applications such as inference, simulation, rendering, remote desktops, and mainstream graphics work. This development extends Google’s G4 line beyond the existing fixed configurations of 1, 2, 4, and 8 GPUs, providing smaller teams with a more affordable entry into Blackwell-class capabilities.
The target audience for this initiative includes enterprises developing agentic AI systems and large mixture-of-experts workloads, where latency, throughput, and scheduling are critical infrastructure challenges. Google reports that G4 is already being utilized for fine-tuning and inference on models ranging from approximately 30 billion to over 100 billion parameters. Notable customers include General Motors, ElevenLabs, Otto Group One.O, Imgix, and Schrödinger. On Google Kubernetes Engine (GKE), G4 is generally available with up to 384 vCPUs, 1,440 GB of memory, 12 TiB of Titanium SSD, and up to 400 Gbps networking, which justifies the division of the same hardware into smaller, rentable units.
We're expanding our partnership with @nvidia! Check out this wave of new announcements, showcasing a co-engineered AI infrastructure foundation, announced at #GTC26 ↓ https://t.co/ijvaHAEff9
— Google Cloud (@googlecloud) March 16, 2026
For Google Cloud, this initiative is about transforming AI Hypercomputer into a comprehensive commercial stack rather than merely a hardware label. NVIDIA Dynamo is being integrated into GKE Inference Gateway, Vertex AI training is expanding to A4X domains based on NVIDIA GB200 NVL72 systems, and Model Garden is incorporating more NVIDIA Nemotron 3 models, including Super 120B. Additionally, Google is enhancing scheduler options such as Calendar Mode and Flex Start, with Flex Start being introduced to G4, enabling customers to reserve or acquire scarce GPU capacity with fewer long-term commitments.
The longer-term strategy will unfold in the second half of 2026, when Google plans to be among the first cloud providers to offer NVIDIA Vera Rubin NVL72 rack-scale systems. NVIDIA is positioning Rubin as its next platform for agentic AI, combining 72 Rubin GPUs and 36 Vera CPUs in a single rack-scale system designed for large-scale training and inference clusters. Google is aligning this roadmap with a year-long accelerator program for a select group of public sector AI startups, signaling a clear message: Blackwell slices for immediate workload needs and a pathway to Rubin-class infrastructure for customers gearing up for a significant AI expansion.