NVIDIA GTC 2026: The Blueprint, Not the Chips · Issue #44

Opening

Dear subscriber, NVIDIA GTC 2026, held this week at the SAP Center in San Jose, wrapped up yesterday, March 19. At an event that drew more than 30,000 attendees, CEO Jensen Huang poured out dozens of products and strategies in a keynote that ran over two hours. In today’s newsletter, I want to pull the whole of GTC 2026 together.

The media headlines were, predictably, plastered with keywords like Vera Rubin, the Groq integration, and $1 trillion in revenue visibility. But I think the real message of this GTC wasn’t in any individual product — it was somewhere else entirely.

What Jensen Huang spent two hours explaining comes down to one thing. It was a declaration that “AI is now an industrial production system, tokens are the output of that factory, and NVIDIA is the factory’s architect”. Today I’ll walk through the three points from GTC 2026 that really deserved our full attention.

What $1 Trillion Means: The Structural Shift Behind the Number

The most-quoted number at GTC was, without question, “$1 trillion”. Jensen Huang revealed that purchase orders for the Blackwell and Vera Rubin platforms will reach $1 trillion through 2027. That is exactly double the $500 billion of visibility he presented at last year’s GTC.

What makes this number interesting is less its size than its composition. According to Jensen Huang, this $1 trillion includes only GPU systems and networking revenue. The newly announced standalone Vera CPU, the Groq LPX solutions, and storage are not in that figure. In Huang’s own words, adding Groq to inference workloads increases compute spending by about 25%, so the effective market size is upwards of $1.25 trillion.

The revenue mix deserves attention too. About 60% of current revenue comes from the top five hyperscalers (AWS, Google, Azure, Meta, Oracle, and the like), while 40% comes from enterprise, industrial, and sovereign AI¹ and other sources. Huang projected that once ‘Physical AI’ takes off in earnest, this ratio will invert, with the non-hyperscaler share climbing to 70%.

Why does this matter? NVIDIA’s revenue concentration in a handful of Big Tech companies has long been a risk factor. But as Physical AI — robotics, autonomous driving, manufacturing — spreads, the customer base diversifies structurally. NVIDIA has already secured automakers producing 18 million vehicles a year — BYD, Hyundai, Nissan, Geely, and others — as autonomous-driving partners, and it announced an integration with Uber’s dispatch network as well.

For reference, NVIDIA’s most recent fiscal year (FY2026, ended January 2026) brought in $215.9 billion in revenue, up 65% year over year. Data center revenue alone was $197.3 billion. Guidance for the next quarter (FY27 Q1) is $78 billion, far above the Wall Street consensus of $72.6 billion.

The Groq Integration: Completing a $20 Billion Puzzle

The technical highlight of this GTC was, hands down, the Groq integration. The $20 billion acquisition of Groq’s assets, announced on Christmas Eve last December, showed up as a concrete product just three months later. Why NVIDIA paid $20 billion — roughly 3x Groq’s prior valuation — finally got its answer at this GTC.

The key is ‘Disaggregated Inference.’ The process by which AI generates an answer (inference) splits into two big stages: the Prefill stage, where the input is understood, and the Decode stage, where tokens are actually produced one by one.

By Huang’s explanation, Vera Rubin is optimized for high throughput — processing massive volumes of data at once. Groq’s LPU², by contrast, is an ultra-low-latency machine that packs on-chip SRAM³ at scale. NVIDIA has fused these two fundamentally different processors into a single system.

Vera Rubin handles prefill, and Groq takes over token generation in decode. Even inside decode, the division of labor gets extraordinarily fine-grained: attention computation⁴ runs on Vera Rubin while the FFN (feed-forward network) computation runs on Groq. The two systems are linked over Ethernet in a special mode, cutting latency nearly in half.

The result? Up to a 35x improvement in inference performance per watt. This isn’t a mere spec bump — in Huang’s words, it opens “a new tier of inference performance the world has never seen.”

It helps to think of this through a car-industry analogy. Normally a single engine has to handle every driving condition. What NVIDIA did is put a highway engine and a city engine into one car and made them switch automatically depending on the situation. Costs go up, but energy efficiency and performance improve dramatically.

The Groq LP30 chip is being manufactured by Samsung, with shipments slated for Q3 2026 (the second half). Vera Rubin also looks set to begin volume shipments in the same window — the first samples have already been delivered to Microsoft.

‘Tokenomics’: NVIDIA Redefines the Value of Computing

At this GTC, the frame I paid the most attention to was Tokenomics (token economics). When concerns about margin sustainability came up on the analyst call, Huang’s answer was unambiguous: “Customers aren’t buying an expensive computer — they’re buying the equipment with the lowest cost per token produced per watt, per second.”

This redefines the computer not as a consumable but as manufacturing equipment. Like ASML’s lithography equipment⁵ or TSMC’s wafers, the logic goes: if productivity is overwhelming, customers will gladly pay up for the latest model.

In this context, Huang compared the computing paradigms of the past and the present. In the past, software was written in advance, content was recorded in advance, and video was stored in advance. It was a model of playing back pre-made output, like spinning up a DVD. Generative AI, however, generates everything in real time. Because it produces a fresh result every time — factoring in the user, the context, and the intent — the compute required is hundreds to thousands of times higher than before.

Layer AI’s evolutionary stages on top of that. Huang defined AI in three stages.

Stage 1: Generative AI — you enter a prompt, it generates an output
Stage 2: Reasoning — it thinks through problems logically
Stage 3: Agentic systems — it autonomously sets goals, uses tools, and spawns other agents to get work done We are now entering Stage 3. In agentic AI, a single question spawns dozens of sub-agents, and each agent reasons independently while consuming tokens. That is why token demand is surging along a parabolic curve. When Huang said the era has arrived of allocating engineers a Token Budget alongside their laptops, it wasn’t just a metaphor — it means enterprise IT cost structures are actually changing.

NVIDIA’s pricing strategy makes sense within this frame. When a competitor says “our chip is 30% cheaper,” NVIDIA’s rebuttal is that “what matters isn’t the chip price but the factory’s productivity.” Just as you judge an iPhone not by its bill of materials but by the total experience of using it, the argument is that AI infrastructure should be evaluated in a new unit: Cost per Token.

Oswarld’s Take

To be honest, I think this GTC was not a product launch but a business-model declaration. What NVIDIA is doing is the completed form of a classic platform lock-in strategy. Hardware (GPU + CPU + LPU + DPU + NIC + switches), software (CUDA + Dynamo + NemoClaw), and now even the architecture of the inference pipeline itself — NVIDIA designs all of it. It’s a structure in which, once a customer steps into this ecosystem, getting out becomes extraordinarily difficult.

The Groq acquisition is the heart of it. Groq was originally an inference-focused chip that could have become an alternative to NVIDIA GPUs — after all, the company was founded by Jonathan Ross, who built Google’s TPU. For $20 billion, NVIDIA removed a potential competitor while shoring up its own weakness (inference latency) in one move. Bernstein’s Stacy Rasgon nailed it when he described the deal as “a structure that is effectively an acquisition while maintaining the fiction of competition.”

One more thing worth watching: the China strategy. The latest Vera Rubin can’t be sold in China because of export controls, but NVIDIA pulled out two cards. First, it obtained an export license from the Trump administration and restarted production of the H200. Second, it plans to launch a China-only inference chip based on Groq technology in May. This is a strategy of building optimized products within the rules rather than circumventing them — and it’s a signal that NVIDIA has no intention of ever giving up the Chinese market.

And SK Hynix’s moves can’t be left out either. It disclosed a ₩21.6 trillion (~$15 billion) infrastructure investment for stages 2–6 of phase 1 of its Yongin cluster, and JP Morgan estimates total capital expenditure could reach ₩72–86 trillion. When Chairman Chey Tae-won said on the GTC floor that “the chip wafer shortage will persist until 2030,” it means the supply side, too, judges this demand to be structural.

That said, I hold on to a few healthy doubts. The $1 trillion of visibility is ‘purchase intent,’ not ‘confirmed revenue.’ If the hyperscalers’ CapEx cycle corrects, that number can move. And while NVIDIA’s margin-defense logic is persuasive, the monopolistic dependence on the CUDA ecosystem could eventually come back around as regulatory risk. For tokenomics to work, the end consumers of tokens — the businesses and consumers actually using AI services — have to be able to bear that cost.

Closing

There are three things to remember from GTC 2026.

First, NVIDIA is no longer a chip company. It has transformed into a company that designs a full-stack ‘AI factory’ built from seven chips. Second, the Groq integration could be a game changer for the inference market. The 35x performance gain from marrying high throughput with ultra-low latency is a structural advantage that goes beyond a spec race. Third, ‘Tokenomics’ is a new evaluation frame for AI infrastructure investment. In a world where the yardstick is cost per token rather than chip price, NVIDIA’s logic is, for now, the most persuasive.

If you want to go deeper on this topic, I recommend watching Jensen Huang’s full keynote (about 2 hours 30 minutes) yourself. Woven between the product announcements is a frame for reading the structure of the AI industry.

References & Further Reading

NVIDIA, “NVIDIA Announces Financial Results for Fourth Quarter and Fiscal 2026”, 2026.2.25. : The primary source for the financial data — FY2026 annual revenue of $215.9 billion, Q1 FY27 guidance of $78 billion, and more.
NVIDIA, “NVIDIA Kicks Off the Next Generation of AI With Rubin — Six New Chips, One Incredible AI Supercomputer”, 2026.1.5. : The official announcement of the Vera Rubin platform, laying out the composition and specs of its six chips.
CNBC, “Nvidia buying AI chip startup Groq’s assets for about $20 billion in its largest deal on record”, 2025.12.24. : Essential reading for understanding the background and structure of the Groq acquisition.
NVIDIA Technical Blog, “Inside the NVIDIA Vera Rubin Platform”, 2026.1 (Updated 2026.3.16). : The technical deep dive, updated after the Groq LPX addition. The best resource for understanding the disaggregated inference architecture.
The Next Platform, “Nvidia Finally Admits Why It Shelled Out $20 Billion For Groq”, 2026.3.17. : A technical analysis of the strategic meaning of the Groq acquisition and the Vera Rubin integration process.
Reuters, “South Korea’s SK Group chairman expects chip wafer shortage to last until 2030”, 2026.3.16. : Covers SK Group Chairman Chey Tae-won’s remarks at GTC and the news that an ADR listing is under review.
Tom’s Hardware, “Nvidia GTC 2026 keynote live blog”, 2026.3.16. : A real-time rundown of the keynote. Good for quickly grasping the flow of the full presentation.

The author, Kwangseob Ahn, is a professor of business administration at Sejong University and lead consultant at OBF (Oswarld Boutique Consulting Firm). At the university he teaches statistics and data analysis — business data management, business analytics — while in the field he leads GTM strategy and AI strategy consulting, designing the interface between technology and business. He has published academic research on memory architecture for AI dialogue systems (HEMA), and runs Daily Arxiv, a project curating global AI papers every day. He completed a master’s program at Korea University’s Graduate School of Management of Technology and holds a KMBA. He is the author of Those Who Outsource Their Thinking: Homo Brainless.

Footnotes

Sovereign AI: The movement by individual nations to build their own AI infrastructure within their borders. It is a strategy to avoid depending on US Big Tech, framed around data sovereignty and national security. NVIDIA’s sovereign AI revenue topped $30 billion in FY2026. ↩
LPU (Language Processing Unit): An AI inference-only processor developed by Groq. Unlike a GPU, it stores data in on-chip SRAM and is specialized for generating tokens at ultra-low latency. Put simply, if a GPU is an all-purpose chef, an LPU is a master artisan who makes nothing but sushi at blinding speed. ↩
SRAM (Static Random-Access Memory): Ultra-fast memory embedded inside a chip. It is far faster than HBM or DRAM but small in capacity and expensive. The Groq LP30 achieves its low latency by packing roughly 500MB of SRAM on chip. ↩
Attention Computation: The core operation of transformer AI models. It is the process of calculating how strongly each word in the input text relates to every other word. It is essential for understanding context, but extremely compute-intensive. ↩
Lithography Equipment: The machines that etch circuit patterns onto wafers in semiconductor manufacturing. The Netherlands’ ASML holds a de facto monopoly on cutting-edge EUV lithography equipment. A single machine costs hundreds of billions of won (hundreds of millions of dollars), but customers gladly buy it because leading-edge chips cannot be made without it. ↩