Wall Street Commentary on GTC: In Nvidia's Definition, Computing Power Equals Revenue, Token is the New Commodity

GateUser-bd883c58 · 2026-03-19T01:09:07+00:00

# Ask AI: How Does Significant Token Cost Reduction Drive AI Demand Expansion?Nvidia's annual GTC conference has released a core signal: the commercial logic of AI computing power is undergoing fundamental restructuring—tokens have become the new commodity, and computing power equals revenue.Nvidia's management raised the data center sales visibility from the previous $500 billion (covering through 2026) to over $1 trillion (cumulative coverage from 2025 to 2027) at this year's GTC, and explicitly stated that sales from independent Vera CPUs and LPX rack solutions would be counted separately on top of this. Wall Street has viewed this conference as strong endorsement of Nvidia's continued AI cycle.According to Chase Wind Trading Platform, Morgan Stanley's latest report indicates that this figure means there is at least $50 to $70 billion

GateUser-bd883c58

2026-03-19 01:09:07

Ask AI · How does a significant reduction in token costs drive AI demand expansion?

NVIDIA’s annual GTC conference sends a core signal: the business logic of AI computing power is undergoing a fundamental restructuring—Tokens have become the new commodity, and compute power equals revenue.

At this GTC, NVIDIA management significantly raised data center sales visibility from the previous $500 billion (through 2026) to over $1 trillion (cumulative from 2025 to 2027), and explicitly stated that sales of the independent Vera CPU and LPX rack solutions will be additionally included. Wall Street views this conference as strong backing for NVIDIA’s ongoing AI cycle.

According to Chase Trade Desk, the latest JPMorgan report indicates this figure implies at least $50 to $70 billion upside potential relative to Wall Street’s current consensus for data center revenue in 2026-2027.

Bank of America Securities directly quotes NVIDIA management—“Tokens are the new commodities, compute power equals revenue”—and notes that the Blackwell system has reduced the cost per token by up to 35 times compared to the previous Hopper generation, with the upcoming Rubin series expected to reduce costs by another 2 to 35 times, depending on workload types and architecture configurations.

Within NVIDIA’s narrative framework, this ongoing compression of token costs is the fundamental driver for demand scale expansion.

Demand visibility doubles, driven by both large-scale cloud customers and enterprise markets

NVIDIA management disclosed that high-confidence purchase orders for Blackwell and Vera Rubin systems have exceeded $1 trillion, doubling from the $500 billion announced at the October 2025 GTC Data Center Conference. They also stated that additional orders and backlogs for 2027 are expected to continue accumulating over the next 6 to 9 months.

Demand structure is diversified: about 60% from hyperscale cloud providers (internal AI consumption shifting from recommendation/search workloads to large language models), with the remaining 40% from CUDA-native AI enterprises, NVIDIA cloud partners, sovereign AI, and industrial/enterprise clients.

BofA notes that this new $1 trillion outlook aligns closely with Wall Street’s previous expectation of approximately $970 billion for data center revenue over this three-year period, confirming the logic similar to the previous October 2025 outlook of $500 billion, which was based on an estimated $450 billion.

It’s noteworthy that NVIDIA management dedicated considerable discussion at this conference to accelerating traditional enterprise workloads.

NVIDIA announced collaborations with IBM (accelerating WatsonX), Google Cloud (BigQuery acceleration, ~76% cost savings with Snap), Dell (AI data platform), and launched two major CUDA-X libraries, cuDF and cuVS.

JPMorgan believes this direction is “seriously underestimated” by the market—its logic being that Moore’s Law has become less effective, and domain-specific acceleration is the only viable alternative, expanding NVIDIA’s addressable market beyond AI training/inference cycles.

Groq LPU integration: the most important new product release at the architecture level

JPMorgan regards the integration of Groq 3 LPU with Vera Rubin as the “most important architecture-level new product” at this GTC.

This decoupled inference architecture pairs the Rubin GPU (high throughput, 288GB HBM4, 22TB/s bandwidth, 50 PFLOPS NVFP4) with the Groq LPU (low latency decoding, 500MB on-chip SRAM, 150TB/s SRAM bandwidth, 1.2 PFLOPS FP8): pre-filling is done on Vera Rubin, attention decoding also runs on Rubin, while feedforward networks/token generation are offloaded to Groq LPU.

The LPX rack integrates 256 LPUs, providing 128GB aggregated SRAM, 40PB/s memory bandwidth, and 315 PFLOPS inference power, expected to launch in Q3 2026.

NVIDIA management stated that workloads requiring ultra-high token speeds (code generation, engineering calculations, long-context inference) will allocate about 25% of data center power to LPX, with the remaining 75% dedicated to pure Vera Rubin NVL72 configurations.

BofA data shows that after pairing Rubin systems with SRAM LPX racks, high-end low-latency workloads can see up to 35 times efficiency improvements over the previous generation. JPMorgan notes that this architecture directly addresses the fundamental contradiction where a single processor cannot optimize throughput (limited by FLOPS) and latency (limited by bandwidth) simultaneously, enabling NVIDIA to effectively compete in the high-end inference market traditionally dominated by ASIC vendors.

Parallel advancement of copper cables and CPO, no single betting route

NVIDIA management directly addressed the copper cable versus co-packaged optics (CPO) debate at the conference, confirming both routes will be pursued simultaneously.

In the current Vera Rubin generation, Oberon racks extend with copper cables to NVL72, and optical extends to NVL576; Spectrum-6 SPX co-packaged optical Ethernet switches are mass-produced, jointly developed with TSMC, with management claiming optical power efficiency is 5 times better than traditional pluggable transceivers, and resilience is 10 times higher.

For Rubin Ultra (late 2027), Kyber racks will use copper NVLink extensions (up to 144 GPUs), with CPO-based NVLink exchange options as an alternative. Feynman (2028) will support both copper and CPO extensions, with Spectrum-7 (204T, CPO) for lateral expansion.

BofA emphasizes that adoption of CPO-based switches and lateral expansion is optional for customers—they can continue using copper cables until they see fit. JPMorgan agrees, expecting copper extension to dominate NVL72/NVL144 configurations at least until 2027, with CPO gradually increasing share in lateral expansion and NVL576+ configurations.

Vera CPU: a new multi-billion-dollar revenue stream targeting intelligent agents AI

NVIDIA management explicitly stated that Vera CPU’s standalone business “has already been confirmed to become a multi-billion-dollar business,” and BofA notes this revenue stream is not yet reflected in current market consensus, representing incremental contribution.

Vera CPU features 88 self-developed Olympus ARM cores, LPDDR5X memory subsystem providing 1.2TB/s bandwidth (half the power consumption of traditional server CPUs), and connects to GPUs via NVLink-C2C at 1.8TB/s (7 times PCIe Gen 6). The Vera CPU rack integrates 256 liquid-cooled CPUs supporting over 22,500 concurrent CPU environments.

Management emphasized that CPUs are becoming a bottleneck for AI agent expansion—reinforcement learning and agent workflows require large CPU environments to test and verify GPU model outputs. Meta has deployed the previous generation Grace CPU at scale, with Vera expected to succeed in 2027.

JPMorgan characterizes this CPU revenue stream as high-margin, repeatable (deployed with GPU racks in AI factories), and structurally linked to the curve of NVIDIA’s active promotion of intelligent agents AI.

Product roadmap extended to 2028, with continuous annual architecture updates

NVIDIA reaffirmed its annual platform release rhythm: Blackwell (2024) → Blackwell Ultra (2025) → Rubin (2026) → Rubin Ultra (2027) → Feynman (2028).

Rubin Ultra will feature a 4-chip GPU configuration with 1TB HBM4e, and introduce the LP35 LPU chip (first to include NVFP4 compute). Kyber racks will support up to 144 GPUs per NVLink domain (7th gen NVLink, 3.6Tb/s per GPU, aggregate bandwidth 1.5Pb/s for NVL576).

Details of Feynman exceed market expectations:

New GPUs will use TSMC’s A16 (1.6nm) process, with chip stacking and custom HBM; new CPU named Rosa (after Rosalind Franklin), designed for orchestrating intelligent agent workloads across GPUs, LPUs, storage, and networking; new LPU named LP40, jointly developed by NVIDIA’s Groq team; also includes BlueField-5 DPU, ConnectX-10 super network card, NVLink 8, and Spectrum-7 (204T, CPO).

JPMorgan believes NVIDIA’s vertically integrated platform (now spanning seven chips, five rack systems, and supporting software stacks) is difficult to replicate, and that accelerating inference demand, along with the structural expansion of addressable markets driven by traditional workload acceleration and expanding customer base, supports a more sustained AI capital expenditure cycle than current market expectations.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes