Business Issue #125 ·

Why Apple Pays Google $1 Billion a Year

Distilling Gemini instead of building it: the strategy of betting on 2 billion pockets over benchmark crowns.

Why Apple Pays Google $1 Billion a Year

Opening

Hello, subscribers. This is Oswarld’s Knowledge Talking. “Disaster.” That was the word Wedbush analyst Dan Ives attached to Apple’s AI strategy in December 2025. Needham’s Laura Martin said Apple was “1–2 years behind its competitors,” and Siri was still meme material. Then on June 8, 2026, on the WWDC stage, Apple unveiled 5 of its own foundation models1 all at once. What deserves attention here isn’t model performance. The way Apple built them — and the structure behind it — is the real news. Personally, I think this is the most important part of WWDC 2026, but it seems to be getting buried, so I wanted to bring it up.

Bottom line up front: Apple didn’t lose the AI arms race — it was running an entirely different race all along.

🏗️ The One Company That Sat Out a $600 Billion Arms Race

The scale of Big Tech’s AI infrastructure investment right now is unprecedented in history. As of 2026, look at the annual CapEx2 of the major players.

  • Amazon: about $200 billion
  • Microsoft: about $190 billion
  • Alphabet (Google): about $180 billion
  • Meta: about $125 billion These 4 companies alone add up to more than $600 billion a year. That’s roughly ₩830 trillion (~$600 billion) — about 1.2x the annual budget of the Korean government. Where does the money go? Buying GPUs, building data centers, and training ever-larger models. Microsoft even disclosed in a recent earnings call that about $25 billion of its annual CapEx was due to “more expensive memory component prices.” The largest infrastructure buildout in history is happening all at once.

But Apple? Its total CapEx for 2025 was $12.7 billion. The 2026 forecast is about $14 billion. That’s roughly 1/40 of the Big Tech 4 combined. Apple’s annual investment is less than what Google spends in a single quarter.

Wall Street’s reaction was brutal. Wedbush’s Dan Ives called Apple’s AI strategy a “disaster,” and went as far as saying that “no one on Wall Street believes Apple is delivering meaningful AI results compared to Microsoft, Google, Meta, and OpenAI.” According to Quartz’s reporting, there was even analysis that Apple itself internally judged it was 2 years behind OpenAI and Google.

But when you dissect the WWDC 2026 announcements, the picture changes. It’s not that this company didn’t spend money — it spent it in a completely different way. Same destination, entirely different route.

💰 A ₩1 Trillion Tutoring Bill — Look at the Structure

In January 2026, Apple and Google officially announced a multi-year AI partnership. According to Bloomberg’s Mark Gurman, the scale is about $1 billion a year (about ₩1.4 trillion). Apple secured access to Google’s Gemini, a frontier model with 1.2 trillion parameters. Here’s where a lot of people got it wrong. The reaction was, “So Apple just outsourced its AI to Google after all?” In reality, the structure is entirely different.

The key is a technique called distillation3. You train a small, efficient “student model” (AFM) by referencing the outputs of a giant “teacher model” (Gemini). Right after WWDC, Apple’s senior vice president Craig Federighi put it this way.

“The amount of Google Assistant we use is zero.”

Amar Subramanya, Apple’s VP of AI, was more precise: “All models are custom builds for Apple silicon, refined using the outputs of the Gemini frontier model.”

To sum up, the structure looks like this.

  • Teacher: Google Gemini (1.2 trillion parameters)
  • Student: Apple AFM 3 series (5 models)
  • Lessons: distillation ≈ using Gemini’s outputs as training signal
  • The exam (runtime): AFM infers alone, without Gemini Take the lessons, but sit the exam alone. By analogy: pay ₩1 trillion a year to learn from the best private tutor, but the diploma carries your own name.

This is becoming a pattern across the AI industry in 2026. Only a handful of companies build frontier models directly; everyone else distills their outputs and reworks them for their own products. Apple is the largest and most explicit adopter of this pattern. Below is the industry(?) standard on distillation mentioned in the recent Elon Musk vs. OpenAI lawsuit.

The American Big Tech companies that spent years condemning DeepSeek — it turns out this was everyday practice at home, too.

There’s one more thing worth noting: what this deal means from Google’s side. According to The Information’s reporting, Apple secured full access to the Gemini model inside its own data centers. This isn’t just calling an API — Apple takes the model weights themselves and runs its distillation pipeline on them. For Google, that means $1 billion a year in stable revenue, plus AFM 3 Cloud Pro running on its own cloud infrastructure (Google Cloud + NVIDIA GPUs). It’s a rational deal for both sides.

Fortune raised an interesting question about this structure. “Will AI models become interchangeable commodities, or a source of durable competitive advantage?” Apple is betting on the former. If models become commodities, there’s no need to spend hundreds of billions of dollars building your own — you just take the best model and distill it.

📱 The Real Weapon Isn’t the Model — It’s 2 Billion Pockets

So what did Apple pour its saved money into? On-device AI.

AFM 3 Core Advanced is the technical highlight of this announcement. It’s a model with 20 billion parameters, and Apple runs it on an iPhone with only 12GB of RAM. Ordinarily, that would be impossible. A typical LLM has to load its entire weights into DRAM. What this model enables — “natural, emotionally expressive TTS (text-to-speech) voices, more accurate dictation, multimodal processing that understands images” — in other words, Siri-like features, all runs on the device without the cloud. Technically, this is a huge deal.

When Apple first announced Apple Intelligence in 2024, the on-device model was in the 3-billion-parameter class. In 2 years the scale grew nearly 7x, and what made that possible is a technique Apple researchers developed called IFP (Instruction-Following Pruning)4. Here’s how it works.

  • All 20 billion parameters are stored in NAND flash5 — the very same memory where your photos and apps live.
  • When a user asks a question, a lightweight prediction block picks only the “experts” needed for that request and loads them into DRAM.
  • Only 1–4 billion parameters are actually activated per prompt. The remaining 16 billion stay asleep in flash. A typical MoE6 model has to swap experts on every token, so memory bandwidth becomes the bottleneck — but IFP makes the routing decision just once per prompt. It’s an architectural workaround for the real-world constraint of slow NAND-to-DRAM bandwidth.

Here’s why this matters. Whether it’s ChatGPT, Gemini, or Claude, cloud-based AI ultimately has to send a request to a server. You need a network, and your data leaves the device. Apple’s on-device model has no such step at all. Privacy is guaranteed structurally. Heavier workloads get handed off to Apple’s own servers under Private Cloud Compute, where user data is never stored and no one — including Apple — can access it.

And there’s one number to underline here. Apple’s active device count is more than 2 billion. ChatGPT has 400 million weekly active users (as of December 2025), and Gemini around 350 million monthly. Apple isn’t building yet another AI app — it’s deploying AI via OS update to 2 billion devices already sitting in people’s pockets.

Judged purely on model performance, AFM 3 is unlikely to beat GPT-4o or Gemini 2.5 Pro. But the generational leap in Apple’s own benchmarks is clear. In Apple’s human evaluations, AFM 3 Cloud was preferred over its predecessor 64.7% to 8.7% on text tasks, and 37.8% to 9.6% on image understanding. And for the first time this year, the Foundation Models framework supports image input, so third-party app developers can integrate on-device multimodal AI directly into their Swift apps.

What this means is that Apple isn’t going for 1st place on benchmarks. It’s about delivering a “good enough” model to “the most people,” and building an app ecosystem that runs on top of it. If ChatGPT and Gemini are apps, Apple’s strategy is to embed AI into the OS those apps run on.

Oswarld’s Take

Honestly, when I watched this announcement, I saw the distribution strategy before I saw the technology.

There’s a pattern I’ve confirmed repeatedly while building technology management strategies. In tech markets, “the best product” wins less often than you’d think. What wins more often is the combination of “a good-enough product + overwhelming distribution.” That’s how VHS beat Betamax, and how Android took 70% of the smartphone market.

I think what’s happening in the AI market right now is at a similar inflection point. The performance gap between frontier models keeps narrowing. GPT-4o, Gemini 2.5, and Claude 4 trade places on benchmarks by 1–2 points (convergence at the top). The smaller the perceived difference for users, the more accessibility and integration — not model performance — become the basis of choice. That’s a signal that models are entering the commoditization stage.

If this direction is right — if the winner is decided not by the model itself but by who deploys it, where, and how — then Apple’s $14 billion strategy isn’t waste; it’s foresight. Of course, that’s a big “if.”

The risks are just as clear. At launch, AFM 3 won’t be available on iPhones and iPads in the EU or mainland China. Given that a significant share of Apple’s 2 billion active devices sits in those two markets, there’s a hole in the premise of the “maximum distribution” strategy. And whether the “good enough” model is truly good enough can only be verified once real-world use begins this fall.

Closing

To sum up: Apple didn’t lose the AI arms race — it chose the game of reworking and distributing the arms race’s output. For ₩1 trillion a year it distills Google’s frontier model, and it poured the savings into on-device architecture.

For this strategy to work, AI models have to keep commoditizing. If one model opens up an overwhelming lead, the strategy collapses — but judging by the trend so far, that likelihood keeps shrinking. The real question in the AI race may not be “who builds the smartest model” but “who reaches 2 billion pockets first.”

💬 Which do you think matters more in AI — “model performance” or “distribution scale”? Leave your thoughts in the comments.

References & Further Reading

Primary sources

Background

The author, Kwangseob Ahn, is a professor of business administration at Sejong University and lead consultant at OBF (Oswarld Boutique Consulting Firm). He teaches statistics and data analysis — including business data management and business analytics — at the university, while leading GTM strategy and AI strategy consulting in the field, designing the interface between technology and business. He has published academic research on memory architecture for AI dialogue systems (HEMA) and runs Daily Arxiv, a project curating global AI papers every day. He completed a master’s program at Korea University’s Graduate School of Management of Technology and its KMBA. He is the author of Those Who Outsource Their Thinking: Homo Brainless.

Footnotes

  1. Foundation Model: An AI model pre-trained on large-scale data that can be used as a general-purpose base for a wide range of tasks. GPT, Gemini, and Claude are representative examples.

  2. CapEx (Capital Expenditure): The money a company invests in facilities, infrastructure, and equipment. In the AI era, GPU purchases and data center construction account for most of it.

  3. Distillation: A training technique that transfers the knowledge of a giant “teacher model” into a small “student model.” The student learns by imitating the teacher’s outputs, achieving high performance despite its small size.

  4. IFP (Instruction-Following Pruning): A dynamic pruning technique developed by Apple. It analyzes the user’s input (prompt) and activates only the parameters needed for that task. Unlike conventional pruning, which shrinks the model permanently, it can activate different parts for each request.

  5. NAND flash: The non-volatile memory that stores data in smartphones and SSDs. It’s slower than DRAM but has far greater capacity and retains data when the power is off. Apple turned this storage space into a “warehouse” for its AI model.

  6. MoE (Mixture of Experts): An architecture that places multiple “expert” networks inside a model and activates only some of them depending on the input. It reduces actual compute relative to total parameter count, making large models more efficient to run.