Why Starbucks Pulled Its AI Out of 11,000 Stores · Issue #109

Opening

Dear subscriber, picture a bottle of peppermint syrup sitting on a shelf. The vanilla syrup right next to it was recognized perfectly, but this one bottle was processed as if it didn’t exist. That was the reality of the AI inventory management system Starbucks in the US had been so proud of.

On May 19, Starbucks scrapped the AI inventory counting tool it had rolled out to 11,000 stores across North America under the Deep Brew banner. Just 9 months after deployment. Let me give you the conclusion upfront: this wasn’t a failure of AI technology — it was the failure of never asking, “Is AI the right tool for this problem?”

After a brilliant conversation with Jemi Crookes I’ve been diving deep into how world-class brand…

Last year, Starbucks made a big publicity push about bringing artificial intelligence into its stores, and yet, barely more than half a year later, it pulled the whole thing out.

📦 What Happened Across 11,000 Stores

In September 2025, Starbucks deployed inventory AI from Seattle-based startup NomadGo across every store in North America. It was a system that used LiDAR¹ sensors and tablet cameras to automatically count the syrups, milk, and drink ingredients on the shelves. NomadGo claimed 8x the speed of manual counting and 99% accuracy, and Starbucks CTO Deb Hall Lefevre introduced it as something that “lets partners focus on crafting beverages and connecting with customers instead of counting inventory.”

But the field told a different story.

According to Reuters reporting, the system repeatedly confused similar-looking types of milk and failed altogether to recognize products that were plainly sitting on the shelf. Even in the promotional video Starbucks released at launch, a bottle of peppermint syrup can be seen going unrecognized between the products on either side of it.

The biggest problem was that the trust threshold² collapsed. Employees had to double-check every result the AI produced. A tool introduced to replace manual work ended up creating duplicate work instead. Before, one manual count and you were done. Now the AI counted, and then a human counted again. The tool didn’t reduce work — it only added cognitive load.

Let me add some context. This tool was a central pillar of CEO Brian Niccol’s “Back to Starbucks” turnaround strategy. Niccol, who came over from Chipotle in September 2024, diagnosed inventory shortages as eating into sales and fast-tracked this technology — which had been in testing since his predecessor’s tenure — into every store. Meanwhile, North American operating margin had fallen from 18% two years earlier to 9.9%, and the turnaround needed speed.

NomadGo said it counted more than 186 million items across 11,000 stores over the course of 2025. The number sounds impressive on its own, but nobody disclosed how many of those counts had to be re-verified by employees.

On May 19, Starbucks declared the official shutdown in a company-wide memo. “Automated counting is retired as of today. Beverage ingredients and milk will be counted the same way as other inventory items.” The blog post from the original launch has already been deleted. Starbucks told Reuters the decision was made “to focus on consistency and execution across stores,” but it never used the word failure.

🔍 The Technology Wasn’t Wrong — the Question Was

Why did it fail? Much of the coverage blames NomadGo, the AI company. In other words, the focus is on ‘insufficient AI accuracy,’ but as I see it, there’s a more fundamental problem.

First, full-scale deployment without validation. NomadGo’s ‘99% accuracy’ was a self-measured figure. It was rolled out to 11,000 stores at once without independent third-party verification. Tech Times’ analysis hits the mark: “The 99% accuracy claim was not independently verified before the 11,000-store deployment.” And here’s the thing worth examining — the number 99% itself. With 20 items on a shelf, 99% accuracy means getting 0.2 items wrong. Sounds fine, right? But when 11,000 stores count multiple times a day, that 0.2 compounds into tens of thousands of errors daily. And the error rate in the field fell far short of 99%.

According to RAND Corporation’s analysis of more than 2,400 enterprise AI projects, 80% of AI projects fail to achieve their intended business value. Of those, 34% are halted before reaching production, 28% are completed but never generate the expected value, and 18% generate some value but can’t justify the investment. MIT’s 2025 study is even more blunt: 95% of enterprise generative AI pilots produced no measurable revenue impact.

Gartner likewise projected in a 2025 report that 60% of projects without AI-ready data would be abandoned by 2026. According to S&P Global, large enterprises (10,000+ employees) abandoned an average of 2.3 AI projects in 2025, with an average sunk cost of $7.2 million (about ₩10 billion) per abandoned project.

The distribution of failure causes is worth noting too. In an analysis of 140 enterprise AI implementations, only 23% failed due to model performance or technical integration issues. The remaining 77% were failures of strategy, governance³, and change management. It’s a people-and-organization problem, not a technology problem.

Second, the problem they set out to solve was misdefined in the first place. Starbucks’ inventory problem was never caused by inaccuracy in the act of counting. According to an in-depth Reuters report earlier this year, fewer than a third of Starbucks deliveries arrive on time, and more than 1,500 cup-and-lid combinations were tangling up the supply chain. The problem wasn’t that manual counting lacked accuracy — it was the structure of the supply chain itself.

This is a pattern that repeated under the previous CEO as well. During Laxman Narasimhan’s tenure, Starbucks partnered with o9 Solutions to introduce an ‘automated ordering’ system — and that machine learning system consistently recommended quantities below what was needed. The technology changed, but the mistake — ‘adopt a solution before defining the problem’ — stayed the same.

🎯 How to Pick the Right Tool for the Problem

What the Starbucks case shows is a principle that’s simple but frequently forgotten: not every problem needs the same grade of technology.

A cafe shelf is not a controlled environment. The lighting changes, product placement shifts constantly, and similar-looking milk cartons sit side by side. When a computer vision⁴ system is trained under uniform conditions and then deployed into a high-variability environment, its performance can degrade sharply. A logistics warehouse — where SKUs are fixed and shelf positions are standardized — and a cafe shelf that baristas rearrange throughout the day are fundamentally different environments.

Human hands, on the other hand, adapt flexibly to this environment. A barista can tell oat milk from low-fat milk at a glance and instantly account for the shelf layout that changed yesterday. Whereas retraining⁵ an algorithm costs time and money, a person processes context in real time. On top of that, a person can simultaneously make the judgment, “this syrup is almost out.” That’s not simple counting — it’s judgment with context baked in.

Feedback from Starbucks store employees pinpoints exactly this. “Thank you for getting rid of automated counting. The intent was good, but the execution was difficult.”

The key point here is not ‘humans are better than AI.’ It’s that choosing a tool matched to the nature of the problem comes first. Let’s look at Starbucks’ inventory counting problem again. For this task — distinguishing visually similar, small-quantity products in a high-variability environment — there were multiple options. Beyond expensive computer vision AI, they could have mounted weight sensors on the shelves to detect changes in stock levels. A barcode scanner hooked up to a simple database might have been enough. Or, as they do now, having people count by hand might be the most accurate option of all.

By contrast, if the task is analyzing sales data across thousands of stores to optimize ordering patterns, that is clearly territory where AI does better. Even within the same problem of ‘inventory management,’ the optimal tool differs at every stage.

To a person holding a hammer, everything looks like a nail. If you start from the premise “we must adopt AI,” it’s easy for every problem to look like one AI can solve.

Oswarld’s Take

To be honest, the moment I saw this news, I thought of what I say every single time in my consulting work.

I currently consult on AI adoption for a range of companies, and I never recommend deep learning or generative AI unconditionally. In some cases, attaching a single sensor is far more efficient. A surprising number of tasks are perfectly solved with Excel VBA. And even when you do use AI, most of the time you don’t need a top-spec frontier model⁶. From embedding models⁷ to lightweight classifiers, placing the right model in the right spot is what real skill looks like.

There’s a pattern I’ve seen countless times while building GTM strategies. It’s the binary thinking of “digital is unconditionally superior to analog, agile beats waterfall, flat organizations are better than hierarchical ones.” Reality doesn’t work that way. For each situation and each problem, there is a tool and a methodology that fits better and works more efficiently. Finding the most optimal conditions — that is our real task. Starbucks just proved this simple principle at the scale of 11,000 stores.

Closing

From this case of Starbucks scrapping its AI inventory tool, there are three things to remember.

One, deploying a vendor’s 99% claim at full scale without validation makes the cost come back doubled. Two, 77% of AI project failures stem from strategy and problem definition, not technology. Three, there’s no need for computer vision where a sensor will do, and no reason to attach a frontier model to a job VBA can handle.

When your organization is considering AI adoption, ask this question first: “What technology does this problem actually need?” And if you’re wrestling with it, reach out to me. I’ll work through the best answer with you and deliver the optimal solution. Oswarld Boutique Consulting Firm is always open. :) contact@oswarld.com

References & Further Reading

Primary sources

Reuters, “Starbucks scraps AI inventory tool across North America”, 2026.05.21. : The primary source for this newsletter. Includes the internal memo text and employee interviews.
Reuters, “Starbucks Can’t Keep Your Favorite Drink in Stock”, 2026.01.28. : An in-depth report on the structural causes of Starbucks’ supply chain problems. Rich in background data, including delivery delay rates and the 1,500+ cup-and-lid combinations.
RAND Corporation, “Enterprise AI Implementation Analysis”, 2025. : The report analyzing more than 2,400 enterprise AI projects. The original source of the 80% AI project failure figure.
MIT Sloan / Project NANDA, “The GenAI Divide: State of AI in Business”, 2025. : The study finding that 95% of enterprise generative AI pilots produced no measurable revenue impact.

Background

NomadGo, “NomadGo’s Inventory AI Brings Automated Counting to More than 11,000 Starbucks Locations”, Business Wire, 2025.09.03. : NomadGo’s original deployment announcement press release. The original source of the 99% accuracy and 8x speed claims.
Starbucks, FY2026 Q2 Earnings Release (SEC Filing), 2026.04.28. : Financial context, including the 9.9% North American operating margin and the 170bp year-over-year decline.

The author, Kwangseob Ahn, is a professor of business administration at Sejong University and lead consultant at OBF (Oswarld Boutique Consulting Firm). At the university he teaches statistics and data analysis, including business data management and business analytics, while in the field he leads GTM strategy and AI strategy consulting, designing the interface between technology and business. He has published academic research on memory architecture for AI dialogue systems (HEMA) and runs Daily Arxiv, a project curating global AI papers every day. He completed a master’s program at Korea University’s Graduate School of Management of Technology and its KMBA. He is the author of Outsourced Minds: Homo Brainless.

Footnotes

LiDAR (Light Detection and Ranging): A technology that fires lasers and measures the reflected light to determine an object’s distance and shape. It’s the 3D spatial-recognition sensor found on the back of iPhone Pro models. ↩
Trust Threshold: The minimum accuracy level at which users can accept a system’s output without separate verification. Below this line, humans have to re-verify everything the system produces, and the tool’s efficiency disappears. ↩
Governance: The decision-making framework and management structure an organization uses when adopting and operating technology or projects. It covers who decides, by what criteria things are evaluated, and how problems are handled when they arise. ↩
Computer Vision: Technology in which AI analyzes images or video captured by cameras to recognize and classify objects. It’s used in facial recognition, autonomous vehicles’ surroundings detection, and more. ↩
Retraining: The process of additionally training an AI model so it can adapt to new environments or data. It’s required every time field conditions change, and it costs time and money. ↩
Frontier Model: The highest-performing class of large-scale AI models, like GPT-5.5, Claude, and Gemini. Their performance is outstanding, but in terms of cost and processing speed they aren’t suitable for every task. ↩
Embedding Model: A lightweight model specialized in converting text or images into numerical vectors to compute similarity. For tasks like search, classification, and recommendation, it’s faster and cheaper than a frontier model. ↩