AI for Marketing
Attribution 9 min read

Your ROAS Measures What Ads Touched, Not What They Caused

AI for Marketing

By Alexa Matveeva

Published Updated

When eBay shut off its brand-keyword search ads in a controlled geo experiment, the measured return was not low, it was negative. The best estimate on the experimental variation came in at negative 63 percent, statistically different from zero, because roughly 99.5 percent of the clicks paid search had been buying simply moved to the free organic link below. At similar scale, Uber's former head of performance marketing turned off about $100 million of a roughly $150 million annual budget and saw basically no change in rider app installs. The installs the dashboards credited to paid reappeared as organic.

The number your ad platform calls ROAS is not a measure of what your advertising caused. It is a measure of what your advertising touched. That is correlation reported as return, and it runs systematically high, with the largest errors on the channels marketers trust most, branded search and retargeting. The fix is not a better dashboard, it is measuring cause directly, and as of 2025 that is cheaper and easier to automate than ever.

Why the dashboard runs high

The overstatement is the predictable output of how the system works. Optimization engines deliberately serve ads to users already predicted to convert, so exposure correlates with conversion no matter what the ad did. Gordon and colleagues call this targeting-induced endogeneity. Retargeting goes further, chasing cart abandoners who have already revealed intent, then claiming last-touch credit for purchases that were going to happen anyway. Branded and navigational search cannibalizes organic traffic the brand would have captured for free. And because each walled garden credits the conversions it touched within its own view, reported conversions across Meta, Google and the rest routinely sum to more than the conversions that actually occurred. Better data does not fix a biased method either, the same researchers ran 15 randomized trials at Facebook across 500 million observations and 1.6 billion impressions and found observational attribution off by a factor of three in half of them, even with samples up to 140 million users.

Branded search is the textbook case, but do not copy the number

The eBay study (Blake, Nosko and Tadelis, Econometrica 2015) is the anchor. Brand-keyword ads showed no measurable short-term benefit, paid search returns came in at a fraction of the non-experimental estimates, and the geo design produced that negative 63 percent figure. The instinct is to apply minus 63 percent to your own brand spend. Do not. A replication at Edmunds.com found that less than half of paid search traffic was recovered through organic when ads went dark, and in paid-heavy local markets up to 72 percent of branded traffic was lost. Same tactic, opposite conclusion. Incrementality is firm and context specific, which is exactly why you measure your own instead of importing someone else's multiplier.

The effect holds at nine-figure scale

This is not a curiosity visible only in clean experiments. P&G cut more than $100 million in digital in a single quarter of 2017, roughly $200 million across the year, and the CFO reported no reduction in the growth rate, calling the cut spend largely ineffective. JPMorgan Chase cut its programmatic display from about 400,000 sites to 5,000, a 99 percent reduction, after finding only 3 percent of sites drove any action beyond an impression, and its CMO saw no deterioration in performance. Uber's episode added a darker layer, exposing click-flooding, install hijacking and attribution fraud beneath the credited installs. Three of the largest advertisers in the world cut nine figures and watched their business outcomes hold.

This is not a reason to distrust every ad

Blanket cynicism gets it wrong. The same discipline that exposes the overstatement also vindicates real channels. Haus analyzed 640 Meta incrementality tests since 2024, with brands spending roughly $14 million a year on average, and found Meta drove about 19 percent average lift to the primary KPI, with 77 of the 100 highest-lift experiments in its dataset coming from Meta. More telling, the same data showed Meta under-reporting its own incrementality by about 15 percent for 7-day click in-platform attribution among DTC brands. The dashboard is wrong in both directions depending on funnel stage. You cannot tell which of yours works without measuring cause. One honesty flag, these benchmarks come largely from a single vendor's DTC dataset, so treat them as directional, not constants.

Why this matters more in 2026 than in 2021

The signal platforms used to fake good attribution has eroded while the cost of guessing wrong has climbed. Apple's App Tracking Transparency cut user-level iOS signal to a minority, with opt-in rising only from around 16 percent in 2021 to roughly 25 percent by mid-2022, and Meta told investors the iOS hit was on the order of $10 billion for 2022. Google reversed Chrome third-party cookie deprecation in April 2025 and wound down its Privacy Sandbox APIs by October, but Safari and Firefox, about 36 percent of traffic, still block cookies by default, so deterministic tracking keeps decaying. Meanwhile ecommerce customer acquisition cost rose roughly 40 to 60 percent from 2023 to 2025, Google Shopping CPCs jumped about 33.7 percent in 2025, and Meta Q4 CPMs hit record highs near $22.98. Attribution is less accurate exactly as each misallocated dollar gets more expensive.

How to measure what your ads actually cause

The replacement for trust is triangulation. Establish ground truth with experiments, run always-on allocation with a model calibrated to those experiments, and demote platform metrics to in-flight optimization. User-level conversion lift with ghost ads is the cleanest, splitting users into exposed and holdout; ghost ads (Johnson, Lewis and Nubbemeyer, JMR 2017) beat older PSA holdouts on cost and precision while surviving algorithmic delivery. Geo experiments with synthetic control are the post-ATT workhorse, randomizing by region so the platform does not pick who is exposed and capturing offline outcomes too, operationalized in Google's open-source CausalImpact (Brodersen et al., 2015). Switchback tests toggle spend over time for marketplaces where geo splits are not feasible. Marketing mix modeling triangulates the full mix across online and offline, best calibrated with experimental priors using Google Meridian, Meta Robyn or PyMC-Marketing. Google cut its minimum incrementality-test budget to about $5,000 in May 2025, down from the $50,000 a month these tests once required.

AI changes the operating model here, not the math. Claude can draft the analysis plan, pick matched test and control markets, run the power analysis before you spend, interpret CausalImpact or Meridian output, and translate iROAS and iCAC into reallocation calls. Anthropic reports that adding structured Skills lifted Claude's analytics accuracy from no better than 21 percent on its evals to consistently above 95 percent. The caveat every practitioner repeats still holds, LLMs hallucinate numbers, so treat the output as a junior analyst's draft and verify every figure against one governed source. No fully audited end-to-end case of an LLM running an incrementality test exists yet, the workflow is real but the public evidence is still vendor-led.

The reallocation rules (copy this)

Use measured incremental ROAS, not platform ROAS, to gate budget. Then act on four thresholds.

  • If a channel's measured iROAS comes back 30 to 50 percent or more below its platform ROAS, reallocate away from it.
  • If branded-search substitution stays under about 50 percent in your own test (the Edmunds pattern, not eBay), keep defensive brand bidding.
  • If a holdout shows near-zero lift (the Uber and P&G pattern), cut the spend and redeploy it.
  • If your experiment and your MMM disagree by more than the model's credible interval, trust the experiment and re-specify the model.

The full test-design workflow, picking matched markets, running the power analysis, sizing the holdout, setting duration, and reading the lift output, is packaged as a reusable Claude skill. Get the free skill.

What to do Monday

Stop calling the dashboard number a return. Relabel platform ROAS an in-platform optimization signal, and add incremental ROAS and incremental CAC as the metrics that move budget. Then run your first holdout on the highest-spend, most-suspect channel you have, almost certainly branded search or retargeting, before touching anything else. The companies that cut nine figures and lost nothing did not get lucky. They measured cause, found they were paying for demand they already owned, and kept the money.

Sources: Blake, Nosko and Tadelis, Econometrica 2015 (eBay field experiments); Coviello and Simonov, paid search replication at Edmunds.com; Gordon, Zettelmeyer, Bhargava and Chapsky, Marketing Science 2019 (15 Facebook RCTs); Johnson, Lewis and Nubbemeyer, JMR 2017 (Ghost Ads); Brodersen et al., 2015 (Google CausalImpact); Haus incrementality benchmark, 640 Meta tests since 2024 (single-vendor DTC data, directional); public statements from Kevin Frisch (Uber), Jon Moeller (P&G) and Kristin Lemkau (JPMorgan Chase); Google Meridian, Meta Robyn, PyMC-Marketing; Anthropic engineering blog on Claude Skills and analytics accuracy.

Read next

← Back to all articles