Why Most Companies Can't Tell If Their Decisions Actually Work
Updated: December 20, 2025
In 1854, London physician John Snow faced a puzzle. Cholera was ravaging Soho, and the medical establishment blamed "miasma" – bad air. Snow suspected something else. He mapped every cholera death and noticed a pattern: they clustered around a single water pump on Broad Street. Remove the pump handle, he argued, and the deaths would stop.
The local officials were skeptical. Correlation wasn't causation. Maybe sick people just happened to live near that pump. But Snow had done something revolutionary: he'd reasoned backward from effect to cause, accounting for alternative explanations. When the handle came off, the outbreak ended within days.
Today's business leaders face Snow's problem at scale. Your marketing spend correlates with revenue, but does it cause it? Engaged users churn less, but does engagement prevent churn or do happy customers simply engage more? Get this backward and your interventions waste millions.
The mathematical tools for separating cause from correlation – what statisticians call causal inference – have moved from academic curiosity to business infrastructure. The market hit $40-80 billion in 2024 and projects 40% annual growth. Tech giants report 15-25% efficiency gains in marketing, 5-10% margin improvements in pricing, 20% increases in customer lifetime value.
Yet 60% of organizations remain unable to use these methods at scale. The gap isn't about statistics. It's about misunderstanding what problem these tools solve – and treating an organizational capability as a software deployment.
Machine learning taught us a seductive lesson: throw enough data at an algorithm and truth emerges. The new wave of causal analysis tools appears to promise the same. Companies now deploy platforms that claim to automatically discover what drives their business outcomes from observational data alone.
This is mostly fiction.
The fundamental barrier is mathematical, not computational. From observational data, multiple causal explanations generate identical statistical patterns. Does A cause B, or does B cause A, or does some hidden factor C cause both? The data cannot tell you. Three completely different causal structures produce the same correlations, the same distributions, the same prediction accuracy.
Discovery algorithms – with names like NOTEARS, PC, and GES – market themselves as solving this problem. In practice, they make strong assumptions that fail silently in real data: that you've measured all relevant variables (you haven't), that causal relationships don't coincidentally cancel out (they do), that the world is linear (it isn't). Run the same algorithm twice with slightly different settings and you get different answers. Which one is true? The algorithm cannot say.
The uncomfortable reality: understanding causality requires human judgment about how the world works. AI now accelerates this process by extracting causal relationships from documentation and generating hypotheses comparable to expert scholars. But it doesn't eliminate the judgment – it makes it faster and more systematic. A 2024 Nature study found that hybrid workflows combining AI with human validation outperformed either alone.
These tools automate calculation, not understanding. Companies treating them as black boxes will systematically encode their own misconceptions into production systems.
Companies treat A/B testing and causal analysis as alternatives. This is the wrong frame. They solve different problems.
A/B testing creates measurements – you can log entirely new outcome metrics from experimental groups, discover behaviors you weren't tracking. Causal analysis works with measurements you already have. It cannot discover new phenomena; it explains patterns in historical records.
Netflix, Uber, and Amazon use both. A/B testing for micro decisions where experiments run fast (feature changes, pricing tweaks). Causal analysis for strategic questions where experiments are impossible (platform architecture, market entry, infrastructure investment worth tens of millions).
The distinction determines what you can learn. If you need to know whether something new works, experiment. If you need to understand what already happened and simulate alternatives, analyze causally. Companies forcing a choice will make systematically worse decisions than those combining both.
Even when companies understand these distinctions, most underestimate how statistical assumptions break in production.
Standard causal analysis rests on four pillars: no unmeasured confounders (you've captured everything that matters), no interference between units (treating one customer doesn't affect others), overlap (every type of customer could receive each treatment), and clean measurement. Textbooks address each violation separately. Production systems violate all four simultaneously.
Consider a marketplace platform analyzing seller pricing elasticity. Unmeasured confounders are guaranteed – you don't capture seller cash flow pressures, competitor moves, inventory constraints. Interference is pervasive – one seller's price affects others' sales through marketplace dynamics. Overlap is thin – luxury sellers rarely drop prices to discount levels. Measurement is noisy – logged prices don't capture actual transaction terms.
Methods exist to handle each problem individually: sensitivity analysis for confounding (quantifying how strong unmeasured variables would need to be to flip conclusions), network models for interference, trimming techniques for overlap violations, error correction for measurement issues. But when all four operate together, their effects multiply in ways not yet fully understood.
This isn't theoretical. Recent work on flexible causal methods – techniques that handle complex relationships – shows they become black boxes when assumptions break. Good prediction performance gives false confidence that causal estimates are valid. They're not the same thing.
A 2025 study found that agreement between optimal model selection and ground truth drops from 90% to 60% as these violations accumulate. The degradation isn't linear – it's a spiral. Companies running causal inference without systematic sensitivity testing are building on sand.
Most companies skip this work. They get a causal estimate, put it in a dashboard, and make decisions. When the intervention fails eighteen months later, they blame execution rather than analysis. The pattern repeats because nobody connects the dots backward.
The frontier organizations – quantitative hedge funds, pharmaceutical companies running observational studies, tech platforms operating at billion-user scale – distinguish themselves not by avoiding violations (impossible) but by quantifying them systematically. They run sensitivity analyses: how strong would unmeasured confounding need to be to flip our conclusion? If the answer is "weak," the causal claim is fragile. They validate observational analysis with small parallel experiments where feasible. When the two conflict, they investigate why rather than choosing the result they prefer.
This gap between frontier firms and everyone else will widen over the next decade, but not for the reasons most people expect.
Three forces will reshape how companies understand causality.
AI will remove one bottleneck while creating another. Specifying causal relationships historically took domain experts months of collaborative mapping. AI now parses documentation, extracts candidate relationships, and generates testable hypotheses in days. A 2024 Nature study found hybrid workflows – AI extraction plus human validation – produce insights comparable to expert scholars. But this accelerates the wrong part. AI makes graph construction faster; it doesn't make the graphs more correct. Organizations now encode their misconceptions about causality at 10x speed. Without systematic validation, they're automating errors rather than insights. The bottleneck shifts from expert time to expert judgment – and most companies have far less of the latter than they think.
Regulatory pressure will create a compliance-driven adoption wave separate from the performance-driven one. The EU AI Act requires explainability for high-stakes decisions. Black-box machine learning struggles here; causal models naturally generate explanations in terms of interventions and counterfactuals. Financial services, healthcare, and government will adopt these methods not because they're better analytically (though they often are) but because regulators increasingly require them.
Yet organizational readiness will remain the binding constraint. The limiting factor isn't methodology or tooling – it's whether you can build the competency. Success requires domain scientists who understand causal structure, data scientists who understand methods, and decision-makers who understand quantified uncertainty. All three must collaborate routinely on assumption validation and sensitivity analysis. This takes 18-36 months to build, minimum. Most companies skip it and treat causal analysis as a dashboard deployment. They get confident-looking estimates from platforms that hide the assumptions underneath. Nobody validates whether measured confounders are sufficient, whether interference exists, whether overlap holds. The numbers look precise. The decisions fail.
The market will bifurcate. Frontier firms – primarily large tech platforms, pharmaceutical companies, and quantitative hedge funds – will achieve 50%+ of decisions informed by causal reasoning. They already have the prerequisites: rich instrumentation, experimentation culture, statistical talent, and high-stakes decisions where causal understanding justifies investment.
Everyone else will struggle. Mid-market enterprises will deploy platforms but lack the expertise to validate assumptions or interpret sensitivity analyses. They'll get confident-looking causal estimates that are quietly wrong. Some will catch this through failed interventions; most won't, because they'll lack the experimental infrastructure to validate claims.
The probable scenario (60% likelihood): institutional adoption accelerates through 2027 as AI-augmented tools make setup faster and regulatory requirements create pull. By 2030, causal analysis becomes table stakes for regulated industries and large platforms. But the majority of companies remain unable to execute well, creating a capability gap similar to what we saw with machine learning a decade ago.
Alternative scenarios: persistent fragmentation (20% likelihood) where organizational readiness improves slowly and causal methods remain a specialized capability, or backlash (20% likelihood) where high-profile failures from unvalidated models trigger regulatory restrictions and organizational retreat to simpler methods.
Early signals distinguishing these paths: watch for documented cases of causal model failures causing material business harm, regulatory acceptance or rejection of causal evidence in compliance decisions, and whether open-source tooling achieves production-grade reliability or remains research-oriented.
If you're deciding whether to invest in causal analysis capability, ask three questions before looking at any technology:
First: Do you have high-stakes decisions where experiments are infeasible? If you can A/B test quickly and cheaply, do that instead. Causal analysis pays off when decision impact is large (pricing strategy, product roadmap, capital allocation), experimentation is slow or impossible, and mistakes are costly. For minor UI tweaks or reversible decisions, it's overkill.
Second: Can you measure your confounders? The irreducible requirement is data quality. You need validated records of the variables that jointly influence treatment and outcome. Garbage in, garbage out – but with causal analysis, the garbage is disguised as confident estimates. Audit your data completeness before deploying any causal method.
Third: Do you have the organizational capability? This isn't about hiring data scientists – it's about building cross-functional competency. Domain experts must map causal structure. Statisticians must validate assumptions and run sensitivity analyses. Decision-makers must understand uncertainty quantification. All three must collaborate routinely. If you can't assemble this, the technology won't help.
For organizations meeting these criteria, the opportunity is genuine. Start with 2-3 pilot projects on bounded problems with clear confounding (marketing attribution, demand forecasting with seasonality, pricing elasticity). Run them in parallel with small validation experiments. When observational causal analysis contradicts A/B results, investigate the discrepancy systematically – that's where you learn whether your assumptions hold.
Invest in data governance before methodology. High-quality confounders are the foundation; without them, sophisticated algorithms just encode bias more confidently. Document your causal assumptions explicitly and revisit them annually as your business evolves.
Most importantly: treat causal estimates as inputs to judgment, not replacements for it. The methods quantify uncertainty; they don't eliminate it. Organizations that remain humble about limitations while systematically improving their causal reasoning will compound advantages over those seeking algorithmic certainty.
John Snow removed the Broad Street pump handle based on causal reasoning from observational data, not a randomized trial. He mapped alternative explanations, accounted for confounding (maybe sick people just lived near that pump), and validated his reasoning through intervention. The methodology has gotten more sophisticated – we now have flexible machine learning methods, sensitivity bounds, network models – but the underlying discipline hasn't changed.
Most companies will skip that discipline in pursuit of automated answers. They'll deploy causal AI platforms, get confident-looking estimates, and make multi-million dollar decisions based on assumptions they never validated. Some will catch their errors through failed interventions. Most won't, because they lack the experimental infrastructure to know they were wrong.
The companies that embrace systematic causal reasoning – separating signal from noise, quantifying uncertainty, validating assumptions, updating beliefs based on evidence – will make better decisions than their competitors. Not perfect decisions. Just systematically better ones.
Over time, that's the only advantage that compounds.