Everything You Know About Nutrition Is Based on Bad Evidence
Why the science behind eggs, red meat, coffee, and saturated fat keeps changing its mind
Eggs will kill you. No wait, eggs are fine. Hold on, eggs are a superfood. Scratch that, eggs will kill you again.
If you have the vague sense that nutritional advice keeps reversing itself every few years, you are not imagining things. And the problem is not that scientists are stupid or that the media distorts their findings (though both happen). The problem is deeper, more structural, and far more interesting than that. For many of the questions we most want answered, the dominant methods used to study what you should eat are not capable of producing reliable answers.
This is not a fringe position. It is the considered view of some of the most influential researchers in meta-science and epidemiology. And once you understand why nutrition research is broken in these specific ways, the entire history of dietary advice starts making an uncomfortable amount of sense.
The crisis has a name (sort of)
You have heard of the “replication crisis” in psychology: studies on priming, ego depletion, and power posing failing to replicate when independent labs tried to run them again. Nutrition has a version of this problem, but it is worse in an unexpected way, for a reason that sounds like good news but isn’t: nobody can afford to run the replications.
A proper replication of a large nutrition study requires following thousands of people for years or decades, controlling or carefully measuring what they eat, and waiting for enough of them to develop heart disease or cancer or die. This costs hundreds of millions of dollars and takes a generation. So instead of direct replication failures, what nutrition has is something more insidious: a literature full of contradictions, flip-flops, and findings that quietly evaporate when better methods arrive.
John Ioannidis, the Stanford meta-researcher famous for arguing that most published research findings are false, turned his attention to nutrition in a 2013 editorial that effectively crystallized the critique. He highlighted a striking finding from an earlier review: when observational claims from nutrition studies were subsequently tested in randomized trials, the success rate was 0 for 52. Not “mixed results.” Not “smaller effects.” Zero.
That same year, he and Jonathan Schoenfeld published what many consider to be the most devastating single paper in the field. They called it “Is everything we eat associated with cancer? A systematic cookbook review.” The experiment was simple: pick 50 common ingredients from random recipes in a cookbook and search the medical literature for cancer associations. Eighty percent of those ingredients had published studies linking them to cancer risk, either increasing it or decreasing it. Beef, veal, pork, lamb, tomato, lemon, onion, celery, carrot, parsley, mace, pepper, butter, mustard, cinnamon, sugar, wine, rum, and tea all had at least one study claiming a connection to cancer.
(I might have paused at “mace” too.)
But here is the key detail: 75% of these claimed associations had weak or statistically non-significant evidence behind them. And when meta-analyses pooled the data from multiple studies, the effects shrank toward nothing. The paper was not literally claiming that “everything causes cancer.” It was demonstrating something more troubling: the research ecosystem can mass-produce food-disease associations that look publishable but are not credible.
Why nutrition research is structurally broken
So what, specifically, goes wrong? The answer is not one thing. It is a cascade of methodological problems that reinforce each other, and virtually all of them get worse the smaller and more long-term the effect you are trying to measure.
And there is a distinction worth making explicit here. Observational studies can do a fine job of finding patterns: people who eat X tend to have more of outcome Y. The part that breaks is the leap from pattern to cause. “People who drink coffee live longer” is an observation. “Coffee makes you live longer” is a causal claim. Nearly everything in this essay is about why nutrition science struggles with that second sentence.
You cannot accurately measure what people eat
This is the big one, and it is surprisingly hard to fix.
Most large nutrition studies measure diet through food frequency questionnaires: surveys asking people to recall, from memory, how often they ate various foods over the past month or year. If you have ever tried to remember exactly what you ate for dinner last Tuesday, you will immediately see why this might be a problem.
The OPEN biomarker studies, published in 2003 by Kipnis, Subar, and colleagues, quantified how bad this problem is. They compared what people said they ate (on both food frequency questionnaires and 24-hour dietary recalls) to objective biomarkers of actual intake: doubly labeled water for energy and urinary nitrogen for protein. These biomarkers do not lie. The results were grim. One companion paper found that men underreported their calorie intake by 31 to 36 percent on the questionnaires; women, by 34 to 38 percent. And the errors were not random. They were systematic, correlated with body weight and other personal characteristics, and large enough to destroy the statistical power of most diet-disease studies.
What is the worst part? A true relative risk of 2.0 (a doubling of risk) could appear as 1.1 or smaller when measured through food frequency questionnaires. The signal gets buried in noise. That means genuine effects can vanish, and spurious effects can emerge after statistical adjustment, because the adjustments are operating on bad data.
Diet is not one thing
When you eat more saturated fat, you necessarily eat less of something else: carbohydrates, protein, unsaturated fat, or you simply eat fewer total calories. So asking “is saturated fat bad?” is actually an incomplete question. Bad compared to what?
This sounds like a pedantic methodological point, but it is the reason entire decades of dietary advice have been confused. A study about “low fat” may in practice be testing a combination of lower fat, more refined grains, more sugar, different overall calorie intake, and different levels of medical attention. A study about “red meat” may be partly measuring what red-meat eaters eat instead of red meat.
Modern causal-inference papers in nutrition emphasize that many classic analyses were targeting unclear or meaningless causal quantities because they never specified what was being substituted for what. If you do not make the replacement explicit, the causal claim dissolves into incoherence.
The true effects are tiny, but confounding is enormous
People who eat lots of red meat tend to differ from people who do not in a hundred ways that have nothing to do with meat: they smoke more, exercise less, have less education, earn less money, visit doctors less often, drink more alcohol, and eat fewer vegetables. You can try to statistically adjust for all of this, but you can never adjust for all of it perfectly. There is always leftover (”residual”) confounding.
This is tolerable when the effect you are looking for is large. Nobody doubts that smoking causes lung cancer, because the relative risk is on the order of 10-to-20-fold. But most nutrition effects, if real, involve relative risks of 1.1 to 1.3. At that scale, a small amount of residual confounding can create, destroy, or reverse an association. You cannot tell whether the red-meat-eaters’ slightly higher mortality came from the meat, or from the dozen other ways their lives differed.
Reverse causation is everywhere
People at higher risk of disease often change their diets: they stop drinking coffee, start eating “healthier,” cut back on red meat, or lose weight. In the data, this makes unhealthy foods look protective (because healthier people have returned to eating them after being told to, or because the sickest people have already cut them out). Coffee is the classic example of how smoking, preexisting illness, and reverse causation can tangle together: early observational studies found that heavy coffee drinkers had worse outcomes, but this was tangled up with smoking (coffee and cigarettes went together) and with sick people quitting coffee.
Analytic flexibility can flip the sign
This is the one that should keep researchers up at night. In 2024, Wang and colleagues took a single data set (NHANES 2007-2014, 10,661 participants) and a single research question (does unprocessed red meat affect all-cause mortality?) and ran it through every defensible combination of analytical choices: different statistical models, different sets of covariates, different ways of measuring red meat intake, and different inclusion criteria. They identified 1,208 unique specifications, all of which could have appeared in a published paper.
The results ranged from a hazard ratio of 0.51 (red meat cuts your mortality risk in half) to 1.75 (red meat nearly doubles it). Out of 1,208 analyses, only 48 were statistically significant, and among those, 40 said red meat was protective and 8 said it was harmful. The median hazard ratio was 0.94, with 96% of specifications finding no significant association either way.
The same data, the same question, and the choice of which knobs to turn on the statistical model determined whether you concluded red meat is a miracle food, a lethal poison, or completely irrelevant. This is not fraud. Every one of those 1,208 analyses was defensible from some perspective. It is simply what happens when the true effect is small (or nonexistent) and the methodological degrees of freedom are large.
Even the gold standard has cracks
The standard response to all this is “just run a randomized controlled trial.” And randomized trials are better. But in nutrition, they are hard.
You cannot blind people to what they eat. Adherence is terrible over long periods (try keeping 40,000 people on a specific diet for eight years). Participants change their behavior in ways you did not intend. And the trials are so expensive that only a handful have ever been run for hard clinical endpoints like heart attacks and death.
Consider the Women’s Health Initiative (WHI), the largest dietary randomized trial ever conducted. It enrolled 48,835 postmenopausal women and randomized half to a “low fat” diet (targeting 20% of calories from fat, with more fruits, vegetables, and grains). After 8.1 years, the low-fat diet produced no significant reduction in coronary heart disease, stroke, or total cardiovascular disease. Fat intake fell sharply at first, reaching about 25% of calories in year one, then drifted back upward to about 29% by year six, well above the 20% target. The study also did not distinguish between types of fat being reduced or what replaced it. After spending hundreds of millions of dollars, the result was essentially a shrug.
Or take PREDIMED, the poster child for Mediterranean diet research and one of the most-cited nutrition trials in history. In 2018, it was retracted and republished in the New England Journal of Medicine after investigators discovered that roughly 1,588 of its 7,447 participants had not been properly randomized. At one site, patients were allocated by clinic rather than individually. At another, household members were assigned to the same group without randomization. The re-analysis, which attempted to statistically correct for these problems, still showed benefit. But the episode demonstrated that even “gold standard” nutrition trials can have structural problems that undermine confidence, and that these problems may go undetected for years.
Four foods, four decades, zero consensus
Now let me use these structural problems to explain why the specific foods that keep bouncing around in headlines keep bouncing.
Eggs: the whiplash food
Eggs are the canonical example of nutritional flip-flopping. They were condemned for decades because of their cholesterol content, partially rehabilitated when the 2015 Dietary Guidelines dropped the old numeric cholesterol cap, then partially re-condemned by a 2019 pooled analysis of six US cohorts that found higher egg intake associated with increased cardiovascular disease and mortality. Then a 2020 meta-analysis of 23 prospective studies, with over 1.4 million participants, found no increased risk of overall cardiovascular events from eating more than one egg per day.
Multiple meta-analyses, same era, opposite conclusions. Why? Because the effect, if it exists, is tiny; the studies rely on dietary questionnaires with all the problems I described above; egg-eaters and non-egg-eaters differ in confounding ways; and the precise definition of “high” versus “low” egg intake varies wildly across studies. Some studies define “high” as one egg per day; others define it as three.
The honest answer on eggs is not “they’re fine” or “they’ll kill you.” It is: the evidence is not strong enough to support confident causal claims for the general population, likely varies by metabolic status and background diet, and has been grotesquely oversimplified by every headline ever written about it.
Red meat: the question that is really four questions
Red meat is at least four separate questions masquerading as one. Is processed red meat (bacon, hot dogs, sausage) the same as unprocessed (steak, ground beef)? What about cancer versus cardiovascular disease versus all-cause mortality? Does the amount matter? Does the cooking method matter? Does what you eat instead matter?
The 2019 NutriRECS review, published in the Annals of Internal Medicine, tried to answer these questions using the GRADE framework designed for clinical evidence evaluation. Their conclusion: the evidence linking meat consumption to adverse health outcomes was “low certainty” or “very low certainty,” and the absolute risk reductions from eating less meat were “very small.” They recommended that most adults could continue their current consumption.
This set off a firestorm. Harvard’s nutrition department called the recommendations “irresponsible.” Other researchers pointed out, correctly, that the NutriRECS meta-analyses did show statistically significant associations, and that calling the effects “small” partly depended on the chosen unit of exposure (three servings per week). The counterargument, also correct, was that “statistically significant” does not mean “clinically meaningful” when the effect size is ~1.1 and the evidence is confounded.
The Wang specification-curve analysis, described above, settled the matter in an important way, if not the way anyone wanted: for unprocessed red meat and all-cause mortality, you can get nearly any answer you want depending on which defensible analytical choices you make.
If you want the least controversial meat warning in the literature, it is processed meat and colorectal cancer. That one has held up well enough that the WHO classified processed meat as a Group 1 carcinogen for colorectal cancer in 2015. If you are going to worry about one thing, the epidemiology says worry about your bacon habit, not your steak habit.
Coffee: the confounding poster child
Large observational studies consistently find that moderate coffee consumption (three to four cups per day) is associated with lower all-cause mortality and lower cardiovascular disease risk. A 2017 BMJ umbrella review by Poole and colleagues synthesized hundreds of meta-analyses and concluded that coffee was “more often associated with benefit than harm.”
Sounds pretty great, right? But there is a problem. When Mendelian randomization studies (which use genetic variants that predispose people to drink more or less coffee as a quasi-natural experiment, sidestepping many confounding problems) examine the same question, they find limited evidence for a causal protective effect. A 2021 MR study using data from the UK Biobank and FinnGen found that genetic predisposition to higher coffee consumption was not associated with any of 15 cardiovascular outcomes studied.
What this probably means is that at least part of the observational benefit of coffee reflects confounding. In earlier studies, coffee drinkers also tended to smoke more (coffee and cigarettes went together); in modern cohorts, they tend to be wealthier, better-educated, and more physically active. Past or present, the “benefit” may be partly the benefit of being a particular kind of person, not of the coffee itself.
The realistic summary: coffee appears safe for most adults, the observational associations with better outcomes are real but at least partially confounded, and if you are drinking it because you think it will make you live longer, the causal evidence for that specific claim is thin.
Saturated fat: the wrong question for forty years
This is where the oversimplification has done the most damage. For decades, the message was simple: saturated fat is bad, it raises cholesterol, cholesterol causes heart disease, so eat less saturated fat. Then, around the 2010s, contrarian articles started appearing: “Saturated fat is fine! The science was wrong all along!”
Both sides were, predictably, wrong in the same way: by asking an incomplete question.
Several meta-analyses of observational data found no clear association between saturated fat intake and cardiovascular disease or mortality. But that absence of association in observational data does not mean saturated fat is harmless, for all the methodological reasons described above (measurement error, residual confounding, lack of substitution specification).
When you look at randomized evidence, the picture clarifies considerably, but only once you ask the right question. The Cochrane Collaboration’s systematic review (15 randomized trials, over 59,000 participants) found that reducing saturated fat intake for at least two years reduced combined cardiovascular events by about 21%. But the benefit came specifically from replacing saturated fat with polyunsaturated fat or starchy foods. When people replaced saturated fat with refined carbohydrates or sugar, the benefit largely vanished.
So the old slogan “fat is bad” was not exactly wrong, but it was fatally incomplete. The question was never just “is saturated fat harmful?” It was “harmful compared to what?“ Replaced by polyunsaturated fat? The evidence favors the switch. Replaced by apple juice and white bread? Not so much. This distinction was buried for decades because the mainstream message collapsed a conditional claim into a simple one.
What nutrition can still tell us
After all this, it would be easy to conclude that nutrition research is worthless. That conclusion is tempting, satisfying, and wrong.
The field is much better at answering some questions than others. Where it can produce robust findings, the effects are large, the mechanisms are clear, the exposures can be objectively measured, or randomized evidence is feasible.
Deficiency diseases are the clearest example: we know with complete confidence that scurvy comes from vitamin C deficiency, that rickets comes from vitamin D deficiency, and that pellagra comes from niacin deficiency. These were worked out decades to centuries ago, because the effects are dramatic and the connection is tight.
Industrial trans fats are a modern example. Randomized trials show they worsen lipid profiles and increase inflammation; prospective studies show higher coronary risk; the mechanism is well-understood; and the WHO has pushed for global elimination. Nobody seriously disputes this.
And Kevin Hall’s 2019 ultra-processed food trial is an example of what good nutrition research looks like. He locked 20 participants into an NIH facility (so compliance was not an issue), randomly assigned them to eat either ultra-processed or unprocessed diets for two weeks each (with diets matched for macronutrients, fiber, sugar, and sodium), and measured what happened with precision. The ultra-processed group ate an extra 508 calories per day and gained about two pounds in two weeks. The unprocessed group lost about two pounds. This is a genuine causal finding, shown in a tightly controlled crossover feeding trial, because the design was tight enough to support causal claims.
Okay, so the pattern is clear: nutrition science works when the question is narrow, the design is controlled, and the effect is large enough to see through the noise. It fails when it tries to answer “does food X cause disease Y over 20 years in free-living populations?” using self-reported dietary data and statistical adjustment, because for those questions, the signal-to-noise ratio is buried under measurement error, confounding, and analytical flexibility.
So what should you do with all this?
I am not going to pretend this essay gives you a tidy answer to “what should I eat.” If it has done its job, you should now be skeptical of anyone who confidently offers one.
But here are three principles I think survive the methodological gauntlet:
First: the more specific and dramatic a claim is (”eat two eggs a day and your heart attack risk rises 17%”), the less you should trust it. These numbers come from the kind of evidence I have spent 3,000 words explaining is unreliable.
Second: the claims that do hold up tend to be broad, obvious, and boring. Eat more vegetables. Eat less processed food. Do not eat so much that you gain weight. These are supported by converging evidence from multiple study designs, have plausible mechanisms, and produce effects large enough to survive the noise. They are also the advice your grandmother would have given you, which is both reassuring and slightly embarrassing for a multi-billion-dollar research enterprise.
Third: when someone tells you that everything you thought you knew about nutrition is wrong, they are probably overcorrecting. The field does have a real credibility problem. Many specific causal claims are far less secure than headlines imply. But “credibility problem” is not the same as “everything is wrong.” It’s more like: the tools were too blunt, the questions were too vague, and the answers were stated with more confidence than the evidence could support. The fix is not despair. It is better tools, better questions, and more honest confidence intervals.
The replication crisis in nutrition is not a story about scientists lying to you. It is a story about what happens when an entire field depends on methods that cannot deliver what we are asking of them. The methods are not malicious. They are just inadequate for small, long-term, causally ambiguous questions embedded in a high-dimensional, correlated exposure that literally everyone has an emotional attachment to.
And it is a story, I suppose, about the human desire for simple answers to complicated questions. “Is butter good or bad?” is the kind of question a child asks. The adult answer, frustratingly, is: it depends on what you replace it with, how you measure it, who is eating it, what their background diet looks like, what outcome you care about, and whether you’re willing to wait 30 years for a definitive answer that may never come.
That answer will never make a good headline. But it has the advantage of being true.


