The Scientific Method

I. The Schoolroom Version, and Why It Is Not Quite Right

II. The Deep Logic: Disprove, Don’t Prove

III. Where Falsification Itself Needs Refining

IV. The Engine of Reliability: Controlled Comparison

V. The Social Machinery: Why Science Is Bigger Than the Scientist

VI. When the Machine Catches Itself: The Replication Crisis

VII. The Second Tool: Reductionism and Its Limits

VIII. What the Method Is, and Is Not

IX. Cross-Links

How a self-deceiving animal learned to stop fooling itself.

There is a famous line from the physicist Richard Feynman: the first principle is that you must not fool yourself, and you are the easiest person to fool. The scientific method is not, at bottom, a procedure for discovering truth. It is a procedure for catching yourself in the act of believing something because you want to, because your tribe does, because it flatters you, or because it simply feels right. Everything else, the hypotheses, the controls, the statistics, the peer review, is machinery built around that single problem: the creature doing the investigating cannot trust its own mind, and knows it.

The previous section described that creature in detail. A threat-tuned, status-seeking, pattern-hungry social ape that reaches conclusions first and reasons toward them afterwards, that sees agency in the weather and faces in the clouds, that remembers the hits and forgets the misses. The scientific method is the most powerful corrective our species has built for that animal. It is worth understanding properly, not as a ritual to recite, but as a way of thinking you can actually use, and it is worth understanding honestly, including the places where the tidy version taught in school quietly misleads.

I. The Schoolroom Version, and Why It Is Not Quite Right

Most of us learned a tidy sequence of steps. Observe something. Form a hypothesis. Make a prediction. Run an experiment. Analyse the results. Confirm or reject the hypothesis. Repeat. This is not wrong, exactly, and as a first approximation, it captures good data. But it presents science as a single fixed recipe, and that picture falls apart the moment you look closely at how science is actually done, or at what philosophers of science have spent the last century arguing about.

The uncomfortable truth, which working scientists and philosophers of science largely agree on, is that there is no single method that all science follows. The astronomer who cannot run an experiment on a galaxy, the field biologist watching a population over decades, the theoretical physicist working with equations, and the lab chemist isolating a reaction are all doing science, and none of them is following the same recipe. Even Karl Popper, whose name is synonymous with one influential account of scientific method, held that there is no unique step-by-step procedure that defines science; he saw it instead as a form of disciplined problem-solving. So rather than memorise a sequence that does not survive contact with reality, it is better to understand the deeper logic that the various methods share, the handful of moves that make any of them work.

II. The Deep Logic: Disprove, Don’t Prove

The single most important move is also the least intuitive, and it rests on a logical asymmetry.

You can never fully prove a general claim true by piling up confirming examples. The classic illustration, the one that gave the manual’s recurring black swan its name, is the claim “all swans are white.” You could observe a million white swans and never prove it, because the next one might be black. But a single black swan disproves it instantly and completely. This is the asymmetry at the heart of scientific reasoning, sharpened by Popper into a principle: confirmation is weak, but disconfirmation is powerful. No amount of evidence can make a theory certainly true, but the right piece of evidence can show it false.

The natural human move, the one the biased animal makes by default, is to seek out evidence that confirms what we already believe, a habit so reliable it has a name, confirmation bias. The scientific move is the opposite and deeply unnatural one: to actively try to break your own idea, to ask not “what would confirm this?” but “what would prove this wrong, and have I genuinely looked for it?” A claim that cannot in principle be proven wrong by any conceivable observation is not strong; it is, in this specific sense, not a scientific one at all. This is Popper’s criterion of falsifiability, and it separates “this medicine outperforms placebo in a controlled trial,” which makes a risky prediction that could fail, from “invisible energies flow through the body in ways that conveniently cannot be measured,” which is constructed so that nothing could ever count against it. The first is science – whether the medicine works or not. The second is not science – whether the energies exist or not, because it has been built to be unfalsifiable.

The strongest test of an idea is a risky prediction: not “my theory says the sun will rise tomorrow” (which almost any theory predicts) but “my theory, and only my theory, says this specific surprising thing will happen, and if it doesn’t, I am wrong.” A theory that sticks its neck out and survives has earned something. A theory that only ever predicts what we already knew, or that can explain any outcome after the fact, has not.

III. Where Falsification Itself Needs Refining

The first complication is that, in real science, a single contradicting result rarely kills a theory outright, and shouldn’t. Popper himself acknowledged this. When an experiment contradicts a well-established theory, the result might be wrong; the equipment might be faulty; some hidden assumption might be off. A good scientist does not abandon a theory that explains a thousand things the first time one measurement disagrees. The genuine work is in judging when an anomaly is noise to be set aside and when it is the black swan that should bring the whole structure down, and that judgement is not itself a mechanical rule. This is part of why the contrarian who shouts “but that one study contradicts you, so your theory is falsified” has usually misunderstood the tool: falsification is a logical principle, not a licence to topple a robust theory with a single inconvenient data point.

The second complication is called the Duhem-Quine problem. You never test a hypothesis in isolation. You test it bundled together with a whole network of background assumptions: that your instruments work, that your statistics are appropriate, that the conditions were controlled, that countless auxiliary beliefs hold. When the prediction fails, strict logic tells you only that something in the entire bundle is wrong, not which thing. You can always save the central hypothesis by adjusting an assumption somewhere else in the network, and sometimes that is exactly the right move (it is how unknown planets have been discovered, by refusing to abandon a theory and positing something new instead), and sometimes it is intellectual cowardice, propping up a dying idea with ad hoc excuses. Again, no mechanical rule tells you which. Judgement, exercised by a community over time, does.

This is where the work of Thomas Kuhn enters. Kuhn observed that science does not actually proceed as a steady accumulation of falsified-and-replaced theories. Most of the time, scientists work within a shared paradigm that they do not try to falsify at all; they use it to solve puzzles, treating anomalies as problems to be worked out rather than as refutations. Only when anomalies pile up unbearably does the field undergo a wrenching paradigm shift to a new framework, as when Newton’s physics gave way to Einstein’s. Kuhn’s account is a more accurate description of how science actually behaves than the textbook falsification story, and it has an important and double-edged implication. Used carefully, it is a sober corrective to the naive view. Pushed too far, as Kuhn’s more radical successors pushed it, it slides toward the claim that science is just one belief system among others, with no special claim on reality. The lesson to take is the balanced view: science is a human, social, sometimes messy and conservative enterprise, and it is also, over time, the most reliable method we have ever found for correcting our errors about the physical world.

IV. The Engine of Reliability: Controlled Comparison

If disproof is the logic, controlled comparison is the engine. The reason a well-run experiment tells you something a casual observation cannot is that it isolates variables.

The world is a tangle of intertwined causes, and our pattern-hungry minds leap to conclusions from that tangle constantly. You felt better after taking the remedy, but you also slept well, the illness was already passing, the weather changed, and you expected to improve. Which caused the recovery? Unaided, you cannot know, and your bias will hand the credit to whatever you already believed. The controlled experiment exists to cut through this. Change one thing while holding everything else as constant as possible, compare against a group that did not get the change, and you can begin to isolate what actually caused what. This is the logic behind the control group, the placebo (to separate the effect of a treatment from the effect of expecting a treatment), randomisation (to scatter unknown confounding factors evenly), and blinding (so the experimenter’s hopes cannot leak into the result). Each is a device for removing a specific way the biased animal fools itself.

From this comes the single most useful piece of thinking discipline in the whole section, which I’m sure you’ve heard screamed by online debaters: correlation is not causation. That two things move together tells you almost nothing on its own about whether one causes the other. They might be coincidence; they might both be caused by some third thing; the causation might run the opposite way to the obvious reading. Ice cream sales and drowning deaths rise together, not because ice cream drowns people but because both rise in summer. A staggering proportion of bad reasoning, in health especially, is the simple failure to notice that a correlation has been quietly promoted to a cause. The controlled experiment is, at heart, the machinery for earning the right to say “cause,” and when an experiment cannot ethically or practically be run, careful scientists reach for other tools designed to approximate that machinery rather than just assuming the causal story they prefer.

V. The Social Machinery: Why Science Is Bigger Than the Scientist

Here is the part the lone-genius mythology obscures, and it matters enormously for the anti-scientism theme of this section. The reliability of science does not come from scientists being unusually rational or honest people. They are the same biased animals as everyone else, with the added pressures of careers, egos, funding, and the hunger for status described in the previous section. The reliability comes from the social machinery that sits on top of the individuals and catches what any individual misses.

Peer review subjects a claim to hostile expert scrutiny before publication. Replication, the requirement that others be able to repeat your work and get the same result, is a great safeguard, because a real effect should show up again in other hands, while a fluke or a fudge usually will not. Open publication of methods lets the whole community check the work. The result is a system that is more reliable than any of its members, because it is built to expose the errors and self-deceptions that individuals cannot see in themselves. This is the genuine answer to the question of why you should trust scientific findings more than one clever person’s opinion: not because scientists are wise, but because the process is adversarial and self-correcting in a way no individual mind can be.

And this same machinery is why the worship of individual scientists, the appeal to “a genius said so,” is itself a misunderstanding of how science works. The authority was never supposed to live in the person. It lives in the method and the community that checks the person. A Nobel laureate making an unreplicated claim outside their field deserves exactly the scrutiny the process gives anyone; their brilliance elsewhere is not evidence. To defer to the credential rather than the evidence is to make, in a lab coat, precisely the tribal-authority move that science was invented to escape.

VI. When the Machine Catches Itself: The Replication Crisis

The most important thing to happen in science in recent decades is also the best possible illustration of all of the above, and it is worth understanding because it is so often weaponised by both the dogmatists and the scientism-peddlers, each misreading it.

Beginning in the 2010s, researchers, especially in psychology and parts of medicine, began systematically trying to reproduce well-known published findings, and discovered that an alarming fraction would not replicate. In one landmark project, an effort to repeat 100 prominent psychology studies, 97 of the original 100 had reported statistically significant effects, but only around 36 of the replications did. Many “established” results, some of them famous, taught in textbooks, built into other research, turned out to be statistical mirages. The causes were partly straightforward dishonesty but mostly subtler: small samples, the pressure to publish exciting positive results, and a cluster of practices known as p-hacking, where researchers (often without conscious fraud) slice and re-slice their data until a publishable result appears.

Now, how you read this crisis is a test of whether you have absorbed this section. The scientism-peddler cannot account for it at all because they treated published science as settled authority, and here is published science being wrong at scale. The lazy cynic reads it as proof that science is broken and nothing can be trusted, which is equally wrong and rather lazier. The correct reading, the one this section has been building toward, is the third one: the replication crisis is science’s self-correction machinery working, painfully and in public. The errors were caught not by outside critics but by scientists, using the method’s own deepest tool, replication, to expose the field’s own failures. And the response, larger samples, pre-registration of hypotheses before data is collected, and open data, has been the field repairing its own engine. A belief system cannot do this. A process that is genuinely built around catching its own errors can, and did. The replication crisis is not the counterexample to trusting science; it is the single best demonstration of why, in the long run and with eyes open, the method earns a trust no individual or institution does.

VII. The Second Tool: Reductionism and Its Limits

Everything so far, the controlled experiment, the isolating of variables, describes one immensely powerful way of understanding the world, called reductionism: break a thing into its parts, study the parts in isolation, and build understanding upward. This approach is the engine of the scientific method’s greatest triumphs, and its power is not in doubt. It is also, on its own, not enough, and knowing where it runs out is the second great thinking skill this section teaches.

Reductionism works by deliberately, artificially restricting what you look at, holding most of the world constant so you can isolate one cause-and-effect relationship. That artificial restriction is exactly the source of its power and exactly the source of its blind spot. For many systems, the interesting behaviour does not live in the parts at all; it lives in the interactions between them. As the manual’s Emergence & Complexity page explores, complex systems, such as an ecosystem, a brain, an economy, a human body, and a climate, exhibit emergent properties that cannot be read off from their components studied in isolation, because they arise from the web of relationships, feedback loops, and non-linear dynamics among the parts. Take such a system apart to study the pieces, and the very thing you were trying to understand, which existed only in the assembled whole, vanishes in your hands.

This is where systems thinking comes in, as the complement to reductionism rather than its replacement. Where reductionist thinking asks “what are the parts, and what does each one do?”, systems thinking asks “how do the parts interact, where are the feedback loops, what behaviour emerges from the whole that no part possesses?” It attends to circular causation rather than just linear cause-and-effect, to the way a change in one place ripples through a network with delays and amplifications, to the way cause and effect can loop back on each other. The body makes the point concretely: you can learn an enormous amount by isolating a single hormone or gene or nutrient, the reductionist triumph of modern biology, and you can also be badly misled, because that hormone exists in a web of feedback with dozens of others, and a system that compensates and adapts will often respond to your clean intervention in ways the isolated study never predicted. This is the insight behind the work of biologists like Denis Noble, discussed in Emergence & Complexity: causation in living systems runs not only upward from genes to traits but downward from the whole system to its parts, and a purely bottom-up, reductionist picture misses half the story.

The mature thinker holds both tools and knows which fits the problem. Reach for a reductive experiment when a question can be isolated, and most of the spectacular successes of science come from exactly that. Reach for systems thinking when the behaviour you care about is a property of the whole, when feedback and emergence dominate, when isolating a variable destroys the phenomenon. A great deal of bad thinking, including a great deal of bad health science, comes from using the wrong one: reducing a complex, multi-causal, feedback-rich condition to a single magic variable because that is what the reductive method can test, or, in the opposite error, hand-waving vaguely about “the whole system” and “balance” to dodge a question that an experiment could settle. Neither tool is superior.

VIII. What the Method Is, and Is Not

Pull this together, and the scientific method is the disciplined practice of trying not to fool yourself: seeking disproof rather than comfortable confirmation, isolating causes through controlled comparison, refusing to promote correlation to causation without earning it, submitting your thinking to a community built to catch your errors, holding both the reductive and the systemic lens and knowing which to use, and treating every conclusion as provisional, the best current account rather than a final truth.

What it is not is a source of certainty, a body of facts to be believed on authority, or a possession of a special class of people called scientists. It does not deliver final answers; it delivers progressively less wrong ones. It does not eliminate bias; it builds external structures that partly compensate for a bias it assumes will always be present. And it confers no authority on the individuals who practise it beyond the authority their evidence earns under scrutiny. To treat science as a faith, with scientists as its priesthood and “studies show” as its scripture, is scientism, and it is a betrayal of the actual thing, which was built precisely to dethrone authority and force every claim to defend itself. To reject science wholesale because it is fallible, run by flawed humans, and sometimes wrong is the mirror-image error, throwing out the one tool we have for systematically becoming less wrong because it is not magically never wrong.

The method asks something genuinely difficult of the animal using it: to value being corrected over being right, to go looking for the evidence that would break your favourite idea, to hold your conclusions loosely enough to update them and firmly enough to act. That is not a natural way for a human to think. It is a trained one. And it is, in the end, the same discipline this entire manual asks of you: to think clearly, to follow the evidence past where it is comfortable, and to remain the author of your own conclusions rather than the believer in someone else’s.

IX. Cross-Links

Science for the section overview
Understanding Statistics for reading the numbers without being fooled
The History of Science for how the method developed and changed
Scientific Methodology Cheat Sheet for the practical everyday toolkit
The Science Rabbit Hole for the deeper questions about scientific knowledge
Resources for the reading list

Resources

Popper, K. (1959). The logic of scientific discovery. Hutchinson.
Kuhn, T.S. (1962). The structure of scientific revolutions. University of Chicago Press.
Feynman, R.P. (1974). Cargo cult science. Caltech commencement address; repr. in Surely You’re Joking, Mr. Feynman! (1985). W.W. Norton.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Chalmers, A.F. (2013). What is this thing called science? (4th ed.). University of Queensland Press.
Noble, D. (2006). The music of life: Biology beyond genes. Oxford University Press.
Ioannidis, J.P.A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.