Evaluating Epistemic Underdetermination
June 9, 2025 on Runxi Yu’s Website

Evaluating Epistemic Underdetermination

Research Question: To what extent does the Quine–Duhem thesis challenge theory falsification in science?

An Extended Essay in Philosophy

June 2025

3817 words

Citation style: APA 6th edition

Introduction

Science is largely based on the assumption that it is possible to test individual hypotheses to distinguish whether a single theory is true or not, as the iterative scientific process is based on our ability to formulate individual scientific theories as individual hypotheses, and falsify or verify them to decide whether they are worth incorporating into our scientific knowledge (Creath 2023).

20th-century logical positivism and logical empiricism sought to rigorously define what counts as meaningful scientific discourse. They largely depended on the distinction between analytic statements (Creath 2023) whose “validity depends solely on the definitions of the symbols it contains”, and synthetic statements which are “determined by facts of experience”, in addition to being able to judge that a statement is weakly verifiable (Ayer 1946).

The Quine–Duhem thesis as presented by the philosophers Pierre Duhem and W. V. O. Quine challenges the empirical testability of isolated hypotheses; they argue that theories cannot be tested in isolation. It also refutes the narrative that there is a clear and distinct separation between analytic and synthetic statements (Stanford 2023). But since the empirical testability of isolated hypotheses forms the foundation for empirical judgement between competing theories, the Quine–Duhem thesis challenges whether scientific knowledge can be objectively grounded at all.

In this EE I shall therefore answer: to what extent does the Quine–Duhem thesis challenge theory falsification in science?

By performing a literature review on primary sources written by various philosophers and reputable secondary sources such as the Stanford Encyclopedia of Philosophy, along with my original thought experiments, I will evaluate the Quine–Duhem thesis, and whether or not it is able to undermine the way people gain scientific knowledge. I will consider Duhem’s original thesis, Quine’s extensions, and critique from other philosophers such as Adolf Grünbaum and Rudolf Carnap.

Duhem’s holism and the limits of crucial experiments

In the conventional parlance, crucial experiments are empirical tests that are supposed to decisively confirm one scientific theory while refuting opposing ones. Duhem refutes, arguing that no experiment can isolate a hypothesis without relying on auxiliary assumptions such as initial conditions and background theories (Duhem 1975, 6–7). This view is known as holism.

He begins by explaining how a crucial experiment is typically framed: if theory T predicts one outcome and its negation ¬T predicts another, then the crucial experiment offers an empirical observation to cleanly decide between T and ¬T. But Duhem argues that when scientists show the derivation of an experimental prediction, they often fail to consider the full chain of inference, where an observation is caused by the theory of interest T along with a lot of auxiliary assumptions. These auxiliary assumptions are often indispensable in the route from the theory of interest to prediction. Therefore, this only tells us that something in the whole conjunction is wrong, not that T is necessarily false on its own. The failed outcome therefore yields that T or one or more of the auxiliary hypotheses are wrong, which logically permits the theory of interest T to be true or false: we might blame any of the auxiliary assumptions, or perhaps hypothesize that there is unnoticed experimental error, or question whatever model is applied to interpret the data. Or, perhaps T really is wrong. There is no simple test that could be conducted to know which of them were wrong (Duhem 1975).

Consider the Michelson–Morley experiment as an example. Physicists hypothesized that the luminiferous ether exists: a hypothetical medium that permeates throughout space and serves as the medium that carries light waves, just like how air carries sound waves and the Earth carries seismic waves. The central hypothesis was that, since the Earth moves through this ether, the speed of light should vary depending on the direction of measurement relative to the ether wind. Michelson and Morley’s experiment was designed to detect such variation using an interferometer, comparing the speed of light along two perpendicular paths. The core hypothesis being tested was the existence of the ether itself (Schreiber, B., n.d.). However, the test relied on several auxiliary assumptions: that the interferometer was sensitive enough to detect the expected differences in light speed, that no unknown and unrelated physical effects were affecting the tests, and that light behaved like a classical wave. When the experiment produced a negative result, i.e. no measurable difference in light speed, physicists did not immediately abandon the ether hypothesis. Instead, they investigated ways to revise the auxiliary assumptions, for example the Fitzgerald–Lorentz contraction hypothesis: that objects physically contract in the direction of motion through the ether, thereby masking the expected result (Staley 2009). Although this was ultimately displaced by special relativity, this entire occurrence illustrates Duhem’s claim that the failure of a prediction derived from a theoretical system does not individually falsify any single component thereof. Rather, the entire structure—including core hypotheses and supporting assumptions—must be considered.

Duhem also argues that there are rarely just two candidate theories. In reality, there may be countless theoretical variants, as well as multiple ways to revise auxiliary assumptions in response to anomalous data. (Duhem 1975, 11).

Duhem’s thesis, in summary, is that physical theories are not modular, and cannot be tested piece by piece, as physical theories function holistically. A reevaluation of the entire system is necessary when it is being questioned. This is, of course, a radical departure from the classical falsification theory, where scientific statements must be able to be independently verified or falsified.

I would generally agree with Duhem’s thesis. In essentially all physics experiments that I’ve participated in, there are auxiliary conditions such as the accuracy of equipment, the ability for our senses to reflect external reality, the assumption that we are awake and not dreaming, the correctness of fundamental constants and theorems, etc. When we observe that a car rolling down a slope doesn’t preserve all of its mechanical energy as expected, we generally hypothesize that there is significant friction, but there is nothing that logically prevents us from casting doubt on the conservation of energy or that the measuring equipment we use is broken, since those could also lead to our observation. So, although by chance and experience we could probably attribute the observation to friction, our observation does not conclusively find so.

Quine, and his rejection of the analytic–synthetic split

In Two Dogmas of Empiricism, W. V. O. Quine both challenged the analytic–synthetic distinction, and extended Duhem’s holism beyond physics to all of epistemology, including logic and mathematics. But most uniquely, Quine argued that the analytic–synthetic split depends on a problematic notion of synonymy, which is ultimately circular. He argues that the idea of analyticity depends on unspoken conventions about meanings and definitions (Quine 1951, 23–24).

Leibniz’s definition of analytic statements as those that are “true in all possible worlds” (Quine 1951, 24) is dependent on that world to include sufficient definitions and laws of logic to give the statement meaning and a truth value; for example, if the only language in our world is Simplified Chinese, then “one plus one equals two” written in English is not a meaningful statement in that world, and thus cannot be accounted for as a true or false one. According to Quine, our belief system forms a web of belief in which statements face “the tribunal of experience not individually, but as an entire body” (Quine 1951, 38). Quine’s argument is as follows.

Let us accept the definition of an analytic statement as one that is true solely in virtue of its meaning i.e. true in all possible external worlds, without needing to consult empirical facts (Quine 1951, 21). For example, if bachelor is defined as unmarried man, then all bachelors are unmarried is true regardless of how people marry, whether the government officially recognizes marriages, or any other external factors in the world. This statement is true because the words in it are synonymous.

But what sort of synonymy is required to establish analyticity? Quine labels the desired kind of synonymy as cognitive synonymy — it needs to be not just verbal or conventional, but capable of establishing the grounds for the analyticity of whole statements (Quine 1951, 28–31). Now he looks for ways to satisfy this requirement.

He first considers whether definitions could result in cognitive synonymy. He finds that defining one term using another just presupposes that the terms are already synonymous in some way. For example, defining “bachelor” as “unmarried man” only works if we are already ready to take those terms to mean the same (Quine 1951, 26).

Quine also considers a criterion where two expressions are said to be synonymous, if and only if they can be substituted for one another, salva veritate (preserving truth) in all contexts (Quine 1951, 29). But this only works in extensional contexts, i.e. contexts where expressions can be replaced with others having the same reference without affecting the truth value like “all bachelors are men”, not in intensional ones, i.e. contexts involving notions like necessity or belief, like “necessarily, all bachelors are men”, where substitution breaks down unless we already understand “necessarily” which is usually defined in terms of analyticity, making it circular (Quine 1951, 30).

The third approach Quine considers is essentially Carnap’s, which is discussed in 5.

Quine concludes that none of these approaches could succeed as each relies on a notion of synonymy or necessity that itself circularly depends on the concept of analyticity, and therefore cannot be used to define it. Therefore, Quine rejects the analytic–synthetic distinction altogether.

Additionally, Quine rejects the idea of reductionism i.e. theories may be cleanly verified or falsified by a finite set of observations, which is essentially a more radical version than what Duhem envisioned, encompassing all of epistemology rather than just physics (Quine 1951, 34–43).

To explain the application of Quine’s rejection of the two dogmas, consider these thought experiments. Firstly, consider the statement that the internal angles of a triangle add up to 180 degrees; while this statement may seem relatively obvious at first, it relies on a plethora of auxiliary assumptions that the triangle is in a Euclidean plane, that a triangle is a shape with three sides, that 180 degrees is the angle of a straight line, etc. These might not hold true in all contexts: a triangle whose points are placed on the surface of a sphere may even have an angle sum of 270 degrees. So, even analytic statements are dependent a variety of auxiliary assumptions that may be falsified depending on the context.

Let us then compare all bachelors are unmarried and the moon does not have a significant atmosphere. The boundary between the synthetic and the analytic lie on what we consider to be part of our existing language and logic, i.e. what context is considered already established. Then consider the latter statement in a computer-simulated game where all properties of all entities are deterministically computed based on predefined, known, finitely computable rules: if we consider these rules to be part of our pre-established context, the statement ceases to require empirical observation of external reality; its truth or falsity depends entirely on whether the game’s rules have been programmed in a way such that it is true. If we may assume the position of an omniscient entity, such a statement would also be considered analytic, considering all known states of the physical world as given context.

In summary, Quine posits that any statement can be retained or rejected depending on what adjustments we are willing to make elsewhere in the system. No statement is immune to revision, and that the analytic/synthetic divide is at best a pragmatic distinction with an arbitrary line on what is considered external observation.

Adolf Grünbaum’s critique of Duhem

Grünbaum refuted Duhem by arguing that the claim that no hypothesis H can ever be decisively falsified because auxiliary assumptions can always be adjusted to accommodate any experimental outcome is not generally applicable because while it is logically true that the failure of an observation O that assumes (H and A) could falsify either or both of the core hypothesis and auxiliary hypotheses, it is unreasonable to conclude that H is always preservable by applying modifications to A, i.e. the claim that for any empirical finding O that is inconsistent with the original expected observation O, there always exists some revised auxiliary set A such that (H and A) entails O, thereby preserving H. But Grünbaum believes that this is unjustified as it assumes the a priori existence of such A without demonstrating it case by case which must be established empirically or through logical construction, and can’t just be assumed as a logical principle (Grünbaum 1975, 117–25).

To illustrate the falsifiability of a hypothesis independent of auxiliary assumptions, Grünbaum presents the example of testing physical geometry. According to Duhem, and later Einstein, the measurement of geometrical properties depends on correcting physical instruments, such as solid rods, for effects such as thermal expansion and electromagnetic distortion. Einstein argues that these corrections rely on laws that themselves presuppose a particular geometry (e.g., Euclidean geometry), thereby making the determination of geometry circular and non-empirical (Grünbaum 1975, 120–21).

Grünbaum refutes this by analyzing scenarios in which there are no deformations, i.e. free from unrelated gradients, fields, and other environmental properties. In such ideal regions, if two rods of differing chemical composition “coincide” at all points during transport, one can conclude that the region is free from perturbations by the observations alone without invoking any other theory. In this setting, the auxiliary assumption A (freedom from perturbations) can be directly confirmed by observation. Consequently, if an experimental result O contradicts (H and A), and A is independently verified, then H can be decisively falsified (Grünbaum 1975, 122).

The example above is relatively technical. Consider the more practical example of COVID-19 testing via a technology called polymerase chain reaction. The hypothesis H is that the testing protocol reliably detects SARS-CoV-2 infection when it is present in a patient, i.e. that the testing method is absolutely sensitive to SARS-CoV-2. A positive result is predicted if: the hypothesis H holds given a patient that carries the virus, in conjunction with many auxiliary assumptions A, that the PCR machine is calibrated correctly, that the sample was properly collected and preserved, that reagents are uncontaminated, etc. These auxiliary assumptions can be independently verified by testing the machine with generic known-positive PCR tasks and confirming them through other lab tests. If a patient presenting with a laboratory-confirmed infection (e.g., by full viral genome sequencing, along with other supporting evidence that virus samples exist in their PCR sample) yields a negative PCR result, and all the components of A have been thoroughly checked and found sound, then what is falsified is not the infection itself, but the reliability of H—the test’s ability to detect infection under controlled assumptions.

Conversely, consider the OPERA experiment in 2011, which reported that neutrinos arrived at Gran Sasso faster than light would have over the same distance from CERN, which seems to violate special relativity. The hypothesis H is that neutrinos travel faster than light. But the conclusion relied on several auxiliary assumptions A such as that the distance was precisely known, that the GPS clocks at both sites were synchronized, that the timing equipment was calibrated, and that signal delays in cables and electronics were negligible. When the result became public, extensive efforts were made to test each component of A. Eventually, researchers discovered a faulty fiber optic cable and a miscalibrated oscillator, both of which introduced timing errors, and when corrected, the anomaly disappears. H was rejected and special relativity was preserved (Orzel 2020).

In summary and as provided in these two examples, Grünbaum concludes that hypothesis like the geometry of physical space can, in some cases, be uniquely and empirically determined, as a counterexample to the claim of universally inconclusive falsifiability. In conclusion, he argues that underdetermination (i.e. that statements cannot be verified or falsified) is not a general feature of scientific practice and that hypotheses can sometimes be tested in isolation and falsified decisively.

While Grünbaum provides compelling counterarguments to Duhem’s holism, I argue that his argument appeals too much to pragmatism and is not theoretically sound. The assertion that auxiliary assumptions can be “independently verified” presumes that the processes used for their verification are themselves reliable, which has not been established. For example, in the case of COVID PCR, the standards used to verify the equipment depend on other instruments and protocols, each with their own auxiliary assumptions, and create the risk of circular reasoning: the very observations used to confirm auxiliaries may be dependent on theory and even trace back to the original hypothesis being verified. Finally, even if an auxiliary is confirmed in one context, such as a lab-controlled trial, it may not generalize to other environments; its reliability is as context-sensitive as the test used to confirm it; taken to the extreme, Grünbaum’s analysis would cease to work on a PCR machine that only works in its testing facility. These points do not refute Grünbaum’s argument outright on practical terms and hence Grünbaum’s argument still undermines Duhem’s when determining the effect of the Quine–Duhem hypothesis in practical falsification of scientific theories, but they do point out its dependence on practical limits of doubt rather than any formal guarantee of epistemic soundness. With that being said, I find Grünbaum’s argument to be compelling when formulated as there are certain auxiliary statements which can be independently verified beyond a reasonable doubt—some auxiliary hypotheses are subject to more trust than others—we are unlikely to reject the conservation of energy, for example, even if we cannot verify it in every conceivable context.

Carnap’s defense of the analytic–synthetic split

Quine’s attack on the analytic–synthetic split relied on the absence of a formal boundary of where logic ends and external observations begin. Carnap attempts to defend the split via formalizing scientific statements consisting of axioms, statements of observation, and “meaning postulates” that capture definitions or conventions (Murzi, n.d.). The meaning postulates are essentially stipulations that make certain sentences true by definition. For example, we could have a meaning postulate that says x is unmarried, for all x in bachelors, and all bachelors are unmarried simply becomes a logical consequence of the definition.

I find Carnap’s argument to be ineffective as it merely relocates the burden of defining what an analytic statement means to whenever the language system is being defined. Having a meaning postulate that says x is unmarried, for all x in bachelors, sounds reasonable. But what if we define a meaning postulate x is less dense than uranium, for all x that’s hydrogen, then all hydrogen is less dense than uranium would be analytic in the system we consider the statement in. The results of empirical observation are clearly encapsulated within the meaning postulate, making the entire system intuitively non-analytic, yet a system could be constructed in Carnap’s framework that claims this statement is analytic. So Carnap’s framework essentially just moves the burden of judging analyticity onto when a language system is defined.

I would also argue that Carnap’s argument exceeds the structural limits it imposes on itself due to the nature of the expressiveness of such formal systems. Firstly, essentially all formal systems of mathematics that allows for basic arithmetic, suffer from the effects of Gödel’s First Incompleteness Theorem and other theorems of undecidability (Gödel 1931), which blurs the line between statements that need external observation to prove and statements that are true. It is therefore not always possible to determine whether a statement is analytic or synthetic through a formal system constructed therefrom. Alternatively, we may construct new statements that assert that a substatement is analytic, and through demonstrating that the new statement may be indeterminate, conclude that there is no generic deterministic proof that the substatement is analytic or synthetic. Secondly, Tarski’s undefinability theorem states that, in general, formal language cannot define its own truth predicate without contradiction; instead, a distinct meta-language is required to talk about truth (Tarski 1956). I find this to be another formulation of Quine’s rebuttal that definitions made in such systems are inherently circular: if meaning or analyticity is defined entirely within the same linguistic system, it becomes unavoidably self-referential, just like how it’s virtually impossible to teach children a language simply by speaking it, without correlation or reference to external objects that the language’s constructs represent. I therefore mostly agree with Quine, and contend that any attempt to segregate definitions from factual claims in one unified framework ultimately fails on the impossibility of internally specifying what truth or synonymy means.

Conclusion

The Quine–Duhem thesis challenges the dominant narrative of simple scientific falsification based on crucial experiments. Duhem argued that no hypothesis is tested in isolation, but only as part of an entire theoretical system with all of its auxiliary assumptions. Quine extended this beyond physics into all of epistemology, and rejects the analytic–synthetic distinction that has essentially served as the backbone of logical positivism. These refute the belief that empirical testing is able to provide conclusive verdicts on isolated propositions or theories; we live in the “tribunal of experience”.

This however does not render empirical science unimportant. Grünbaum offers convincing counterexamples that demonstrate practical limits to underdetermination: When auxiliary assumptions can practically be independently verified, we could isolate and falsify a hypothesis. The OPERA neutrino anomaly and COVID-19 PCR testing examples actually show how science, in practice, often overcomes underdetermination by verifying or falsifying auxiliary assumptions.

Nevertheless, these rebuttals generally tend to rely on a notion of independent verification that is strongly based on practical possibility. But every method of verification depends on further assumptions, and those assumptions, in turn, depend on other assumptions, introducing an infinite regress that cannot be escaped by purely practical means. I agree with Duhem and Quine’s analysis that empirical claims cannot be tested in a vacuum; inevitably they are based on auxiliary assumptions that we must assume to be given, even if those are so basic that we need a chain of testing auxiliary hypotheses to get to our final hypothesis of interest.

Carnap’s formal system allows for a formally defined boundary between analytic and synthetic truths, but only within a particularly stipulated language with many expressive constraints that ultimately put the burden of defining analyticity on whenever the formal system is created, and ultimately fails to reliably distinguish between analytic and synthetic statements due to the inherent incompleteness of formal systems.

If we reconsider the extent to which the Quine–Duhem thesis challenges the epistemic foundations of theory falsification in science, I argue that it presents little challenge in practical terms to the scientific method that could be resolved by methodologically testing auxiliary hypotheses beyond a reasonable doubt, but rather a challenge to the analytic–synthetic distinction and the established epistemic invariant that statements may be independently falsified.

In conclusion, while the epistemic threat presented herein by the Quine–Duhem thesis is subtle but significant, its ultimate effect lies in undermining the assumption that we could test scientific statements with absolute certainty, not in undermining practical scientific knowledge.

Bibliography

Ayer, Alfred Jules. 1946. Language, Truth and Logic. 2nd ed. London: Penguin Classics.

Creath, Richard. 2023. “Logical Empiricism.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta and Uri Nodelman, Winter 2023. https://plato.stanford.edu/archives/win2023/entries/logical-empiricism/; Metaphysics Research Lab, Stanford University.

Duhem, Pierre. 1975. “Physical Theory and Experiment.” In Can Theories Be Refuted? Essays on the Duhem-Quine Thesis, edited by Sandra G. Harding, translated by Philip Wiener, 1–40. Reidel.

Gödel, Kurt. 1931. “On Formally Undecidable Propositions of Principia Mathematica and Related Systems.” Monatshefte für Mathematik Und Physik 38: 173–98.

Grünbaum, Adolf. 1975. “The Duhemian Argument.” In Can Theories Be Refuted? Essays on the Duhem-Quine Thesis, edited by Sandra G. Harding, 116–31. Reidel.

Murzi, Mauro. n.d. “Rudolf Carnap (1891–1970).” Internet Encyclopedia of Philosophy. https://iep.utm.edu/rudolf-carnap/.

Orzel, Chad. 2020. “The OPERA Experiment and the Value of High-Profile Scientific Blunders.” The MIT Press Reader. June 23, 2020. https://thereader.mitpress.mit.edu/when-science-fails-opera-neutrinos/.

Quine, W. V. 1951. “Main Trends in Recent Philosophy: Two Dogmas of Empiricism.” The Philosophical Review 60 (1): 20–43. https://www.jstor.org/stable/2181906.

Schreiber, B. n.d. “Michelson-Morley experiment.” Encyclopædia Britannica. https://www.britannica.com/science/Michelson-Morley-experiment.

Staley, Richard. 2009. Albert Michelson, the Velocity of Light, and the Ether Drift. Chicago: University of Chicago Press.

Stanford, Kyle. 2023. “Underdetermination of Scientific Theory.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta and Uri Nodelman, Summer 2023. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2023/entries/scientific-underdetermination/.

Tarski, Alfred. 1956. “The Concept of Truth in Formalized Languages.” In Logic, Semantics, Metamathematics, edited by Alfred Tarski, 152–278. Oxford: Clarendon Press.