The International Studies Association, Wiley, Oxford University Press International Studies Quarterly

Toward a Scientific Understanding of International Conflict: A Personal View
Author(s): Bruce Bueno de Mesquita

Source: International Studies Quarterly, Vol. 29, No. 2 (Jun., 1985), pp. 121-136
Published by: Wiley on behalf of The International Studies Association
Stable URL: http://www.jstor.org/stable/2600500
Accessed: 14-11-2016 21:31 UTC
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
The International Studies Association, Wiley, Oxford University Press are collaborating with
JSTOR to digitize, preserve and extend access to International Studies Quarterly
This content downloaded from 128.122.149.154 on Mon, 14 Nov 2016 21:31:11 UTC
All use subject to http://about.jstor.org/terms
International Studies Quarterly (1985) 29, 121-136
Toward a Scientific Understanding of

International Conflict: A Personal View
BRUCE BUENO DE MESQUITA
University of Rochester
A scientific understanding of international conflict is best gained by explicit

theorizing, whether verbal or mathematic, grounded in axiomatic logic, from
which hypotheses with empirical referents may be extracted, followed by
rigorous empirical analysis (whether quantitative or not) in which assumptions
and procedures are explicitly stated. Such research should be careful to note
whether the hypotheses stipulate necessary, sufficient, necessary and sufficient,
or probabilistic relations among variables. Where, as will generally be the case,
individual researchers lack all the skills required by the above research agenda,
collaboration should be emphasized. Knowledge will best be gained when those
with the 'traditionalist' skills for evaluating patterns within individual events,
those with the 'behavioralist' training in the analysis of general patterns, and
those with the skills of the axiomatic theorist communicate and cooperate with
each other to move the discipline forward.
The very existence of an essay on the development of a scientific understanding of

international conflict is testimony to the immaturity of the subject as a scientifically
oriented discipline. Our progress toward generally accepted, well-documented
observations about international conflict has been so poor that many observers
disparage the belief-and it is still only a belief, an article of faith that such
knowledge is possible. Still, this belief is not based on blind faith. Progress has been
made. It is possible in some particular instances to specify contending explanations of
a phenomenon and to conclude that one is better than the other. How we can do this
is a central theme of this essay. Answers to 'How do we know?' are not specific to
international conflict. In that sense, the very focus of this discussion on international
conflict raises serious questions about our epistemological underpinnings. 'How we
know' is a problem inherent in all scientific research.
Means of Evaluating Knowledge
Five standards for evaluating scientific 'knowledge' at one time or another have
enjoyed widespread acceptance. Other standards, no doubt, can also lay claim to
Authors' note: Portions of this research were funded by grants from the Scaife Family Charitable Trusts,
the Sarah Scaife Foundation, and the Hoover Institution at Stanford University. I would like to thankJohn
Ferejohn and Grace lusi for their many helpful comments on earlier drafts.
0020-8833/85/02 0121-16 $03.00 ?) 1985 International Studies Association
122 Toward a Scientific Understanding of International Conflict
being the means of evaluating 'How do we know' questions. The five are:
1. Justificationism: Acceptance of a theory only if it has been proven true.
2. Neojustificationism: Acceptance of the probability that a theory is true, given the
preponderance of evidence.
3. Dogmatic Falsificationism: Rejection of a theory because it has been disproven by
the demonstration of a single counterexample.
4. Naive Methodological Falsificationism: Rejection of a theory, while recognizing
that it is not logically falsifiable, if it fails to satisfy pre-established criteria of
'significance'.
5. Sophisticated Methodological Falsificationism: Rejection of a theory if it has been
superseded by a more powerful one.'
The two 'justificationist' standards generally are in disrepute, largely because any
finite number of observations of an event, or relationship among variables, cannot be
taken as conclusive evidence regarding the relationship among a potentially infinitely
large number of instances of the event-category being explained. Still, many in the
international conflict field persist in offering empirical instances of support for their
hypotheses as demonstrations that their arguments are true.
'Falsificationist' standards are the most widely used basis for evaluating theory.
Debate has raged over the scope of falsification since Popper's (1959) suggestion that a
scientific theory requires the specification of the evidence that would constitute its
falsification. The application of dogmatic criteria such as the stipulation that a single
contradictory example or observation is sufficient to falsify a theory seems excessively
stringent. By those standards, even as important a scientific development as Newton's
explanation of the motion of heavenly bodies would have been rejected outright,
leaving chaos to reign where some order had emerged. After all, counterexamples to
the Newtonian system abound, and were articulated from the initial publication of
Newton's theory. So too in international relations research, dogmatic falsificationist
criteria, especially when our tools of observation are so primitive, seem excessively
stringent.
Methodological falsificationism seems, then, to be the dominant theme in the
contemporary practice of science. The significance testing perspective dominated the
'behavioral revolution' in international relations, with such scholars as Deutsch and
Singer at the vanguard, and with many, myself included, often having followed that
path. However, such an approach seems to have at least two important limitations.
First, what Lakatos terms naive methodological falsification forces us to accept some
mix of 'type 1' and 'type 2' errors. We run the risk of too readily falsely rejecting
correct hypotheses, or incorrectly failing to reject false hypotheses. Secondly, and I
think more importantly, rejection alone does not facilitate scientific progress.
Although the rejection of hypotheses stimulates research in new and hopefully more
fruitful directions, in the absence of the fruits of those new directions, rejection leaves
only chaos. By way of analogy recall that, however flawed Ptolemaic astronomy was,
some order was brought to celestial observation by that very astronomy for nearly
2000 years. It was abandoned only when an alternative established itself as a clearly
superior theory. Abandonment of Ptolemaic astronomy before the 'paradigm shift'
brought about by the research of Copernicus, Kepler, Galileo, and Newton, certainly
would have served no practical, and little scientific, purpose. So too in international
relations research: abandonment of a perspective, however flawed, is not likely to
enhance our understanding if it is not supplanted by a demonstrably superior
BRUCE BUENO DE MESQUITA 123
understanding of the world. For instance, whatever the severe logical and empirical
limitations of the post-World War I school known as the 'idealist approach', its
dismissal would not have served a beneficial purpose in the absence of some 'better'
framework for studying international conflict. So too whatever the flaws of the 'realist'
paradigm, its abandonment in the absence of a demonstrably better alternative does
not seem warranted.2
That is not to say that individuals should not pursue such alternatives, even as their
discipline continues to cling to its received wisdom or knowledge. We need not accept
the Kuhnian notion that transitions from one framework to another occur only in
situations of intellectual crisis (Kuhn, 1962). There is no reason why evolutionary
change or 'microrevolutions' cannot bring about the transition from one framework to
another (Kuhn, 1962; Vasquez, 1983). But I subscribe strongly to the notion that
progress is best made when one explanation is shown to supplant another. Thus I
propose adherence to the standard of knowledge suggested by Lakatos (1978: 32):
A scientific theory T is falsified if and only if another theory T' has been
proposed with the following characteristics:(1) T' has excess empirical content
over T: that is, it predicts novel facts, that is, facts improbable in the light of, or
even forbidden by, T; (2) T' explains the previous success of T, that is, all the
unrefuted content of T is included (within the limits of observational error) in
the content of T'; and (3) some of the excess content of T' is corroborated.
In short, knowledge in its most stringent sense and in its highest form is gained when
one explanation is replaced with another, broader, and apparently more accurate one.
Disciplinary progress toward such knowledge is generally made through the often
uncoordinated fits and starts of individual researchers. What is more, the replacement
of one theory with a superior one is a rare, 'culminating' event in the scientific
enterprise. Still, whatever the heuristic conditions are that move a field in the
direction of this highest form of growth in knowledge, I suggest that satisfaction of
Lakatos's criteria is a sufficient condition for stating that knowledge has been gained.
Reflections on the implications of Lakatos's criteria for the study of international
conflict motivates the remainder of this essay.
Such a perspective, of course, is not universally accepted, particularly as it carries
important methodological and epistemological implications. Although my concerns
are primarily with the rigor of theorizing, so much debate has centered around
methodological questions that I believe it will prove most useful to begin with an
examination of some of those methodological issues.
Single or Few Case Studies versus Multiple Cases
An important methodological division within the field of international conflict that

bears directly on criteria for judging advances in knowledge exists between those who
subscribe to the belief that knowledge is advanced by the in-depth study of a very
small number of cases and those who believe that many cases must be examined before
knowledge can be acquired. The issue is not whether case studies or large-N studies
are more or less useful components of the scientific enterprises. To the extent that
each of these research strategies helps inform students of conflict, and stimulates new
ideas or reinforces old ones, each is a valued tool. The heuristic importance of
alternative empirical strategies is not in doubt. The issue is the extent to which
evidence from single cases, or from large-N studies, contributes to the satisfaction of
Lakatos's (1978) criteria for advancing scientific knowledge. Let me be clear. I am not
here drawing a distinction between those who subscribe to nonquantitative analysis
and those who subscribe to quantitative analysis. To be sure, quantitative research
encourages but need not require-many cases. By way of illustration, remember that
the project to study the 1914 crisis leading to World War I is an instance of
quantitative analyses of a single event.3 Nonquantitative empirical research, of course,
can follow the path of a single case analysis as in Taylor's (1961) The Origins of the
Second World War, of a small number of cases as in Stoessinger's (1978) Why Nations Go
to War, or of many cases as in Blainey's (1973) Causes of War.
Why, then, is the decision to focus on very few or on many cases important? In
order for one explanation to supplant another, it is necessary that the new explanation
yield a net increase in knowledge. Both hitherto unexplained facts and previously
explained facts must be accounted for, within the limits of measurement error. This
requirement alone indicates a need to 'test' hypotheses against more than one case.
With one case it simply is not possible both to account for previously unexplained facts
(thus satisfying Lakatos's requirement of excess empirical content) and previously
accounted-for facts. It is difficult to see how excess content over previous explanations
may be attained with a single observation. Recognition of this limitation has led some
to propose the examination of a small number of directed case studies or focused
comparisons. However, even with more than one, but still few, cases, it is difficult to
attain the standard for advancing knowledge proposed above, although important
heuristic and pedagogical contributions may be made. Here we run into the problem
of establishing that the 'new' theory explains the unrefuted content of the theory it is
proposed to supersede. Two difficulties are likely to arise. First, if it was already
demonstrated that the first theory explained many facts, then a test on a small number
of facts, or cases, can not by itself establish the ability of the new theory to explain
those same facts. Second, with few cases the 'limits of measurement error' referred to
above are very wide indeed.
The first limitation may, with a sufficiently well-specified theory, be finessed by
demonstrating that the old theory is a subset, or special case, of the new theory.
Consider, for instance, the following example. Organski and Kugler (1980) argue that
when two dominant nations fight each other, allies are unimportant. Their reasoning
is that the dominant states are so much stronger than their allies that the allies are
unable to make a meaningful contribution to the war effort and so are irrelevant.
Without any empirical evidence, it is possible to establish on purely mathematical
grounds that the Organski-Kugler hypothesis is a subset of the expected utility
approach to war as proposed by Altfeld and Bueno de Mesquita (1979), with further
refinements in Bueno de Mesquita (1981, 1985). To see this, I restate the basic form of
the expected utility argument as:
EZ(Lj) = [Pi(US)?+(lPi) (Ufi) +E (Pik+Pjk-1)(Ukii-Ukj)']-

k#
[Q2iw(Li) + ( I-Q4i) (Q b(Lq f) ? (1 -i) (Uwii))] [1 ]

where: Ei(Ui) = i's perception of the difference in i's expected utility from
challengingj's policies and from leavingj unchallenged;Pi = i's probabilit
success in a bilateral contest withj;
(Pik + Pjk- 1) = the marginal effect of third

succee ng;
U-ki = the value i believes is gained from support from k;
U(kj = the value i believesj gains from k's support;
Us is the utility of success for the subscripted actor as perceived by the
superscripted actor in the event the subscripted actor challenges the relevant
adversary in a bilateral dispute;
Uf is the utility the superscripted actor believes the subscripted actor attaches
to being defeated following its initiation of a bilateral challenge. These
utilities are a function of the similarity in policies manifested by i andj, and of
the risk taking propensity of the superscripted actor;
The 'Qq' terms are the probability of the subscripted actor maintaining its
current policies in the absence of a challenge by the other actor, with the
estimate of that probability being made by the superscripted actor;
Ub and U. refer to the utility the superscripted actor perceives the subscripted
actor attaches to some anticipated improvement or worsening of existing
policy in the absence of a demand for policy change, while the 'Uq' terms refer
to the utility the superscripted actor perceives the subscripted actor attaches
to no change in policy by its potential adversary.
Focusing on the component that calculates the expected utility contribution of third
parties
Z (Pik+Pjk-1)(LPki-LUkj)'
k4 i,j
we see that if k is assumed to have virtually no power (with the 'P' terms referring to the
probability of success of the subscripted actors, and the 'U' terms referring to the
utility contribution of k to the relevant subscripted actor), then:
(Pik+ Pjk) ->(P+Pj)i [2]

(Pi + Pj = 1, by definition. Thus, (Pik + Pk-1 ) -O
(Pik + Pk-1 ) (Ui- Ukj)'-O. [3]

Thus, third-party k has no impact on the expected utility of i orj when i and j are
dominant powers engaged in conflict, with k's power approaching zero relative to i's
and j's. The evidence that Organski and Kugler (1980) adduce in support of their
hypothesis that allies make no difference-a regression analysis in which the role of
alliances in power transition wars is shown to be insignificant is equally well suited as
a test of the expected utility formulation under the power conditions they specify.
Thus, without any direct observations we must conclude that the expected utility
proposition explains the same facts as does the Organski-Kugler proposition. Alliances
are insignificant in influencing certain war-initiation decisions by the dominant states,
but are relevant for comparable decisions by lesser nations. If we can also show, as is
done in Altfeld and Bueno de Mesquita (1979) and Bueno de Mesquita (1981, 1985),
that other facts are explained by this theory, and that at least some of those facts are
corroborated, then we may conclude that the expected utility framework supersedes
the Organski-Kugler framework, at least with respect to alliance behavior.
With many cases, and fixed measurement rules, one runs the risk that each
individual observation has measurement distortion added to it. Assuming that the
distortion is not systematic (by intention or by accident), such measurement error

reduces 'goodness of fit', thereby adding a conservative bias to the analysis. The
conservative bias favors the preservation of the already established theory or
hypothesis over the challenger. As the number of observations increases, there is a
decrease in the likelihood that measurement distortion, and not the superiority of the
challenger hypothesis, accounts for an improvement in explanation. Consequently,
with many observations, evidence for the replacement of one explanation by another
is likely, in a probabilistic sense, to be fairly reliable. With few cases, on the other
hand, there is less opportunity for the distortions to 'cancel' each other, with the
concomitant diminution in the goodness of fit that such 'cancelling' produces.
Consequently, with few observations there is a tendency toward the appearance of
stark contrasts between the old and the new hypothesis. Of course, such stark contrast
may sometimes favor the old hypothesis, and sometimes the new. As cases that use the
same definitions and rules of observation and evaluation are accumulated, the
conditions of the many-case-analysis are approximated and the underlying 'true'
relationship emerges. However, evidence from few cases simply possesses a higher
probability of yielding results that are the consequence of measurement error rather
than any underlying 'truth' than do analyses based on the many-case approach. A
single whimsical example serves to demonstrate that when inferences are drawn from
very few cases, even a very small measurement error can produce patently absurd,
dangerously misleading conclusions. In 'The Lost Theorem of Euclid', Jolly (1978),
proves that obtuse angles are equal to right angles.
Let line segment AB equal CD in length by construction. Let angle DCB be a right
angle by construction. Let angle ABC be obtuse by construction. Join points A and D
A E D
FIG. 1. Do right angles equal obtuse angles? A case study. Source: D. Jolly. (1978)
The Lost Theorem of Euclid. The Journal of Irreproducible Results 23(4).
forming line segment AD. Draw the perpendicular bisectors of line segments AD at E
and BC at F. Draw lines from points A and D to their perpendicular bisector such that
they intersect it at point 0. Note that, if drawn correctly, AO and DO must be of
equal length. Similarly, draw lines from points B and C to their perpendicular bisector
such that they intersect it at 0. Note that BO and CO must be of equal length. As can
be seen from Figure 1, triangles ABO and DCO are congruent, so that angle ABO
must equal angle DCO. Similarly, angle CBO must equal angle BCO. Then, by
subtraction, angle ABC, which equals ABO-CBO, must equal angle DCB, which
equals DCO-BCO. Therefore, obtuse angles equal right angles.
This particular result is developed on the basis of a total measurement error of less
than 1 per cent of the data, where the data are the locations of the 10 line segments in
the figure, and the drawing of accompanying angles. Put somewhat differently, the
conclusion is derived by a measurement error in the location of point E along line
segment AD of less than 8 per cent. Such errors of observation, of course, are well
within the norm for social science. With a single case study, then, small measurement
error can lead to results that are flatly preposterous. To be sure, with a large number
of cases the improvement in our assessment of the evidence would be real, but
inadequate. With many unbiased observations we would conclude that, on average,
right angles do not equal obtuse angles, although there is variance around this
average. That is, we would conclude that under some (as yet unexplained)
circumstances, obtuse angles do equal right angles. Many cases bring us a little closer
to the truth than does a single case, but we still have a long way to go. Of course, if we
had a rigorous, internally consistent argument to rely on, such as the Pythagorean
Theorem, we would know without any observation that the empirical relationship
observed between obtuse and right angles is false. Before concluding that this example
is excessively silly, remember that many highly respected researchers in the 1920s and
1930s claimed that World- War I resulted from the establishment of rigid alliance
systems. So influential was this inference from a single event that many national
leaders explicitly strove to avoid such alliances following the war. Other scholars have
since argued that the absence of clear alliance networks prior to World War II was a
prime cause of that war. NATO and the Warsaw Pact are consequences of this
perspective. Similarly, it was fashionable after World War I to conclude that the
prewar arms race promoted war. This again led many significant national leaders to
neglect rearmament. What were the consequences? As Goldblat (1982 :12) puts it:
When the League Covenant was written many believed that World War I was
caused by the arms race prior to the war, whereas a few decades later the
prevalent feeling was that World War II could have been avoided if the great
powers had maintained an adequate military potential as well as a readiness to
use it.
Surely these single case inferences are of sufficient import to remind us of the dangers
of propositions buttressed by too few analyses, and too much measurement error.
Induction and Deduction
A many-case analysis of angles would have alerted us to the fact that the case study
result was in serious doubt. However, in the absence of a well-specified theory of
relations among variables, it is unlikely that we would escape the unfortunate
conclusion that obtuse angles sometimes equal right angles. Large Ns alone are a poor
substitute for rigorous theorizing. Here is where the interaction between theory and
evidence, between logic and observation, is so important. Too often we confuse
empiricism with theory construction. We would do well to remember Kant's (1950:
53) contention that 'no conditions ofjudgements of experience are higher than those
... pure concepts of the understanding, which render the empirical judgement
objectively valid'.
In some larger sense, of course, theory construction always follows on from our
previous experience and observation. Which axioms we choose, whether the choice is
made implicitly or explicitly, is largely a function of our unscientific personal judgment
about the workings of the world. Our individual experiences and observations lead
some students of international conflict to accept the axioms of such decisionmaking
frameworks as stimulus-response, action-reaction cycles (Richardson, 1960), bureau-
cratic politics (Allison, 1971), or expected utility maximization (Bueno de Mesquita,
1981). Others reject decisionmaking approaches, preferring to seek explanations of
conflict from the perspective of system-wide structural constraints (Morgenthau, 1966;
Singer et al., 1972; Waltz, 1979). Still others strive to understand conflict from the
perspective of cybernetics (Deutsch, 1964), asymmetric exchange relations (Keohane
and Nye, 1977), national attributes (Organski and Kugler, 1980), cognition, and so
on.
Indeed, while disciplines make rare leaps forward as a consequence of the
establishment of new theories superseding old theories, individual researchers
generally contribute to the incremental progress that makes great leaps possible by
providing the building blocks for the selection and evaluation of alternative
approaches. Thus, individual choices from among competing paradigms are informed
by personal knowledge of history and experience. Such personal knowledge results
from the examination of individual case studies, the perusal of data collections, ad hoc
inquiries and analyses, as well as from individual efforts at axiomatic, deductive
theorizing. In this way, the many seemingly unrelated efforts of independent
researchers provide the knowledge base from which the pieces of the conflict puzzle
may be assembled into a body of coherent scientific knowledge.
So far, we do not know enough to make our knowledge coherent or our many
theoretical perspectives commensurable. Consequently, we cannot say that one
approach or another is better in general. Usually, the competing perspectives in the
field of international conflict provide frameworks for explanations of different, though
related, dependent variables. However, this does not mean that there are not objective
criteria by which to evaluate hypotheses within and sometimes across these
frameworks.
Whether one is a Marxist, mercantilist, or capitalist; a quantifier or nonquantifier; a
single-case study analyst or a many-cases analyst, we should all be able to agree that
internal, logical consistency is a fundamental requirement of all hypotheses. To the
extent that logical consistency is accepted as an elemental requirement of all research,
formal, explicit theorizing takes intellectual, if not temporal, precedence over empiricism. Rigorous
'tests' of casual hunches seem to me to carry little more weight than do casual 'tests' of
those same hunches. In the absence of the careful specification of the exact logical
linkages among the terms in one's hypotheses, even the most rigorous empirical
analysis is doomed to be inchoate. Our main problem is not a lack of facts to marshal
in support of hypotheses, but rather a lack of rigorously derived hypotheses that can
render our facts informative.
This is not to say that all researchers must in all their endeavors do axiomatic,
deductive modeling before they address evidence. Differentiation, and specialization of

skills is inevitable and laudable. Rather, it is to say that there is a tendency in the
discipline to be seduced by the plausibility of our casual explanations or observations
of relations among variables. Such casual explanations and observations provide a
useful foundation for the development of scientific knowledge. But, they are not
themselves all that is required for such knowledge.
The difficult task of making precise the meaning of each term in a stipulated
relationship, so that its properties are clear and so that the explanatory component
can be evaluated, is essential. What exactly do we mean when we say that in a
balance-of-power framework alliances are a temporary, nonideological means of
augmenting power? How exactly do alliances augment power? Do all alliances
augment power? What precisely are the boundaries within which alternative power
distributions satisfy the requirements of balance or imbalance? Can we prove that
under certain specific conditions alliances do augment power, and that under other
conditions they do not? Can we prove a relationship-in logic-between the quest for
power and alliance behavior? Such proof requires more than observation.
We generally accept that observation is useful to falsify theory, but observation is
not particularly useful for confirming theories. Falsification or replacement of theories
may come from observation, but proof must come from axiomatic logic. However
difficult the task of such logical proof, however less exciting than the examination of
concrete events, we must not be lulled by apparent empirical successes into believing
that scientific knowledge can be attained without the abstract, rigorous exercise of
logical proof. While we can not each undertake all of the tasks required for the
development of a scientific understanding of international conflict, great progress
could be made if we all accepted the necessity of rigor in the discipline's elaboration,
logical evaluation, and empirical observation of hypothesized relations among
variables.
I state this call for greater rigor in theorizing more as a challenge to myself than as a
dictum for the field. I recognize that the priority I attach to explicit, rigorous
theorizing is not widely shared. After all, many prominent hypotheses about
international conflict as they are currently formulated are incapable of satisfying the
criterion of logical consistency and yet they persist as important elements in the study
of international relations. In some instances, such as many 'balance-of-power'
arguments, there are underlying logical inconsistencies in the hypothetical linkage
between the distribution of power and the likelihood of war that are well known and
long established (Gulick, 1955; Claude, 1962; Riker, 1962; Bueno de Mesquita, 1980).
Many empirical statements relating the balance of power to war should either be
ignored because they lack logical consistency or they should be constructed from some
alternative, internally consistent, logical foundation. The current state of knowledge
on the linkage between power and war certainly is not worth taking seriously by the
policy community (Kissinger, 1979). Still, these hypotheses have not been abandoned
in favor of sounder theoretical arguments. Indeed, so limited is the commitment to
theoretical rigor that even as giant a contributor to the field as Morgenthau could
write in good conscience of historical and logical critique:
I am being asked from time to time why I do not justify my position against
what appears to be at present the prevailing trend in the field. I do not intend
to do this; for I have learned both from historic and personal experience that
academic polemics generally do not advance the cause of truth but leave things
very much as they found them (Morgenthau, 1966:ix).4
In other instances, empirical progress toward an understanding of conflict is

thwarted by a failure to examine closely the logical basis on which seemingly plausible
hypotheses are formed. For some time, linkage, scapegoat hypotheses (Rummel, 1968;
Rosenau, 1969; Blainey, 1973), status inconsistency arguments (Galtung, 1964;
Wallace, 1973a; Midlarsky, 1975), and other psychological theories enjoyed
prominence in the international conflict literature. Yet many of these perspectives
suffered a common flaw. By anthropomorphizing nations, they attributed to collective
choice mechanisms rigid, restricted psychological profiles for responding to domestic
or international sources of frustration. Leaders of status-inconsistent nations, for
instance, were presumed to act out their frustration and aggression through the use of
international violence. However appealing to one's intuition such explanations seem,
there is nothing in the logical structure of the psychological theories from which such
hypotheses were derived to indioate that frustrated individuals, in their capacities as
national leaders, particularly vent their frustrations (or aggression) by the use of
military force. From these psychological theories one can argue with equal plausibility
that national leaders vent their frustrations and aggression by beating their heads
against a wall, going home and beating their spouses, shooting their underlings, etc. In
short, the logical foundation for the leap from individual psychology to national action
remains nonexistent. The hypotheses are not actually grounded in a clear logical or
psychological structure. Although couched in the language of psychology, they are in
reality unproven ad hoc hunches. The same is true of most scapegoat arguments linking
international violence to domestic politics. Again, there is a seemingly infinite variety
of ways decisionmakers can respond to domestic demands, domestic unrest, etc., of
which fomenting international violence, though one, hardly seems logically estab-
lished as a prominent, let alone dominant, response. And even if it were, the logic of
such theories does not establish the frequency with which this particular source of
violence arises as compared to many alternative causes of conflict (Most and Starr,
1983).
Similarly grand logical leaps, motivated by the seductiveness of casual explanations
and seemingly supportive observations, may be found in almost all other major
research programs related to international conflict. Certainly the pioneering work by
Richardson (1960) on the mathematics of arms races bears this characteristic. Nothing
in Richardson's structural equations dictates logically that war follows an arms race,
or even that war is particularly likely as a consequence of arms races. To be sure, the
equations specify conditions under which pressures exist to persist or desist in
acquiring further armaments. The models do bear mathematical implications
regarding what an arms racer views as an acceptable relationship between their
armaments and that of their competitors. But the logic itself provides no criterion for
inferring that arms races should be expected to terminate in violence (or in
deterrence). The casual hypotheses of this, and other literatures await the careful
development of logical proofs that can raise the supporting evidence from its current
level of provoking interest and argument to the higher level of established (or refuted)
knowledge.
The satisfaction of logical consistency, of course, is not a guarantee of useful or
interesting theory. It is a trivial task to construct an internally consistent, but totally
vacuous, theory. The difficulty is in constructing arguments that possess both logical
integrity and empirically interesting content. Too often, however, our empirical
predilections lead us to fail to explore the logical basis for the empirical expectations
derived from competing hypotheses. This oversight tends to produce empirical
analyses that do not serve as particularly helpful tests of competing, but plausible,
rival hypotheses. Consider, for instance, the argument between those who contend
that bipolar systems tend to be at peace while multipolar systems tend to be at war
(Waltz, 1964), and those who contend that multipolar systems tend to be at peace,
while bipolar systems tend to be at war (Deutsch and Singer, 1964). Close scrutiny of
the competing arguments reveals an important underlying assumption that the
competing sides accept, and an important underlying difference in the implications
that have been drawn from that assumption. Both 'schools' assume that multipolarity
induces uncertainty in the international system, while bipolarity induces clarity. But,
Deutsch and Singer (and others) seem to assume that uncertainty provokes cautious
behavior, while Waltz (and others), with equal plausibility, argue that uncertainty
provokes reckless, or at least miscalculated, behavior. The two contending arguments,
then, really are about how people with the power to wage war respond to uncertainty,
and not about the effects of polarity at all. Yet the analyses generated by these
competing hypotheses generally focus on how war and bi- or multipolarity are
empirically associated (Singer and Small, 1968; Wallace, 1973b; Bueno de Mesquita,
1975; Ostrom and Aldrich, 1978). The critical differences between these rival
perspectives can produce discriminating empirical results only if decisionmaker
responses to uncertainty are heavily skewed in favor of one or the other point of view.
If decisionmaker responses to uncertainty are symmetrically distributed (whether
normally, uniformly, or multimodally) then neither hypothesis can be true in general,
although each can be correct under specifiable circumstances (Bueno de Mesquita,
1978). Despite this logically inescapable conclusion, the authors of the rival hypotheses
have, nevertheless, been committed to a systemic approach to studying the linkage
between polarity and conflict. Waltz (1979) seems to deny actively the potential value
of a decisionmaking perspective, while Singer, in a forthcoming work, moves toward
accepting a decisionmaking framework.
Does it matter whether our research proceeds inductively or deductively, so long as
we, as a group engaged in a collective enterprise, satisfy the requirements of rigorous
theory construction and rigorous empirical investigation? I think not, at least in terms
of the value of the final product. The logic of discovery apparently is not laid out along
a single, neat path. Many routes seem capable of leading to an advancement of
knowledge. But perhaps there is an important difference between beginning a research
endeavor inductively or deductively when viewed from the perspective of research
efficiency. Consider the following scenario. One observes relations among variables.
From these observations, one constructs a tentative explanation. The result is a model
of behavior with appropriate empirical referents. Now, consider the potential sources
of error or inefficiency. First, the set of variables selected initially may not prove to
yield 'interesting' relations. This may be because, in the absence of an explicit theory,
the researchers made some bad choices. Or it may be that the wrong relations among
the variables were examined. That is, the researcher may have chosen well in terms of
variables, but poorly in terms of the functional form of their relationships. This is most
likely to happen when the hypotheses being analysed are ad hoc efforts to fit variables
together. Finally, the 'right' functional relationship may have been specified, in the
sense that goodness of fit is maximized, but the post hoc explanation attached to the
relationship may possess some internal inconsistency. Now, had the researcher begun
deductively, the final problem would have been discovered long before the time and
effort was ever made to gather data.
Second, although the hypotheses derived deductively may prove just as vacuous or
just as powerful as those derived inductively, still by taking a deductive approach the
researcher is guided by some logical structure concerning the selection of relevant
variables and the specification of the expected functional form of the relationships
among the variables. An ad hoc search for the best fit is not undertaken. Rather, the
question is far simpler: Do the data fit together in the manner specified in the theory? In
short, the deductive approach tends to short-circuit certain kinds of dead ends more
quickly and efficiently than does the inductive approach. Of course, to the extent that
one thinks more creatively one way or the other, that creativity may outweigh the
structural advantages of deductivism over inductivism. In any event, it is a matter of
taste whether efficiency is to be valued as a methodological goal.
Criteria for Evaluating Evidence
Gathering evidence and evaluating how weighty is its tendency to falsify our
hypotheses is one of the most difficult, and clearly subjective, tasks faced by
researchers. Our hypotheses rarely address the exact same problem from different
theoretical perspectives. Rather, our hypotheses tend to necessitate the specification of
definitionally unique, if not unrelated, dependent variables. And even should we agree
on the definition of a crucial variable, say war, still we seem to disagree on which
aspects of war we expect our hypotheses to explain. Studies of the causes of war, for
instance, have focused on war's frequency in the international system, in certain
nation subsets (e.g., democracies, Third World, socialist economies, the European
balance of power framework, major powers), and in certain historic periods (e.g.,
Peloponnesian, 19th century, nuclear age). But studies have also focused on war's
intensity, magnitude, severity, duration, periodicity, and so forth. Rarely are two
studies' dependent variables sufficiently similar to permit a direct comparison of
results. Without such comparisons, of course, Lakatos's (1978) criteria set out earlier
can not be applied effectively, thus largely precluding judgments about advancements
in scientific knowledge. But even if these criteria can be applied only loosely, still there
are arguments for doing empirical research of one sort rather than another.
Here I would like to address some of the advantages and disadvantages of
quantification and nonquantification as methods of addressing evidence. I do so with
an obvious bias which should be stated at the outset. My own research is quantitative,
suggesting, correctly, that I will conclude that quantification is generally, though not
always, preferable to nonquantification. Let me begin with the weaknesses, as I see
them, of quantitative analysis.
The construction of indicators of important phenomena based on rigid coding rules
usually means a significant sacrifice in internal validity. Let me take my own research
on expected utility and war as a case in point. Cardinal utilities, as far as I know, can
be measured accurately only through the application of Von Neumann experiments.
Any other technique is likely to distort the shape of utility functions, producing in
particular instances serious misestimates of the factors motivating or retarding
behavior. Of particular seriousness in this regard is that we have no way of knowing
which cases are distorted, by how much, or even in what direction, except through the
application of post hoc, revealed-preference criteria (and even then we can address only
ordinal issues of direction of distortion). While it is impossible to make use of Von
Neumann experiments in studying most past wars (since the relevant decisionmakers
are dead and, if they are alive, they almost surely would not subject themselves to such
experiments), it may be possible to come closer to a proper estimate of utilities by
studying individual cases in great depth than by applying fixed, rigid coding rules to
all actors at all times. Although we cannot verify in any strict sense that close, careful
case studies would produce better estimates (since we have no benchmark values for
comparison), it seems eminently reasonable to believe that close scrutiny of individual
decisions yields better estimates of utilities than do gross applications of general
evaluative criteria. Indeed, this is a statement that could be verified in general, if not
in this particular instance, through experimental research. In any event, I accept that
specific events can be described more accurately by careful nonquantitative research
than by quantitative analyses. In short, the description of individual events as
manifested by 'values' on specific variables is likely to be more accurate in
nonquantitative studies than in quantitative ones.
But what is the price paid for accuracy in estimating the 'values' on relevant
variables? First, a trade-off generally exists between the achievement of internal
validity and the attainment of external validity. As the quality of explanation of a few
events increases, the quality of generalization to other events typically decreases. This
is because the quality of explanation is enhanced by the inclusion of factors unique to
the few cases studied in nonquantitative depth. But, such unique factors, by definition,
cannot contribute to a general understanding or explanation of the event-class studied.5
Furthermore, even when general explanations are sought, and unique factors are
specifically excluded, it is difficult in the absence of precise coding criteria to assess
whether the evaluation of the role of specific variables is biased or confounded by the
specific circumstances being studied. Quantitative analysis insists on explicit rules for
defining variables, and explicit assumptions concerning the way variables relate to
each other. The selection of one or another statistical technique, for instance,
automatically imposes precise assumptions on the data. Nonquantitative research does
not insist on such explicitness. Of course, one can select poor rules or inappropriate
assumptions in doing quantitative research (just as one can in nonquantitative
research). But, with quantitative research it is much easier to identify those poor
decisions. Debate about research and the possibility of careful replication, or
modification, is enhanced by the explicitness of assumptions. In that sense, it is easier
to recognize spurious results in quantitative analysis than in nonquantitative analysis.
Consequently, quantitative research facilitates judgments about the general accuracy of
explanations and the precision with which methods of evaluation are applied while
nonquantitative research sacrifices such precision in order to enhance the accuracy of
description with respect to particular events. A commitment to science seems to argue
strongly in favor of emphasizing general understanding, rather than particularistic
understanding. After all, the power of particularistic understanding is precisely that it
can emphasize details that are not likely to recur, and therefore are likely to escape
entirely the viewpoint of quantitative analysis.
Some will object to the argument by saying 'how can external validity be achieved
when internal validity is in doubt?' There are, of course, at least two responses. First,
both internal and external validity are in doubt in quantitative and in nonquanti-
tative analysis. It is rather a question of the trade-off between explaining the
characteristics of forests or the characteristics of individual trees. Second, internal
validity comes in large measure from sound, explict theory, and not just from
observation. A well-specified theory dictates what relationships to look for, and in
what form to look for them. In short, good theory specifies the essentialfacts needed to explain
certain classes of events. Neither theory, nor empirical analysis is expected to replicate
reality fully-nor should it. The replication of reality requires attention to an infinite
number of facts and factors-no study, no matter how detailed, can replicate reality.
Indeed, even the most detailed of analyses must remain infinitely removed from a full
specification of factors that impinged upon the event being studied. But, quantitative
analyses at least permit us the opportunity to evaluate the quality of inferences that
are drawn from the available evidence in a broader, more readily replicated or
refuted, analytic framework than do nonquantitative analyses. This is not to suggest
that rigor is not possible in nonquantitative research. Several examples of rigorous
research that is nonquantitative come to mind, including for instance Gulick's (1955)
Europe's Classical Balance of Power, Blainey's (1973) Causes of War, and many others. The
point is, it is easier to spot poor quantitative analysis than it is to spot poor
nonquantitative analysis because the former is more open and explicit by nature than
is the latter. Consequently, quantitative research is likely to reveal that a given
approach is wrong more quickly and clearly than is nonquantitative analysis. Thus,
efficiency seems to argue for quantification, just as it argues for explicit, deductive
theorizing.
A Personal Prescription
Rather than review 'the literature' I have chosen to focus on a few epistemological
issues that seem to me to be central to our ability to answer the question 'How do we
know when we know something about international conflict?' Undoubtedly such a
focus will disappoint many readers. However, there are many fine reviews of the
literature and tours d'horizon in our field that have been written during the past few
years, so that my failure to pursue that focus should not leave anyone lacking sources
on the subject. Let me turn now, in this concluding section, to my personal
prescriptions for future research. I do so with considerable trepidation because it is not
anyone's place to instruct others about how to do research. I take the view that what is
said here represents more a challenge to myself than an agenda for others.
Furthermore, I do not have in mind that researchers must satisfy all the tasks I believe
are required for knowledge to be advanced. Rather, I have in mind what I believe we,
as a discipline, should strive for.
What should not be emphasized about the means for achieving scientific progress in
training future researchers are the following:
Abstract theorizing that lacks empirical referents.

Quantitative analyses of implicit theories, or of informal, ad hoc, or personal
hunches.
Nonquantitative analyses of implicit theories, or of informal, ad hoc, or personal
hunches.
Small-N studies intended to 'tease out' patterns, or relations among variables,
from assemblages of facts.
Large-N studies intended to 'tease out' patterns, or relations among variables,
from assemblages of facts.
Rather, the means for achieving scientific progress when training future researchers
should include explicit theorizing, whether verbal or mathematical, grounded in
axiomatic logic, from which hypotheses with empirical referents may be extracted.
These should be followed by rigorous empirical analysis (whether quantitative or not)
in which operational assumptions and procedures for evaluating evidence are
explicitly stated. Such research should be careful to note whether the relevant
hypotheses stipulate necessary, sufficient, or necessary and sufficient, conditions for a

given outcome to obtain. They should be careful to use criteria for evaluating the
evidence that are consistent with the expectations implied by the differences between
necessary, sufficient, or necessary and sufficient conditions. Such studies should be
clear about the possibility that the hypotheses represent neither necessary nor
sufficient conditions, but rather catalytic circumstances that sometimes contribute, in
a probabilistic sense, to particular outcomes (Most and Starr, 1983). Where, as will
generally be the case, individual researchers lack all of the skills required by the above
research agenda, collaboration should be emphasized. Knowledge will best be gained
when those with the 'traditionalist' skills for evaluating the patterns within individual
events, those with the 'behavioralist' training in the analysis of general patterns, and
those with the skills of the axiomatic theorist communicate and cooperate with each
other in an effort to move the discipline forward.
Notes
1. For a particularly clear and precise description of each of these standards, along with an evaluation of
their logical limitations, see Lakatos (1978).
2. Of course, while utter rejection of a theory on the grounds of naive methodological falsificationism may
be excessive, still it is important to know the strengths and weaknesses of our theories. Thus, even in the
absence of an alternative explanation that is superior to the realist paradigm, it is useful to know
whether a naive, null model yields results superior to alternative hypotheses, and to know where
theories possess their weakness so that we may ascertain whether further theorizing (such as
constructing auxiliary hypotheses) is required or if improved measurement instruments are needed.
3. Of course, the studies that resulted from the 1914 project did investigate with time series many small
events in the hope that patterns underlying those small events would lead to an understanding of the
single, big event the eruption of World War I (Holsti et al., 1968; Zinnes, 1968).
4. Note how Morgenthau equates criticism with polemics.
5. A possible exception to this line of argument arises when a very small number of 'most different systems'
are selected to evaluate the merits of competing, precise, deterministic arguments. If the instruments of
observation are sufficiently fine that measurement distortion is not a serious concern, then even one
qualitative departure from a hypothesized necessary, sufficient, or necessary and sufficient relationship
can stand as an important challenge to the survival of the hypothesis. Where the hypothesis is not
deterministic either in theory, or in its operational form (because of limitations on the instruments or
observations), most different systems do not represent exceptions to the argument delineated above.
References
ALLISON, (1971) Essence of Decision: Explaining the Cuban Missile Crisis. Boston, MA: Little, Brown.
ALTFELD, M., AND B. BUENO DE MESQUITA. (1979) Choosing Sides in Wars. International Studies Quarterly 23 (1):
87-112.
BLAINEY,(1973) The Causes of War. New York, NY: The Free Press.
BUENO DE MESQUITA, B. (1975) Measuring Systematic Polarity. Journal of Conflict Resolution 19 (2): 187-215.
BUENO DE MESQUITA, B. (1978) Systematic Polarization and the Occurrence and Duration of War. Journal of
Conflict Resolution 22 (2): 241-267.
BUENO DE MESQUITAB. (1980) 'Theories of International Conflict: An Analysis and an Appraisal' In The
Handbook of Political Conflict, edited by T. R. Gurr, 361-398. New York, NY: The Free Press.
BUENO DE MESQUITA, B. (1981) The War Trap. New Haven, CT: Yale University Press.
BUENO DE MESQUITA, B. (1985) The War Trap Revisited. American Political Science Review 79 (1): 156-177.
CLAUDE, I. (1962) Power and International Relations. New York, NY: Random House.
DEUTSCH, K. (1964) The Nerves of Government. New York, NY: The Free Press.
DEUTSCH, K., ANDJ. D. SINGER. (1964) Multipolar Power Systems and International Stability. World Politics 16
(3): 390-406.
GALTUNG,J. (1964) A Structural Theory of Aggression. Journal of Peace Research 1(1): 95-119.
GOLDBLAT, J. (1982) Agreement for Arms Control: A Critical Survey. London, UK: Taylor and Francis.
GULICK, E. (1955) Europe's Classical Balance of Power. Ithaca, NY: Cornell University Press.
HOLSTI, O., R. NORTH, AND R. BRODY. (1968) 'Perceptions and Action in the 1914 Crisis' Quantitative
International Politics, edited byJ. D. Singer, 123-158. New York, NY: The Free Press.
JOLLY, D. (1978) The Lost Theorem of Euclid. The Journal of Irreproducible Results 23(4): 8-10.
KANT, I. (1950) Prolegomena to Any Future Metaphysics That Will Be Able to Present Itself as a Science. Translated
by L. W. Beck. New York, NY: Library of Liberal Arts.
KEOHANE, R., ANDJ. NYEJr. (1977) Power and Interdependence: World Politics in Transition. Boston, MA: Little,
Brown.
KISSINGER, H. (1979) The White House Years. Boston, MA: Little, Brown.
KUHN, T. S. (1962) The Structure of Scientific Revolutions. Princeton, NJ: Princeton University Press.
LAKATOS, I. (1978) The Methodology of Scientific Research Programmes, Vol. 1. London UK: Cambridge
University Press.
MIDLARSKY, M. (1975) On War. New York, NY: The Free Press.
MORGENTHAU, H. J. (1966) Politics among Nations, 4th and 5th ed. New York, NY: Alfred Knopf.
MOST, B., AND H. STARR. (1984) International Relations Theory, Foreign Policy Substitutability, and 'Nice'
Laws. World Politics 36 (3): 383-406.
ORGANSKI, A. F. K., ANDJ. KUGLER (1980) The War Ledger. Chicago, IL: University of Chicago Press.
OSTROM, C., AND J. ALDRICH (1978) The Relationship between Size and Stability in the Major Power
International System. American Journal of Political Science : 743-771.
POPPER, K. (1959) The Logic of Scientific Discovery. London, UK: Hutchinson.
RICHARDSON, L. F. (1960) Arms and Insecurity. Chicago, IL: Quadrangle Books.
RIKER, W. (1962) The Theory of Political Coalitions. New Haven, CT: Yale University Press.
ROSENAU, J. (1969) Linkage Politics. New York, NY: The Free Press.
RUMMEL, R. (1968) 'The Relationship between National Attributes and Foreign Conflict Behavior'. In
Quantitative International Politics, edited byJ. D. Singer, 187-214. New York, NY: The Free Press.
SINGER, J. D., S. BREMER, ANDJ. STUCKEY. (1972) 'Capability Distribution, Uncertainty, and Major Power
War, 1820-1965'. In Peace, War, and Numbers, edited by B. Russett, 19-48. Beverley Hills, CA: Sage
Publications.
SINGER, J. D., AND M. SMALL (1968) 'Alliance Aggregation and the Onset of War, 1815-1945'. In Quantitative
International Politics, edited by J. D. Singer, 247-286. New York, NY: The Free Press.
STOESSINGER, J. (1978) Why Nations Go to War. New York, NY: St Martin's Press.
TAYLOR, A. J. P. (1961) The Origins of the Second World War. London, UK: H. Hamilton.
VASQUEZ, J. (1983) The Power of Power Politics: A Critique. New Brunswick, NJ: Rutgers University Press.
WALLACE, M. (1973a) War and Rank among Nations. Lexington, KY: D.C. Heath.
WALLACE, M. (1973b) 'Alliance Polarization, Cross-Cutting, and International War: 1815-1964. Journal of
Conflict Resolution 17(4): 575-604.
WALTZ, K. (1964) The Stability of a Bipolar World. Daedalus 93(2): 881-909.
WALTZ, K. (1979) Theory of International Politics. Reading, MA: Addison-Wesley.
ZINNES, D. (1968) 'Expression and Perception of Hostility in Pre-War Crisis: 1914'. In Quantitative International
Politics, edited byJ. D. Singer, 85-119. New York, NY: The Free Press.

The International Studies Association, Wiley, Oxford University Press International Studies Quarterly

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The International Studies Association, Wiley, Oxford University Press International Studies Quarterly

Uploaded by

Copyright:

Available Formats

Toward a Scientific Understanding of International Conflict: A Personal View

Author(s): Bruce Bueno de Mesquita

Toward a Scientific Understanding of

BRUCE BUENO DE MESQUITA

A scientific understanding of international conflict is best gained by explicit

The very existence of an essay on the development of a scientific understanding of

Means of Evaluating Knowledge

0020-8833/85/02 0121-16 $03.00 ?) 1985 International Studies Association

Single or Few Case Studies versus Multiple Cases

An important methodological division within the field of international conflict that

EZ(Lj) = [Pi(US)?+(lPi) (Ufi) +E (Pik+Pjk-1)(Ukii-Ukj)']-

[Q2iw(Li) + ( I-Q4i) (Q b(Lq f) ? (1 -i) (Uwii))] [1 ]

(Pik + Pjk- 1) = the marginal effect of third

(Pik+ Pjk) ->(P+Pj)i [2]

(Pik + Pk-1 ) (Ui- Ukj)'-O. [3]

distortion is not systematic (by intention or by accident), such measurement error

Induction and Deduction

deductive modeling before they address evidence. Differentiation, and specialization of

In other instances, empirical progress toward an understanding of conflict is

Criteria for Evaluating Evidence

Abstract theorizing that lacks empirical referents.

hypotheses stipulate necessary, sufficient, or necessary and sufficient, conditions for a

You might also like