Reasoning-Automated A4

pdf version of the entry
Automated Reasoning
http://plato.stanford.edu/archives/win2014/entries/reasoning-automated/ Automated Reasoning
from the Winter 2014 Edition of the First published Wed Jul 18, 2001; substantive revision Thu Nov 20, 2014
Stanford Encyclopedia Reasoning is the ability to make inferences, and automated reasoning is
concerned with the building of computing systems that automate this
of Philosophy process. Although the overall goal is to mechanize different forms of
reasoning, the term has largely been identified with valid deductive
reasoning as practiced in mathematics and formal logic. In this respect,
automated reasoning is akin to mechanical theorem proving. Building an
automated reasoning program means providing an algorithmic description
to a formal calculus so that it can be implemented on a computer to prove
Edward N. Zalta Uri Nodelman Colin Allen R. Lanier Anderson
theorems of the calculus in an efficient manner. Important aspects of this
Principal Editor Senior Editor Associate Editor Faculty Sponsor
exercise involve defining the class of problems the program will be
Editorial Board
http://plato.stanford.edu/board.html required to solve, deciding what language will be used by the program to
represent the information given to it as well as new information inferred
Library of Congress Catalog Data
ISSN: 1095-5054
by the program, specifying the mechanism that the program will use to
conduct deductive inferences, and figuring out how to perform all these
Notice: This PDF version was distributed by request to mem- computations efficiently. While basic research work continues in order to
bers of the Friends of the SEP Society and by courtesy to SEP provide the necessary theoretical framework, the field has reached a point
content contributors. It is solely for their fair use. Unauthorized where automated reasoning programs are being used by researchers to
distribution is prohibited. To learn how to join the Friends of the
attack open questions in mathematics and logic, and to solve problems in
SEP Society and obtain authorized PDF versions of SEP entries,
engineering.
please visit https://leibniz.stanford.edu/friends/ .
1. Introduction
Stanford Encyclopedia of Philosophy
Copyright c 2011 by the publisher 1.1 Problem Domain
The Metaphysics Research Lab 1.2 Language Representation
Center for the Study of Language and Information
2. Deduction Calculi
Stanford University, Stanford, CA 94305
2.1 Resolution
Automated Reasoning
Copyright c 2014 by the author 2.2 Sequent Deduction
Frederic Portoraro 2.3 Natural Deduction
All rights reserved. 2.4 The Matrix Connection Method
Copyright policy: https://leibniz.stanford.edu/friends/info/copyright/
1
Automated Reasoning Frederic Portoraro
2.5 Term Rewriting resources.

2.6 Mathematical Induction
3. Other Logics 1.1 Problem Domain
3.1 Higher-Order Logic
A first important consideration in the design of an automated reasoning
3.2 Non-classical Logics
program is to delineate the class of problems that the program will be
4. Applications
required to solve—the problem domain. The domain can be very large,
4.1 Logic Programming
as would be the case for a general-purpose theorem prover for first-order
4.2 SAT Solvers
logic, or be more restricted in scope as in a special-purpose theorem
4.3 Deductive Computer Algebra
prover for Tarski's geometry, or the modal logic K. A typical approach in
4.4 Formal Verification of Hardware
the design of an automated reasoning program is to provide it first with
4.5 Formal Verification of Software
sufficient logical power (e.g., first-order logic) and then further demarcate
4.6 Logic and Philosophy
its scope to the particular domain of interest defined by a set of domain
4.7 Mathematics
axioms. To illustrate, EQP, a theorem-proving program for equational
4.8 Artificial Intelligence
logic, was used to solve an open question in Robbins algebra (McCune
5. Conclusion
1997): Are all Robbins algebras Boolean? For this, the program was
Bibliography
provided with the axioms defining a Robbins algebra:
Academic Tools
Other Internet Resources
(A1) x + y = y + x (commutativity)
Related Entries
(A2) (x + y) + z = x + (y + z) (associativity)
(A3) −(−(x + y) + −(x + −y)) = x (Robbins equation)
1. Introduction
The program was then used to show that a characterization of Boolean
A problem being presented to an automated reasoning program consists of algebra that uses Huntington's equation,
two main items, namely a statement expressing the particular question
−(−x + y) + −(−x + −y) = x,
being asked called the problem's conclusion, and a collection of
statements expressing all the relevant information available to the program follows from the axioms. We should remark that this problem is non-
—the problem's assumptions. Solving a problem means proving the trivial since deciding whether a finite set of equations provides a basis for
conclusion from the given assumptions by the systematic application of Boolean algebra is undecidable, that is, it does not permit an algorithmic
rules of deduction embedded within the reasoning program. The problem representation; also, the problem was attacked by Robbins, Huntington,
solving process ends when one such proof is found, when the program is Tarski and many of his students with no success. The key step was to
able to detect the non-existence of a proof, or when it simply runs out of
2 Stanford Encyclopedia of Philosophy Winter 2014 Edition 3

establish that all Robbins algebras satisfy a variable, or a function whose arguments are themselves terms. For
example, a, x, f(x), and h(c,f(z),y) are all terms. A literal is either an
∃x∃y(x + y = x), atomic formula, e.g. F(x), or the negation of an atomic formula, e.g.
~R(x,f(a)). Two literals are complementary if one is the negation of the
since it was known that this formula is a sufficient condition for a Robbins
other. A clause is a (possibly empty) finite disjunction of literals l1∨ … ∨
algebra to be Boolean. When EQP was supplied with this piece of
ln where no literal appears more than once in the clause (that is, clauses
information, the program provided invaluable assistance by completing
can be alternatively treated as sets of literals). Ground terms, ground
the proof automatically.
literals, and ground clauses have no variables. The empty clause, [ ], is
A special-purpose theorem prover does not draw its main benefit by the clause having no literals and, hence, is unsatisfiable—false under any
restricting its attention to the domain axioms but from the fact that the interpretation. Some examples: ~R(a,b), and F(a) ∨ ~R(f(x),b) ∨ F(z) are
domain may enjoy particular theorem-proving techniques which can be both examples of clauses but only the former is ground. The general idea
hardwired—coded—within the reasoning program itself and which may is to be able to express a problem's formulation as a set of clauses or,
result in a more efficient logic implementation. Much of EQP's success at equivalently, as a formula in conjunctive normal form (CNF), that is, as
settling the Robbins question can be attributed to its built-in associative- a conjunction of clauses.
commutative inference mechanisms.
For formulas already expressed in standard logic notation, there is a
1.2 Language Representation systematic two-step procedure for transforming them into conjunctive
normal form. The first step consists in re-expressing a formula into a
A second important consideration in the building of an automated semantically equivalent formula in prenex normal form, (Θx1)…
reasoning program is to decide (1) how problems in its domain will be (Θxn)α( x1,…,xn), consisting of a string of quantifiers (Θx1)…(Θxn)
presented to the reasoning program; (2) how they will actually be followed by a quantifier-free expression α(x1,…,xn) called the matrix.
represented internally within the program; and, (3) how the solutions The second step in the transformation first converts the matrix into
found—completed proofs—will be displayed back to the user. There are conjunctive normal form by using well-known logical equivalences such
several formalisms available for this, and the choice is dependent on the as DeMorgan's laws, distribution, double-negation, and others; then, the
problem domain and the underlying deduction calculus used by the quantifiers in front of the matrix, which is now in conjunctive normal
reasoning program. The most commonly used formalisms include standard form, are dropped according to certain rules. In the presence of existential
first-order logic, typed λ-calculus, and clausal logic. We take up clausal quantifiers, this latter step does not always preserve equivalence and
logic here and assume that the reader is familiar with the rudiments of requires the introduction of Skolem functions whose role is to “simulate”
first-order logic; for the typed λ-calculus the reader may want to check the behaviour of existentially quantified variables. For example, applying
Church 1940. Clausal logic is a quantifier-free variation of first-order the skolemizing process to the formula
logic and has been the most widely used notation within the automated
reasoning community. Some definitions are in order: A term is a constant, ∀x∃y∀z∃u∀v[R(x,y,v) ∨ ~K(x,z,u,v)]

requires the introduction of a one-place and two-place Skolem functions, f A deduction calculus consists of a set of logical axioms and a collection of
and g respectively, resulting in the formula deduction rules for deriving new formulas from previously derived
formulas. Solving a problem in the program's problem domain then really
∀x∀z∀v[R(x,f(x),v) ∨ ~K(x,z,g(x,z),v)] means establishing a particular formula α—the problem's conclusion—
from the extended set Γ consisting of the logical axioms, the domain
The universal quantifiers can then be removed to obtain the final clause,
axioms, and the problem assumptions. That is, the program needs to
R(x,f(x),v) ∨ ~K(x,z,g(x,z),v) in our example. The Skolemizing process
determine if Γ⊨ α. How the program goes about establishing this semantic
may not preserve equivalence but maintains satisfiability, which is enough
fact depends, of course, on the calculus it implements. Some programs
for clause-based automated reasoning.
may take a very direct route and attempt to establish that Γ ⊨ α by
Although clausal form provides a more uniform and economical notation actually constructing a step-by-step proof of α from Γ. If successful, this
—there are no quantifiers and all formulas are disjunctions—it has certain shows of course that Γ derives—proves—α, a fact we denote by writing
disadvantages. One drawback is the increase in the size of the resulting Γ ⊢ α. Other reasoning programs may instead opt for a more indirect
formula when transformed from standard logic notation into clausal form. approach and try to establish that Γ ⊨ α by showing that Γ ∪ {~α} is
The increase in size is accompanied by an increase in cognitive inconsistent which, in turn, is shown by deriving a contradiction, ⊥, from
complexity that makes it harder for humans to read proofs written with the set Γ ∪ {~α}. Automated systems that implement the former approach
clauses. Another disadvantage is that the syntactic structure of a formula include natural deduction systems; the latter approach is used by systems
in standard logic notation can be used to guide the construction of a proof based on resolution, sequent deduction, and matrix connection methods.
but this information is completely lost in the transformation into clausal
Soundness and completeness are two (metatheoretical) properties of a
form.
calculus that are particularly important for automated deduction.
Soundness states that the rules of the calculus are truth-preserving. For a
2. Deduction Calculi
direct calculus this means that if Γ ⊢ α then Γ ⊨ α. For indirect calculi,
A third important consideration in the building of an automated reasoning soundness means that if Γ∪{~α} ⊢ ⊥ then Γ ⊨ α. Completeness in a
program is the selection of the actual deduction calculus that will be used direct calculus states that if Γ ⊨ α then Γ ⊢ α. For indirect calculi, the
by the program to perform its inferences. As indicated before, the choice is completeness property is expressed in terms of refutations since one
highly dependent on the nature of the problem domain and there is a fair establishes that Γ ⊨ α by showing the existence of a proof, not of α from
range of options available: General-purpose theorem proving and problem Γ, but of ⊥ from Γ∪{~α}. Thus, an indirect calculus is refutation
solving (first-order logic, simple type theory), program verification (first- complete if Γ ⊨ α implies Γ∪{~α} ⊢ ⊥. Of the two properties, soundness
order logic), distributed and concurrent systems (modal and temporal is the most desirable. An incomplete calculus indicates that there are
logics), program specification (intuitionistic logic), hardware verification entailment relations that cannot be established within the calculus. For an
(higher-order logic), logic programming (Horn logic), and so on. automated reasoning program this means, informally, that there are true
statements that the program cannot prove. Incompleteness may be an

unfortunate affair but lack of soundness is a truly problematic situation p ∨ r. More formally, let C − l denote the clause C with the literal l
since an unsound reasoning program would be able to generate false removed. Assume that C1 and C2 are ground clauses containing,
conclusions from perfectly true information. respectively, a positive literal l1 and a negative literal ~l2 such that l1 and
~l2 are complementary. Then, the rule of ground resolution states that, as
It is important to appreciate the difference between a logical calculus and a result of resolving C1 and C2, one can infer (C1 − l1) ∨ (C2 − ~l2):
its corresponding implementation in a reasoning program. The
implementation of a calculus invariably involves making some C1 C2
(ground resolution)
modifications to the calculus and this results, strictly speaking, in a new (C1 − l1) ∨ (C2 − ~l2)
calculus. The most important modification to the original calculus is the
“mechanization” of its deduction rules, that is, the specification of the Herbrand's theorem (Herbrand 1930) assures us that the non-
systematic way in which the rules are to be applied. In the process of satisfiability of any set of clauses, ground or not, can be established by
doing so, one must exercise care to preserve the metatheoretical properties using ground resolution. This is a very significant result for automated
of the original calculus. deduction since it tells us that if a set Γ is not satisfied by any of the
infinitely many interpretations, this fact can be determined in finitely many
Two other metatheoretical properties of importance to automated steps. Unfortunately, a direct implementation of ground resolution using
deduction are decidability and complexity. A calculus is decidable if it Herbrand's theorem requires the generation of a vast number of ground
admits an algorithmic representation, that is, if there is an algorithm that, terms making this approach hopelessly inefficient. This issue was
for any given Γ and α, it can determine in a finite amount of time the effectively addressed by generalizing the ground resolution rule to binary
answer, “Yes” or “No”, to the question “Does Γ ⊨ α?” A calculus may be resolution and by introducing the notion of unification (Robinson 1965a).
undecidable in which case one needs to determine which decidable Unification allows resolution proofs to be “lifted” and be conducted at a
fragment to implement. The time-space complexity of a calculus specifies more general level; clauses only need to be instantiated at the moment
how efficient its algorithmic representation is. Automated reasoning is where they are to be resolved. Moreover, the clauses resulting from the
made the more challenging because many calculi of interest are not instantiation process do not have to be ground instances and may still
decidable and have poor complexity measures forcing researchers to seek contain variables. The introduction of binary resolution and unification is
tradeoffs between deductive power versus algorithmic efficiency. considered one of the most important developments in the field of
automated reasoning.
2.1 Resolution
Unification
Of the many calculi used in the implementation of reasoning programs, the
ones based on the resolution principle have been the most popular. A unifier of two expressions—terms or clauses—is a substitution that
Resolution is modeled after the chain rule (of which Modus Ponens is a when applied to the expressions makes them equal. For example, the
special case) and essentially states that from p ∨ q and ~q ∨ r one can infer substitution σ given by

σ := {x ← b, y ← b, z ← f(a,b)} by binary resolution; the clause (C1θ − l1θ) ∨ (C2θ − ~l2θ) is called a
binary resolvent of C1 and C2.
is a unifier for
Factoring
R(x,f(a,y)) and R(b,z)
If two or more literals occurring in a clause C share an mgu θ then Cθ is a
since when applied to both expressions it makes them equal:
factor of C. For example, in R(x,a) ∨ ~K(f(x),b) ∨ R(c,y) the literals R(x,a)
R(x,f(a,y))σ = R(b,f(a,b)) and R(c,y) unify with mgu {x ← c, y ← a} and, hence, R(c,a) ∨ ~K(f(c),b)
is a factor of the original clause.
= R(b,z)σ
The Resolution Principle

A most general unifier (mgu) produces the most general instance shared
by two unifiable expressions. In the previous example, the substitution {x
Let C1and C2 be two clauses. Then, a resolvent obtained by resolution
← b, y ← b, z ← f(a,b)} is a unifier but not an mgu; however, {x ← b, z
from C1 and C2 is defined as: (a) a binary resolvent of C1 and C2; (b) a
← f(a,y)} is an mgu. Note that unification attempts to “match” two
binary resolvent of C1 and a factor of C2; (c) a binary resolvent of a factor
expressions and this fundamental process has become a central component
of C1 and C2; or, (d) a binary resolvent of a factor of C1 and a factor of
of most automated deduction programs, resolution-based and otherwise.
C2.
Theory-unification is an extension of the unification mechanism that
includes built-in inference capabilities. For example, the clauses Resolution proofs, more precisely refutations, are constructed by deriving
R(g(a,b),x) and R(g(b,a),d) do not unify but they AC-unify, where AC- the empty clause [ ] from Γ ∪ {~α} using resolution; this will always be
unification is unification with built-in associative and commutative rules possible if Γ ∪ {~α} is unsatisfiable since resolution is refutation
such as g(a,b) = g(b,a). Shifting inference capabilities into the unification complete (Robinson 1965a). As an example of a resolution proof, we
mechanism adds power but at a price: The existence of an mgu for two show that the set {∀x(P(x) ∨ Q(x)), ∀x(P(x) ⊃ R(x)),∀x(Q(x) ⊃ R(x))},
unifiable expressions may not be unique (there could actually be infinitely denoted by Γ, entails the formula ∃xR(x). The first step is to find the
many), and the unification process becomes undecidable in general. clausal form of Γ ∪ {~∃xR(x)}; the resulting clause set, denoted by S0, is
shown in steps 1 to 4 in the refutation below. The refutation is constructed
Binary resolution by using a level-saturation method: Compute all the resolvents of the
initial set, S0, add them to the set and repeat the process until the empty
Let C1 and C2 be two clauses containing, respectively, a positive literal l1
clause is derived. (This produces the sequence of increasingly larger sets:
and a negative literal ~l2 such that l1 and l2 unify with mgu θ. Then,
S0, S1, S2,…) The only constraint that we impose is that we do not resolve
C1 C2 the same two clauses more than once.
(binary resolution)
(C1θ − l1θ) ∨ (C2θ − ~l2θ)

S0 1 P(x) ∨ Q(x) Assumption at improving the efficiency of the deduction process. The above sample
2 ~P(x) ∨ R(x) Assumption derivation has 21 steps but research-type problems command derivations
3 ~Q(x) ∨ R(x) Assumption with thousands or hundreds of thousands of steps.
4 ~R(a) Negation of the conclusion
Resolution Strategies
S1 5 Q(x) ∨ R(x) Res 1 2
6 P(x) ∨ R(x) Res 1 3
The successful implementation of a deduction calculus in an automated
7 ~P(a) Res 2 4 reasoning program requires the integration of search strategies that reduce
8 ~Q(a) Res 3 4 the search space by pruning unnecessary deduction paths. Some strategies
S2 9 Q(a) Res 1 7 remove redundant clauses or tautologies as soon as they appear in a
10 P(a) Res 1 8 derivation. Another strategy is to remove more specific clauses in the
11 R(x) Res 2 6 presence of more general ones by a process known as subsumption
12 R(x) Res 3 5 (Robinson 1965a). Unrestricted subsumption, however, does not preserve
13 Q(a) Res 4 5 the refutation completeness of resolution and, hence, there is a need to
14 P(a) Res 4 6 restrict its applicability (Loveland 1978). Model elimination (Loveland
15 R(a) Res 5 8 1969) can discard a sentence by showing that it is false in some model of
16 R(a) Res 6 7 the axioms. The subject of model generation has received much attention
as a complementary process to theorem proving. The method has been
S3 17 R(a) Res 2 10
used successfully by automated reasoning programs to show the
18 R(a) Res 2 14
independence of axioms sets and to determine the existence of discrete
19 R(a) Res 3 9
mathematical structures meeting some given criteria.
20 R(a) Res 3 13
21 [] Res 4 11 Instead of removing redundant clauses, some strategies prevent the
generation of useless clauses in the first place. The set-of-support
Although the resolution proof is successful in deriving [ ], it has some strategy (Wos, Carson and Robinson 1965) is one of the most powerful
significant drawbacks. To start with, the refutation is too long as it takes strategies of this kind. A subset T of the set S, where S is initially Γ ∪
21 steps to reach the contradiction, [ ]. This is due to the naïve brute-force {~α}, is called a set of support of S iff S − T is satisfiable. Set-of-support
nature of the implementation. The approach not only generates too many resolution dictates that the resolved clauses are not both from S − T. The
formulas but some are clearly redundant. Note how R(a) is derived six motivation behind set-of-support is that since the set Γ is usually
times; also, R(x) has more “information content” than R(a) and one should satisfiable it might be wise not to resolve two clauses from Γ against each
keep the former and disregard the latter. Resolution, like all other other. Hyperresolution (Robinson 1965b) reduces the number of
automated deduction methods, must be supplemented by strategies aimed intermediate resolvents by combining several resolution steps into a single

inference step. certain aspects of the deduction process at the expense of others. For
instance, a strategy may reduce the size of the proof search space at the
Independently co-discovered, linear resolution (Loveland 1970, Luckham expense of increasing, say, the length of the shortest refutations. A
1970) always resolves a clause against the most recently derived resolvent. taxonomy and detailed presentation of theorem-proving strategies can be
This gives the deduction a simple “linear” structure affording a found in (Bonacina 1999).
straightforward implementation; yet, linear resolution preserves refutation
completeness. Using linear resolution we can derive the empty clause in There are several automated reasoning programs that are based on
the above example in only eight steps: resolution, or refinements of resolution. Prover4 (formerly Otter) is one of
the most versatile among these programs and is being used in a growing
1 P(x) ∨ Q(x) Assumption number of applications (Wos, Overbeek, Lusk and Boyle 1984).
2 ~P(x) ∨ R(x) Assumption Resolution also provides the underlying logico-computational mechanism
3 ~Q(x) ∨ R(x) Assumption for the popular logic programming language Prolog (Clocksin and Mellish
4 ~R(a) Negation of the conclusion 1981).
5 ~P(a) Res 2 4
6 Q(a) Res 1 5 2.2 Sequent Deduction
7 R(a) Res 3 6 Hilbert-style calculi (Hilbert and Ackermann 1928) have been traditionally
8 [] Res 4 7 used to characterize logic systems. These calculi usually consist of a few
axiom schemata and a small number of rules that typically include modus
With the exception of unrestricted subsumption, all the strategies
ponens and the rule of substitution. Although they meet the required
mentioned so far preserve refutation completeness. Efficiency is an
theoretical requisites (soundness, completeness, etc.) the approach at proof
important consideration in automated reasoning and one may sometimes
construction in these calculi is difficult and does not reflect standard
be willing to trade completeness for speed. Unit resolution and input
practice. It was Gentzen's goal “to set up a formalism that reflects as
resolution are two such refinements of linear resolution. In the former,
accurately as possible the actual logical reasoning involved in
one of the resolved clauses is always a literal; in the latter, one of the
mathematical proofs” (Gentzen 1935). To carry out this task, Gentzen
resolved clauses is always selected from the original set to be refuted.
analyzed the proof-construction process and then devised two deduction
Albeit efficient, neither strategy is complete. Ordering strategies impose
calculi for classical logic: the natural deduction calculus, NK, and the
some form of partial ordering on the predicate symbols, terms, literals, or
sequent calculus, LK. (Gentzen actually designed NK first and then
clauses occurring in the deduction. Ordered resolution treats clauses not
introduced LK to pursue metatheoretical investigations). The calculi met
as sets of literals but as sequences—linear orders—of literals. Ordered
his goal to a large extent while at the same time managing to secure
resolution is extremely efficient but, like unit and input resolution, is not
soundness and completeness. Both calculi are characterized by a relatively
refutation complete. To end, it must be noted that some strategies improve
larger number of deduction rules and a simple axiom schema. Of the two

calculi, LK is the one that has been most widely used in implementations Axioms Cut Rule
of automated reasoning programs, and it is the one that we will discuss
first; NK will be discussed in the next section. Γ → Δ, α α, λ → Σ
Γ, α → Δ, α Γ, λ → Δ, Σ
Although the application of the LK rules affect logic formulas, the rules
Antecedent Rules (Θ→) Succedent Rules (→Θ)
are seen as manipulating not logic formulas themselves but sequents.
Sequents are expressions of the form Γ → Δ, where both Γ and Δ are Γ, α, β → Δ Γ → Δ, α Γ → Δ, β
&→ →&
(possibly empty) sets of formulas. Γ is the sequent's antecedent and Δ its Γ, α & β → Δ Γ → Δ, α & β
succedent. Sequents can be interpreted thus: Let I be an interpretation.
Γ, α → Δ Γ, β → Δ Γ → Δ, α, β
Then, ∨→ →∨
Γ, α ∨ β → Δ Γ → Δ, α ∨ β
I satisfies the sequent Γ → Δ (written as: I ⊨ Γ→Δ) iff Γ → Δ, α Γ, β → Δ Γ, α → Δ, β
either I ⊭ α (for some α ∈ Γ) or I ⊨ β (for some β ∈ Δ). ⊃→ →⊃
Γ, α ⊃ β → Δ Γ → Δ, α ⊃ β
In other words, Γ, α, β → Γ → Δ, α, Γ, α → Δ, Γ, β, → Δ,
⊃≡ Δ β ≡⊃ β α
I ⊨Γ → Δ iff I ⊨ (α1 & … & αn) ⊃ (β1 ∨ … ∨ βn), where α1 & Γ, α ≡ β → Δ Γ → Δ, α ≡ β
… & αn is the iterated conjunction of the formulas in Γ and β1 ∨
… ∨ βn is the iterated disjunction of those in Δ. Γ → Δ, α Γ, α → Δ
~→ →~
Γ, ~α → Δ Γ → Δ, ~α
If Γ or Δ are empty then they are respectively valid or unsatisfiable. An
Γ, α(a/x) → Δ Γ → Δ, α(t/x), ∃xα(x)
axiom of LK is a sequent Γ → Δ where Γ ∩ Δ ≠ ∅. Thus, the ∃→ →∃
Γ, ∃xα(x) → Δ Γ → Δ, ∃xα(x)
requirement that the same formula occurs at each side of the → sign
means that the axioms of LK are valid, for no interpretation can then make Γ, α(t/x), ∀xα(x) → Δ Γ → Δ, α(a/x)
∀→ →∀
all the formulas in Γ true and, simultaneously, make all those in Δ false. Γ, ∀xα(x) → Δ Γ → Δ, ∀xα(x)
LK has two rules per logical connective, plus one extra rule: the cut rule.
The sequents above a rule's line are called the rule's premises and the
sequent below the line is the rule's conclusion. The quantification rules
∃→ and →∀ have an eigenvariable condition that restricts their
applicability, namely that a must not occur in Γ, Δ or in the quantified
sentence. The purpose of this restriction is to ensure that the choice of
parameter, a, used in the substitution process is completely “arbitrary”.

Proofs in LK are represented as trees where each node in the tree is As it stands, LK is unsuitable for automated deduction and there are some
labeled with a sequent, and where the original sequent sits at the root of obstacles that must be overcome before it can be efficiently implemented.
the tree. The children of a node are the premises of the rule being applied The reason is, of course, that the statement of the completeness of LK only
at that node. The leaves of the tree are labeled with axioms. Here is the has to assert, for each entailment relation, the existence of a proof tree but
LK-proof of ∃xR(x) from the set {∀x(P(x) ∨ Q(x)), ∀x(P(x) ⊃ a reasoning program has the more difficult task of actually having to
R(x)),∀x(Q(x) ⊃ R(x))}. In the tree below, Γ stands for this set: construct one. Some of the main obstacles: First, LK does not specify the
order in which the rules must be applied in the construction of a proof tree.
Γ,P(a) → Γ,P(a),R(a) → Γ,Q(a) → Γ,Q(a),R(a) → Second, and as a particular case of the first problem, the premises in the
P(a),R(a),∃xR(x) R(a),∃xR(x) Q(a),R(a),∃xR(x) R(a),∃xR(x) rules ∀→ and →∃ rules inherit the quantificational formula to which the
Γ,P(a),P(a) ⊃ R(a) → R(a),∃xR(x) Γ,Q(a),Q(a) ⊃ R(a) → R(a),∃xR(x) rule is applied, meaning that the rules can be applied repeatedly to the
Γ,P(a) → R(a),∃xR(x) Γ,Q(a) → R(a),∃xR(x) same formula sending the proof search into an endless loop. Third, LK
Γ,P(a) ∨ Q(a) → R(a),∃xR(x)
does not indicate which formula must be selected next in the application of
a rule. Fourth, the quantifier rules provide no indication as to what terms
Γ → R(a),∃xR(x)
or free variables must be used in their deployment. Fifth, and as a
Γ → ∃xR(x)
particular case of the previous problem, the application of a quantifier rule
can lead into an infinitely long tree branch because the proper term to be
In our example, all the leaves in the proof tree are labeled with axioms.
used in the instantiation never gets chosen. Fortunately, as we will hint at
This establishes the validity of Γ → ∃xR(x) and, hence, the fact that
below each of these problems can be successfully addressed.
Γ ⊨ ∃xR(x). LK takes an indirect approach at proving the conclusion and
this is an important difference between LK and NK. While NK constructs Axiom sequents in LK are valid, and the conclusion of a rule is valid iff its
an actual proof (of the conclusion from the given assumptions), LK premises are. This fact allows us to apply the LK rules in either direction,
instead constructs a proof that proves the existence of a proof (of the forwards from axioms to conclusion, or backwards from conclusion to
conclusion from the assumptions). For instance, to prove that α is entailed axioms. Also, with the exception of the cut rule, all the rules' premises are
by Γ, NK constructs a step-by-step proof of α from Γ (assuming that one subformulas of their respective conclusions. For the purposes of
exists); in contrast, LK first constructs the sequent Γ → α which then automated deduction this is a significant fact and we would want to
attempts to prove valid by showing that it cannot be made false. This is dispense with the cut rule; fortunately, the cut-free version of LK
done by searching for a counterexample that makes (all the sentences in) Γ preserves its refutation completeness (Gentzen 1935). These results
true and makes α false: If the search fails then a counterexample does not provide a strong case for constructing proof trees in a backwards fashion;
exist and the sequent is therefore valid. In this respect, proof trees in LK indeed, by working this way a refutation in cut-free LK gets increasingly
are actually refutation proofs. Like resolution, LK is refutation complete: simpler as it progresses since subformulas are simpler than their parent
If Γ ⊨ α then the sequent Γ → α has a proof tree. formulas. Moreover, and as far as propositional rules go, the new
subformulas entered into the tree are completely dictated by the cut-free

LK rules. Furthermore, and assuming the proof tree can be brought to understood context, but they are also commonly given in the literature as
completion, branches eventually end up with atoms and the presence of acting more explicitly on “judgements”, that is, expressions of the form
axioms can be quickly determined. Another reason for working backwards Γ ⊢ α where Γ is a set of formulas and α is a formula. This form is
is that the truth-functional fragment of cut-free LK is confluent in the typically understood as making the metastatement that there is a proof of α
sense that the order in which the non-quantifier rules are applied is from Γ (Kleene 1962). Following Gentzen 1935 and Prawitz 1965 here we
irrelevant: If there is a proof, regardless of what you do, you will run into take the former approach. The system NK has no logical axioms and
it! To bring the quantifier rules into the picture, things can be arranged so provides two introduction-elimination rules for each logical connective:
that all rules have a fair chance of being deployed: Apply, as far as
possible, all the non-quantifier rules before applying any of the quantifier Introduction Rules (ΘI) Elimination Rules (ΘE)
rules. This takes care of the first and second obstacles, and it is no too
α β α1 & α2
difficult to see how the third one would now be handled. The fourth and &I &E
α&β αi (for i = 1,2)
fifth obstacles can be addressed by requiring that the terms to be used in
the substitutions be suitably selected from the Herbrand universe αi (for i = 1,2) α∨β [α — γ] [β — γ]
∨I ∨E
(Herbrand 1930). α1 ∨ α2 γ
The use of sequent-type calculi in automated theorem proving was [α — β] α α⊃β

⊃I ⊃E
initiated by efforts to mechanize mathematics (Wang 1960). At the time, α⊃β β
resolution captured most of the attention of the automated reasoning
[α — β] [β — α] αi (i = 0,1) α0 ≡ α1
community but during the 1970's some researchers started to further ≡I ≡E
investigate non-resolution methods (Bledsoe 1977), prompting a frutiful α≡β α1−i
and sustained effort to develop more human-oriented theorem proving [α — ⊥] [~α — ⊥]
~I ~E
systems (Bledsoe 1975, Nevins 1974). Eventually, sequent-type deduction ~α α
gained momentum again, particularly in its re-incarnation as analytic
α(t/x) ∃xα(x) [α(a/x) — β]
tableaux (Fitting 1990). The method of deduction used in tableaux is ∃I ∃E
essentially cut-free LK's with sets used in lieu of sequents. ∃xα(x) β
α(a/x) ∀xα(x)
2.3 Natural Deduction ∀I ∀E
∀xα(x) α(t/x)
Although LK and NK are both commonly labeled as “natural deduction”
A few remarks: First, the expression [α — γ] represents the fact that α is
systems, it is the latter which better deserves the title due to its more
an auxiliary assumption in the proof of γ that eventually gets discharged,
natural, human-like, approach to proof construction. The rules of NK are
i.e. discarded. For example, ∃E tells us that if in the process of
typically presented as acting on standard logic formulas in an implicitly

constructing a proof one has already derived ∃xα(x) and also β with α(a/x) assumption that α. But note that although the goal α ⊃ β does not match
as an auxiliary assumption then the inference to β is allowed. Second, the the conclusion of any other introduction rule, it matches the conclusion of
eigenparameter, a, in ∃E and ∀I must be foreign to the premises, all elimination rules and the reasoning program would need to consider
undischarged—“active”—assumptions, to the rule's conclusion and, in the those routes too. Similarly to forward-chaining, here there is the risk of
case of ∃E, to ∃xα(x). Third, ⊥ is shorthand for two contradictory setting goals that are irrelevant to the proof and that could lead the
formulas, β and ~β. Finally, NK is complete: If Γ ⊨ α then there is a program astray. To wit: What prevents a program from entering the never-
proof of α from Γ using the rules of NK. ending process of building, say, larger and larger conjunctions? Or, what
is there to prevent an uncontrolled chain of backward applications of, say,
As in LK, proofs constructed in NK are represented as trees with the ⊃-Elimination? Fortunately, NK enjoys the subformula property in the
proof's conclusion sitting at the root of the tree, and the problem's sense that each formula entering into a natural deduction proof can be
assumptions sitting at the leaves. (Proofs are also typically given as restricted to being a subformula of Γ ∪ Δ ∪ {α}, where Δ is the set of
sequences of judgements, Γ ⊢ α, running from the top to the bottom of the auxiliary assumptions made by the ~-Elimination rule. By exploiting the
printed page.) Here is a natural deduction proof tree of ∃xR(x) from subformula property a natural deduction automated theorem prover can
∀x(P(x) ∨ Q(x)), ∀x(P(x) ⊃ R(x)) and ∀x(Q(x) ⊃ R(x)): drastically reduce its search space and bring the backward application of
the elimination rules under control (Portoraro 1998, Sieg and Byrnes
∀x(P(x)⊃R(x)) [P(a) ∀x(Q(x)⊃R(x)) [Q(a)
1996). Further gains can be realized if one is willing to restrict the scope
∀x(P(x)∨Q(x)) P(a)⊃R(a) —R(a)] Q(a)⊃R(a) —R(a)]
of NK's logic to its intuitionistic fragment where every proof has a normal
P(a)∨Q(a) R(a) R(a) form in the sense that no formula is obtained by an introduction rule and
R(a) then is eliminated by an elimination rule (Prawitz 1965).
∃xR(x)
Implementations of automated theorem proving systems using NK
As in LK, a forward-chaining strategy for proof construction is not well deduction have been motivated by the desire to have the program reason
focused. So, although proofs are read forwards, that is, from leaves to root with precisely the same proof format and methods employed by the human
or, logically speaking, from assumptions to conclusion, that is not the way user. This has been particularly true in the area of education where the
in which they are typically constructed. A backward-chaining strategy student is engaged in the interactive construction of formal proofs in an
implemented by applying the rules in reverse order is more effective. NK-like calculus working under the guidance of a theorem prover ready to
Many of the obstacles that were discussed above in the implementation of provide assistance when needed (Portoraro 1994, Suppes 1981). Other,
sequent deduction are applicable to natural deduction as well. These issues research-oriented, theorem provers true to the spirit of NK exist (Pelletier
can be handled in a similar way, but natural deduction introduces some 1998) but are rare.
issues of its own. For example, as suggested by the ⊃-Introduction rule, to
2.4 The Matrix Connection Method
prove a goal of the form α ⊃ β one could attempt to prove β on the

The name of the matrix connection method (Bibel 1981) is indicative of Path 1 P, ~P, ~Q and ~R
the way it operates. The term “matrix” refers to the form in which the set
of logic formulas expressing the problem is represented; the term Path 2 P, ~P, R and ~R
“connection” refers to the way the method operates on these formulas. To Path 3 P, R, ~Q and ~R
illustrate the method at work, we will use an example from propositional
Path 4 P, R, R and ~R
logic and show that R is entailed by P ∨ Q, P ⊃ R and Q ⊃ R. This is done
by establishing that the formula Path 5 Q, ~P, ~Q and ~R
(P ∨ Q) & (P ⊃ R) & (Q ⊃ R) & ~R Path 6 Q, ~P, R and ~R
Path 7 Q, R, ~Q and ~R
is unsatisfiable. To do this, we begin by transforming it into conjunctive
normal form: Path 8 Q, R, R and ~R
(P ∨ Q) & (~P ∨ R) & (~Q ∨ R) & ~R A path is complementary if it contains two literals which are
complementary. For example, Path 2 is complementary since it has both P
This formula is then represented as a matrix, one conjunct per row and,
and ~P but so is Path 6 since it contains both R and ~R. Note that as soon
within a row, one disjunct per column:
as a path includes two complementary literals there is no point in pursuing
the path since it has itself become complementary. This typically allows
P Q
for a large reduction in the number of paths to be inspected. In any event,
~P R all the paths in the above matrix are complementary and this fact
~Q R establishes the unsatisfiability of the original formula. This is the essence
of the matrix connection method. The method can be extended to predicate
~R logic but this demands additional logical apparatus: Skolemnization,
variable renaming, quantifier duplication, complementarity of paths via
The idea now is to explore all the possible vertical paths running through unification, and simultaneous substitution across all matrix paths (Bibel
this matrix. A vertical path is a set of literals selected from each row in 1981, Andrews 1981). Variations of the method have been implemented in
the matrix such that each literal comes from a different row. The vertical reasoning programs in higher-order logic (Andrews 1981) and non-
paths: classical logics (Wallen 1990).
2.5 Term Rewriting
Equality is an important logical relation whose behavior within automated

deduction deserves its own separate treatment. Equational logic and, ⇒ (x × 1) + (1 × x) by R1

more generally, term rewriting treat equality-like equations as rewrite
rules, also known as reduction or demodulation rules. An equality At this point, since none of the rules (R1)–(R4) applies, no further
statement like f(a)= a allows the simplification of a term like g(c,f(a)) to reduction is possible and the rewriting process ends. The final expression
g(c,a). However, the same equation also has the potential to generate an obtained is called a normal form, and its existence motivates the
unboundedly large term: g(c,f(a)), g(c,f(f(a))), g(c,f(f(f(a)))), … . What following question: Is there an expression whose reduction process will
distinguishes term rewriting from equational logic is that in term rewriting never terminate when applying the rules (R1)–(R4)? Or, more generally:
equations are used as unidirectional reduction rules as opposed to equality Under what conditions a set of rewrite rules will always stop, for any
which works in both directions. Rewrite rules have the form t1 ⇒ t2 and given expression, at a normal form after finitely many applications of the
the basic idea is to look for terms t occurring in expressions e such that t rules? This fundamental question is called the termination problem of a
unifies with t1 with unifier θ so that the occurrence t1θ in eθ can be rewrite system, and we state without proof that the system (R1)–(R4)
replaced by t2θ. For example, the rewrite rule x + 0 ⇒ x allows the meets the termination condition.
rewriting of succ(succ(0) + 0) as succ(succ(0)).
There is the possibility that when reducing an expression, the set of rules
To illustrate the main ideas in term rewriting, let us explore an example of a rewrite system could be applied in more than one way. This is
involving symbolic differentiation (the example and ensuing discussion actually the case in the system (R1)–(R4) where in the reduction of der(x
are adapted from Chapter 1 of Baader and Nipkow 1998). Let der denote × x) we could have applied R1 first to the second sub-expression in (x ×
the derivative respect to x, let y be a variable different from x, and let u der(x)) + (der(x) × x), as shown below:
and v be variables ranging over expressions. We define the rewrite system:
der(x × x) ⇒ (x × der(x)) + (der(x) × x) by R4
R1: der(x) ⇒ 1 ⇒ (x × der(x)) + (1 × x) by R1
R2: der(y) ⇒ 0 ⇒ (x × 1) + (1 × x) by R1
R3: der(u + v) ⇒ der(u) + der(v)
Following this alternative course of action, the reduction terminates with
R4: der(u × v) ⇒ (u × der(v)) + (der(u) × v)
the same normal form as in the previous case. This fact, however, should
Again, the symbol ⇒ indicates that a term matching the left-hand side of a not be taken for granted: A rewriting system is said to be (globally)
rewrite rule should be replaced by the rule's right-hand side. To see the confluent if and only if independently of the order in which its rules are
differentiation system at work, let us compute the derivative of x × x applied every expression always ends up being reduced to its one and only
respect to x, der(x × x): normal form. It can be shown that (R1)–(R4) is confluent and, hence, we
are entitled to say: “Compute the derivative of an expression” (as opposed
der(x × x) ⇒ (x × der(x)) + (der(x) × x) by R4 to simply “a” derivative). Adding more rules to a system in an effort to
⇒ (x × 1) + (der(x) × x) by R1 make it more practical can have undesired consequences. For example, if

we add the rule theorem-proving that uses annotations to selectively restrict the rewriting
process. The superposition calculus is a calculus of equational first-order
R5: u + 0 ⇒ u logic that combines notions from first-order resolution and Knuth-Bendix
ordering equality. Superposition is refutation complete (Bachmair and
to (R1)–(R4) then we will be able to further reduce certain expressions but Ganzinger 1994) and is at the heart of a number of theorem provers, most
at the price of losing confluency. The following reductions show that der(x notably the E equational theorem prover (Schulz 2004) and Vampire
+ 0) now has two normal forms: the computation (Voronkov 1995).
der(x + 0) ⇒ der(x) + der(0) by R3 2.6 Mathematical Induction

⇒ 1 + der(0) by R1
Mathematical induction is a very important technique of theorem proving
gives one normal form, and in mathematics and computer science. Problems stated in terms of objects
or structures that involve recursive definitions or some form of repetition
der(x + 0) ⇒ der(x) by R5 invariably require mathematical induction for their solving. In particular,
⇒1 by R1 reasoning about the correctness of computer systems requires induction
and an automated reasoning program that effectively implements
gives another. Adding the rule induction will have important applications.
R6: der(0) ⇒ 0 To illustrate the need for mathematical induction, assume that a property φ
is true of the number zero and also that if true of a number then is true of
would allow the further reduction of 1 + der(0) to 1 + 0 and then, by R5,
its successor. Then, with our deductive systems, we can deduce that for
to 1. Although the presence of this new rule actually increases the number
any given number n, φ is true of it, φ(n). But we cannot deduce that φ is
of alternative paths—der(x + 0) can now be reduced in four possible ways
true of all numbers, ∀xφ(x); this inference step requires the rule of
—they all end up with the same normal form, namely 1. This is no
mathematical induction:
coincidence as it can be shown that (R6) actually restores confluency. This
motivates another fundamental question: Under what conditions can a α(0) [α(n) — α(succ(n))]
(mathematical induction)
non-confluent system be made into an equivalent confluent one? The ∀xα(x)
Knuth-Bendix completion algorithm (Knuth and Bendix 1970) gives a
partial answer to this question. In other words, to prove that ∀xα(x) one proves that α(0) is the case, and
that α(succ(n)) follows from the assumption that α(n). The
Term rewriting, like any other automated deduction method, needs implementation of induction in a reasoning system presents very
strategies to direct its application. Rippling (Bundy, Stevens and Harmelen challenging search control problems. The most important of these is the
1993, Basin and Walsh 1996) is a heuristic that has its origins in inductive

ability to determine the particular way in which induction will be applied closer in spirit to actual mathematical reasoning. For example, the notion
during the proof, that is, finding the appropriate induction schema. Related of set finiteness cannot be expressed as a first-order concept. Due to its
issues include selecting the proper variable of induction, and recognizing richer expressiveness, it should not come as a surprise that implementing
all the possible cases for the base and the inductive steps. an automated theorem prover for higher-order logic is more challenging
than for first-order logic. This is largely due to the fact that unification in
Nqthm (Boyer and Moore 1979) has been one of the most successful higher-order logic is more complex than in the first-order case: unifiable
implementations of automated inductive theorem proving. In the spirit of terms do not always posess a most general unifier, and higher-order
Gentzen, Boyer and Moore were interested in how people prove theorems unification is itself undecidable. Finally, given that higher-order logic is
by induction. Their theorem prover is written in the functional incomplete, there are always proofs that will be entirely out of reach for
programming language Lisp which is also the language in which theorems any automated reasoning program.
are represented. For instance, to express the commutativity of addition the
user would enter the Lisp expression (EQUAL (PLUS X Y) (PLUS Y Methods used to automate first-order deduction can be adapted to higher-
X)). Everything defined in the system is a functional term, including its order logic. TPS (Andrews et al. 1996) is a theorem proving system for
basic “predicates”: T, F, EQUAL X Y, IF X Y Z, AND, NOT, etc. The higher-order logic that uses Church's typed λ-calculus as its logical
program operates largely as a black box, that is, the inner working details representation language and is based on a connection-type deduction
are hidden from the user; proofs are conducted by rewriting terms that mechanism that incorporates Huet's unification algorithm (Huet 1975). As
posses recursive definitions, ultimately reducing the conclusion's a sample of the capabilities of TPS, the program has proved automatically
statement to the T predicate. The Boyer-Moore theorem prover has been that a subset of a finite set is finite, the equivalence among several
used to check the proofs of some quite deep theorems (Boyer, Kaufmann, formulations of the Axiom of Choice, and Cantor's Theorem that a set has
and Moore 1995). Lemma caching, problem statement generalization, and more subsets than members. The latter was proved by the program by
proof planning are techniques particularly useful in inductive theorem asserting that there is no onto function from individuals to sets of
proving (Bundy, Harmelen and Hesketh 1991). individuals, with the proof proceeding by a diagonal argument. HOL
(Gordon and Melham 1993) is another higher-order proof development
3. Other Logics system primarily used as an aid in the development of hardware and
software safety-critical systems. HOL is based on the LCF approach to
3.1 Higher-Order Logic interactive theorem proving (Gordon, Milner and Wadsworth 1979), and it
is built on the strongly typed functional programming language ML. HOL,
Higher-order logic differs from first-order logic in that quantification over
like TPS, can operate in automatic and interactive mode. Availability of
functions and predicates is allowed. The statement “Any two people are
the latter mode is welcomed since the most useful automated reasoning
related to each other in one way or another” can be legally expressed in
systems may well be those which place an emphasis on interactive
higher-order logic as ∀x∀y∃RR(x,y) but not in first-order logic. Higher-
theorem proving (Farmer, Guttman and Thayer 1993) and can be used as
order logic is inherently more expressive than first-order logic and is
assistants operating under human guidance. (Harrison 2001) discusses the

verification of floating-point algorithms and the non-trivial mathematical safe to say that this result would have been impossible to obtain without
properties that are proved by HOL under the guidance of the user. Isabelle the assistance of an automated reasoning program.
(Paulson 1994) is a generic, higher-order, framework for rapid prototyping
of deductive systems. Object logics can be formulated within Isabelle's There have been three basic approaches to automate the solving of
metalogic by using its many syntactic and deductive tools. Isabelle also problems in non-classical logic (McRobie 1991). One approach has been,
provides some ready-made theorem proving environments, including of course, to try to mechanize the non-classical deductive calculi. Another
Isabelle/HOL, Isabelle/ZF and Isabelle/FOL, which can be used as starting has been to simply provide an equivalent formulation of the problem in
points for applications and further development by the user. Isabelle/ZF first-order logic and let a classical theorem prover handle it. A third
has been used to prove equivalent formulations of the Axiom of Choice, approach has been to formulate the semantics of the non-classical logic in
formulations of the Well-Ordering Principle, as well as the key result a first-order framework where resolution or connection-matrix methods
about cardinal arithmetic that, for any infinite cardinal κ, κ · κ = κ would apply.
(Paulson and Grabczewski 1996).
Modal logic
3.2 Non-classical Logics
Modal logics find extensive use in computing science as logics of
Non-classical logics (Haack 1978) such as modal logics, intuitionsitic knowledge and belief, logics of programs, and in the specification of
logic, multi-valued logics, autoepistemic logics, non-monotonic reasoning, distributed and concurrent systems. Thus, a program that automates
commonsense and default reasoning, relevance logic, paraconsistent logic, reasoning in a modal logic such as K, K4, T, S4, or S5 would have
and so on, have been increasingly gaining the attention of the automated important applications. With the exception of S5, these logics share some
reasoning community. One of the reasons has been the natural desire to of the important metatheoretical results of classical logic, such as cut-
extend automated deduction techniques to new domains of logic. Another elimination, and hence cut-free (modal) sequent calculi can be provided
reason has been the need to mechanize non-classical logics as an attempt for them, along with techniques for their automation. Connection methods
to provide a suitable foundation for artificial intelligence. A third reason (Andrews 1981, Bibel 1981) have played an important role in helping to
has been the desire to attack some problems that are combinatorially too understand the source of redundancies in the search space induced by
large to be handled by paper and pencil. Indeed, some of the work in these modal sequent calculi and have provided a unifying framework not
automated non-classical logic provides a prime example of automated only for modal logics but also for intuitionistic and classical logic as well
reasoning programs at work. To illustrate, the Ackerman Constant (Wallen 1990).
Problem asks for the number of non-equivalent formulas in the relevance
logic R. There are actually 3,088 such formulas (Slaney 1984) and the Intuitionistic logic
number was found by “sandwiching” it between a lower and an upper
There are different ways in which intuitionsitic logic can be automated.
limit, a task that involved constraining a vast universe of 20400 20-element
One is to directly implement the intuitionistic versions of Gentzen's
models in search of those models that rejected non-theorems in R. It is

sequent and natural deduction calculi, LJ and NJ respectively. This SLD-resolution is a variation of linear input resolution that incorporates a
approach inherits the stronger normalization results enjoyed by these special rule for selecting the next literal to be resolved upon; SLD-
calculi allowing for a more compact mechanization than their classical resolution also takes into consideration the fact that, in the computer's
counterparts. Another approach at mechanizing intuitionistic logic is to memory, the literals in a clause are actually ordered, that is, they form a
exploit its semantic similarities with the modal logic S4 and piggy back on sequence as opposed to a set. A Prolog program consists of clauses
an automated implementation of S4. Automating intuitionistic logic has stating known facts and rules. For example, the following clauses make
applications in software development since writing a program that meets a some assertions about flight connections:
specification corresponds to the problem of proving the specification
within an intuitionistic logic (Martin-Löf 1982). A system that automated flight(toronto, london).
the proof construction process would have important applications in flight(london,rome).
algorithm design but also in constructive mathematics. Nuprl (Constable et flight(chicago,london).
al. 1986) is a computer system supporting a particular mathematical flight(X,Y) :– flight(X,Z) , flight(Z,Y).
theory, namely constructive type theory, and whose aim is to provide
The clause flight(toronto, london) is a fact while flight(X,Y) :– flight(X,Z) ,
assistance in the proof development process. The focus is on logic-based
flight(Z,Y) is a rule, written by convention as a reversed conditional (the
tools to support programming and on implementing formal computational
symbol “:–” means “if”; the comma means “and”; terms starting in
mathematics. Over the years the scope of the Nuprl project has expanded
uppercase are variables). The former states that there is flight connection
from “proofs-as-programs” to “systems-as-theories”.
between Toronto and London; the latter states that there is a flight between
cities X and Y if, for some city Z, there is a flight between X and Z and one
4. Applications between Z and Y. Clauses in Prolog programs are a special type of Horn
clauses having precisely one positive literal: Facts are program clauses
4.1 Logic Programming
with no negative literals while rules have at least one negative literal.
Logic programming, particularly represented by the language Prolog (Note that in standard clause notation the program rule in the previous
(Colmerauer et al. 1973), is probably the most important and widespread example would be written as flight(X,Y) ∨ ~flight(X,Z) ∨ ~flight(Z,Y).)
application of automated theorem proving. During the early 1970s, it was The specific form of the program rules is to effectively express statements
discovered that logic could be used as a programming language (Kowalski of the form: “If these conditions over here are jointly met then this other
1974). What distinguishes logic programming from other traditional forms fact will follow”. Finally, a goal is a Horn clause with no positive literals.
of programming is that logic programs, in order to solve a problem, do not The idea is that, once a Prolog program Π has been written, we can then
explicitly state how a specific computation is to be performed; instead, a try to determine if a new clause γ, the goal, is entailed by Π, Π ⊨ γ; the
logic program states what the problem is and then delegates the task of Prolog prover does this by attempting to derive a contradiction from Π ∪
actually solving it to an underlying theorem prover. In Prolog, the theorem {~γ}. We should remark that program facts and rules alone cannot
prover is based on a refinement of resolution known as SLD-resolution. produce a contradiction; a goal must enter into the process. Like input

resolution, SLD-resolution is not refutation complete for first-order logic between Toronto and Rome but it can also come up with an actual
but it is complete for the Horn logic of Prolog programs. The fundamental itinerary, Toronto-London-Rome, by extracting it from the unifications
theorem: If Π is a Prolog program and γ is the goal clause then Π ⊨ γ iff used in the proof.
Π ∪ {~γ} ⊢ [ ] by SLD-resolution (Lloyd 1984).
There are at least two broad problems that Prolog must address in order to
For instance, to find out if there is a flight from Toronto to Rome one asks achieve the ideal of a logic programming language. Logic programs
the Prolog prover to see if the clause flight(toronto, rome) follows from consist of facts and rules describing what is true; anything that is not
the given program. To do this, the prover adds ~flight(toronto,rome) to the provable from a program is deemed to be false. In regards to our previous
program clauses and attempts to derive the empty clause, [ ], by SLD- example, flight(toronto, boston) is not true since this literal cannot be
resolution: deduced from the program. The identification of falsity with non-
provability is further exploited in most Prolog implementations by
1 flight(toronto,london) Program clause incorporating an operator, not, that allows programmers to explicitly
2 flight(london,rome) Program clause express the negation of literals (or even subclauses) within a program. By
3 flight(chicago,london) Program clause definition, not l succeeds if the literal l itself fails to be deduced. This
4 flight(X,Y) ∨ ~flight(X,Z) ∨ Program clause mechanism, known as negation-by-failure, has been the target of
~flight(Z,Y) criticism. Negation-by-failure does not fully capture the standard notion of
5 ~flight(toronto,rome) Negation of the negation and there are significant logical differences between the two.
conclusion Standard logic, including Horn logic, is monotonic which means that
6 ~flight(toronto,Z) ∨ ~flight(Z,rome) Res 5 4 enlarging an axiom set by adding new axioms simply enlarges the set of
7 ~flight(london,rome) Res 6 1 theorems derivable from it; negation-by-failure, however, is non-
8 [] Res 7 2 monotonic and the addition of new program clauses to an existing Prolog
program may cause some goals to cease from being theorems. A second
The conditional form of rules in Prolog programs adds to their readability issue is the control problem. Currently, programmers need to provide a
and also allows reasoning about the underlying refutations in a more fair amount of control information if a program is to achieve acceptable
friendly way: To prove that there is a flight between Toronto and Rome, levels of efficiency. For example, a programmer must be careful with the
flight(toronto,rome), unify this clause with the consequent flight(X,Y) of order in which the clauses are listed within a program, or how the literals
the fourth clause in the program which itself becomes provable if both are ordered within a clause. Failure to do a proper job can result in an
flight(toronto,Z) and flight(Z,rome) can be proved. This can be seen to be inefficient or, worse, non-terminating program. Programmers must also
the case under the substitution {Z ← london} since both embed hints within the program clauses to prevent the prover from
flight(toronto,london) and flight(london,rome) are themselves provable. revisiting certain paths in the search space (by using the cut operator) or to
Note that the theorem prover not only establishes that there is a flight prune them altogether (by using fail). Last but not least, in order to
improve their efficiency, many implementations of Prolog do not

implement unification fully and bypass a time-consuming yet critical test of great practical importance such as graph-theoretic problems, network
—the so-called occurs-check—responsible for checking the suitability of design, storage and retrieval, scheduling, program optimization, and many
the unifiers being computed. This results in an unsound calculus and may others (Garey and Johnson 1979) can be expressed as SAT instances, i.e.
cause a goal to be entailed by a Prolog program (from a computational as the SAT question of some propositional formula representing the
point of view) when in fact it should not (from a logical point of view). problem. Given that SAT is NP-complete (Cook 1971) it is very unlikely
that a polynomial algorithm exists for it; however, this does not preclude
There are variations of Prolog intended to extend its scope. By the existence of sufficiently efficient algorithms for particular cases of
implementing a model elimination procedure, the Prolog Technology SAT problems.
Theorem Prover (PPTP) (Stickel 1992) extends Prolog into full first-order
logic. The implementation achieves both soundness and completeness. The Davis-Putnam-Logemann-Loveland (DPLL) algorithm was one of
Moving beyond first-order logic, λProlog (Miller and Nadathur 1988) the first SAT search algorithms (Davis and Putnam 1960; Davis,
bases the language on higher-order constructive logic. Logemman and Loveland 1962) and is still considered one of the best
complete SAT solvers; many of the complete SAT procedures in existence
4.2 SAT Solvers today can be considered optimizations and generalizations of DPLL. In
essence, DPLL search procedures proceed by considering ways in which
The problem of determining the satisfiability of logic formulas has
assignments can be chosen to make the original formula true. For
received much attention by the automated reasoning community due to its
example, consider the formula in CNF
important applicability in industry. A propositional formula is satisfiable
if there is an assignment of truth-values to its variables that makes the P & ~Q & (~P ∨ Q ∨ R) & (P ∨ ~S)
formula true. For example, the assignment (P ← true, Q ← true, R ←
false) does not make (P ∨R) & ~Q true but (P ← true, Q ← false, R ← Since P is a conjunct, but also a unit clause, P must be true if the entire
false) does and, hence, the formula is satisfiable. Determining whether a formula is to be true. Moreover, the value of ~P does not contribute to the
formula is satisfiable or not is called the Boolean Satisfiability Problem truth of ~P ∨ Q ∨ R and P ∨ ~S is true regardless of S. Thus, the whole
—SAT for short—and for a formula with n variables SAT can be settled formula reduces to
thus: Inspect each of the 2n possible assignments to see if there is at least
~Q & (Q ∨ R)
one assignment that satisfies the formula, i.e. makes it true. This method is
clearly complete: If the original formula is satisfiable then we will Similarly, ~Q must be true and the formula further reduces to
eventually find one such satisfying assignment; but if the formula is
contradictory (i.e. non-satisfiable), we will be able to determine this too. R
Just as clearly, and particularly in this latter case, this search takes an
exponential amount of time, and the desire to conceive more efficient which forces R to be true. From this process we can recover the
algorithms is well justified particularly because many computing problems assignment (P ← true, Q ← false, R ← true, S ← false) proving that the

original formula is satisfiable. A formula may cause the algorithm to hundredths of a second to complete if the formula had instead 40 and 60
branch; the search through a branch reaches a dead end the moment a variables respectively. In dramatic contrast, if the solution to the problem
clause is deemed false—a conflicting clause—and all variations of the were exponential (say 2n) then the times to complete the job for 10, 40 and
assignment that has been partially constructed up to this point can be 60 variables would be respectively one thousandth of a second, 13 days,
discarded. To illustrate: and 365 centuries. It is a true testament to the ingenuity of the automated
reasoning community and the power of current SAT-based search
1 R & (P ∨ Q) & (~P ∨ Q) & (~P ∨ ~Q) Given algorithms that real-world problems with thousands of variables can be
2 (P ∨ Q) & (~P ∨ Q) & (~P ∨ ~Q) By letting R ← true handled with reasonable efficency. (Küchlin and Sinz 2000) dicusses a
3 Q & ~Q By letting P ← true SAT application in the realm of industrial automotive product data
4 ? Conflict: Q and ~Q cannot management where 18,000 (elementary) Boolean formulas and 17,000
both be true variables are used to express constraints on orders placed by customers.
5 (P ∨ Q) & (~P ∨ Q) & (~P ∨ ~Q) Backtrack to (2): R ← true As another example, (Massacci and Marraro 2000) discusses an
still holds application in logical cryptanalysis, that is, the verification of properties of
6 ~P By letting Q ← true cryptographic algorithms expressed as SAT problems. They demonstrate
7 true By letting ~P be true, i.e., how finding a key with a cryptographic attack is analogous to finding a
P ← false model—assignment—for a Boolean formula; the formula in their
application encodes the commercial version of the U.S Data Encryption
Hence, the formula is satisfiable by the existence of (P ← false, Q ← true, Standard (DES) with the encoding requiring 60,000 clauses and 10,000
R ← true). DPLL algorithms are made more efficient by strategies such as variables.
term indexing (ordering of the formula variables in an advantageous
way), chronological backtracking (undoing work to a previous Although SAT is conceptually very simple, its inner nature is not well
branching point if the process leads to a conflicting clause), and conflict- understood—there are no criteria that can be generally applied to answer
driven learning (determining the information to keep and where to as to why one SAT problem is harder than another. It should then come as
backtrack). The combination of these strategies results in a large prune of no surprise that algorithms that tend to do well on some SAT instances do
the search space; for a more extensive discussion the interested reader is not perform so well on others, and efforts are being spent in designing
directed to (Zhang and Malik 2002). hybrid algorithmic solutions that combine the strength of complementary
approaches—see (Prasad, Biere and Gupta 2005) for an application of this
A quick back-envelope calculation reveals the staggering computing times hybrid approach in the verification of hardware design.
of (algorithms for) SAT-type problems represented by formulas with as
little as, say, 60 variables. To wit: A problem represented as a Boolean The DPLL search procedure has been extended to quantified logic. MACE
formula with 10 variables that affords a linear solution taking one is a popular program based on the DPLL algorithm that searches for finite
hundredth of a second to complete would take just four hundredths and six models of first-order formulas with equality. As an example (McCune

2001), to show that not all groups are commutative one can direct MACE proofs and MACE jointly looking for (counter) models. To find such
to look for a model of the group axioms that also falsifies the commutation models, MACE converts the first-order problem into a set of "flattened"
law or, equivalently, to look for a model of: clauses which, for increasing model sizes, are instantiated into
propositional clauses and solved as a SAT problem. The method has been
(G1) e·x=x (left identity) implemented in other automated reasoning systems as well, most notably
(G2) i (x) · x = e (left inverse) in the Paradox model finder where the MACE-style approach has been
(G3) (x · y) · z = x · (y · z) (associativity) enhanced by four additional techniques resulting in some significant
(DC) a·b≠b·a (denial of commutativity) efficiency improvements (Claessen and Sörensson 2003): term definitions
(to reduce the number of variables in flattened clauses), static symmetric
MACE finds a six-element model of these axioms, where · is defined as: reduction (to reduce the number of isomorphic models), sort inference (to
apply symmetric reduction at a finer level) and incremental SAT (to reuse
· 0 1 2 3 4 5 search information between consecutive model sizes).
0 0 1 2 3 4 5
An approach of great interest at solving SAT problems in first-order logic
1 1 0 4 5 2 3 is Satisfiability Modulo Theory (SMT) where the interpretation of
symbols in the problem's formulation is constrained by a background
2 2 3 0 1 5 4
theory. For example, in linear arithmetic the function symbols are
3 3 2 5 4 0 1 restricted to + and -. As another example, in the extensional theory of
arrays (McCarthy 1962) the array function read(a, i) returns the value of
4 4 5 1 0 3 2
the array a at index i, and write(a, i, x) returns the array identical to a but
5 5 4 3 2 1 0 where the value of a at i is x. More formally,
and where i are defined as: ∀a : Array . ∀i,j : Index . ∀x : Value . i = j → (read-write
read(write(a, i, x), j) = x axiom 1)
x 0 1 2 3 4 5
∀a : Array . ∀i,j : Index . ∀x : Value . i ≠ j → (read-write
i(x) 0 1 2 3 4 5 read(write(a, i, x), j) = read(a, j) axiom 2)
∀a,b : Array . ∀i : Index . a = b → read(a, i) =
This example also illustrates the benefits of using an automated deduction (extensionality)
read(b, i)
system: How long would have taken the human researcher to come up
with the above or a similar model? For more challenging problems, the
In the context of these axioms, an SMT solver would attempt to establish
program is being used as a practical complement to the resolution-based
the satisfiability (or, dually, the validity) of a given first-order formula, or
theorem prover Prover9 (formerly Otter), with Prover9 searching for
thousands of formulas for that matter, such as

i − j = 1 & f(read(write(a, i, 2), j + 1) = read(write(a, i, f(i − j + ∞ 22

1)), i)
∫ e−a t cos2bt dt
0
(Ganzinger et al. 2004) discusses an approach to SMT called DPLL(T)

consisting of a general DPLL(X) engine that works in conjunction with a a competent computer algebra system would quickly reply with the answer
solver SolverT for background theory T. (Bofill et al. 2008) presents the
√π −b2/a2
approach in the setting of the theory of arrays, where the DPLL engine is e
2a
responsible for enumerating propositional models for the given formula
whereas SolverT checks whether these models are consistent with the
Essentially, the computer algebra system operates by taking the input
theory of arrays. Their approach is sound and complete, and can be
expression entered by the user and successively applies to it a series of
smoothly extended to multidimensional arrays.
transformation rules until the result no longer changes (see the section
SMT is particularly successful in verification applications, most notably Term Rewriting in this article for more details). These transformation rules
software verification. Having improved the efficiency of SAT solvers with encode a significant amount of domain (mathematical) knowledge making
SMT, the effort is now on designing more efficient SMT solvers (de symbolic systems powerful tools in the hands of applied mathematicians,
Moura 2007). scientists, and engineers trying to attack problems in a wide variety of
fields ranging from calculus and the solving of equations to combinatorics
4.3 Deductive Computer Algebra and number theory.
To prove automatically even the simplest mathematical facts requires a Problem solving in mathematics involves the interplay of deduction and
significant amount of domain knowledge. As a rule, automated theorem calculation, with decision procedures being a reminder of the fuzzy
provers lack such rich knowledge and attempt to construct proofs from division between the two; hence, the integration of deductive and
first principles by the application of elementary deduction rules. This symbolic systems, which we coin here as Deductive Computer Algebra
approach results in very lengthy proofs (assuming a proof is found) with (DCA), is bound to be a fruitful combination. Analytica (Bauer, Clarke
each step being justified at a most basic logical level. Larger inference and Zhao 1998) is a theorem prover built on top of Mathematica, a
steps and a significant improvement in mathematical reasoning capability powerful and popular computer algebra system. Besides supplying the
can be obtained, however, by having a theorem prover interact with a deductive engine, Analytica also extends Mathematica's capabilities by
computer algebra system, also known as a symbolic computation system. defining a number of rewrite rules—more precisely, identities about
A computer algebra system is a computer program that assists the user summations and inequalities—that are missing in the system, as well as
with the symbolic manipulation and numeric evaluation of mathematical providing an implementation of Gosper's algorithm for finding closed
expressions. For example, when asked to compute the improper integral forms of indefinite hypergeometric summations. Equipped with this
extended knowledge, Analytica can prove semi-automatically some

nontrivial theorems from real analysis, including a series of lemmas summation identities mentioned before as well as some elementary
directly leading to a proof of the Bernstein Approximation Theorem. Here properties of the harmonic numbers,
is the statement of the theorem simply to give the reader a sense of the
n 1
level of the mathematical richness we are dealing with: Hn = ∑
k=1 k
Bernstein Approximation Theorem.
Let I = [0, 1] be the closed unit interval, f a real continuous The resulting proof has 28 steps (some of which are nontrivial) taking
function on I, and Bn(x,f) the nth Bernstein polynomial for f about 2 minutes to find.
defined as
n In (Kerber, Kohlhase and Sorge 1998) the authors use the Ωmega planning
n
Bn(x, f) = ∑ ( ) f(k/n)xk(1 − x)n−k system as the overall way to integrate theorem proving and symbolic
k=0 k
computation. (Harrison and Théry 1998) is an example of the integration
Then, on the interval I, the sequence of Bernstein polynomials for f
of a higher-order logic theorem proving system (HOL) with a computer
converges uniformly to f.
algebra system (Maple).
To be frank, the program is supplied with key information to establish the
Their great power notwithstanding, symbolic algebra systems do not
lemmas that lead to this theorem but the amount and type of deductive
enforce the same level of rigor and formality that is the essence of
work done by the program is certainly nontrivial. (Clarke and Zhao 1994)
automated deduction systems. In fact, the mathematical semantics of some
provides examples of fully automated proofs using problems in Chapter 2
of the knowledge rules in most algebra systems is not entirely clear and
of Ramanujan's Notebooks (Berndt 1985) including the following example
are, in cases, logically unsound (Harrison and Théry 1998). The main
that the reader is invited to try. Show that:
reason for this is an over-aggressiveness to provide the user with an
Ar r Ak answer in a timely fashion at whatever cost, bypassing the checking of
1 1
∑ = r + 2 ( ∑ (r − k) ( ∑ 3 ) ) + 2rφ(3,A0) required assumptions even if it means sacrificing the soundness of the
k=n+1 k k=1 j=A +1 (3j) −3j
k−1 calculation. (This is strongly reminiscent of most Prolog implementations
where A0=1, An+1=3An+1 and φ(x,n) is Ramanujan's abbreviation for that bypass the so-called “occurs-check” also abandoning logical
soundness in the name of efficiency.) This serious problem opens the
n 1 opportunity for a deduction system to provide a service to the computer
φ(x,n) =df ∑ 3x3 algebra system: Use its deductive capabilities to verify that the computer
k=1 −(kx)+k
algebra's computational steps meet the required assumptions. There is a
Analytica's proof of this identity proceeds by simplifying both the left- and catch in this, however: For sufficiently large calculation steps, verifying is
right-hand sides of the equality and showing that both sides reduce to the tantamount to proving and, to check these steps, the deduction system may
same expression, −Hn + HAr. The simplification uses the added well need the assistance of the very same system that is in need of

verification! The solution to the soundness problem may then well require 10—has shown to be invaluable in practice. Proof construction in a system
an extensive modification of the chosen symbolic algebra system to make like HOL proceeds semi-automatically with the user providing a fair
it sound; an alternative approach is to develop a new system, entirely from amount of guidance as to how the proof should proceed: The user tries to
scratch, in conjunction with the development of the automated theorem find a proof while being assisted by the theorem prover which, on request,
prover. In either case, the resulting combined deductive computer algebra can either automatically fill in a proof segment or verify proof steps given
system should display a much improved ability for automated to it. Although some of the techniques mentioned above provide decision
mathematical reasoning. procedures which higher-order logic lacks, higher-order logic has the
advantage of being very expressive. The tradeoff is justified since proving
4.4 Formal Verification of Hardware facts about floating-point arithmetic requires the formalization of a large
body of real analysis, including many elementary statements such as:
Automated reasoning has reached the level of maturity where theorem
proving systems and techniques are being used for industrial-strength |- (!x. a <= x /\ x <= b ==> (f diffl (f' x)) x) /\
applications. One such application area is the formal verification of
f(a) <= K /\
hardware and software systems. The cost of defects in hardware can easily
f(b) <= K /\
run into the millions. In 1994, the Pentium processor was shipped with a
defect in its floating-point unit and the subsequent offer by Intel to replace (!x. a <= x /\ x <= b /\ (f'(x) = 0) ==> f(x) <= K) ==>
the flawed chip (which was taken up only by a small fraction of all (!x. a <= x /\ x <= b ==> f(x) <= K)
Pentium owners) cost the company close to $500 million. To guard against
situations like this, the practice of testing chip designs is now considered This statement from (Harrison 2000) written in HOL says that if a function
insufficient and more formal methods of verification have not only gained f is differentiable with derivative fʹ′ in an interval [a, b] then a sufficient
large attention in the microprocessor industry but have become a condition for f(x) ≤ K throughout the interval is that f(x) ≤ K at the
necessity. The idea behind formal verification is to rigorously prove with endpoints a, b and at all points of zero derivative. The result is used to
mathematical certainty that the system functions as specified. Common determine error bounds when approximating transcendental functions by
applications to hardware design include formally establish that the system truncated power series. Conducting proofs in such a “painstakingly
functions correctly on all inputs, or that two different circuits are foundational system” (Harrison 2006) has some significant benefits. First,
functionally equivalent. one achieves a high degree of assurance that the proofs are valid since
(admitedly lengthy) they are composed of small error-free deductive steps.
Depending on the task at hand, one can draw from a number of automated Second, the formalization of these elementary statements and intermediate
formal verification techniques, including SAT solvers in propositional results can be reused in other tasks or projects. For example, a library of
logic, symbolic simulation using binary decision diagrams (BDDs), model formal statements and proven results in floating-point division can be
checking in temporal logic, or conducting proofs in higher-order logic. In reused when proving other results of floating-point algorithms for square
the latter case, using an automated theorem prover like HOL—see Section roots or transcendental functions. To further illustrate, different versions

of the square root algorithm for the Intel Itanium share many similarities National Westminster Bank and subsequently sold to MasterCard
and the proof of correctness for one version of the algorithm can be International. (Schmitt and Tonin 2007) describes a Java Card
carried over to another version after minor tweaking of the proof. A third implementation of the Mondex protocol for which the security properties
benefit of using a prover like HOL is, of course, that such lengthy proofs were reformulated in the Java Modeling Language (JML) following
are carried out mechanically and are deductively certain; the likelihood of closely the original Z specification. Proof of correctness was conducted
introducing a human error if they were carried out manually would be just using the KeY tool (Beckert, Hanle and Schmitt 2007), an interactive
as certain. theorem proving environment for first-order dynamic logic that allows the
user to prove properties of imperative and object-oriented sequential
4.5 Formal Verification of Software programs. This application of automated reasoning demonstrates, in the
words of the authors, that “it is possible to bridge the gap between
Society is becoming increasingly dependent on software systems for
specification and implementation ensuring a fully verified result”.
critical services such as safety and security. Serious adverse effects of
malfunctioning software include loss of human life, threats to security, (Denney, Fischer and Schumann 2004) describes a system to automate the
unauthorized access to sensitive information, large financial losses, denial certification of safety properties of data-analysis aerospace software at
of critical services, and risk to safety. One way to increase the quality of NASA. Using Hoare-style program verification techniques, their system
critical software is to supplement traditional methods of testing and generates proof obligations that are then handled by an automated theorem
validation with techniques of formal verification. The basic approach to prover. The process is not fully automated, however, since many of the
formal verification is to generate a number of conditions that the software obligations must be simplified first in order to improve the ability of the
must meet and to verify—establish—them by mathematical proof. As with theorem prover to solve the proof tasks. For example, one such class of
hardware, automated formal verification (simply formal verification, obligations makes a statement about a matrix, r, that needs to remain
hereafter) is concerned with discharging these proof obligations using an symmetric after updates along its diagonal have been made, and has the
automated theorem prover. form:
The formal verification of security protocols is an almost ideal Original form:

application of automated theorem proving in industry. Security protocols symm(r) → symm(diag-updates(r))
are small distributed programs aimed at ensuring that transactions take
place securely over public networks. The specification of a security Simplified form (when r is 2x2):
protocol is relatively small and well defined but its verification is certainly
non-trivial. We have already mentioned in a previous section the use of (∀i)(∀j)(0 ≤ i, j ≤ 1 → sel(r, i, j) = sel(r, j, i)) →
SAT-based theorem provers in the verification of the U.S Data Encryption (∀k)(∀l)(0 ≤ k, l ≤ 1 →
Standard (DES). As another example, the Mondex “electronic purse” is a sel(upd(upd(r, 1, 1, r11), 0, 0, r00), k, l) = sel(upd(upd(r, 1, 1, r11), 0, 0, r00), l,
smart card electronic cash system that was originally developed by k)))

Even after the simplification, current theorem provers find the proof task government's regulatory personnel that need to approve the RPS software
challenging. The task becomes intractable for larger matrices and number before the reactor can be certified for operation.
of updates (e.g. a 6x6 matrix with 36 updates) and further preprocessing
and simplification on the obligation is required before the task eventually 4.6 Logic and Philosophy
falls within the reach of state-of-art theorem provers. But it is worth
In the spirit of (Wos, Overbeek, Lusk and Boyle 1992) we pose the
remarking that proofs are found without using any specific features or
question: What do the following statements about different systems of
configuration parameters of the theorem provers which would improve
formal logic and exact philosophy have in common?
their chances at completing the proofs. This is important since the
everyday application of theorem provers in industry cannot presuppose The implicational fragments of the modal logics S4 and S5 have been
such deep knowledge of the prover from their users. The formal studied extensively over the years. Posed as an open question, it was
verification of software remains a demanding task but it is difficult to see eventually shown that there is a single axiom for implicational S4 as
how the certification of properties could happen without the assistance of well as several new shortest axioms for implicational S5 (Ernst,
automated deduction when one faces the humanly impossible task of Fitelson, Harris and Wos 2002).
establishing thousands of such obligations. The L combinator is defined as (Lx)y = x(yy). Although it was known
that the L-based combinator E12 = ((L(LL))(L(LL)))((L(LL))(L(LL)))
In the field of nuclear engineering, techniques of automated reasoning
satisfies E12E12 = E12 the question remained whether a shorter L-
are deemed mature enough to assist in the formal verification of the
based combinator satisfying this property existed. (Glickfeld and
safety-critical software responsible for controlling a nuclear power plant's
Overbeek 1986) showed this to be the case with E8 = ((LL)(L(LL)))
reactor prevention systems (RPS). The RPS component of the digital
(L(LL)).
control system of the APR-1400 nuclear reactor is specified using NuSCR,
Thirteen shortest single axioms of length eleven for classical
a formal specification language customized for nuclear applications (Yoo,
equivalence had been discovered, and XCB = e(x, e(e(e(x, y), e(z, y)),
Jee and Cha 2009). Model checking in computation tree logic is used to
z)) was the only remaining formula of that length whose status was
check the specifications for completeness and consistency. After this,
undetermined—was it an axiom? For a quarter of a century this
nuclear engineers generate function block designs via a process of
question remained open despite intense study by various researchers.
automatic synthesis and formally verify the designs also using techniques
It was finally settled that XCB is indeed such a single axiom, thus
of model checking in linear temporal logic; the techniques are also used to
ending the search for shortest single axioms for the equivalential
verify the equivalence of the multiple revisions and releases of the design.
calculus (Wos, Ulrich and Fitelson 2002).
These model-checking tools were implemented to make their use as easy
Saint Anselm of Canterbury offered in his Proslogium a famous
and intuitive as possible, in a way that did not require a deep knowledge of
argument for the existence of God. But, quite recently, a simpler
the techniques, and used notations familiar to nuclear engineers. The use
proof has been discovered in the sense that it is shorter and uses
of automated reasoning tools not only helps the design engineers to
fewer assumptions (Oppenheimer and Zalta 2011).
establish the desired results but it also raises the confidence of the

In the axioms defining a Robbins algebra, the Huntington's equation was also proved that it is the shortest such axiom:
−(−(x + y) + −(x + −y)) = x can be replaced by a simpler one, namely
the Robbins equation −(−x + y) + −(−x + −y) = x. This conjecture (3) CCpCCqCrrCpsCCstCuCpt
went unproved for more than 50 years resisting the attacks of many
To show that each of (2) and (3) is necessary and sufficient for (1), a circle
logicians including Tarski until it was eventually proved in (McCune
of proofs was produced using the automated reasoning tool: (1) ⇒ (3) ⇒
1997).
(2) ⇒ (1). As for C5, its axiomatization was originally published in
We ask again, what do these results have in common? The answer is that (Lemmon, A. Meredith, D. Meredith, Prior and Thomas 1957) giving
each has been proved with the help of an automated reasoning program. several 4-, 3-, 2- and 1-axiom bases for C5, including the following 3-
Having disclosed the answer to this question prompts a new one: How axiom basis:
much longer would have taken to settle these open problems without the
application of such an automated reasoning tool? (4) CqCpp CCpqCCqrCpr CCCCpqrCpqCpq
The strict implicational fragments of the logical systems S4 and S5 of The publication also included the shortest known 2-axiom bases for C5
modal logic are known as C4 and C5, respectively, and their Hilbert-style (actually two of them, containing 20 symbols each) but the shortest single
axiomatizations presuppose condensed detachment as their sole rule of axiom for C5 was later discovered by (Meredith and Prior 1964) and
inference. With insight from Kripke, (Anderson and Belnap 1962) having 21 symbols:
published the first axiomatization of C4 using the following 3-axiom
basis, where the Polish notation 'Cpq' stands for 'p → q'. (5) CCCCCppqrCstCCtqCsCsq
(1) Cpp CCpqCrCpq CCpCqrCCpqCpr Applying automated reasoning strategies again, (Ernst, Fitelson, Harris
and Wos 2001) discovered several new bases, including the following 2-
A question was posed sometime after: Is there a shorter such axiom basis of length 18 and six 1-axiom bases matching Meredith's
axiomatization for C4, using a 2-axiom basis or even a single axiom? length of 21 (only one of these is given below):
Using the automated reasoning program Otter, (Ernst, Fitelson, Harris and
Wos 2001) settled both questions in the affirmative. In fact, several 2- (6) Cpp CCpqCCCCqrsrCpr
axiom bases were discovered of which the following turned out to be (7) CCCCpqrCCuuqCCqtCsCpt
shortest:
To show that each of (6) and (7) is necessary and sufficient for (4), a circle
(2) CpCqq CCpCqrCCpqCsCpr of proofs was also produced with the theorem prover: (6) ⇒ (4) ⇒ (7) ⇒
(6).
Further rounds of automated reasoning work were rewarded with the
discovery of a single axiom for C4; the axiom is 21 symbols long and it A charming foray into combinatory logic is presented in (Smullyan 1985,

Glickfeld and Overbeek 1986) where we learn about a certain enchanted Smullyan challenges us to prove a most surprising thing about larks:
forest inhabited by talking birds. Given any birds A and B, if the name of Suppose we are not given any other information except that the forest
bird B is spoken to bird A then A will respond with the name of some bird contains a lark. Then, show that at least one bird in the forest must be
in the forest, AB, and this response to B from A will always be the same. egocentric! Below we give the salient steps in the proof found by the
Here are some definitions about enchanted birds: automated reasoning system, where ‘S(x, y)’ stands for ‘xy’ and where
clauses (2) and (3) are, respectively, the definition of a lark and the denial
B1 A mockingbird M mimics any bird in the sense that M's of the theorem; numbers on the right are applications of paramodulation:
response to a bird x is the same as x's response to itself,
Mx = xx. 1 (x1 = x1)
B2 A bird C composes birds A and B if A(Bx) = Cx, for any 2 (S(S(L, x1), x2) = S(x1, S(x2, x2)))
bird x. In other words, C's response to x is the same as A's 3 -(S(x1, x1) = x1)
response to B's response to x. 6 (S(x1, S(S(L, S(x2, x2)), x2)) = S(S(L, 2 2

x1), S(x2, x2)))
B3 A bird A is fond of a bird B if A's response to B is B; that
8 (S(x1, S(S(x2, x2), S(x2, x2))) = 2 2
is, AB = B. S(S(L, S(L, x1)), x2))
9 (S(S(S(L, L), x1), x2) = S(S(x1, x1), 2 2
And here are two facts about this enchanted forest: S(x2, x2)))
18 -(S(S(L, S(S(L, S(L, L)), x1)), x1) = 6 3 6 9 8 8
F1 For any birds A and B in the forest there is a bird C that S(S(L, S(x1,x1)), x1))
composes them. 19 [] 18 1
F2 There is a mockingbird in the forest.
Closer inspection of the left and right hand sides of (18) under the
There have been rumors that every bird in the forest is fond of at least one application of unification revealed the discovery of a 10-L bird, i.e. a 10-
bird, and also that there is at least one bird that is not fond of any bird. The symbol bird expressed solely in terms of larks, which was a strong
challenge to the reader now is, of course, to settle these rumors using only candidate for egocentricity. This discovery was exciting because the
F1 and F2, and the given definitions (B1)–(B3). (Glickfeld and Overbeek shortest egocentric L-bird known to Smullyan was of length 12. A
1986) do this in mere seconds with an automated reasoning system using subsequent run of the automated reasoning system produced a proof of
paramodulation, demodulation and subsumption. For a more challenging this fact as well as another new significant bird: A possible egocentric 8-L
problem, consider the additional definitions: bird! A few more runs of the system eventually produced a 22-line proof
(with terms with as many as 50 symbols, excluding commas and
B4 A bird is egocentric if it is fond of itself: EE = E. parentheses) of the fact that ((LL)(L(LL)))(L(LL)) is indeed egocentric.
B5 A bird L is a lark if for any birds x and y the following The natural questions to ask next are, of course, whether there are other
holds: (Lx)y = x(yy). 8-L egocentric birds and whether there are shorter ones. The reader may

want to attempt this with paper and pencil but, given that there are 429 axiom for the equivalential calculus (Peterson 1977). One way to answer
such birds, it may be wiser to try it instead (or in conjunction) with an the question in the affirmative would be to show that at least one of the 13
automated reasoning program; both approaches are explored in (Glickfeld known single axioms is derivable from XCB alone; another approach
and Overbeek 1986). For a more formal, but admittedly less colorful, would be to derive from XCB the 3-axiom set (E1)–(E3). While (Wos,
introduction to combinatory logic and lambda-conversion the reader is Ulrich and Fitelson 2002) take shots at the former, their line of attack
referred to (Hindley and Seldin 1986). concentrates on the latter with the most challenging task being the proving
of symmetry. Working with the assistance of a powerful automated
Formulas in the classical equivalential calculus are written using reasoning program, Otter, they conducted a concerted, persistent and very
sentential variables and a two-place function symbol, e, for equivalence. aggressive assault on the open question. (Their article sometimes reads
The calculus has two rules of inference, detachment (modus ponens) and like a military briefing from the front lines!) For simpler problems, proofs
substitution; the rules can be combined into the single rule of condensed can be found by the reasoning program automatically; deeper and more
detachment: Obtain tθ from e(s,t) and r where sθ = rθ with mgu θ. The challenging ones like the one at hand require the guidance of the user. The
calculus can be axiomatized with the formulas: relentless application of the reasoning tool involved much guidance in the
setting of lemmas as targets and the deployment of an arsenal of strategies,
(E1) e(x,x) (reflexivity)
including the set of support, forward and backward subsumption, lemma
(E2) e(e(x,y),e(y,x)) (symmetry)
adjunction, formula complexity, hints strategy, ratio strategy, term
(E3) e(e(x,y),e(e(y,z),e(x,z))) (transitivity) avoidance, level saturation, and others. After much effort and CPU time,
the open question finally succumbed to the combined effort of man and
We can dispense with reflexivity since it is derivable from the other two
machine and a 61-step proof of symmetry was found, followed by one for
formulas. This brings the number of axioms down to two and a natural
transitivity after 10 more applications of condensed detachment.
question to ask is whether there is a single axiom for the equivalential
Subsequent runs of the theorem prover using demodulation blocking and
calculus. In 1933, Łukasiewicz found three formulas of length eleven that
the so-called cramming strategy delivered shorter proofs. Here are the last
each could act as a single axiom for the calculus—here's one of them:
lines of their 25-step proof which in this case proves transitivity first
e(e(x,y),e(e(z,y),e(x,z)))—and he also showed that no shorter single axiom
followed by symmetry:
existed. Over time, other single axioms also of length eleven were found
and the list kept growing with additions by Meredith, Kalman and
123 [hyper,51,106,122] P(e(e(e(e(x,y),e(z,y)),z),x)).
Peterson to a total of 14 formulas of which 13 were known to be single
124 [hyper,51,53,123] P(e(e(e(e(e(e(e(x,y),e(z,y)),
axioms and one formula with a yet undetermined status: the formula XCB
z),x),u),e(v,u)),v)).
= e(x, e(e(e(x, y), e(z, y)), z)). (Actually, the list grew to 18 formulas but
125 [hyper,51,124,123] P(e(e(e(x,y),x),y)).
(Wos, Winker, Veroff, Smith and Henschen 1983) reduced it to 14.)
Resisting the intense study of various researchers, it remained as an open 127 [hyper,51,124,108] P(e(e(e(e(x,e(e(e(x,y),e(z,y))
question for many years whether the 14th formula, XCB, was a single ,z)),e(e(e(e(e(u,v),e(w,v)),w),u),

v6)),v7),e(v6,v7))). Definition of none_greater:

128 [hyper,51,127,123] P(e(e(x,y),e(e(y,z),e(x,z)))). all x (Object(x) -> (Ex1(none_greater,x) <->
130 [hyper,51,128,125] P(e(e(x,y),e(e(e(z,x),z),y))). (Ex1(conceivable,x) &
131 [hyper,51,128,130] P(e(e(e(e(e(x,y),x),z),u), -(exists y (Object(y) & Ex2(greater_than,y,x) &
e(e(y,z),u))). Ex1(conceivable,y)))))).
132 [hyper,51,131,123] P(e(e(x,y),e(y,x))).
Definition of God:
With an effective methodology and a strategy that included the assistance
Is_the(g,none_greater).
of an automated reasoning program in a crucial way, the search for
shortest single axioms for the equivalent calculus came to an end.
Part of the challenge when representing in Prover9 these and other
Fitelson & Zalta 2007 and Oppenheimer & Zalta 2011 describe several statements from axiomatic metaphysics was to circumvent some of the
applications of automated reasoning in computational metaphysics. By prover's linguistic limitations. For example, Prover9 does not have definite
representing formal metaphysical claims as axioms and premises in an descriptions so statements of this kind as well as second-order concepts
automated reasoning environment using programs like Prover9, Mace4, had to be expressed in terms of Prover9's existing first-order logic. But the
the E-prover system and Paradox, the logical status of metaphysical return is worth the investment since Prover9 not only delivered a proof of
arguments is investigated. After the suitable formalization of axioms and Ex1(e,g)—there is one and only one God—but does so with an added
premises, the model finder program Mace4 is used to help verify their bonus. A close inspection of the output provides yet another example of an
consistency. Then, using Prover9, proofs are automatically generated for a automated theorem prover "outreasoning" its users, revealing that some of
number of theorems of the Theory of Plato's Forms, twenty five the logical machinery is actually redundant: The proof can be constructed
fundamental theorems of the Theory of Possible Worlds, the theorems only using two of the logical theorems of the theory of descriptions (called
described in Leibniz's unpublished paper of 1690, and a fully automated "Theorem 2" and "Theorem 3" in their article), one of the non-logical
construction of Saint Anselm's Ontological Argument. In the latter premises (called "Premise 2"), and the definition of God. We cannot help
application, Saint Anselm is understood in Oppenheimer and Zalta 2011 but to include here Prover9's shorter proof, written in the more elegant
as having found a way of inferring God's existence from His mere being as notation of standard logic (from Oppenheimer and Zalta 2011):
opposed to inferring God's actuality from His mere possibility. This allows
1. ~E!ιxφ1 Assumption, for Reductio
for a formalization that is free of modal operators, involving an underlying
logic of descriptions, three non-logical premises, and a definition of God. 2. ∃y(Gyιxφ1 & Cy) from (1), by Premise 2 and MP
Here are two key definitions in the formalization, as inputted into Prover9, 3. Ghιxφ1 & Ch from (2), by ∃E, ‘h’ arbitrary
that helped express the concept of God: 4. Ghιxφ1 from (3), by &E
5. ∃y(y = ιxφ1) from (4), by Theory of Descriptions,

Theorem 3 goal in the agent's search for true knowledge about its world. (Pollock
6. Cιxφ1 & ~∃y(Gyιxφ1 & from (5), by Theory of Descriptions, 1995) offers the following definition:
Cy) Theorem 2
A set is defeasible enumerable iff there is an effective computable
7. ~∃y(Gyιxφ1 & Cy) from (6), by &E
function f such that for each n, f(n) is a recursive set and the following two
8. E!ιxφ1 from (1), (2), (7), by Reductio conditions hold
9. E!g from (8), by the definition of 'g'
1. (∀x)(x ∈ A → (∃n)(∀m > n) x ∈ f(m))
Leibniz's dream was to have a charateristica universalis that would allows 2. (∀x)(x ∉ A → (∃n)(∀m > n) x ∉ f(m))
us to reason in metaphysics and morals in much the same way as we do in
geometry and analysis; that is to say, to settle disputes between To compare the concepts, if A is recursively enumerable then there is a
philosophers as accountants do: “To take pen in hand, sit down at the sequence of recursive sets Ai such that each Ai is a subset of A with each
abacus and, having called in a friend if they want, say to each other: Let us Ai growing monotonically, approaching A in the limit. But if A is only
calculate!” From the above applications of automated reasoning, one defeasibly enumerable then the Ai's still approach A in the limit but may
would agree with the researchers when they imply that these results not be subsets of A and approach A intermittently from above and below.
achieve, to some extent, Leibniz's goal of a computational metaphysics The goal of the OSCAR Project (Pollock 1989) is to construct a general
(Fitelson and Zalta 2007). theory of rationality and implement it in an artificial computer-based
rational agent. As such, the system uses a defeasible automated reasoner
A nonmonotonic theorem prover can provide the basis for a
that operates according to the maxim that the set of warranted beliefs
“computational laboratory” in which to explore and experiment with
should be defeasible enumerable. OSCAR has been in the making for
different models of artificial rationality; the theorem prover can be used to
some time and the application of automated nonmonotonic reasoning has
equip an artificial rational agent with an inference engine to reason and
also been used to extend its capabilities to reason defeasibly about
gain information about the world. In such procedural epistemology, a
perception and time, causation, and decision-theoretic planning (Pollock
rational agent is defeasible (i.e. nonmonotonic) in the sense that new
2006).
reasoning leads to the acceptance of new beliefs but also to the retraction
of previously held beliefs in the presence of new information. At any 4.7 Mathematics
given point in time, the agent holds a set of justified beliefs but this set is
open to revision and is in a continuous set of flux as further reasoning is One of the main goals of automated reasoning has been the automation of
conducted. This model better reflects our accepted notion of rationality mathematics. An early attempt at this was Automath (de Bruijn 1968)
than a model in which all the beliefs are warranted, i.e. beliefs that once which was the first computer system used to check the correctness of
are attained are never retracted. Actually, a set of warranted beliefs can be proofs and whole books of mathematics, including Landau's Grundlagen
seen as justified beliefs “in the limit”, that is, as the ultimate epistemic der Analysis (van Benthem Jutting 1977). Automath has been superseded

by more modern and capable systems, most notably Mizar. The Mizar take w, w;
system (Trybulec 1979, Muzalewski 1993) is based on Tarski- thus thesis by H1,H3;
Grothendieck set theory and, like Automath, consists of a formal language suppose H4: w.^.w is irrational;
which is used to write mathematical theorems and their proofs. Once a
take w.^.w, w;
proof is written in the language, it can be checked automatically by Mizar
thus thesis by H1,H2,H4;
for correctness. Mizar proofs are formal but quite readable, can refer to
definitions and previously proved theorems and, once formally checked, end;
can be added to the growing Mizar Mathematical Library (MML)
Examples of proofs that have been checked by Mizar include the Hahn-
(Bancerek and Rudnicki 2003). As of July 2012, MML contained about
Banach theorem, the Brower fixed-point theorem, Konig's lemma, the
10,000 definitions and 52,000 theorems. The Mizar language is a subset of
Jordan curve theorem, and Gödel's completeness theorem. (Rudnicki
standard English as used in mathematical texts and is highly structured to
2004) discusses the challenges of formalizing Witt's proof of the
ensure the production of rigorous and semantically unambiguous texts.
Wedderburn theorem: Every finite division ring is commutative. The
Here's a sample proof in Mizar of the existence of a rational number xy
theorem was formulated easily using the existing formalizations available
where x and y are irrational:
in MML but the proof demanded further entries into the library to
theorem T2: formalize notions and facts from algebra, complex numbers, integers,
roots of unity, cyclotomic polynomials, and polynomials in general. It
ex x, y st x is irrational & y is irrational &
took several months of effort to supply the missing material to the MML
x.^.y is rational
library but, once in place, the proof was formalized and checked correct in
proof
a matter of days. Clearly, a repository of formalized mathematical facts
set w = √2; and definitions is a prerequisite for more advanced applications. The QED
H1: w is irrational by INT_2:44,T1; Manifesto (Boyer at al. 1994, Wiedijk 2007) has such aim in mind and
w>0 by AXIOMS:22,SQUARE_1:84; there is much work to do: Mizar has the largest such repository but even
then (w.^.w).^.w = w.^.(w•w) by POWER:38 after 30 years of work “it is miniscule with respect to the body of
.= w.^.(w2) by SQUARE_1:58 established mathematics” (Rudnicki 2004). This last remark should be
construed as a call to increase the effort toward this important aspect in the
.= w.^.2 by SQUARE_1:88
automation of mathematics.
.= w2 by POWER:53
.= 2 by SQUARE_1:88; Mizar's goal is to assist the practitioner in the formalization of proofs and
then H2: (w.^.w).^.w is rational by RAT_1:8; to help check their correctness; other systems aim at finding the proofs
per cases; themselves. Geometry has been a target of early automated proof-finding
efforts. (Chou 1987) proves over 500 geometry theorems using the
suppose H3: w.^.w is rational;

algebraic approach offered by Wu's method and the Gröbner basis method year undergraduate mathematics and still far from leading edge
by representing hypotheses and conclusions as polynomial equations. mathematical research. While it is true that current systems cannot prove
(Quaife 1992) is another early effort to find proofs in first-order completely on their own problems at this level of difficulty we should
mathematics: over 400 theorems in Neumann-Bernays-Gödel set theory, remember that the goal is to build reasoning systems so that “eventually
over 1,000 theorems in arithmetic, a number of theorems in Euclidian machines are to be an aid to mathematical research and not a substitute for
geometry, and Gödel's incompleteness theorems. The approach is best it” (Wang 1960). With this in mind, and while the automated reasoning
described as semi-automatic or “interactive” with the user providing a community continues to try to meet the grand challenge of building
significant amount of input to guide the theorem-proving effort. This is no increasingly powerful theorem provers, mathematicians can draw now
surprise since, as one applies automated reasoning systems into richer some of the benefits offered by current systems, including assistance in
areas of mathematics, the systems take more on the role of proof assistants completing proof gaps or formalizing and checking the correctness of
than theorem provers. This is because in richer mathematical domains the proposed proofs. Indeed, the latter may be an application that could help
systems need to reason about theories and higher-order objects which in address some real issues currently being faced by the mathematical
general takes them deeper into the undecidable. community. Consider the announcement by Daniel Goldston and Cem
Yildrim of a proof of the Twin Prime Conjecture where, although experts
Different proof assistants offer different capabilities measured by their initially agreed that the proof was correct, an insurmountable error was
power at automating reasoning tasks, supported logic, object typing, size found shortly after. Or, think about the case of Hales' proof of the Kepler
of mathematical library, and readability of input and output. A “canonical” Conjecture which asserts that no packing of congruent balls in Euclidean
proof which is not too trivial but not too complex either can be used as a 3-space has density greater than the face-centered cubic packing. Hales'
baseline for system comparison, as done in (Wiedijk 2006) where the proof consists of about 300 pages of text and a large number of computer
authors of seventeen reasoning systems are tasked with establishing the calculations. After four years of hard work, the 12-person panel assigned
irrationality of √2. The systems discussed are certainly more capable than by Annals of Mathematics to the task of verifying the proof still had
this and some have been used to assist in the formalization of far more genuine doubts about its correctness. Thomas Hales, for one, has taken
advanced proofs such as Erdös-Selberg's proof of the Prime Number upon himself to formalize his proof and have it checked by an automated
Theorem (about 30,000 lines in Isabelle), the formalization of the Four proof assistant with the aim of convincing others of its correctness (Hales
Color Theorem (60,000 lines in Coq), and the Jordan Curve Theorem 2005b, in Other Internet Resources). His task is admittedly heavy but the
(75,000 lines in HOL Light). outcome is potentially very significant to both the mathematical and
automated reasoning communities. At the time of this writing (August
The above notwithstanding, automated reasoning has had a small impact
2014), all eyes are on Hales and his formal proof as he has just announced
on the practice of doing mathematics and there is a number of reasons
the completion of the Flyspeck project (Hales 2014, in Other Internet
given for this. One reason is that automated theorem provers are not
Resources) having constructed a formal proof of the conjecture using the
sufficiently powerful to attempt the kind of problems mathematicians
Isabelle and HOL Light automated proof assistants.
typically deal with; that their current power is, at best, at the level of first-

(Church 1936a, 1936b) and (Turing 1936) imply the existence of theorems paper mathematics” (Wiedijk 2006). For theorem provers and model
whose shortest proof is very large, and the proof of the Four Color finders, a complementary strategy would be to verify the programs' results
Theorem in (Appel and Haken 1977), the Classification of Simple Groups as opposed to the programs themselves. Paraphrasing (Slaney 1994): It
in (Gorenstein 1982), and the proof of the Kepler Conjecture in (Hales does not matter to the mathematician how many defects a program may
2005a) may well be just samples of what is yet to come. As (Bundy 2011) have as long as the proof (or model) it outputs is correct. So, the onus is in
puts it: “As important theorems requiring larger and larger proofs emerge, the verification of results, whether produced by machine or man, and
mathematics faces a dilemma: ether these theorems must be ignored or checking them by independent parties (where of course the effort may
computers must be used to assist with their proofs.” well use automated checkers) should increase the confidence on the
validity of the proofs.
The above remarks also counter another argument given for not using
automated theorem provers: Mathematicians enjoy proving theorems, so It is often argued that automated proofs are too long and detailed. That a
why let machines take away the fun? The answer to this is, of course, that proof can be expressed in more elementary steps is in principle very
mathematicians can have even more fun by letting the machine do the beneficial since this allows a mathematician to request a proof assistant
more tedious and menial tasks: “It is unworthy of excellent men to lose justify its steps in terms of simpler ones. But proof assistants should also
hours like slaves in the labour of calculation which could safely be allow the opposite, namely to abstract detail and present results and their
relegated to anyone else if machines were used” (G. W. Leibniz, New justifications using the higher-level concepts, language, and notation
Essays Concerning Human Understanding). If still not convinced, just mathematicians are accustomed to. Exploiting the hierarchical structure of
consider the sobering prospect of having to manually check the 23,000 proofs as done in (Denney 2006) is a step in this direction but more work
inequalities used in Hales' proof! along these lines is needed. Having the proof assistant work at the desired
level of granularity provides more opportunity for insight during the proof
Another reason that is given for the weak acceptance of automated discovery process. This is an important consideration since
reasoning by the mathematical community is that the programs are not to mathematicians are equally interested in gaining understanding from their
be trusted since they may contain bugs—software defects—and hence proofs as in establishing facts.
may produce erroneous results. Formally verifying automated reasoning
programs will help ameliorate this, particularly in the case of proof (Bundy 2011) alludes to a deadlock that is preventing the wider adoption
checkers. Proving programs correct is no easy task but the same is true of theorem provers by the mathematical community: On the one hand, the
about proving theorems in advanced mathematics: Gonthier proved correct mathematicians need to use the proof assistants to build a large formal
the programs used in the formalization of his proof of the Four Color library of mathematical results. But, on the other hand, they do not want to
Theorem, but he spent far more effort formalizing all the graph theory that use the provers since there is no such library of previously proved results
was part of the proof. So ironically enough, it turns out that at least in this they can build upon. To break the impasse, a number of applications are
case, and surely there are others, “it is actually easier to verify the proposed of which assisting the mathematician in the search of previously
correctness of the program than to verify the correctness of the pen-and- proved theorems is of particular promise. During its history, mathematics

has accumulated a huge number of theorems and the number of methodical order: First, automated reasoning tools are used for theory
mathematical results continues to grow dramatically. In 2010, Zentralblatt exploration and discovery. Then, having identified some target problem,
MATH covered about 120,000 new publications (Wegner 2011). Clearly, the practitioner works interactively with an automated assistant to find
no individual researcher can be acquainted with all this mathematical proofs and establish facts. Finally, an automated proof checker is used to
knowledge and it will be increasingly difficult to cope with one's ever- check the correctness of all final proofs prior to being submitted for
growing area of specialty unless assisted with automated theorem-proving publication and being made available to the rest of the mathematical
tools that can search in intelligent ways for previously proved results of community via the creation of new entries in a repository of formalized
interest. An alternative approach to this problem is for mathematicians to mathematics. It is indeed a matter of time before the application of
tap into each other's knowledge as enabled in computational social automated proof assistants becomes an everyday affair in the life of the
systems like polymath and mathoverflow. The integration of automated mathematician; it is the grand challenge of the automated reasoning
reasoning tools into such social systems would increase the effectiveness community to make it happen sooner than later.
of their collective intelligence by supporting “the combination of precise
formal deductions and the more informal loose interaction seen in 4.8 Artificial Intelligence
mathematical practice” (Martin and Pease 2013, in Other Internet
Since its inception, the field of automated theorem proving has had
Resources).
important applications in the larger field of artificial intelligence (AI).
Due to real pressing needs from industry, some applications of automated Automated deduction is at the heart of AI applications like logic
reasoning in pure and applied mathematics are more of necessity than programming (see section 4.1 Logic Programming, in this article) where
choice. After having worked on the formalization of some elementary real computation is equated with deduction; robotics and problem solving
analysis to verify hardware-based floating point trigonometric functions, (Green 1969) where the steps to achieve goals are steps extracted from
(Harrison 2006, Harrison 2000) mentions the further need to formalize proofs; deductive databases (Das 1992) where factual knowledge is
more pure mathematics—italics are his—to extend his formalization to expressed as atomic clauses and inference rules, and new facts are inferred
power series for trigonometric functions and basic theorems about by deduction; expert systems (Giarratano and Riley 1998) where human
Diophantine approximations. Harrison finds it surprising that “such expertise in a given domain (e.g. blood infections) is captured as a
extensive mathematical developments are used simply to verify that a collection of IF-THEN deduction rules and where conclusions (e.g.
floating point tangent function satisfies a certain error bound” and, from diagnoses) are obtained by the application of the inference rules; and
this remark, one would expect there are other industrial applications that many others.
will demand more extensive formalizations.
Restricting the proof search space has always been a key consideration in
Albeit not at the rate originally anticipated, automated reasoning is finding the implementation of automated deduction, and traditional AI-approaches
applications in mathematics. As the use of automated reasoning assistants to search have been an integral part of theorem provers. The main idea is
becomes more widespread one can envision their use following a certain to prevent the prover from pursuing unfruitful reasoning paths. A dual

aspect of search is to try to look for a previously proved result that could conjectures.” (Urban 2007) discusses MaLARea (a Machine Learner for
be useful in the completion of the current proof. Automatically identifying Automated Reasoning), a meta-system that also combines inductive and
those results is no easy task and it becomes less easy as the size of the deductive reasoning methods. MaLARea is intended to be used in large
problem domain, and the number of already established results, grows. theories, i.e. problems with a large number of symbols, definitions,
This is not a happy situation particularly in light of the growing trend to premises, lemmas, and theorems. The system works in cycles where
build large libraries of theorems such as the Mizar Problems for Theorem results proved deductively in a given iteration are then used by the
Proving (MPTP) (Urban et al. 2010, Bancerek and Rudnicki 2003) or the inductive machine-learning component to place restrictions in the search
Isabelle/HOL mathematical library (Meng and Paulson 2008), so space for the next theorem-proving cycle. Albeit simple in design, the first
developing techniques for the discovery, evaluation, and selection of version of MaLARea solved 142 problems out of 252 in the MPTP
existing suitable definitions, premises and lemmas in large libraries of Challenge, outperforming the more seasoned provers E (89 problems
formal mathematics as discussed in (Kühlwein et al. 2012) is an important solved) and SPASS (81 problems solved).
line of research.
Besides using large mathematical libraries, tapping into web-based
Among many other methods, and in stark contrast to automated provers, semantic ontologies is another possible source of knowledge. (Pease and
mathematicians combine induction heuristics with deductive techniques Sutcliffe 2007) discusses ways for making the SUMO ontology suitable
when attacking a problem. The former helps them guide the proof-finding for first-order theorem proving, and describes work on translating SUMO
effort while the latter allows them to close proof gaps. And of course all into TPTP. An added benefit of successfully reasoning over large semantic
this happens in the presence of the very large body of knowledge that the ontologies is that this promotes the application of automated reasoning
human possesses. For an automated prover, the analogous counterpart to into other fields of science. Tapping into its full potential, however, will
the mathematician's body of knowledge is a large library like MPTP. An require a closer alignment of methods from automated reasoning and
analogous approach to using inductive heuristics would be to endow the artificial intelligence.
theorem prover with inductive, data-driven, machine learning abilities.
(Urban and Vyskocil 2012) runs a number of experiments to determine 5. Conclusion
any gains that may result from such an approach. For this, they use MPTP
and theorem provers like E and SPASS enhanced with symbol-based Automated reasoning is a growing field that provides a healthy interplay
machine learning mechanisms. A detailed presentation and statistical between basic research and application. Automated deduction is being
results can be found in the above reference but in summary, and quoting conducted using a multiplicity of theorem-proving methods, including
the authors, “this experiment demonstrates a very real and quite unique resolution, sequent calculi, natural deduction, matrix connection methods,
benefit of large formal mathematical libraries for conducting novel term rewriting, mathematical induction, and others. These methods are
integration of AI methods. As the machine learner is trained on previous implemented using a variety of logic formalisms such as first-order logic,
proofs, it recommends relevant premises from the large library that type theory and higher-order logic, clause and Horn logic, non-classical
(according to the past experience) should be useful for proving new logics, and so on. Automated reasoning programs are being applied to

solve a growing number of problems in formal logic, mathematics and Appel, K., and W. Haken, 1977, “Every Planar Map is Four Colorable Part
computer science, logic programming, software and hardware verification, I. Discharging”, Illinois Journal of Mathematics, 21: 429–490.
circuit design, and many others. One of the results of this variety of Baader, F. and T. Nipkow, 1998, Term Rewriting and All That,
formalisms and automated deduction methods has been the proliferation of Cambridge: Cambridge University Press.
a large number of theorem proving programs. To test the capabilities of Bachmair, L. and H. Ganzinger, 1994, “Rewrite-Based Equational
these different programs, selections of problems have been proposed Theorem Proving with Selection and Simplification”, Journal of
against which their performance can be measured (McCharen, Overbeek Logic and Computation, 4 (3): 217–247.
and Wos 1976, Pelletier 1986). The TPTP (Sutcliffe and Suttner 1998) is a Bancerek, G. and P. Rudnicki, 2003, “Information Retrieval in MML”,
library of such problems that is updated on a regular basis. There is also a Proceedings of the Second International Conference on
competition among automated theorem provers held regularly at the Mathematical Knowledge Management, LNCS 2594, Heidelberg:
CADE conference (Pelletier, Sutcliffe and Suttner 2002, Sutcliffe 2014, in Springer-Verlag, pp. 119-132
Other Internet Resources); the problems for the competition are selected Basin, D. A. and T. Walsh, 1996, “A Calculus for and Termination of
from the TPTP library. Rippling”, Journal of Automated Reasoning, 16 (1–2): 147–180.
Bauer, A., E. Clarke and X. Zhao, 1998, “Analytica: An Experiment in
Initially, computers were used to aid scientists with their complex and Combining Theorem Proving and Symbolic Computation”, Journal
often tedious numerical calculations. The power of the machines was then of Automated Reasoning, 21: 295–325.
extended from the numeric into the symbolic domain where infinite- Beckert, B., R. Hanle and P.H. Schmitt (eds.), 2007, “Verification of
precision computations performed by computer algebra programs have Object-Oriented Software: The KeY Approach”, Lecture Notes in
become an everyday affair. The goal of automated reasoning has been to Artificial Intelligence (Volume 4334), Berlin: Springer-Verlag.
further extend the machine's reach into the realm of deduction where they Berndt, B., 1985, Ramanujan's Notebooks (Part I), Berlin: Springer-
can be used as reasoning assistants in helping their users establish truth Verlag, pp. 25-43.
through proof. Bibel, W., 1981, “On Matrices with Connections”, Journal of the
Association of Computing Machinery, 28 (4): 633–645.
Bibliography Bledsoe, W. W., 1977, “Non-resolution Theorem Proving”, Artificial
Intelligence, 9: 1–35.
Anderson, A. R. and N. D. Belnap, 1962, “The Pure Calculus of
Bledsoe, W. W. and M. Tyson, 1975, “The UT Interactive Prover”, Memo
Entailment”, Journal of Symbolic Logic, 27: 19–52.
ATP-17A, Department of Mathematics, University of Texas.
Andrews, P. B., M. Bishop, S. Issar, D. Nesmith, F. Pfenning and H. Xi,
Bofill, M., R. Nieuwenhuis, A. Oliveras, E. Rodriguez-Carbonell and A.
1996, “TPS: A Theorem-Proving System for Classical Type Theory”,
Rubio, 2008, “A Write-Based Solver for SAT Modulo the Theory of
Journal of Automated Reasoning, 16 (3): 321–353.
Arrays”, Formal Methods in Computer-Aided Design (FMCAD'08),
Andrews, P. B., 1981, “Theorem-Proving via General Matings”, Journal
pp. 1–8.
of the Association for Computing Machinery, 28 (2): 193–214.

Bonacina, M. P., 1999, “A Taxonomy of Theorem-Proving Strategies”, Claessen, K. and N. Sörensson, 2003, “New Techniques that Improve
Artificial Intelligence Today, (Lecture Notes in Computer Science: MACE-style Finite Model Finding”, Proceedings of the CADE-19
Volume 1600), Berlin: Springer-Verlag, pp. 43–84. Workshop: Model Computation – Principles, Algorithms,
Boyer, R. S., M. Kaufmann and J. S. Moore, 1995, “The Boyer-Moore Applications, P. Baumgartner and C. Fermueller (eds.)
Theorem Prover and its Interactive Enhancement”, Computers and Clarke, E. and X. Zhao, 1994, “Combining Symbolic Computation and
Mathematics with Applications, 29: 27–62. Theorem Proving: Some Problems of Ramanujan”, CADE-12:
Boyer, R.S. and J. S. Moore, 1979, A Computational Logic, New York: Proceedings of the 12th International Conference on Automated
Academic Press. Deduction, (Lecture Notes in Artificial Intelligence: Volume 814), A.
Boyer R., et al., 1994, “The QED Manifesto”, CADE-12: Proceedings of Bundy (ed.), Berlin: Springer-Verlag, pp. 758-763.
the 12th International Conference on Automated Deduction, (Lecture Clocksin, W. F. and C. S. Mellish, 1981, Programming in Prolog, Berlin:
Notes in Artificial Intelligence: Volume 814), A. Bundy (ed.), Berlin: Springer-Verlag.
Springer-Verlag, pp. 238–251. Colmerauer, A., H. Kanoui, R. Pasero and P. Roussel, 1973, Un Système
Bundy, A., 2011, “Automated theorem proving: a practical tool for the de Communication Homme-machine en Français, Rapport, Groupe
working mathematician?”, Annals of Mathematics and Artificial Intelligence Artificielle, Université d'Aix Marseille.
Intelligence, 61 (1): 3–14. Constable, R. L., S. F. Allen, H. M. Bromley, W.R. Cleaveland, J. F.
Bundy, A., F. van Harmelen, J. Hesketh and A. Smaill, 1991, Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler,
“Experiments with Proof Plans for Induction”, Journal of Automated P. Panangaden, J. T. Sasaki and S. F. Smith, 1986, Implementing
Reasoning, 7 (3): 303–324. Mathematics with the Nuprl Proof Development System, Englewood
Bundy, A., A. Stevens, F. van Harmelen, A. Ireland and A. Smaill, 1993, Cliffs, NJ: Prentice Hall.
“Rippling: A Heuristic for Guiding Inductive Proofs”, Artificial Cook, S. A., 1971, “The complexity of Theorem-Proving Procedures”,
Intelligence, 62: 185–253. Proceedings of the 3rd Annual ACM Symposium on Theory of
Church, A., 1936a, “An unsolvable problem of elementary number Computing, New York: Association for Computing Machinery, pp.
theory”, American Journal of Mathematics, 58 (2): 345–363. 151–158.
Church, A., 1936b, “A note on the Entscheidungsproblem”, Journal of Das, S. K., 1992, Deductive Databases and Logic Programming, Addison-
Symbolic Logic, 1 (1): 40–41. Wesley.
Church, A., 1940, “A Formulation of the Simple Theory of Types”, Davis, M., G. Logemann and D. Loveland, 1962, “A Machine Program for
Journal of Symbolic Logic, 5: 56–68. Theorem-Proving”, Communications of the Association for
Chang, C. L. and R. C. T. Lee, 1973, Symbolic Logic and Mechanical Computing Machinery, 5 (7): 394–397.
Theorem Proving, New York: Academic Press. Davis, M. and H. Putnam, 1960, “A Computing Procedure for
Chou, S., 1987, Mechanical Geometry Theorem Proving, Dordrecht: Quantification Theory”, Journal of the Association for Computing
Kluwer Academic Publishers. Machinery, 7 (3): 201–215.

de Bruijn, N. G., 1968, “Automath, a Language for Mathematics”, in Ganzinger, H., G. Hagen, R. Nieuwenhuis, A. Oliveras, C. Tinelli, 2004,
Automation of Reasoning (Volume 2), J. Siekmann and G. Wrighston “DPLL(T): Fast Decision Procedures”, Computer Aided Verification,
(eds.), Berlin: Springer-Verlag, 1983, pp. 159–200. (Lecture Notes in Computer Science: Volume 3114), pp. 175–188.
de Moura, L., 2007, “Developing Efficient SMT Solvers”, Proceedings of Gentzen, G., 1935, “Investigations into Logical Deduction”, in Szabo
the CADE-21 Workshop on Empirically Successful Automated 1969, pp. 68–131.
Reasoning in Large Theories, G. Sutcliffe, J. Urban and S. Schulz Giarratano, J. and G. Riley, 1998, Expert Systems: Principles and
(eds.), Bremen. Programming, Boston, MA: PWS Publishing Co.
Denney, E., B. Fischer and J. Schumann, 2004, “Using Automated Gordon, M. J. C. and T. F. Melham, eds., 1993, Introduction to HOL: A
Theorem Provers to Certify Auto-generated Aerospace Software”, Theorem Proving Environment for Higher Order Logic, Cambridge:
Automated Reasoning, Second International Joint Conference Cambridge University Press.
(IJCAR) (Lecture Notes in Artificial Intelligence: Volume 3097), D. Gordon, M. J. C., A. J. Milner and C. P. Wadsworth, 1979, Edinburgh
Basin and M. Rusinowitch (eds.), Berlin: Springer-Verlag, pp. 198- LCF: A Mechanised Logic of Computation (LNCS 78), Berlin:
212. Springer-Verlag.
Denney, E., J. Power and K. Tourlas, 2006, “Hiproofs: A Hierarchical Gorenstein, D., 1982, Finite Simple Groups: An Introduction to their
Notion of Proof Tree”, Proceedings of the 21st Annual Conference Classification (University Series in Mathematics), New York:
on Mathematical Foundations of Programming Semantics (MFPS Plenum Press.
XXI) (Electronic Notes in Theoretical Computer Science, Vol. 155), Green, C., 1969, “Application of Theorem Proving to Problem Solving”,
pp. 341–359. IJCAI'69 Proceedings of the 1st international joint conference on
Ernst, Z., B. Fitelson, K. Harris and L. Wos, 2002, “Shortest Artificial intelligence, San Francisco: Morgan Kaufmann, pp. 219–
Axiomatizations of Implicational S4 and S5”, Notre Dame Journal of 239
Formal Logic, 43 (3): 169–179. Haack, S., 1978, Philosophy of Logics, Cambridge: Cambridge University
Fitelson B. and E. Zalta, 2007, “Steps Toward a Computational Press.
Metaphysics”, Journal of Philosophical Logic, 36 (2): 227–247. Hales, T. C., 2005a, “A proof of the Kepler Conjecture”, Annals of
Fitting, M., 1990, First-Order Logic and Automated Theorem Proving, Mathematics, 162 (3): 1065–1185.
Berlin: Springer-Verlag. Harrison, J., 2000, “High-Level Verification Using Theorem Proving and
Farmer, W. M., J. D. Guttman and F. J. Thayer, 1993, “IMPS: An Formalized Mathematics”, CADE-17: Proceedings of the 17th
Interactive Mathematical Proof System”, Journal of Automated International Conference on Automated Deduction, (Lecture Notes in
Reasoning, 11 (2): 213–248. Artificial Intelligence: Volume 1831), D. McAllester (ed.), Berlin:
Furbach , U., 1994, “Theory Reasoning in First Order Calculi”, Springer-Verlag, pp. 1-6.
Management and Processing of Complex Data Structures, (Lecture Harrison, J., 2006, “Verification: Industrial Applications”, Proof
Notes in Computer Science Volume 777), pp. 139–156. Technology and Computation, H. Schwichtenberg and K. Spies

(eds.), Amsterdam: IOS Press, pp. 161–205. Large Theory Mathematics”, Automated Reasoning: 6th
Harrison, J., 2009, “Formalizing an Analytic Proof of the Prime Number International Joint Conference, IJCAR 2012, (Lecture Notes in
Theorem”, Journal of Automated Reasoning (Special Issue: A Computer Science: Volume 7364), B. Gramlich, D. Miller and U.
Festschrift for Michael J. C. Gordon), 43 (3): 243–261. Sattler (eds.), Manchester, UK: Springer-Verlag, pp. 378–392.
Harrison, J. and L. Théry, 1998, “A Skeptic's Approach to Combining Lemmon, E. J., C. A. Meredith, D. Meredith, A. N. Prioir and I. Thomas,
HOL and Maple”, Journal of Automated Reasoning, 21: 279–294. 1957, Calculi of Pure Strict Implication, Philosophy Dept.,
Hilbert, D. and W. Ackermann, 1928, Principles of Mathematical Logic, Canterbury University, Christchurch, New Zealand.
L. Hammond, G. Leckie, and F. Steinhardt (trans.), New York: Lloyd, J. W., 1984, Foundations of Logic Programming, Berlin: Springer-
Chelsea Publishing Co., 1950. Verlag.
Herbrand, J., 1930, Recherches sur la Theorie de la Demonstration, Loveland, D. W., 1969, “A Simplified Format for the Model Elimination
Travaux de la Societé des Sciences at des Lettres de Varsovie, Classe Procedure”, Journal of the Association for Computing Machinery, 16:
III, Science Mathématique et Physique, No. 33, 128. 349–363.
Huet, G. P., 1975, “A Unification Algorithm for Typed λ-calculus”, Loveland, D. W., 1970, “A Linear Format for Resolution”, Proceedings of
Theoretical Computer Science, 1: 27–57. the IRIA Symposium on Automatic Demonstration, New York:
Kerber, M., Kohlhase and V. Sorge, 1998, “Integrating Computer Algebra Springer-Verlag, pp. 147-162.
into Proof Planning”, Journal of Automated Reasoning, 21: 327–355. Loveland, D. W., 1978, Automated Theorem Proving: A Logical Basis,
Knuth, D. and P. B. Bendix, 1970, “Simple Word Problems in Universal Amsterdam: North Holland.
Algebras”, in Computational Problems in Abstract Algebra, J. Leech Luckham, D., 1970, “Refinements in Resolution Theory”, Proceedings of
(ed.), Oxford, New York: Pergamon Press, pp. 263–297. the IRIA Symposium on Automatic Demonstration, New York:
Kleene, S. C., 1962, Introduction to Metamathematics, Amsterdam: Springer-Verlag, pp. 163-190.
North-Holland. Martin-Löf, P., 1982, “Constructive Mathematics and Computer
Kowalski, R., 1974, “Predicate Logic as a Programming Language”, Programming”, Logic, Methodology and Philosophy of Science
Proceedings of the International Federation for Information (Volume IV), Amsterdam: North-Holland, pp. 153-175.
Processing (Proc. IFIP '74), Amsterdam: North Holland, pp. 569– Massacci, F. and L. Marraro, 2000, “Logical Cryptanalysis: Encoding and
574. Analysis of the U.S. Data Encryption Standard”, Journal of
Küchlin, W. and C. Sinz, 2000, “Proving Consistency Assertions for Automated Reasoning (Special Issue: Satisfiability in the Year 2000),
Automotive Product Data Management”, Journal of Automated I. P. Gent and T. Walsh (eds.), 24 (1–2): 165–203.
Reasoning (Special Issue: Satisfiability in the Year 2000), I. P. Gent McCarthy, J., 1962, “Towards a Mathematical Science of Computation”,
and T. Walsh (eds.), 24 (1–2): 145–163. International Federation for Information Processing Congress
Kühlwein, D., T. van Laarhoven, E. Tsivtsivadze, J. Urban and T. Heskes, (Munich, 1962), Amsterdam: North Holland, pp. 21–28.
2012, “Overview and Evaluation of Premise Selection Techniques for McCune, W., 1997, “Solution of the Robbins Problem”, Journal of

Automated Reasoning, 19 (3): 263–276. Successful Automated Reasoning in Large Theories (Volume 257), G.
McCune, W., 2001, MACE 2.0 Reference Manual and Guide, Sutcliffe and J. Urban (eds.), Bremen.
Mathematics and Computer Science Division, ANL/MSC-TM-249, Pelletier, F. J., 1986, “Seventy-Five Problems for Testing Automatic
Argonne National Laboratory. Theorem Provers”, Journal of Automated Reasoning, 2 (2): 191–216.
McRobie, M. A., 1991, “Automated Reasoning and Nonclassical Logics: Pelletier, F. J., 1998, “Natural Deduction Theorem Proving in
Introduction”, Journal of Automated Reasoning, 7 (4): 447–451. THINKER”, Studia Logica, 60 (1): 3–43.
Meng, J. and L. C. Paulson, 2008, “Translating higher-order clauses to Pelletier, F. J., G. Sutcliffe, and C. Suttner, 2002, “The Development of
first-order clauses”, Journal of Automated Reasoning, 40 (1): 35–60. CASC”, AI Communications, 15 (2–3): 79–90.
Meredith, C. A. and A. N. Prior, 1964, “Investigations into Implicational Peterson, J. G., 1977, The Possible Shortest Single Axiom for EC-
S5”, Z. Math. Logik Grundlagen Math., 10:203–220. Tautologies, Report 105, Department of Mathematics, University of
Miller, D. and G. Nadathur, 1988, “An Overview of λProlog”, Auckland.
Proceedings of the Fifth International Logic Programming Pollock, J., 1989, “OSCAR: A General Theory of Rationality”, Journal of
Conference — Fifth Symposium in Logic Programming, R. Bowen Experimental & Theoretical Artificial Intelligence, 1 (3): 209–226
and R. Kowalski (eds.), Cambridge, MA: MIT Press. Pollock, J., 1995, Cognitive Carpentry, Cambridge, MA: Bradford/MIT
McCharen, J. D., R. A. Overbeek and L. A. Wos, 1976, “Problems and Press.
Experiments for and with Automated Theorem-Proving Programs”, Pollock, J., 2006, “Against Optimality: Logical Foundations for Decision-
IEEE Transactions on Computers 8: 773–782. Theoretic Planning in Autonomous Agents”, Computational
Muzalewski, M., 1993, An Outline of PC Mizar, Fondation Philippe le Intelligence, 22(1): 1–25.
Hodey, Brussels. Portoraro, F. D., 1994, “Symlog: Automated Advice in Fitch-style Proof
Nivens, A. J., 1974, “A Human-Oriented Logic for Automatic Theorem Construction”, CADE-12: Proceedings of the 12th International
Proving”, Journal of the Association of Computing Machinery, 21 Conference on Automated Deduction, (Lecture Notes in Artificial
(4): 606–621. Intelligence: Volume 814), A. Bundy (ed.), Berlin: Springer-Verlag,
Oppenheimer, P. and E. Zalta, 2011, “A Computationally-Discovered pp. 802-806.
Simplification of the Ontological Argument”, Australasian Journal of Portoraro, F. D., 1998, “Strategic Construction of Fitch-style Proofs”,
Philosophy, 89 (2): 333–349. Studia Logica, 60 (1): 45–66.
Paulson, L. C., 1994, Isabelle: A Generic Theorem Prover (Lecture Notes Prasad, M., A. Biere and A. Gupta, 2005, “A Survey of Recent Advances
in Computer Science: Volume 828), Berlin: Springer-Verlag. in SAT-Based Formal Verification”, International Journal on
Paulson, L. C. and K. Grabczewski, 1996, “Mechanizing Set Theory”, Software Tools for Technology Transfer, 7 (2): 156–173.
Journal of Automated Reasoning, 17 (3): 291–323. Prawitz, D., 1965, Natural Deduction: A Proof Theoretical Study,
Pease, A. and G. Sutcliffe, 2007, “First Order Reasoning on a Large Stockholm: Almqvist & Wiksell.
Ontology”, Proceedings of the CADE-21 Workshop on Empirically Quaife, A., 1992, Automated Development of Fundamental Mathematical

Theories, Kluwer Academic Publishers. Amsterdam: North-Holland.

Robinson, J. A., 1965, “A Machine Oriented Logic Based on the Trybulec, A., 1978, “The Mizar Logic Information Language”, Bulletin of
Resolution Principle”, Journal of the Association of Computing the Association for Literary Linguistic Computing, 6(2): 136–140.
Machinery, 12: 23–41. Trybulec, A. and H. Blair, 1985, “Computer Assisted Reasoning with
Robinson, J. A., 1965, “Automatic Deduction with Hyper-resolution”, Mizar”, Proceedings of the 9th International Joint Conference on
Internat. J. Comput. Math., 1: 227–234. Artificial Intelligence, (IJCAI-85: Volume 1), Los Angeles, pp. 26–
Robinson, J. A. and A. Voronkov (eds.), 2001, Handbook of Automated 28.
Reasoning: Volumes I and II, Cambridge, MA: MIT Press. Turing, A., 1936, “On computable numbers, with an application to the
Schmitt, P. and I. Tonin, 2007, “Verifying the Mondex Case Study”, Entscheidungsproblem”, Proceedings of the London Mathematical
Proceedings of the Fifth IEEE International Conference on Software Society, 42 (2): 230–265.
Engineering and Formal Methods, IEEE Computer Society, pp. 47– Urban, J., 2007, “MaLARea: A Metasystem for Automated Reasoning in
58. Large Theories”, Proceedings of the CADE-21 Workshop on
Schulz, S., 2004, “System Abstract: E 0.81”, Proceedings of the 2nd Empirically Successful Automated Reasoning in Large Theories, J.
International Joint Conference on Automated Reasoning (Lecture Urban, G. Sutcliffe and S. Schulz (eds.), pp. 45–58.
Notes in Artificial Intelligence: Volume 3097), D. Basin and M. Urban, J., K. Hoder, A. Voronkov, 2010, “Evaluation of Automated
Rusinowitch (eds.), Berlin: Springer-Verlag, pp.223-228. Theorem Proving on the Mizar Mathematical Library”, Mathematical
Sieg, W. and J. Byrnes, 1996, Normal Natural Deduction Proofs (in Software – ICMS 2010: Proceedings of the Third International
Classical Logic), Report CMU-PHIL 74, Department of Philosophy, Congress on Mathematical Software, Kobe, Japan, (Lecture Notes in
Carnegie-Mellon University. Computer Science, Volume 6327), pp. 155–166.
Slaney, J. K., 1984, “3,088 Varieties: A Solution to the Ackerman Urban, J. and J. Vyskocil, 2012, “Theorem Proving in Large Formal
Constant Problem”, Journal of Symbolic Logic, 50: 487–501. Mathematics as an Emerging AI Field”, arXiv:1209.3914 [cs.AI],
Stickel, M. E., 1992, “A Prolog Technology Theorem Prover: A New Report No. DPA-12271, Cornell University.
Exposition and Implementation in Prolog”, Theoretical Computer van Benthem Jutting, L. S., 1977, Checking Landau's “Grundlagen” in
Science, 104: 109–128. the Automath system. PhD Thesis, Eindhoven University of
Suppes, P., et al., 1981, “Part I: Interactive Theorem Proving in CAI Technology, 1977. (Published as as Mathematical Centre Tracts nr.
Courses”, University-Level Computer-Assisted Instruction at 83, Amsterdam, Mathematisch Centrum, 1979).
Stanford: 1968–1980, P. Suppes (ed.), Institute for the Mathematical Voronkov, A., 1995, “The Anatomy of Vampire: Implementing Bottom-
Study of the Social Sciences, Stanford University. Up Procedures with Code Trees”, Journal of Automated Reasoning,
Sutcliffe, G. and C. Suttner, 1998, “The TPTP Problem Library – CNF 15 (2): 237–265.
Release v1.2.1”, Journal of Automated Reasoning, 21 (2): 177–203. Wallen, L. A., 1990, Automated Deduction in Nonclassical Logics,
Szabo, M. E. (ed.), 1969, The Collected Papers of Gerhard Gentzen, Cambridge, MA: MIT Press.

Wang, H., 1960, “Proving Theorems by Pattern Recognition – I”, in Zhang, L. and S. Malik, 2002, “The Quest for Efficient Boolean
Automation of Reasoning (Volume 1), J. Siekmann and G. Wrightson Satisfiability Solvers”, CADE-18: Proceedings of the 18th
(eds.), Berlin: Springer-Verlag, 1983, pp. 229–243. International Conference on Automated Deduction, (Lecture Notes in
Wang, H., 1960, “Toward Mechanical Mathematics”, in Automation of Artificial Intelligence: Volume 2392), A. Voronkov (ed.), Berlin:
Reasoning (Volume 1), J. Siekmann and G. Wrightson (eds.), Berlin: Springer-Verlag, pp. 295-313.
Springer-Verlag, 1983, pp. 244-264.
Wegner, B., 2011, “Completeness of reference databases, old-fashioned or Academic Tools
not?”, Newsletter of the European Mathematical Society, 80: 50–52.
Wiedijk, F., 2006, The Seventeen Provers of the World, (Lecture Notes in How to cite this entry.
Artificial Intelligence: Volume 3600), F. Wiedijk (ed.), New York: Preview the PDF version of this entry at the Friends of the SEP
Springer-Verlag. Society.
Wiedijk, F., 2007, “The QED Manifesto Revisited”, Studies in Logic, Look up this entry topic at the Indiana Philosophy Ontology
Grammar and Rhetoric, 10 (23): 121–133. Project (InPhO).
Wos, L. (ed.), 2001, Journal of Automated Reasoning (Special Issue: Enhanced bibliography for this entry at PhilPapers, with links
Advances in Logic Through Automated Reasoning), 27 (2). to its database.
Wos, L., D. Carson and G. R. Robinson, 1965, “Efficiency and
Completeness of the Set of Support Strategy in Theorem Proving”,
Other Internet Resources
Journal of the Association of Computing Machinery, 12: 698–709.
Wos, L., R. Overbeek, E. Lusk and J. Boyle, 1984, Automated Reasoning: Publications
Introduction and Applications, Englewood Cliffs, NJ: Prentice-Hall.
Wos, L., D. Ulrich, and B. Fitelson, 2002, “Vanquishing the XCB Hales, T. C., 2005b, The Flyspeck Project Fact Sheet,
Question; The Methodological Discovery of the Last Shortest Single http://code.google.com/p/flyspeck/wiki/FlyspeckFactSheet
Axiom for the Equivalential Calculus”, Journal of Automated Hales, T. C., 2014, Flyspeck,
Reasoning, 29 (2):107–124. http://code.google.com/p/flyspeck/wiki/AnnouncingCompletion
Wos, L., S. Winker, R. Veroff, B. Smith and L. Henschen, 1983, Martin, U. and A. Pease, 2013, “What does mathoverflow tell us
“Questions Concerning Possible Shortest Single Axiom for the about the production of mathematics?,” Computing Research
Equivalential Calculus: An Application of Automated Theorem Repository, at arxiv.org.
Proving to Infinite Domains”, Notre Dame Journal of Formal Logic, Sutcliffe, G., 2014, Proceedings of the 7th IJCAR Automated
24: 205–223. Theorem Proving System Competition (CASC-J7), available online,
Yoo, J., E. Jee and S. Cha, 2009, “Formal Modeling and Verification of pp. 1–36.
Safety-Critical Software”, IEEE Software, 26 (3): 42–49.

Web Sites
Related Entries
ACL2: A Computational Logic
Alfa/Agda artificial intelligence: logic and | logic: classical | logic: modal | reasoning:
The Coq Proof Assistant defeasible
CVC4
E Theorem Prover Copyright © 2014 by the author
EQP Equational Prover Frederic Portoraro
HOL Automated Reasoning Group

IMPS: An Interactive Mathematical Proof System
iProver
Isabelle/Isar
leanCoP
LEO-II
Metamath
MetiTarski
The Minlog System
The Mizar Project
The Nuprl Project
Paradox
Prover 9 and Mace 4
PVS Specification and Verification System
Satallax
SPASS
TPS Theorem Proving System
Vampire
Waldmeister
The CADE ATP System Competition
The TPTP Problem Library for Automated Theorem Proving
The QED Manifesto
CADE: The Conference on Automated Deduction
IJAR: The International Joint Conference on Automated Reasoning

Reasoning-Automated A4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reasoning-Automated A4

Uploaded by

Copyright:

Available Formats

pdf version of the entry

2.5 Term Rewriting resources.

2 Stanford Encyclopedia of Philosophy Winter 2014 Edition 3

4 Stanford Encyclopedia of Philosophy Winter 2014 Edition 5

6 Stanford Encyclopedia of Philosophy Winter 2014 Edition 7

8 Stanford Encyclopedia of Philosophy Winter 2014 Edition 9

The Resolution Principle

10 Stanford Encyclopedia of Philosophy Winter 2014 Edition 11

12 Stanford Encyclopedia of Philosophy Winter 2014 Edition 13

14 Stanford Encyclopedia of Philosophy Winter 2014 Edition 15

16 Stanford Encyclopedia of Philosophy Winter 2014 Edition 17

18 Stanford Encyclopedia of Philosophy Winter 2014 Edition 19

The use of sequent-type calculi in automated theorem proving was [α — β] α α⊃β

20 Stanford Encyclopedia of Philosophy Winter 2014 Edition 21

22 Stanford Encyclopedia of Philosophy Winter 2014 Edition 23

(P ∨ Q) & (P ⊃ R) & (Q ⊃ R) & ~R Path 6 Q, ~P, R and ~R

2.5 Term Rewriting

Equality is an important logical relation whose behavior within automated

24 Stanford Encyclopedia of Philosophy Winter 2014 Edition 25

deduction deserves its own separate treatment. Equational logic and, ⇒ (x × 1) + (1 × x) by R1

26 Stanford Encyclopedia of Philosophy Winter 2014 Edition 27

der(x + 0) ⇒ der(x) + der(0) by R3 2.6 Mathematical Induction

28 Stanford Encyclopedia of Philosophy Winter 2014 Edition 29

30 Stanford Encyclopedia of Philosophy Winter 2014 Edition 31

32 Stanford Encyclopedia of Philosophy Winter 2014 Edition 33

34 Stanford Encyclopedia of Philosophy Winter 2014 Edition 35

36 Stanford Encyclopedia of Philosophy Winter 2014 Edition 37

38 Stanford Encyclopedia of Philosophy Winter 2014 Edition 39

40 Stanford Encyclopedia of Philosophy Winter 2014 Edition 41

42 Stanford Encyclopedia of Philosophy Winter 2014 Edition 43

i − j = 1 & f(read(write(a, i, 2), j + 1) = read(write(a, i, f(i − j + ∞ 22

(Ganzinger et al. 2004) discusses an approach to SMT called DPLL(T)

44 Stanford Encyclopedia of Philosophy Winter 2014 Edition 45

46 Stanford Encyclopedia of Philosophy Winter 2014 Edition 47

48 Stanford Encyclopedia of Philosophy Winter 2014 Edition 49

The formal verification of security protocols is an almost ideal Original form:

50 Stanford Encyclopedia of Philosophy Winter 2014 Edition 51

52 Stanford Encyclopedia of Philosophy Winter 2014 Edition 53

54 Stanford Encyclopedia of Philosophy Winter 2014 Edition 55

response to B's response to x. 6 (S(x1, S(S(L, S(x2, x2)), x2)) = S(S(L, 2 2

56 Stanford Encyclopedia of Philosophy Winter 2014 Edition 57

58 Stanford Encyclopedia of Philosophy Winter 2014 Edition 59

v6)),v7),e(v6,v7))). Definition of none_greater:

60 Stanford Encyclopedia of Philosophy Winter 2014 Edition 61

62 Stanford Encyclopedia of Philosophy Winter 2014 Edition 63

64 Stanford Encyclopedia of Philosophy Winter 2014 Edition 65

66 Stanford Encyclopedia of Philosophy Winter 2014 Edition 67

68 Stanford Encyclopedia of Philosophy Winter 2014 Edition 69

70 Stanford Encyclopedia of Philosophy Winter 2014 Edition 71

72 Stanford Encyclopedia of Philosophy Winter 2014 Edition 73

74 Stanford Encyclopedia of Philosophy Winter 2014 Edition 75

76 Stanford Encyclopedia of Philosophy Winter 2014 Edition 77

78 Stanford Encyclopedia of Philosophy Winter 2014 Edition 79

80 Stanford Encyclopedia of Philosophy Winter 2014 Edition 81

82 Stanford Encyclopedia of Philosophy Winter 2014 Edition 83

Theories, Kluwer Academic Publishers. Amsterdam: North-Holland.

84 Stanford Encyclopedia of Philosophy Winter 2014 Edition 85

86 Stanford Encyclopedia of Philosophy Winter 2014 Edition 87

EQP Equational Prover Frederic Portoraro

HOL Automated Reasoning Group

88 Stanford Encyclopedia of Philosophy Winter 2014 Edition 89

You might also like