Automated theorem proving has two goals: (1) to prove theorems and (2) to do it automatically. Fully automated theorem provers for first-order logic have been developed, starting in the 1960’s, but as theorems get more complicated, the time that theorem provers spend tends to grow exponentially. As a result, no really interesting theorems of mathematics can be proved this way- the human life span is not long enough. Therefore a major problem is to prove interesting theorems and the solution is to give the theorem provers heuristics, rules of thumb for knowledge and wisdom.

Some heuristics are fairly general, for example, in a proof that is about t break into several cases do as much as possible that will be of broad applicability before the division into cases occurs. But many heuristics are area-specific; for instance, heuristics appropriate for plane geometry will probably not be appropriate for group theory. The development of good heuristics is a major area of research and requires much experience and insight. Brief History

In 1930 Kurt Godel and Jaques Herbrand proved the first version of what is now the completeness of predicate calculus. Godel and Herbrand both demonstrated that the proof machinery of the predicate calculus can provide a formal proof for every logically true proposition, while also giving a constructive method for finding the proof, given the proposition. In 1936 Alonzo Church and Alain Turing independently discovered a fundamental negative property of the predicate calculus. Until then, there had been an intense search for a positive solution to what was called the decision problem – which was to create an algorithm for the predicate calculus which would correctly determine, for any formal sentence B and any set A of formal sentences, whether or not B is a logical consequence of A.

Church and Turing found that despite the existence of the proof procedure, which correctly recognizes (by constructing the proof of B from A) all cases where B is in fact a logical consequence of A, there is not and cannot be an algorithm which can similarly correctly recognize all cases in which B is not a logical consequence of A. It means that it is pointless to try to program a computer to answer ‘yes’ or ‘no’ correctly to every question of the form ‘is this a logically true sentence ? ‘” Church and Turing proved that it was impossible to find a general decision to verify the inconsistency of a formula. However, there are proof procedures that will verify that a formula is valid if the formula is in fact valid. For invalid formulas, these general procedures will never terminate . The most significant to automated theorem proving occurred in the 1930’s and 1960’s.

In 1930, Herbrand proved an important theorem that changed the idea of a mechanical theorem prover into a more practicable one. He developed an algorithm to find an interpretation that can falsify a given formula. Since, a valid formula is one that is true under all interpretations, this interpretation cannot exist if the formula is indeed valid, and his algorithm will halt after trying out a finite amount of interpretations. However, it was impossible to implement the refutation procedure which Herbrand’s theorem led to because going through the procedure by hand would take forever.

In 1960, Paul Gilmore implemented Herbrand’s procedure for the first time on a computer. Later on, however, Martin Davis and Putnam followed with a more efficient procedure. However, the most important breakthrough, during the 60’s, in the automated theorem proving world occurred when Robinson reviewed a paper by Davis and Putnam, proposing a predicate-calculus proof procedure that seemed better than Gilmore’s, but they still had onto turned it into a practical computer program. Robinson then used programming skills to implement Davis and Putnam’s procedure on the Argonne IBM 704.

Robinson quickly found that their procedure remained very inefficient. However, while implementing a different procedure also suggested in 1960 by Dag Prawitz, Robinson came to see how the two sets of ideas could be combined into a new, far more efficient, automated proof procedure for first-order predicate logic: ”resolution. ” Herbrand’s Theorem Herbrand’s theorem dealt with clauses and literals. A set of clauses (for example P(x) v R(x), Q(x) v R(x)}) is unsatisfiable if and only if it is false under all interpretations over all domains (by definition). However, it is impossible to consider all the interpretations over all the domains.

This is the importance of Herbrand’s contribution. He developed a special domain H such that a set of clauses S is unsatisfiable if and only if S is false under all the interpretations over this domain. This Herbrand universe of S is defined as follows Let Ho be the set of constants appearing in S. If no constant appears in S, then H0 is to consist of a single constant, say Ho = a}. For i = 0, 1, 2,… , let Hi+1 be the union of Hi and the set of all terms of the form f^n(t1, …. tn) for all n-place functions f^n occuring in S, where tj, j= 1, … , n, are members of the set Hi.

Then each Hi is called the i-level constant set of S, and H is called the Herbrand universe of S. Example: Let S = P(a), P(x) v P(f(x))}. Then Ho = a} H1 = a, f(a)} H2 = a, f(a), f(f(a))} ….. H = a, f(a), f(f(a)), f(f(f(a))), … } Herbrand’s Theorem: A set S of clauses is unsatisfiable if and only if there is a finite unsatisfiable set S’ of ground instances (A ground instance of a clause C of a set S of clauses obtained by replacing variables in C by members of the Herbrand universe of S) of clauses of S. The above information basically contains the essence of Herbrand’s theorem.

I will not discuss Herbrands theorem in detail. Herbrand’s contribution opened a world of automated theorem proving that was not possible before. Unlike before where there is an infinite of different domains to test, now the domain is confined by Herbrand’s universe. This breakthrough is the basis of Resolution, which made automated theorem proving more effective. Resolution Even though Herbrand’s procedure was a major breakthrough, it has a major drawback. It requires generation of sets of ground instances of clauses, which grows exponentially. It will literally take forever to prove interesting theorems.

Resolution principle is the idea that will decrease the number of generation of sets of ground instances required by Herbrand’s procedure. The resolution principle is easier to understand in propositional logic than in first-order logic. The basic concept of propositional logic is to combine the literals that are complementary to each other so that they cancel out (e. g. P and P are complementary). Below is a quick example that demonstrates the concept. Example: Consider the set S 1. Q v P 2. P 3. Q From (1) and (2), we obtain a resolvent 4. Q From (4) and (3), we obtain a resolvent 5. } Definition of resolution principle:

For any two clauses C and D, if there is a literal L1 in C that is complementary to a literal in L2 in D, then delete L1 and L2 from C and D, respectively, and construct the disjunction of the remaining clauses. The constructed clause is a resolvent of C and D. The resolution principle is a very powerful inference rule. It is complete, that is: as set S of clauses is unsatisfiable if and only if there is a resolution deduction of the empty clause } from S. The resolution principle for first-order logic requires understanding of two principles: substitution and unification for most general applications.

However, a first-order resolution example will be used here that does not require those two technical concepts. Substitution is required for first-order resolution because clauses now contain variables. Look at the following clauses: C1: P(x) v Q(x) C2: P(f(x)) v R(x) There is no literal in C1 that is complementary in C2 so that they can cancel out. Even though the predicates P and P look like they should cancel. Substitution is a procedure that manipulates the clauses so that they do cancel out in the end. It might remind readers to some of the ideas of algebra and calculus they might have studied.

Unification is also used to combine functions in clauses are not exactly complementary. However, this report will not go in depth into these two ideas. Further information can be found in Symbolic Logic and Mechanical Theorem Proving. Most of the application of the resolution principle in first-order logic will require the use of substitution and unification. However, in order to give a general concept instead of an in-depth math lesson, we will not show how those two concepts work in the resolution example. The following example is taken from Chang and Lee’s Symbolic Logic and Mechanical Theorem Proving.

I think it is a great example to show off the concept of resolution without concentrating on the math. Example Show that alternate interior angles formed by a diagonal of a trapezoid are equal. Let T(x, y, u, v) mean that xyuv is a trapezoid with upper-left vertex x, upper-right vertex y, lower-right vertex u, and lower-left vertex v. Let P(x, y, u, v) mean that the line segment xy is parallel to the line segment uv. Let E(x,y,z,u,v,w) mean that the angle xyz is equal to the angle uvw. Then we have the following axioms:

During the 1970’s and 1980s the Argonne National Laboratory, where Robinson had the introduction to automated theorem proving, did much research on the resolution principle . At Argonne, Larry Wos and George Robinson continued theoretical work on proof procedures, for example combining unification (see appendix) with an inference rule for equality, to yield the powerful rules of “demodulation” and “paramodulation. ” To these theoretical advances were added continuous practical improvements (focusing on matters such as indexing, and the storage and retrieval of clauses), notably by Ross Overbeek and Ewing Lusk.

The Argonne systems – AURA (AUtomated Reasoning Assistant); now replaced by OTTER, developed by William McCune – are quite different from the simple goal-directed inference engine at the heart of Prolog. Fast and sophisticated, AURA and OTTER usually proceed in a “best-first” search, focusing on relevant parts of the theory, special hypotheses and the goal,lxxxv rather than simply searching from the goal backwards in a “depth-first” manner (as in the Prolog prover). For several years the Argonne systems attracted relatively little attention.

At the end of the 1970s, and in the early 1980s, however, they began successfully to be applied, not just to theorems whose proofs were known, but to the proof of open conjectures in several specialized fields of mathematics (conjectures that humans had formed but which they had been unable themselves to prove): see the articles by Winker and Wos in 1978,lxxxvi by Winker, Wos and Lusk in 1981,lxxxvii by Winker in 1982,lxxxviii by Wos in 1982,lxxxix and Wos and colleagues in 1983. xc