Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Some Pumping Lemmas

Conrad T. Miller Math 5358 April 30, 2007

Abstract In the theory of computation a regular language is any set of strings accepted by a deterministic nite automata (DFA). Similarly, a context-free language may be said to be any set of strings accepted by a pushdown automata (PDA). In each case one may give a result that provides a necessary condition for classifying a set of strings as either regular or context-free. We call each of these results the pumping lemma.

Preliminary Denitions

A decision problem is a function whose codomain contains only two possible outputs: 0 or 1. That is, f : X {0, 1} is a decision problem. We think of a decision problem as a problem of deciding which elements of X have a particular property. In this sense 1 corresponds to the answer yes and 0 to the answer no. To specify a decision problem, one must include the domain of interest X, and the set f 1 ({1}) X having the desired property. Throughout this paper we will limit ourselves to abstract computing machines that calculate a decision function, f . Although such a perspective might seem narrow, decision problems actually represent a wide variety of problems in computer science. For instance, consider the problem of determining if a graph in the set of all graphs of size n is connected. This is a decision problem. Another important abstraction is the concept of a string. Suppose represents a nite set of symbols or characters called an alphabet. A string over the alphabet is any nite list of elements of . This list may possibly be empty, in which case we denote the empty string by . is dened as the set of all strings over . Note that , and in particular = { }. Where denotes the empty set. Now, if x then the length of x, |x|, is the number of symbols from in x. For consistency | | = 0. Example 1. Let = {a, b}. Then = { , a, b, aa, ab, ba, bb, aaa, aab, ...}, and | | = 0, |a| = |b| = 1, |aa| = |ab| = |ba| = |bb| = 2, etc. Concatenation is a binary operator that forms a new string x y = xy from two previously dened strings x and y. The new string xy is read by reading the string x rst and then the string y. In general xy = yx. Suppose x, y, z are strings then the operation of concatenation obeys the axioms 1. associativity: (xy)z = x(yz), 2. is a left and right identity: x = x = x, and

3. |xy| = |x| + |y|. The associativity of the concatenation operator allows us to unambiguously write xyz = (xy)z = x(yz). For the more abstractly minded, we note that concatenation over a set of strings is a monoid (see [Koz97]).

Deterministic Finite Automata

Suppose we wish to model some physical system. This could be any physical system; a pendulum, a mixture of chemicals, a computing machine, etc. To describe the progression of that system through time we require a description of the important aspects of the systems current conguration, and we require a method for determining its future congurations from its present one. In the example 2

of the pendulum two initial conditions, the current angle of the pendulum and its time rate of change, and a dierential equation from which we may calculate the angle of the pendulum at any future time provide a complete description of the systems time evolution. Similarly, a complete description of a computing machine requires both a description of its present condition, and some method of determining its future congurations. To model a computing system with a nite amount of memory, one only needs a nite number of states. Each state represents all relevant information required to determine the computers time evolution from that point onwards. Furthermore since we are only concerned with decision problems, consider the decision problem dened by f : {0, 1} where the set is a set of nite length strings. Our computing machine will take as input some s and decide if s f 1 ({1}); i.e. our computing machine will compute f.

2.1

Denition of a DFA

The preceding is an intuitive description of a Deterministic Finite Automata (DFA). In short, a DFA is a machine with a nite number of states and a nite number of transitions between states. This machine has a predetermined state in which it starts, a stream of input in the form of a string, and a set of nal states which indicate that the machine has accepted its input. Denition 2.1 (Deterministic Finite Automata). Formally a Deterministic Finite Automaton (DFA) is a structure M = (Q, , , q0 , F ), where 1. Q is a nite set of states, 2. is a nite alphabet, 3. : Q Q is the transition function, 4. q0 Q is the start state, and 5. F Q is a set of nal accepting states. is an alphabet. The input to M is in the form of a string s . The set Q = {q0 , q1 , ..., qn } is the nite set of all possible states. The rst state in our list, q0 , is singled out as the start state. This is the state in which our machine starts before any input. M reads the string of input s in order one character at a time, and the transition function determines M s new state from this character along with M s current state. Since is a function from the nite set Q , to another nite set Q, one could dene by listing all possible inputs and outputs. After reading all characters of s, if the nal state of M is a state r F , we say that the DFA accepts the string s. That is, if M calculates the decision function f : {0, 1}, and the nal state of M is r F , then f (s) = 1. Example 2. As an example of a three state automaton, let M = (Q, , , q0 , F ) with states Q = {q0 , q1 , q2 }, start state q0 , and F = {q2 } as the set of accepting states. Lastly for : Q Q let (q0 , a) = q1 (q0 , b) = q0 3

(q1 , a) = q2 (q1 , b) = q0 (q2 , a) = q2 (q2 , a) = q2 All parts of the DFA are specied; however, there exists more compact methods for dening M . Either of the transition diagram in gure 1

Figure 1: The Transition Diagram for M

or the table will suce. 0 1 2F a 1 2 2 b 0 0 2

In the table the input character is listed across the top row, the input state is listed in the leftmost column, and the output is read just like one would read a multiplication table. For further notational convenience in both cases each state is numbered, the start state is indicated by and any nal states by F (or a double circle in the case of gure 1) [Koz97]. Before we progress any further the following denition will prove useful. Denition 2.2 (Regular Language). A regular language over is any set of stings in that are accepted by a DFA. If M is a DFA we denote the language accepted by M as L(M ). Later the reader will see that not every subset of is a regular language, but for now we give an example of a language that is regular. 2.1.1 An Example of a Regular Language

Proposition 2.0.1. The set of strings over = {a, b} given by A = {xaay|x, y } is a regular language.

Proof To show that a language is regular using denition 2.1, one needs to construct a DFA that accepts a string s if and only if s A. Dene M as in example 2. Our claim is that L(M ) = A. Now suppose that s A. Then s = xaay for some x, y . After reading x, M must be in one of the in states q0 , q1 , or q2 . Now once M reads the substring aa it is clear that M will be in state q2 regardless of its current state. To see this the reader may review each of the proceeding cases separately. Finally after reading y, M will still be in state q2 since (q2 , a) = (q2 , b) = q2 . State q2 is an acceptance state. Therefore A L(M ). Conversely, suppose s L(M ). Thus M is in the nal state q2 after reading s. Once in the state q2 , M remains in that state. So for s L(M ), s only needs to place M in state q2 once. If at any point M is not in state q2 and M reads the character b, M returns to the start state q0 . So M must read the substring aa to get to q2 , and accept s. Hence L(M ) A. Therefore L(M ) = A and A is a regular language.

2.2

A Pumping Lemma for Regular Languages

At this juncture we remind the reader that a DFA can at best only have a nite memory. Although the DFAs current state acts as a type of memory, a DFA is limited by the fact that the set of possible states Q is nite. In fact this memory has the further limitation that the process which placed the DFA in its current state is not recallable; any of the possible sequences of states that reach the current state might have occurred. These limitations restrict the types of sets of strings that are regular languages. The following theorem precisely states the problem at hand. Theorem 2.1 (The Pumping Lemma). Let A be a regular set. Then there is a p N (the pumping length) such that for any strings x, y, and z with xyz A and |y| p, there exists strings u, v and w such that y = uvw, v = and xuv i wz A, i 0 (This version of the theorem is from [Koz97], most often a slightly weaker version is given). Proof Let M = (Q, , , s, F ) be a DFA which recognizes the regular language A and let p = |Q|. Suppose that xyz A with y = y1 y2 ...yn where |y| = n p. Let r1 , r2 , ..., rn+1 be the states that M enters as it processes y. That is ri+1 = (ri , yi ) for every i = 1, .., n. This list of states has length n + 1. Note n + 1 p + 1. According to the pigeonhole principle, among the rst p + 1 states that M enters at least two states in the list must be the same state since there are only p states. We have called this state rj and rk where j < k. Furthermore k p + 1. Denote u = y1 ...yj , v = yj ...yp1 , and w = yk ...yn . Then the string u will take M from r1 to rj , the string v will take M from rj back to rj , and the string w will take M from rj to rn+1 . Since v does not change which state M is in, we may repeat v as many times as we would like. Hence M accepts xuv i wz for every i N. Furthermore j < k p + 1, so j 1 = p, and |v| > 0 (v = ). Hence result. (proof adapted from [Sip97]) The proof of the pumping lemma shown here assumes that A is innite. If A is a nite set of strings it is always regular, and the pumping lemma still holds vacuously since for p N such that p > max{|s||s A} there are no strings in A longer than p [Lin01].

2.2.1

An Example of a Nonregular Language

The most useful form of the pumping lemma turns out to be the contrapositive restatement. So consider Corollary 2.1.1. Let A be a set of strings, and suppose that p N there are strings x, y, and z with xyz A and |y| p, for every u, v and w such that y = uvw and v = , i 0 such that xuv i wz A. Then A is not regular [Koz97]. The contrapositive form of the pumping lemma allows us to prove sets nonregular. Proposition 2.1.1. The set of all strings of the form, A = {an bm |n 0}, is an example of a nonregular set. Proof We will show the result using corollary 2.1.1. Pick p 0 arbitrarily. Choose x = ap , y = bp , and z = . With these choices xyz = ap bp A and |y| = p as required by corollary 2.1.1. Now arbitrarily pick u, v = and w of lengths j, k > 0 and l respectively such that y = uvw. p = |y| = |u| + |v| + |w| = j + k + l. Hence for i = 2 xuv 2 wz = ap bj bk bk bl = ap bj+2k+l = ap bp+k where k > 0. Hence xuv 2 wz A. Therefore A is not regular.

Nondeterministic Pushdown Automata

Now that we have seen the limitations of DFAs the reader might wonder if we could do better. Maybe one could add some form of memory to expand the possible sets of acceptable input strings? This is exactly what a pushdown automata (PDA) does. But rst we need to loosen our denition of a DFA. What if our nite automata were nondeterministic? Informally a nondeterministic nite automaton is a machine in which the next state need not by uniquely determined by the current state and input character. Denition 3.1 (Nondeterministic Finite Automata). Formally a Nondeterministic Finite Automata (NDFA) is a structure N = (Q, , , , S, F ), where 1. Q is a nite set of states, 2. is a nite input alphabet, 3. Q Q is the transition relation, 4. S Q is the set of possible start states, and 5. F Q is a set of nal accepting states. 6

The only dierences between this denition and denition 2.1 are that now N has a set of possible start states S instead of the single start state q0 , and a transition relation , instead of a transition function . The way one interprets these dierences is that N may randomly start in one of many alternative start states given by S, and for each input character in , N may progress from its current state to one of many choices for the next state. So the time evolution of N may not be unique given some input s . Specically, the nal state of N is not predetermined, and we require a new method of deciding which strings N accepts. A string s is accepted by N , if after reading s it is possible that N is in an accepting state r F . Only one of all the possible choices made from the initialization of N to the nal transition must lead to an accept state for N to accept s. The literature contains several variants of denition 3.1. In many cases instead of a transition relation the author denes a transition function : Q 2Q (see [Sip97] and [Lin01]). These denitions are equivalent. Surprisingly it can be shown that nondeterminism does not change the ability of the machine N to accept any new sets [Koz97]; i.e. if L(N ) is the language accepted by N , then L(N ) is a regular language and is accepted by some DFA. Hence NDFAs suer from the same limitations as DFAs. We wish to further expand our new denition by adding some form of memory. The addition of memory will allow our abstract machine to overcome some of the limitations seen thus far. Conceptually a PDA is a NDFA with the addition of an innite depth stack. A stack is a type of memory that allows access to only a single item at a time. In particular, a stack gives access to the last item placed on the stack, but once this item is recalled the PDA then has access to the item placed on the stack before the last.

3.1

Denition of a PDA

Denition 3.2 (Pushdown Automata). Formally a Pushdown Automata (PDA) is a structure M = (Q, , , , q0 , , F ), where 1. Q is a nite set of states. 2. is a nite input alphabet. 3. is the nite stack alphabet. 4. (Q ( { }) ) Q is the transition relation. 5. q0 Q is the start state. 6. is the initial stack symbol. 7. F Q is a set of nal accepting states. Recall from the discussion of DFAs that the input to M is in the form of a string s . Q is the nite set of all possible states, q0 is the start state, and F is the set of accepting states. If after M is done processing s , it is possible that M is in a state r F , then f (s) = 1, where f is the decision function calculated by M . The only feature that is essentially new is the addition of the stack structure. is the alphabet used by the stack. Only one character from is placed on the stack during any one transition. The bottom of the stack is initially represented by , but the symbol may 7

be placed on the stack or removed from the stack during the course of any transition, just like any other symbol in the set . The addition of the stack also forces a redesign of the transition relation . now includes the possibility of reading a symbol from the stack, and how this might alter M s transition from one state to the next. Following the outline of the development of theorem 2.1, we can state the denition for a context-free language in an analogous manner as the denition for a regular language. Denition 3.3 (Context-Free Languages). A context-free language over is any set of stings in that are accepted by a PDA. If M is a PDA we denote the language accepted by M as L(M ). The reader should be aware that the above denition is not standard. Historically scientists studying the theory of computation gave context-free languages a denition quite dierent from the denition provided here. Most textbooks provide our denition as a theorem (see any of [Koz97], [Lin01] or [Sip97]).

3.2

A Pumping Lemma for Context-Free Languages

Unlike DFAs, PDAs do not suer from the limitation imposed by theorem 2.1. However, PDAs have their own limitation based on the structure of their memory. This limitation is the subject of our next theorem. Theorem 3.1 (The Pumping Lemma). Let A be a context-free language. Then there is a p N (the pumping length) such that every string z A of length at least p can be broken up into ve substrings, z = uvwxy, such that vx = , |vwx| p, and uv i wxi y A, i 0 ([Koz97]). In this one case we will dispense with the proof since the proof requires another characterization of context-free languages than the one given above. The interested reader should consult any of the references supplied in the reference section. Finally, as with regular languages we will restate theorem 3.1 in contrapositive form since this form is the most useful for showing that a language is not context-free. Corollary 3.1.1. Let A be a set of strings, and suppose that p N there is a z A with length at least p such that for every u, v w, x, and y such that z = uvwxy with vx = and |vwx| p, i 0 such that uv i wxi y A. Then A is not context-free [Koz97]. With corollary 3.1.1 one can show that a set of strings is not a context-free language.

References
[FB94] Robert W. Floyd and Richard Beigel, The language of machines, W. H. Freeman and Company, New York, NY, 1994. [Koz97] Dexter C. Kozen, Automata and computability, Spring-Verlag, New York, NY, 1997. [Lin01] Peter Linz, An introduction to formal languages and automata, third ed, Jones and Bartlett Publishers, Boston, MA, 2001. [Sip97] Micheal Sipser, Introduction to the theory of compuation, PWS Publishing Company, Boston, MA, 1997.

You might also like