Professional Documents
Culture Documents
ToA - Lecture 03 04 - Language Preliminaries Regular Expressions
ToA - Lecture 03 04 - Language Preliminaries Regular Expressions
ToA - Lecture 03 04 - Language Preliminaries Regular Expressions
• Searching in Text:
• Searching for a given text pattern
• Searching for a part of text specified by a regular expression
Alphabet, Word
• Alphabet – a nonempty finite set of symbols
• Example:
• Word – a finite sequence of symbols from the given alphabet
• Example:
• The set of all words of alphabet is denoted with .
• For variables, whose values are words, we will use names such as etc., possibly with indexes
(e.g., )
• So, when we write , it means that the value of variable is word .
• Similarly, the notation means that the value of a variable is some word consisting of
symbols belonging to alphabet .
Formal Languages
• A (formal) language over an alphabet is a subset of , i.e.,
• Example:
• Language
• Language
Encoding of Input and Output
• Inputs and outputs of an algorithm could be encoded as words over some alphabet .
• Remark: It is often the case that only some words over the given alphabet represent valid
input or output.
Encoding of Input and Output
• Example: If an input for a given problem is graph, it could be represented as a pair of two
lists – a list of nodes and a list of edges:
• For example, the following graph
• over alphabet .
Correspondence between Recognizing Formal Languages
and Decision Problems
• There is a close correspondence between recognizing words from a given language and
decision problems:
• For each language over some alphabet there is a corresponding decision problem:
• For each decision problem where inputs are encoded as words over alphabet , there is a corresponding
language:
• The language containing of exactly those words over alphabet , for which the answer to the question stated in
problem P is “Yes”.
Correspondence between Recognizing Formal Languages
and Decision Problems
• Example: The following decision problem can be viewed as the language L given below and
vice versa.
• Problem
• Input: A word over alphabet .
• Question: Does the word contain an even number of occurrences of symbol b?
• Language
Models of Computation
• We can consider different types of machines that are able to perform an algorithm.
• There can be many kinds of differences between these types of machines:
• What types of instructions they can execute
• What types of dates they can store in their memory and this memory is organized
• Examples:
• Let be the language of all strings consisting of ’s followed by ’s:
• Example:
• Let
• Is ?
Concatenation of Words
• One of operations we can do on words is the operation of concatenation:
• For example, the concatenation of words and is the word .
• The operation of concatenation is denoted by symbol (it is similar to multiplication). This
symbol can be omitted.
• So, for , the concatenation of words and is written as or just .
• which means that we can omit parenthesis when we write multiple concatenations. For example, we
can write instead of .
• Word is a neutral element for the operation of concatenation, so for every word we also
have:
• Remark: It is obvious that if the given alphabet contains at least two different symbols, the
operation of concatenation is not commutative, e.g.,
Power of a Word
• For arbitrary word and arbitrary we can define word as the word obtained by concatenating
copies of the word .
• Example:
• For its .
for
Reverse of a Word
• The reverse of a word is the word written from backwards (in the opposite order).
• The reverse of a word is denoted .
• Example:
• Suffix
• A word is a suffix of a word , if there exists a word such that .
• Suffixes of the word are .
• Subword
• A word is a subword of a word , if there exists a words and such that .
• Subwords of the word are
Operations of Languages
• Let us say we have already described some languages. We can create new languages from
these languages using different operations on languages.
• So, a description of a complicated language can be decomposed in such a way that it is described a
result of an application of some operations on some simpler languages.
• Examples of important operations on languages:
• Union
• Intersection
• Complement
• Concatenation
• Iteration
• Remark: It is assumed the languages involved in these operations use the same alphabet .
Set Operation of Languages
• Since languages are sets, we can apply any set operations to them:
• Union:
• is the language consisting of the words belonging to language or to language (or to both).
• Intersection
• is the language consisting of the words belonging to language and to language .
• Complement
• is the language containing those words from that do not belong to .
• Difference
• is the language containing those words of that do not belong to .
• Remark: Note that the concatenation of languages is associative, i.e., for arbitrary
languages it holds that:
Power of a Language
• Notation , where and , denotes the concatenation of the form where the language occurs
times, i.e.,
• Formally, the power of a language , denoted can be defined using the following inductive
definition:
• for
Finite Automata
• Some Applications
• Software for designing and checking the behavior of digital circuits
• Lexical analyzer of a typical compiler
• Software for scanning large bodies of text (e.g., web pages) for pattern finding
• Software for verifying systems of all types that have a finite number of states (e.g., stock market
transaction, communication/network protocol)
Defining Languages
• The languages can be defined in different ways, such as Descriptive definition, Recursive
definition, using Regular Expression (RE) and using Finite Automata (etc.)
• Descriptive Definition of Language:
• The language is defined, describing the conditions imposed on its words.
• Example:
• The language L of strings of odd length, defined over , can be written as
• The language L of strings that does not start with a, defined over , can be written as
• The language L of strings of length 2, defined over , can be written as
• The language L of strings of length 3 ending in 0, defined over , can be written as
• The language EQUAL of strings with number of a’s equal to number of b’s, defined over , can be written as
• The language EVEN-EVEN of strings with even number of a’s and even number of b’s, defined over , can be written
as
Palindrome
• The language consisting of and the string s defined over such that
• It is to be denoted that the word of PALINDROME are called palindromes.
• English language example:
• EYE, RADAR, LEVEL, NOON, etc.
• Example:
• , PALINDROME =
Regular Expressions
• Offers a declarative way to express the pattern of any string we want to accept
• Example:
• Automata more machine-like
• <input: string, output: [accept/reject]>
• Regular expressions more program syntax-like
• Unix environments heavily use regular expressions
• bash shell, grep, vi & other editors, sed
• Perl scripting – good for string processing
• Lexical analyzers such as Lex or Flex
Regular Expressions
Regular
Languages
Regular Expressions - Definitions
• Regular expressions are:
• An algebraic way to describe languages (they describe exactly the regular languages).
• If E is a regular expression, then L(E) is the language it defines.
• We’ll describe RE’s and their languages recursively.
• Basis
• Basis 1: If is any symbol, then a is RE, and .
• Basis 2: is RE, and .
• Basis 3: is RE, and .
• Induction
• Induction 1: If and are regular expressions, then is a regular expression, and
• Induction 2: If and are regular expressions, then is a regular expression, and
• Induction 3: If is RE, then is RE, and
Language Operators
• Union
• = all strings that are either in L or M
• Concatenation
• = all strings that are of the form
• Kleene Closure (the * operator)
Kleene Closure – Example
• Let
• …
Kleene Closure – Special Notes
• is an infinite set iff and Why?
• If , then Why?
• If , then Why?