ToA - Lecture 03 04 - Language Preliminaries Regular Expressions

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Lecture – 03 & 04

Language Preliminaries & Regular Expressions


Theory of Formal Languages
• An area of theoretical computer science dealing with questions concerning syntax.
• Language – a set of words
• Word – a sequences of symbols from some alphabet
• Alphabet – a set of symbols (or letters)
• Words and languages appear in computer science on many levels:
• Representation of input and output data
• Representation of programs
• Manipulation with character strings or files
Theory of Formal Languages - Motivation
• Examples of problem types, where theory of formal languages is useful:
• Construction of Compilers:
• Lexical analysis
• Syntactic analysis

• Searching in Text:
• Searching for a given text pattern
• Searching for a part of text specified by a regular expression
Alphabet, Word
• Alphabet – a nonempty finite set of symbols
• Example:
• Word – a finite sequence of symbols from the given alphabet
• Example:
• The set of all words of alphabet is denoted with .
• For variables, whose values are words, we will use names such as etc., possibly with indexes
(e.g., )
• So, when we write , it means that the value of variable is word .
• Similarly, the notation means that the value of a variable is some word consisting of
symbols belonging to alphabet .
Formal Languages
• A (formal) language over an alphabet is a subset of , i.e.,

• Example:
• Language
• Language
Encoding of Input and Output
• Inputs and outputs of an algorithm could be encoded as words over some alphabet .

• Example: for “Sorting” problem, we can take alphabet .


• An example of input data (as a word over alphabet ):

• and the corresponding output data (as a word over alphabet )

• Remark: It is often the case that only some words over the given alphabet represent valid
input or output.
Encoding of Input and Output
• Example: If an input for a given problem is graph, it could be represented as a pair of two
lists – a list of nodes and a list of edges:
• For example, the following graph

• could be represented as a word

• over alphabet .
Correspondence between Recognizing Formal Languages
and Decision Problems
• There is a close correspondence between recognizing words from a given language and
decision problems:
• For each language over some alphabet there is a corresponding decision problem:

• Input: A word over alphabet .


• Question: Does belong to ?

• For each decision problem where inputs are encoded as words over alphabet , there is a corresponding
language:
• The language containing of exactly those words over alphabet , for which the answer to the question stated in
problem P is “Yes”.
Correspondence between Recognizing Formal Languages
and Decision Problems
• Example: The following decision problem can be viewed as the language L given below and
vice versa.

• Problem
• Input: A word over alphabet .
• Question: Does the word contain an even number of occurrences of symbol b?

• Language
Models of Computation
• We can consider different types of machines that are able to perform an algorithm.
• There can be many kinds of differences between these types of machines:
• What types of instructions they can execute
• What types of dates they can store in their memory and this memory is organized

• Different kinds of such machines are called models of computation.


• In the case of very simple kinds of such machines, they are usually called automata in the
formal language theory.
• In this course we will see several types of such automata.
Models of Computation
• For different types of models of computation analyze for example:
• What algorithmic problems can be solved by such machines and what languages they can recognize.
• How efficiently they can execute different algorithms
• How machines of a certain type can simulate the computations of some other type of machines
• How the number of instructions that are executed by the machine in such simulation grows compared
to the original machine
• …
Alphabet
• An alphabet is a finite, non-empty set of symbols
• We use the symbol (sigma) to denote an alphabet
• Examples:
• Binary:
• All lower-case letters:
• Alphanumeric:
• DNA molecule letters:
• …
Strings
• A string or word is a finite sequence of symbols chosen from
• Empty string is (or “epsilon”)
• Length of a string , denoted by “”, is equal to the number of (non-) characters in the string

• = concatenation of two strings and


Powers of an Alphabet
• Let be an alphabet.
Languages
• is said to be a language over alphabet , only if
• this is because is the set of all strings (of all possible length including ) over the given alphabet

• Examples:
• Let be the language of all strings consisting of ’s followed by ’s:

• Let be the language of all strings of equal number of ’s and ’s:

• Definition: denotes the Empty Language


• Let ; Is ?
• No
The Membership Problem
• Given a string and a language over , decide whether .

• Example:
• Let
• Is ?
Concatenation of Words
• One of operations we can do on words is the operation of concatenation:
• For example, the concatenation of words and is the word .
• The operation of concatenation is denoted by symbol (it is similar to multiplication). This
symbol can be omitted.
• So, for , the concatenation of words and is written as or just .

• Remark: Formally, the concatenation of words over alphabet is a function of type


Concatenation of Words
• Concatenation is associative, i.e., for every three words , and , we have

• which means that we can omit parenthesis when we write multiple concatenations. For example, we
can write instead of .
• Word is a neutral element for the operation of concatenation, so for every word we also
have:

• Remark: It is obvious that if the given alphabet contains at least two different symbols, the
operation of concatenation is not commutative, e.g.,
Power of a Word
• For arbitrary word and arbitrary we can define word as the word obtained by concatenating
copies of the word .
• Example:
• For its .

• A little bit more formal definition looks as follows:

for
Reverse of a Word
• The reverse of a word is the word written from backwards (in the opposite order).
• The reverse of a word is denoted .
• Example:

• So, if (where ) then .


• We can define using the following inductively defined function
• rev : as the value .
• The function is defined as follows:

• for and it holds that


Prefix, Suffix & Subword
• Prefix
• A word is a prefix of a word , if there exists a word such that .
• Prefixes of the word are .

• Suffix
• A word is a suffix of a word , if there exists a word such that .
• Suffixes of the word are .

• Subword
• A word is a subword of a word , if there exists a words and such that .
• Subwords of the word are
Operations of Languages
• Let us say we have already described some languages. We can create new languages from
these languages using different operations on languages.
• So, a description of a complicated language can be decomposed in such a way that it is described a
result of an application of some operations on some simpler languages.
• Examples of important operations on languages:
• Union
• Intersection
• Complement
• Concatenation
• Iteration

• Remark: It is assumed the languages involved in these operations use the same alphabet .
Set Operation of Languages
• Since languages are sets, we can apply any set operations to them:
• Union:
• is the language consisting of the words belonging to language or to language (or to both).

• Intersection
• is the language consisting of the words belonging to language and to language .

• Complement
• is the language containing those words from that do not belong to .

• Difference
• is the language containing those words of that do not belong to .

• Remark: We assume that for some given alphabet


Concatenation of Languages
• Concatenation of languages and , where , is the language such that for each it holds that

• The concatenation of languages and is denoted .


• Example:

• The language contains the following words:

• Remark: Note that the concatenation of languages is associative, i.e., for arbitrary
languages it holds that:
Power of a Language
• Notation , where and , denotes the concatenation of the form where the language occurs
times, i.e.,

• Example: For , the language contains the following words:

• Formally, the power of a language , denoted can be defined using the following inductive
definition:
• for
Finite Automata
• Some Applications
• Software for designing and checking the behavior of digital circuits
• Lexical analyzer of a typical compiler
• Software for scanning large bodies of text (e.g., web pages) for pattern finding
• Software for verifying systems of all types that have a finite number of states (e.g., stock market
transaction, communication/network protocol)
Defining Languages
• The languages can be defined in different ways, such as Descriptive definition, Recursive
definition, using Regular Expression (RE) and using Finite Automata (etc.)
• Descriptive Definition of Language:
• The language is defined, describing the conditions imposed on its words.
• Example:
• The language L of strings of odd length, defined over , can be written as
• The language L of strings that does not start with a, defined over , can be written as
• The language L of strings of length 2, defined over , can be written as
• The language L of strings of length 3 ending in 0, defined over , can be written as
• The language EQUAL of strings with number of a’s equal to number of b’s, defined over , can be written as
• The language EVEN-EVEN of strings with even number of a’s and even number of b’s, defined over , can be written
as
Palindrome
• The language consisting of and the string s defined over such that
• It is to be denoted that the word of PALINDROME are called palindromes.
• English language example:
• EYE, RADAR, LEVEL, NOON, etc.
• Example:
• , PALINDROME =
Regular Expressions
• Offers a declarative way to express the pattern of any string we want to accept
• Example:
• Automata more machine-like
• <input: string, output: [accept/reject]>
• Regular expressions more program syntax-like
• Unix environments heavily use regular expressions
• bash shell, grep, vi & other editors, sed
• Perl scripting – good for string processing
• Lexical analyzers such as Lex or Flex
Regular Expressions

Regular Finite Automata


Expressions (DFA, NFA)

Regular
Languages
Regular Expressions - Definitions
• Regular expressions are:
• An algebraic way to describe languages (they describe exactly the regular languages).
• If E is a regular expression, then L(E) is the language it defines.
• We’ll describe RE’s and their languages recursively.

• Basis
• Basis 1: If is any symbol, then a is RE, and .
• Basis 2: is RE, and .
• Basis 3: is RE, and .
• Induction
• Induction 1: If and are regular expressions, then is a regular expression, and
• Induction 2: If and are regular expressions, then is a regular expression, and
• Induction 3: If is RE, then is RE, and
Language Operators
• Union
• = all strings that are either in L or M
• Concatenation
• = all strings that are of the form
• Kleene Closure (the * operator)
Kleene Closure – Example
• Let

• …
Kleene Closure – Special Notes
• is an infinite set iff and Why?
• If , then Why?
• If , then Why?

• denotes the set of all words over an alphabet


• Therefore, an abbreviated way of saying there is an arbitrary language over an alphabet is:
Thank You 
Any Questions?

You might also like