PY Mod3@AzDOCUMENTS - in

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

MODULE 3 PYTHON

CHAPTER 01: LISTS


1. A list is a sequence
 A list is a sequence of values.
 In a string, the values are characters; in a list, they can be any type.
 The values in list are called elements or sometimes items.
 There are several ways to create a new list; the simplest is to enclose the elements in square brackets
([ and ]):
Ex 1: [10, 20, 30, 40] - list of four integers
2: [‘crunchy frog’, ‘ram bladder’, ‘lark vomit’] - list of three strings
3: ['spam', 2.0, 5, [10, 20]] - list containing different types (A list within another list is nested)
 A list that contains no elements is called an empty list; and can be created with empty brackets, [].
 We can assign list values to variables:
>>> cheeses = ['Cheddar', 'Edam', 'Gouda']
>>> numbers = [17, 123]
>>> empty = []
>>> print(cheeses, numbers, empty)
['Cheddar', 'Edam', 'Gouda'] [17, 123] []

2. Lists are mutable

 To access the elements of a list bracket operator is used and the list index starts with 0.

>>> print(cheeses[0])
Cheddar
 Lists are mutable because we can change the order of items in a list or reassign an item in a list.
 When the bracket operator appears on the left side of an assignment, it identifies the element of the
list that will be assigned.

>>> numbers = [17, 123]


>>> numbers[1] = 5
>>> print(numbers) [17, 5]

 In a list the relationship between indices and elements is called a mapping; each index “maps to” one
of the elements.
 List indices work the same way as string indices:

* Any integer expression can be used as an index.


* If we try to read or write an element that does not exist, you get an IndexError.

 If an index has a negative value, it counts backward from the end of the list.
 The in operator also works on lists.

>>> cheeses = ['Cheddar', 'Edam', 'Gouda']


>>> 'Edam' in cheeses True
>>> 'Brie' in cheeses False

3. Traversing a list
 The most common way to traverse the elements of a list is with a for loop. The syntax is the same as
for strings:

1 Dept. of CSE, CBIT


MODULE 3 PYTHON

for cheese in cheeses: print(cheese)


 This works well if you only need to read the elements of the list. But if we want to write or update
the elements, we need the indices. A common way to do that is to combine the functions range and
len:
for i in range(len(numbers)): numbers[i] = numbers[i] * 2

 This loop traverses the list and updates each element. len returns the number of elements in the list.
range returns a list of indices from 0 to n − 1, where n is the length of the list.
 Each time through the loop, i gets the index of the next element. The assignment statement in the
body uses i to read the old value of the element and to assign the new value.
 A for loop over an empty list never executes the body:

for x in empty:
print('This never happens.')

 Although a list can contain another list, the nested list still counts as a single element. The length of
this list is four:

['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]

4. List operations
 The + operator concatenates lists:

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> c = a + b
>>> print(c)
[1, 2, 3, 4, 5, 6]

 Similarly, the operator repeats a list a given number of times:


>>> [0] * 4
[0, 0, 0, 0]
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

The first example repeats four times. The second example repeats the list three times.

5. List slices
 The slice operator also works on lists:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> t[1:3]
['b', 'c']
>>> t[:4]
['a', 'b', 'c', 'd']
>>> t[3:]
['d', 'e', 'f']

>>> t[:]
['a', 'b', 'c', 'd', 'e', 'f']

2 Dept. of CSE, CBIT


MODULE 3 PYTHON

 Since lists are mutable, it is often useful to make a copy before performing operations that fold,
spindle, or mutilate lists.
 A slice operator on the left side of an assignment can update multiple elements:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> t[1:3] = ['x', 'y']
>>> print(t)
['a', 'x', 'y', 'd', 'e', 'f']

6. List methods
 Python provides methods that operate on lists.
i. Append: append adds a new element to the end of a list:

>>> t = ['a', 'b', 'c']


>>> t.append('d')
>>> print(t)
['a', 'b', 'c', 'd']

ii. Extend: extend takes a list as an argument and appends all of the elements:

>>> t1 = ['a', 'b', 'c']


>>> t2 = ['d', 'e']
>>> t1.extend(t2)
>>> print(t1)
['a', 'b', 'c', 'd', 'e']

iii. Sort: sort arranges the elements of the list from low to high:

>>> t = ['d', 'c', 'e', 'b', 'a']


>>> t.sort()
>>> print(t)
['a', 'b', 'c', 'd', 'e']

 Most list methods are void; they modify the list and return None. If we accidentally write
t = t.sort(), you will be disappointed with the result.

7. Deleting elements
 There are several ways to delete elements from a list. If we know the index of the element we want,
we can use pop:

>>> t = ['a', 'b', 'c']


>>> x = t.pop(1)
>>> print(t)
['a', 'c']
>>> print(x)
b
 pop modifies the list and returns the element that was removed. If we don’t provide an index,
it deletes and returns the last element.
 If we don’t need the removed value, we can use the del operator:

>>> t = ['a', 'b', 'c']


>>> del t[1]
>>> print(t) ['a', 'c']
3 Dept. of CSE, CBIT
MODULE 3 PYTHON

 If we know the element we want to remove (but not the index), we can use remove:

>>> t = ['a', 'b', 'c']


>>> t.remove('b')
>>> print(t) ['a', 'c']

 The return value from remove is None.


 To remove more than one element, we can use del with a slice index:

>>> t = ['a', 'b', 'c', 'd', 'e', 'f']


>>> del t[1:5]
>>> print(t)
['a', 'f']

 As usual, the slice selects all the elements up to, but not including, the second index.

8. Lists and functions


 There are a number of built-in functions that can be used on lists that allow us to quickly look
through a list without writing your own loops:

>>> nums = [3, 41, 12, 9, 74, 15]


>>> print(len(nums))
6
>>> print(max(nums))
74
>>> print(min(nums))
3
>>> print(sum(nums))
154
>>> print(sum(nums)/len(nums))
25

 The sum() function only works when the list elements are numbers. The other functions
(max(), len(), etc.) work with lists of strings and other types that can be comparable.

The program to compute an average without a list:

total = 0
count = 0
while (True):
inp = input('Enter a number: ')
if inp == 'done':
break
value = float(inp)
total = total + value
count = count + 1
average = total / count print('Average:', average)

The program to compute an average with a list:

numlist = list()
while (True):
inp = input('Enter a number: ')
if inp == 'done':
break
value = float(inp)
4 Dept. of CSE, CBIT
MODULE 3 PYTHON

numlist.append(value)
average = sum(numlist) / len(numlist)
print('Average:', average)

 We make an empty list before the loop starts, and then each time we have a number, we append it to
the list.
 At the end of the program, we simply compute the sum of the numbers in the list and divide it by the
count of the numbers in the list to come up with the average.

9. Lists and strings


 A string is a sequence of characters and a list is a sequence of values, but a list of characters is not
the same as a string.
 To convert from a string to a list of characters, we can use list:

>>> s = 'spam'
>>> t = list(s)
>>> print(t)
['s', 'p', 'a', 'm']

 The list function breaks a string into individual letters. If we want to break a string into words, you
can use the split method:

>>> s = 'pining for the fjords'


>>> t = s.split()
>>> print(t)
['pining', 'for', 'the', 'fjords']
>>> print(t[2])
the

 We can call split with an optional argument called a delimiter that specifies which characters to use
as word boundaries.
 The following example uses a hyphen as a delimiter:

>>> s = 'spam-spam-spam'
>>> delimiter = '-'
>>> s.split(delimiter)
['spam', 'spam', 'spam']

 join is the inverse of split. It takes a list of strings and concatenates the elements.
 join is a string method, so you have to invoke it on the delimiter and pass the list as a parameter:

>>> t = ['pining', 'for', 'the', 'fjords']


>>> delimiter = ' '
>>> delimiter.join(t)
’pining for the fjords’
 In this case the delimiter is a space character, so join puts a space between words.
 To concatenate strings without spaces, you can use the empty string, “”, as a delimiter.

10.Parsing lines
 If we wanted to print out the day of the week from the lines that start with “From”?
From stephen.marquard@uct.ac.zaSat Jan 5 09:14:16 2008

5 Dept. of CSE, CBIT


MODULE 3 PYTHON

fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if not line.startswith('From '):
continue
words = line.split()
print(words[2])

Output:
Sat Fri Fri Fri

11.Objects and values


 If we execute these assignment statements:
a = 'banana'
b = 'banana'
 We know that a and b both refer to a string, but we don’t know whether they refer to the same string.
There are two possible states:

a 'banana' a 'banana'
b 'banana' b

Variables and Objects

 In one case, a and b refer to two different objects that have the same value. In the second case, they
refer to the same object.
 To check whether two variables refer to the same object,we can use the is operator.
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
 But when we create two lists, you get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
 In this case we would say that the two lists are equivalent, because they have the same elements, but
not identical, because they are not the same object.
 If two objects are identical, they are also equivalent, but if they are equivalent, they are not
necessarily identical.

12.Aliasing
 If a refers to an object and you assign b = a, then both variables refer to the same object:

>>> a = [1, 2, 3]
>>> b = a
>>> b is a True

 The association of a variable with an object is called a reference.


 In this example, there are two references to the same object.
 An object with more than one reference has more than one name, so we say that the object is
aliased.
 If the aliased object is mutable, changes made with one alias affect the other:
>>> b[0] = 17
>>> print(a)
[17, 2, 3]

6 Dept. of CSE, CBIT


MODULE 3 PYTHON

 In general, it is safer to avoid aliasing when you are working with mutable objects.
 For immutable objects like strings, aliasing is not as much of a problem.

13.List arguments
 When you pass a list to a function, the function gets a reference to the list. If the function modifies a
list parameter, the caller sees the change.
 For example, delete_head removes the first element from a list:

def delete_head(t):
del t[0]

>>> letters = ['a', 'b', 'c']


>>> delete_head(letters)
>>> print(letters)
['b', 'c']

 The parameter t and the variable letters are aliases for the same object.
 It is important to distinguish between operations that modify lists and operations that create new
lists. For example, the append method modifies a list, but the + operator creates a new list:
>>> t1 = [1, 2]
>>> t2 = t1.append(3)
>>> print(t1)
[1, 2, 3]
>>> print(t2)
None

>>> t3 = t1 + [3]
>>> print(t3)
[1, 2, 3]
>>> t2 is t3
False
 This difference is important when you write functions that are supposed to modify lists. For
example, this function does not delete the head of a list:

def bad_delete_head(t):
t = t[1:] # WRONG!

 The slice operator creates a new list and the assignment makes t refer to it, but none of that has any
effect on the list that was passed as an argument.
 An alternative is to write a function that creates and returns a new list. For example, tail returns all
but the first element of a list:

def tail(t):
return t[1:]

 This function leaves the original list unmodified. Here’s how it is used:

>>> letters = ['a', 'b', 'c']


>>> rest = tail(letters)
>>> print(rest)
['b', 'c']

7 Dept. of CSE, CBIT


MODULE 3 PYTHON

CHAPTER 02: DICTIONARIES


 A dictionary is like a list, but more general.
 In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.
 A dictionary as a mapping between a set of indices (which are called keys) and a set of values.
 Each key maps to a value.
 The association of a key and a value is called a key-value pair or an item.
 The function dict creates a new dictionary with no items.
 Because dict is the name of a built-in function, it should be avoided using it as a variable name.

>>> eng2sp = dict()


>>> print(eng2sp)
{}

 The curly brackets, {}, represent an empty dictionary.


 To add items to the dictionary, we can use square brackets:
>>> eng2sp['one'] = 'uno'

 This line creates an item that maps from the key ’one’ to the value “uno”.
 If we print the dictionary again, we see a key-value pair with a colon between the key and value:
>>> print(eng2sp)
{'one': 'uno'}

 This output format is also an input format.


 For example, we can create a new dictionary with three items. But if we print eng2sp, you might be
surprised:
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> print(eng2sp)
{'one': 'uno', 'three': 'tres', 'two': 'dos'}

 The order of the key-value pairs is not the same. In general, the order of items in a dictionary is
unpredictable.
 But that’s not a problem because the elements of a dictionary are never indexed with integer indices.
Instead, we use the keys to look up the corresponding values:
>>> print(eng2sp['two'])
’dos’
 The key ’two’ always maps to the value “dos” so the order of the items doesn’t matter.
 If the key isn’t in the dictionary, we get an exception:
>>> print(eng2sp['four'])
KeyError: 'four'

 The len function works on dictionaries; it returns the number of key-value pairs:
>>> len(eng2sp)
3
 The in operator works on dictionaries; it tells us whether something appears as a key in the
dictionary (appearing as a value is not good enough).
>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False
 To see whether something appears as a value in a dictionary, we can use the method values, which
returns the values as a list, and then use the in operator:
>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True
 The in operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search
algorithm. For dictionaries, Python uses an algorithm called a hash table.
8 Dept. of CSE, CBIT
MODULE 3 PYTHON

1. Dictionary as a set of counters


 Suppose there is a string and we want to count how many times each letter appears. There are
several ways you could do it:

1. We could create 26 variables, one for each letter of the alphabet. Then wecould traverse the string
and, for each character, increment the corresponding counter, probably using a chained conditional.
2. Wecould create a list with 26 elements. Then we could convert each character to a number (using the
built-in function ord), use the number as an index into the list, and increment the appropriate counter.
3. We could create a dictionary with characters as keys and counters as the corresponding values. The
first time you see a character, we would add an item to the dictionary. After that you would increment the
value of an existing item.

 Each of these options performs the same computation, but each of them implements that
computation in a different way.
 An implementation is a way of performing a computation; some implementations are better than
others.
 For example, an advantage of the dictionary implementation is that we don’t have to know ahead of
time which letters appear in the string and we only have to make room for the letters that do appear.

word = 'brontosaurus'
d = dict()
for c in word:
if c not in d:
d[c] = 1
else:
d[c] = d[c] + 1
print(d)

Output: {'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}

 The histogram indicates that the letters ’a’ and “b” appear once; “o” appears twice, and so on.
 Dictionaries have a method called get that takes a key and a default value.
 If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the
default value.

>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}


>>> print(counts.get('jan', 0))
100
>>> print(counts.get('tim', 0))
0

 We can use get to write our histogram loop more concisely. Because the get method automatically
handles the case where a key is not in a dictionary, we can reduce four lines down to one and
eliminate the if statement.

word = 'brontosaurus'
d = dict()
for c in word:
d[c] = d.get(c,0) + 1 print(d)

9 Dept. of CSE, CBIT


MODULE 3 PYTHON

2. Dictionaries and files


 One of the common uses of a dictionary is to count the occurrence of words in a file with some
written text.
Program: The file which contains text with no punctuation.

fname = input('Enter the file name: ')


try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()

counts = dict()
for line in fhand:
words = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1

print(counts)

3. Looping and dictionaries


 If we use a dictionary as the sequence in a for statement, it traverses the keys of the
dictionary.
 This loop prints each key and the corresponding value:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}


for key in counts:
print(key, counts[key])

Output: jan 100


chuck 1
annie 42

 This loop prints keys having value more than 10:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}


for key in counts:
if counts[key] > 10 :
print(key, counts[key])

Output: jan 100


annie 42

10 Dept. of CSE, CBIT


MODULE 3 PYTHON

 This loop prints each key and the corresponding value in sorted order:
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:
print(key, counts[key])

Output: ['jan', 'chuck', 'annie']


annie 42
chuck 1
jan 100

4. Advanced Text Parsing


 Python provides a function called translate and the syntax is:
line.translate(str.maketrans(fromstr, tostr, deletestr))
 Replace the characters in fromstr with the character in the same position in tostr and delete
all characters that are in deletestr.
 The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.
 If we need to delete all the punctuations, no need to specify all instead we can import all the
punctuations using the below function:

>>> import string


>>> string.punctuation
’!"#$ &\’()*+,-./:;<=>?@[\\]^_`{|}~’

Program: The file which contains text with punctuation.

import string
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit( )

counts = dict( )
for line in fhand:
line=line.rstrip( )
line=line.translate(line.translate(‘ ‘,’ ‘,string.punctuation)
line=line.lower( )
words = line.split( )
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1

print(counts)

11 Dept. of CSE, CBIT


MODULE 3 PYTHON

CHAPTER 03: TUPLES


1. Tuples are immutable

 A tupleis a sequence of values much like a list.


 The values stored in a tuple can be any type, and they are indexed by integers.
 The important diff erence is that tuples are immutable.
 Tuples are also comparable and hashable so we can sort lists of them and use tuples as key values in
Python dictionaries.
 A tuple is a comma-separated list of values: t = ('a', 'b', 'c', 'd', 'e')
 To create a tuple with a single element, you have to include the final comma:
>>>t1 = ('a',)
>>>type(t1)
<type 'tuple'>

 Without the comma Python treats (’a’) as an expression with a string in paren-theses that evaluates to a
string:
>>>t2 = ('a')
>>>type(t2)
<type 'str'>

 With no argument, it creates empty tuple:


>>>t = tuple()
>>>print(t)
()

 If we provide an argument of type sequence (a list, a string or tuple) to the method tuple(), then a tuple
with the elements in a given sequence will be created –
>>>t = tuple('lupins')
>>>print(t)
('l', 'u', 'p', 'i', 'n', 's')
Eg:
Create tuple using Create tuple using list: Create tuple using another
string: >>>t=tuple([3,[12,5],' tuple:
>>>t=tuple('Hello Hi) >>> t=('Mango', 34,
') >>> print(t) 'hi')
>>> print(t) (3, [12, 5], 'Hi') >>> t1=tuple(t)
('H', 'e', 'l', >>> print(t1)
'l', 'o') ('Mango', 34, 'hi')
>>> t is t1
True

 Elements in the tuple can be extracted using square-brackets with the help of indices. Similarly, slicing
also can be applied to extract required number of items from tuple.
>>> t=('Mango', 'Banana', 'Apple')
>>> print(t[1])
Banana
>>> print(t[1:])
('Banana', 'Apple')
12 Dept. of CSE, CBIT
MODULE 3 PYTHON

>>> print(t[-1])
Apple

 Modifying the value in a tuple generates error, because tuples are immutable –
>>> t[0]='Kiwi'
TypeError: 'tuple' object does not support item assignment

2. Comparing tuples

 The comparison operators work with tuples and other sequences.


 Tuples can be compared using operators like >, <, >=, == etc.
 Python starts by comparing the first element from each sequence. If they are equal, it goes on to the
next element, and so on, until it finds elements that diff er. Subsequent elements are not considered.

>>>(0, 1, 2) < (0, 3, 4)


True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> (1,2,3)==(1,2,5)
False
>>> (3,4)==(3,4)
True

 The sort() function internally works on similar pattern – it sorts primarily by first element, in case of
tie, it sorts on second element and so on. This pattern is known as DSU –
 Decorate a sequence by building a list of tuples with one or more sort keys
preceding the elements from the sequence,
 Sort the list of tuples using the Python built-in sort(), and
 Undecorate by extracting the sorted elements of the sequence.

 For example, suppose you have a list of words and you want to sort them from longest to shortest:
txt = 'but soft what light in yonder window breaks'
words = txt.split()
t = list()
for word in words:
t.append((len(word), word))

t.sort(reverse=True)
res = list()

for length, word in t:


res.append(word)

print(res)
 The output of the program is as follows:[‘yonder’, ’window’, ’breaks’, ’light’, ‘what’, ‘soft’, ‘but’, ‘in’
]

13 Dept. of CSE, CBIT


MODULE 3 PYTHON

 In the above program, we have split the sentence into a list of words. Then, a tuple containing length
of the word and the word itself are created and are appended to a list.
 Observe the output of this list – it is a list of tuples. Then we are sorting this list indescending order.
Now for sorting, length of the word is considered, because it is a firstelement in the tuple.
 At the end, we extract length and word in the list, and create anotherlist containing only the words and
print it.

3. Tuple assignment

 Tuple has a unique feature of having it at LHS of assignment operator. This allows us toassign values
to multiple variables at a time.
>>>m = [ 'have', 'fun' ]
>>> x, y = m
>>> x
'have'
>>>y
'fun'

 Python roughly translates the tuple assignment syntax to be the following:


>>>m = [ 'have', 'fun' ]
>>> x = m[0]
>>> y = m[1]
>>> x
'have'
>>>y
'fun'

 when we use a tuple on the left side of the assignment statement, we omit the parentheses, but the
following is an equally valid syntax:
>>>m = [ 'have', 'fun' ]
>>> x, y = m
>>> x
'have'
>>>y
'fun'

 The best known example of assignment of tuples is swapping two values as below:
>>> a=10
>>> b=20
>>> a, b = b, a
>>> print(a, b)

 Giving more values than variables generates ValueError –


>>> a, b=10,20,5
ValueError: too many values to unpack (expected 2)

14 Dept. of CSE, CBIT


MODULE 3 PYTHON

 While doing assignment of multiple variables, the RHS can be any type of sequence like list, string or
tuple. Following example extracts user name and domain from an email ID.
>>>addr = 'monty@python.org'
>>> uname, domain = addr.split('@')
>>> print(uname)
monty
>>> print(domain)
python.org

4. Dictionaries and tuples

 Dictionaries have a method called items() that returns a list of tuples, where each tuple is a key-value
pair as shown below :
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = list(d.items())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]

 As dictionary may not display the contents in an order, we can use sort() on lists and then print in
required order as below :
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = list(d.items())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]
>>> t.sort()
>>> print(t)
[('a', 10), ('b', 1), ('c', 22)]

 The new list is sorted in ascending alphabetical order by the key value.

5. Multiple assignment with dictionaries

 We can combine the method items(), tuple assignment and a for-loop to get a pattern for traversing
dictionary:
d={'a':10, 'b':1, 'c':22}
for key, val in list(d.items()):
print(val,key)

The output of this loop is:


10 a
22 c
1b

 This loop has two iteration variables because items() returns a list of tuples. And key,val is a tuple
assignment that successively iterates through each of the key-value pairs inthe dictionary.
15 Dept. of CSE, CBIT
MODULE 3 PYTHON

 For each iteration through the loop, both key and value are advanced to thenext key-value pair in the
dictionary in hash order.

 Once we get a key-value pair, we can create a list of tuples and sort them :
>>> d = {'a':10, 'b':1, 'c':22}
>>> l = list()
>>> for key,val in d.items():
... l.append((val,key))
...
>>> l
[(10, 'a'), (22, 'c'), (1, 'b')]
>>> l.sort(reverse=True)
>>> l
[(22, 'c'), (10, 'a'), (1, 'b')]

6. The most common words

 We will apply the knowledge gained about strings, tuple, list and dictionary till here to solve a
problem – write a program to find most commonly used words in a text file.
 The logic of the program is –
 Open a file
 Take a loop to iterate through every line of a file.
 Remove all punctuation marks and convert alphabets into lower case
 Take a loop and iterate over every word in a line.
 If the word is not there in dictionary, treat that word as a key, and initialize its value as 1. If that word
already there in dictionary, increment the value.
 Once all the lines in a file are iterated, you will have a dictionary containing distinct words and their
frequency. Now, take a list and append each key-value (word frequency) pair into it.
 Sort the list in descending order and display only 10 (or any number of) elements from the list to get
most frequent words.

import string
fhand = open('test.txt')
counts = dict()
for line in fhand:
line=line.translate(str.maketrans('','',string.punctuati
on))
line = line.lower()
line = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1

16 Dept. of CSE, CBIT


MODULE 3 PYTHON

lst = list()
for key, val in list(counts.items()):
lst.append((val, key))
lst.sort(reverse=True)

for key, val in lst[:10]:


print(key, val)

7. Using tuples as keys in dictionaries

 As tuples and dictionaries are hashable, when we want a dictionary containing composite keys, we will
use tuples. For Example, we may need to create a telephone directory where name of a person is
Firstname-last name pair and value is the telephone number.

directory[last,first] = number
 The expression in brackets is a tuple. We could use tuple assignment in a for loop to traverse this
dictionary.
forlast, first in directory:
print(first, last, directory[last,first])

 This loop traverses the keys in directory, which are tuples. It assigns the elements of each tuple to
last and first, then prints the name and corresponding telephone number.

8. Sequences: Strings, Lists, and Tuples –Oh My!

 Strings are more limited compared to other sequences like lists and Tuples. Because, the elements in
strings must be characters only. Moreover, strings are immutable. Hence, if we need to modify the
characters in a sequence, it is better to go for a list of characters than a string.
 As lists are mutable, they are most common compared to tuples. But, in some situations as given
below, tuples are preferable.
 When we have a return statement from a function, it is better to use tuples rather than lists.
 When a dictionary key must be a sequence of elements, then we must use immutable type like strings
and tuples.
 When a sequence of elements is being passed to a function as arguments, usage of tuples reduces
unexpected behavior due to aliasing.
 As tuples are immutable, the methods like sort() and reverse() cannot be applied on them. But, Python
provides built-in functions sorted() and reversed() which will takea sequence as an argument and
return a new sequence with modified results.

17 Dept. of CSE, CBIT


MODULE 3 PYTHON

9. Debugging

 Reading: Examine your code, read it again and check that it says what you meant to say.
 Running: Experiment by making changes and running different versions. Often if you display the
right thing at the right place in the program, the problem becomes obvious, but sometimes you have to
spend some time to build scaffolding.
 Ruminating: Take some time to think! What kind of error is it: syntax, runtime,
semantic? What information can you get from the error messages, or from the output of the program?
What kind of error could cause the problem you’re seeing? What did you change last, before the
problem appeared?
 Retreating: At some point, the best thing to do is back off, undoing recent changes, until you get back
to a program that works and that you understand. Then you can start rebuilding.

18 Dept. of CSE, CBIT


MODULE 4 PYTHON

CHAPTER 04: REGULAR EXPRESSIONS

 Searching for required patterns and extracting only the lines/words matching the pattern is
a very common task in solving problems programmatically. We have done such tasks
earlier using string slicing and string methods like split(), find() etc.
 As the task of searching and extracting is very common, Python provides a powerful
library called regular expressions to handle these tasks elegantly. Though they have quite
complicated syntax, they provide efficient way of searching the patterns.
 The regular expressions are themselves little programs to search and parse strings.
 To use them in our program, the library/module re must be imported.
 There is a search() function in this module, which is used to find particular substring
within a string. Consider the following example –

import re
fhand = open('myfile.txt')
for line in fhand:
line = line.rstrip()
if re.search('how', line):
print(line)

The output would be, by referring[myfile.txt] –


hello, how are you?
how about you?

In the above program, the search() function is used to search the lines containing a word
how.

 Regular expressions make use of special characters with specific meaning. In the
following example, we make use of caret (^) symbol, which indicates beginning of the
line.
import re
hand = open('myfile.txt')
for line in hand:
line = line.rstrip()
if re.search('^From:', line):
print(line)

1. Character Matching in Regular Expressions

 Python provides a list of meta-characters to match search strings. Table 4.1 shows the
details of few important metacharacters. Some of the examples for quick and easy
understanding of regular expressions are given in Table 4.2

19 Dept. of CSE, CBIT


MODULE 4 PYTHON

Table 4.1 List of Important Meta-Characters

Character Meaning
^ (caret) Matches beginning of the line
$ Matches end of the line
. (dot) Matches any single character except newline. Using option m, then
newline also can be matched
[…] Matches any single character in brackets
[^…] Matches any single character NOT in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression.
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
\d Matches digits. Equivalent to [0-9].
\D Matches non-digits.
\w Matches word characters.
\W Matches non-word characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches non-whitespace.
\A Matches beginning of string.
\Z Matches end of string. If a newline exists, it matches just before
newline.
\z Matches end of string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
( ) When parentheses are added to a regular expression, they are ignored
for the purpose of matching, but allow you to extract a particular subset
of the matched string rather than the whole string when using
findall()

Table 4.2 Examples for Regular Expressions


Expression Description
[Pp]ython Match "Python" or "python"
rub[ye] Match "ruby" or "rube"
[aeiou] Match any one lowercase vowel
[0-9] Match any digit; same as [0123456789]
[a-z] Match any lowercase ASCII letter
[A-Z] Match any uppercase ASCII letter
[a-zA-Z0-9] Match any of uppercase, lowercase alphabets and digits
[^aeiou] Match anything other than a lowercase vowel
[^0-9] Match anything other than a digit

20 Dept. of CSE, CBIT


MODULE 4 PYTHON

 Most commonly used metacharacter is dot, which matches any character. Consider the
following example, where the regular expression is for searching lines which starts with F
and has any two characters (any character represented by two dots) and then has a
character m.
import re
fhand = open('myfile.txt')
for line in fhand:
line = line.rstrip()
if re.search('^F..m:', line):
print(line)

 Note that, the regular expression ^F..m: would match any of the strings ‘From:’, ‘Fxxm’,
‘F12m:’ and so on. That is, between F and m, there can be any two characters.
 In the previous program, we knew that there are exactly two characters between F and m.
Hence, we could able to give two dots. But, when we don’t know the exact number of
characters between two characters (or strings), we can make use of dot and + symbols
together. Consider the below given program –

#Search for lines that start with From and have an at sign
import re
hand = open('myfile.txt')
for line in hand:
line = line.rstrip()
if re.search('^From:.+@', line):
print(line)

2. Extracting Data using Regular Expressions

 Python provides a method findall() to extract all of the substrings matching a regular
expression. This function returns a list of all non-overlapping matches in the string. If
there is no match found, the function returns an empty list.
 Consider an example of extracting anything that looks like an email address from any line.
import re
s = 'A message from csev@umich.edu to cwen@iupui.edu about
meeting @2PM'
lst = re.findall('\S+@\S+', s)
print(lst)

The output would be –


['csev@umich.edu', 'cwen@iupui.edu']

Here, the pattern indicates at least one non-white space characters (\S) before @ and at
least one non-white space after @. Hence, it will not match with @2pm, because of a
white-space before @.

 # program to extract all email-ids from the file:


import re
fhand = open('mbox-short.txt')
for line in fhand:

21 Dept. of CSE, CBIT


MODULE 4 PYTHON

line = line.rstrip()
x = re.findall('\S+@\S+', line)
if len(x) > 0:
print(x)

 Here, the condition len(x) > 0 is checked because, we want to print only the line
which contain an email-ID.
 If any line do not find the match for a pattern given, the findall() function will
return an empty list. The length of empty list will be zero, and hence we would
like to print the lines only with length greater than 0.

The output of above program will be something as below –

['stephen.marquard@uct.ac.za']
['<postmaster@collab.sakaiproject.org>']
['<200801051412.m05ECIaH010327@nakamura.uits.iupui.edu>']
['<source@collab.sakaiproject.org>;']
['<source@collab.sakaiproject.org>;']
['<source@collab.sakaiproject.org>;']
['apache@localhost)']
……………………………….
………………………………..

 we want email-ID to be started with any alphabets or digits, and ending with only
alphabets. Hence, the statement would be –
x = re.findall('[a-zA-Z0-9]\S*@\S*[a-zA-Z]', line)
eg:
import re
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
x = re.findall('[a-zA-Z0-9]\S*@\S*[a-zA-Z]', line)
if len(x) > 0:
print(x)

3. Combining Searching and Extracting

 Assume that we need to extract the data in a particular syntax. For example, we need to
extract the lines containing following format –
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
 The line should start with X-, followed by 0 or more characters. Then, we need a colon
and white-space. They are written as it is. Then there must be a number containing one or
more digits with or without a decimal point. Note that, we want dot as a part of our pattern
string, but not as meta character here. The pattern for regular expression would be –
^X-.*: [0-9.]+

The complete program is –

22 Dept. of CSE, CBIT


MODULE 4 PYTHON

#Search for lines that start with 'X' followed by any non whitespace characters and ':'
followed by a space and any number. The number can include a decimal.

import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
if re.search('^X\S*: [0-9.]+', line):
print(line)

The output lines will as below –


X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6178
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6961
X-DSPAM-Probability: 0.0000
……………………………………………………
……………………………………………………

 Assume that, we want only the numbers (representing confidence, probability etc) in the
above output. We can use split() function on extracted string. But, it is better to refine
regular expression. To do so, we need the help of parentheses.

 When we add parentheses to a regular expression, they are ignored when matching the
string. But when we are using findall(), parentheses indicate that while we want the whole
expression to match, we only are interested in extracting a portion of the substring that
matches the regular expression.

#Search for lines that start with 'X' followed by any non whitespace characters and
':' followed by a space and any number. The number can include a decimal. Then
print the number if it is greater than zero.

import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
x = re.findall('^X-\S*: ([0-9.]+)', line)
if len(x) > 0:
print(x)

 Because of the parentheses enclosing the pattern above, it will match the pattern starting
with X- and extracts only digit portion. Now, the output would be –
['0.8475']
['0.0000']
['0.6178']
['0.0000']
['0.6961']
…………………
………………..

 Another example of similar form: The file mbox-short.txt contains lines like –

23 Dept. of CSE, CBIT


MODULE 4 PYTHON

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

#Search for lines that start with 'Details: rev=' followed by numbers and '.'
Then print the number if it is greater than zero

import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
x = re.findall('^Details:.*rev=([0-9.]+)', line)
if len(x) > 0:
print(x)

 The regex here indicates that the line must start with Details:, and has something with rev=
and then digits. As we want only those digits, we will put parenthesis for that portion of
expression.
 Note that, the expression [0-9] is greedy, because, it can display very large number. It
keeps grabbing digits until it finds any other character than the digit.
 The output of above regular expression is a set of revision numbers as given below –
['39772']
['39771']
['39770']
['39769']
………………………
………………………

 Consider another example – we may be interested in knowing time of a day of each email.
The file mbox-short.txt has lines like –

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

#Search for lines that start with From and a character followed by a two digit
number between 00 and 99 followed by ':' Then print the number if it is greater
than zero

import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip()
x = re.findall('^From.*([0-9][0-9]:', line)
if len(x) > 0:
print(x)
 Here, [0-9][0-9] indicates that a digit should appear only two times. The alternative way of
writing this would be -
x = re.findall('^From .* ([0-9]{2}):', line)
 The number 2 within flower-brackets indicates that the preceding match should appear
exactly two times. Hence [0-9]{2} indicates there can be exactly two digits. Now, the
output would be –

['09']

24 Dept. of CSE, CBIT


MODULE 4 PYTHON

['18']
['16']
['15']
…………………
…………………

4. Escape Character

 The character like dot, plus, question mark, asterisk, dollar etc. are meta characters in
regular expressions. Sometimes, we need these characters themselves as a part of
matching string. Then, we need to escape them using a back-slash.
eg:
import re
x = 'We just received $10.00 for cookies.'
y = re.findall('\$[0-9.]+',x)
Output:
['$10.00']

Here, we want to extract only the price $10.00. As, $ symbol is a metacharacter, we need
to use \ before it. So that, now $ is treated as a part of matching string, but not as
metacharacter.

25 Dept. of CSE, CBIT


CBIT, KOLAR Page 1 of 4

DEPARTMENT: COMPUTER SCIENCE AND ENGINEERING


SUBCODE: 15CS664/
SEMESTER: VI SUBJECT: PYTHON APPLICATION PROGRAMMING
17CS664

MODULE 3

VTU QUESTIONS
1 What are lists? Lists are mutable. Justify the statement with examples. 5

2 How tuples are created in Python? Explain different ways of accessing and creating 5
them.
Write a Python program to read all the lines in a file accepted from the user and print
3 all email addresses contained in it. Assume the email addresses contain only non- 6
while space characters.
4 Implement a Python program using Lists to store and display the average of N 5
integers accepted from the user.
5 Explain dictionaries. Demonstrate with a Python program. 5
Write a Pytho program to search for lines that start with the word ‘FROM’ and a
6 character followed by a two digit number 00 and 99 followed by ‘:’ Print the number 6
if it is greater than zero. Assume any input file.
Describe any two list operations and list methods. Write a python program to accept
7 ‘n’ numbers from user, find sum all even numbers and product of all odd numbers 8
in entered list.
List merits of dictionary over list. Write a python program to accept USN and marks
8 obtained, find maximum, minimum and students USN who have scored in the range 8
100-85, 85-75, 75-60 and below marks separately.
Compare and contrast tuples with lists. Explain the following operations in tuples
i) Sum of two tuples
9 ii) Slicing operators 8
iii) Compression of two tuples
iv) Assignments to variables.
10 Explain extracting data using regular expressions. Implement a python program to 8
find for lines having ‘@’ sign between characters in a read text file.
11 What are the ways of traversing a list? Explain with an example for each. 4

12 Differentiate Pop and Remove methods on lists. How to declare more than one 6
element from a list.
Write a Python program that accepts a sentences and build dictionary with
LETTERS, DIGITS, UPPER CASE, LOWER CASE as key values and their count
13 in the sentence as values. 6
Ex: Sentence = VTU@123.e-Leraning
D={“LETTERS” : 12, “DIGITS”: 3, “UPPERCASE”: 4, “LOWERCASE”: 8}
14 Compare and contrast lists and tuples. 4
CBIT, KOLAR Page 2 of 4

Write a program to check the validity of a password read by users. The following
criteria should be used to check the validity. Password should have at least
i) One lower case letter
15 ii) One digit 8
iii) One upper case letter
iv) One special character from [$ # @ !]
v) Six character
16 Demonstrate: i) how a dictionary items can be represented as a list of tuples. 4
i) How tuples can be used as keys in dictionaries?
17 What is a list? Explain the concept of list slicing and list traversing with example. 5

18 Explain the concept of comparing tuples. Describe the working of sort function with 6
python code.
19 Write a python program to search for lines that start with “F” followed by 2 5
characters, followed by ‘m’.
20 What is dictionary? How is it different from list? Write a python program to count 6
occurrence of characters in a string and print the count.
21 With an example program, illustrate how to pass function arguments to list. 5
Write a python program to search lines that start with ‘X’ followed by any non
22 whitespace characters, followed by ‘:’ ending with number. Display the sum of all 5
these number.
MODEL PAPER QUESTIONS
Write pythonic code that implements and returns the functionality of histogram
1 using dictionaries. Also, write the function print_hist to print the keys and their 10
values in alphabetical order from the values returned by the histogram function.
Explain join( ), split( ) and append( ) methods in a list with examples. Write pythonic
code to input information about 20 students as given below:
1) Roll number
2 2) Name 10
3) Total Marks
Get the input from the user for a student name. The program should display the
Roll number and total marks for the given student name. Also, find the average
marks of all the students. Use dictionaries.
3 Define tuple. Explain DSU pattern. Write Pythonic code to demonstrate tuples by 10
sorting a list of words from longest to shortest using loops.
Why do you need regular expression in Python? Consider a line “From
4 stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008” in the file email.txt. Write 10
Pythonic code to read the file and extract email address from the lines starting from
the word “From”. Use reglular expressions to match email address.
Consider a file “ait.txt”. Write a Python program to read the file and look for lines
of the form
5 X-DSPAM-Confidence: 0.875 10
X-DSPAM-Probability: 0.458
Extract the number from each of the lines using a regular expression. Compute the
average of the numbers and print out the average.
CBIT, KOLAR Page 3 of 4

6 How are dictionaries and tuples used together? Demonstrate the use of tuple 10
assignment with dictionaries to traverse the keys and values of dictionary.
7 Write Pythonic code to create a function called most_frequent that takes a string and 10
prints the letters in decreasing order of frequency. Use dictionaries.
Consider the string ‘brontosaurus’. Write Pythonic code that implements and returns
8 the functionality of histogram using dictionaries for the given string. Also, write the 10
function print_hist to print the keys and their values in alphabetical order from the
values returned by the histogram function.

TEXTBOOK EXERCISE PROGRAMS


Write a program to open the file romeo.txt and read it line by line. For each line,
split, split the line into a list of words using the split function.
1 For each word, check to see if the word is already in a list. If the word is not in the
list, add it to the list.
When the program completes, sort and print the resulting words in alphabetical
word.
Write a program to read through the mail box data and when you find line that starts
2 with “From”, you will split the line into words using the split function. Parse the
lines and print out the second word for each From line, then you will also count the
number of From lines and print out a count at the end.
Rewrite the program that prompts the user for a list of numbers and prints out the
maximum and minimum of the numbers at the end when the user enters “done”.
3 Write the program to store the numbers the user enters in a list and use the max( )
and min( ) functions to compute the maximum and minimum numbers after the loop
completes.
Write a program that categorizes each mail message by which day of the week the
4 commit was done. To do this look for lines that start with “From”, then look for the
third word and keep a running count of each of the days of the week. At the end of
program print out the contents of your dictionary.
Write a program to read through a mail log, build a histogram using a dictionary to
5 count how many messages have come from each email address, and print the
dictionary.
Add code to the above program (Program 5) to figure out who has the most messages
6 in the file. After all the data has been read and the dictionary has been created, look
through the dictionary using a maximum loop to find who has the most messages
and print how many messages the person has.
This program records the domain name (instead of the address) where the message was
7 sent from instead of who the mail came from (i.e., the whole email address). At the end
of the program, print out the contents of your dictionary.
Read and parse the “From” lines and pull out the addresses from the line. Count the
number of messages from each person using a dictionary.
8 After all the data has been read, print the person with the most commits by creating a
list of (count, email) tuples from the dictionary. Then sort the list in reverse order and
print out the person who has the most commits.
This program counts the distribution of the hour of the day for each of the messages.
9 You can pull the hour from the “From” line by finding the time string and then splitting
that string into parts using the colon character. Once you have accumulated the counts
for each hour, print out the counts, one per line, sorted by hour.
CBIT, KOLAR Page 4 of 4

Read and parse the “From” lines and pull out the addresses from the line. Count the
number of messages from each person using a dictionary.
10 After all the data has been read, print the person with the commits by creating a list of
(count, email) tuples from the dictionary. Then sort the list in reverse order and print
out the person who has the most commits.
This program counts the distribution of the hour of the day for each of the messages.
11 You can pull the hour from the “From” line by finding the time string and then splitting
that string into parts using the colon character. Once you have accumulated the counts
for each hour, print out the counts, one per line, sorted by hour.
WAP that reads a file and prints the letters in descending order of frequency. Your
12 program should convert all the input to lower case and only count the letters a-z. Your
program, should not count spaces, digits, punctuations or anything other than the letters
a-z.
Write a simple program to simulate the operation of the grep command on Unix. Ask
13 the user to enter a regular expression and count the number of lines that matched the
regular expression.
Write a program to look for lines of the form:
14 ‘New Revision: 39772’
And extract the number from each of the lines using a regular expression and the
findall( ) method. Compute the average of the numbers and print out the average.

You might also like