Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

File Handling in Python

Python has in-built functions to create and manipulate files. The io module is the default
module for accessing files and you don't need to import it. The module consists of
open(filename, access_mode) that returns a file object, which is called "handle" . You
can use this handle to read from or write to a file.
Python treats the file as an object, which has its own attributes and methods.

There are two types of files, text and binary files:

1. Text files have an End-Of-Line (EOL) character to indicate each line's termination.
In Python, the new line character (\n) is default EOL terminator.

2. Since binary files store data after converting it into binary language (0s and 1s),
there is no EOL character. This file type returns bytes. This is the file to be used
when dealing with non-text files such as images or exe.
Open()

The built-in Python function open() has the following arguments:

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None,


closefd=True, opener=None)

The two important arguments are file and mode. ‘file’ is the argument that should be
mandatorily passed as for all other arguments there are default values given.
1. file
This argument is basically the path where your file resides.If the path is in current
working directory, you can just provide the filename, like
my_file_handle=open("mynewtextfile.txt")
If the file resides in a directory other than that, you must provide the full path with the
file name:

my_file_handle=open("D:\\new_dir\\anotherfile.txt")
my_file_handle.read()
Make sure file name and path given is correct, otherwise you'll get a
FileNotFoundError.

1 Semester 2 Python Programming Maya Nair


2. Access Modes

Access modes define in which way you want to open a file, you want to open a file
for read only, write only or for both. It specifies from where you want to start reading
or writing in the file.You specify the access mode of a
file through the mode argument. You use 'r' , the default mode, to read the
file. In other cases where you want to write or append, you use , 'w' or 'a'
respectively.

Character Function

Open file for reading only. Starts reading from beginning of file.
r This default mode.

Open a file for reading only in binary format. Starts reading from
rb beginning of file.

Open file for reading and writing. File pointer placed at beginning of
r+ the file.

rb+ Same as rb but also alows to write to file.

w Open file for writing only. File pointer placed at beginning

Character Function

of the file. Overwrites existing file and creates a new one if it does
not exists.

2 Semester 2 Python Programming Maya Nair


wb Same as w but opens in binary mode.

w+ Same as w but also alows to read from file.

wb+ Same as wb but also alows to read from file.

Open a file for appending. Starts writing at the end of file. Creates a
a new file if file does not exist.

Same as a but in binary format. Creates a new file if file does not
ab exist.

a+ Same as a but also open for reading.

ab+ Same as ab but also open for reading.

close() function

Every opened file needs to be closed for clean up, which is achieved by making a close()
call on file object.

i.e file1.close() # closes the the file associated and frees the memory space acquired by
that file.

With statement ( Recommended method for file handling)

with statement in Python is used in exception handling to make the code cleaner and
much more readable. In this case no explicit close() function is called , file is cleaned up
automatically when we leave the ‘with ‘ block.

3 Semester 2 Python Programming Maya Nair


Syntax:

with open(filename,mode) as fileobj:

block of with for handling filename using fileobj

e.g:

with open(“C:\\Data\\marks.txt”,”w”) as file2:

print(file2.name) # name is a file object attribute

that will display the name of the file

Reading from a file


There are three ways to read from a file.

• read([n])

• readline([n])

• readlines()

Note: that n is the number of bytes to be read.


The read() method just outputs the entire file if number of bytes are not given in
the argument. If you execute my_file.read(3), you will get back the first three
bytes of the file
readline(n) outputs at most n bytes of a single line of a file. It does not read more
than one line.
The readlines() method maintains a list of each line in the file

4 Semester 2 Python Programming Maya Nair


A file can also be read using template with a for loop:
name = open("filename")
for line in name:
statements
e.g.
>>> input = open("hours.txt")
>>> for line in input:
... print(line.strip()) # strip() removes \n
123 Susan 12.5 8.1 7.6 3.2
456 Brad 4.0 11.6 6.5 2.7 12
789 Jenn 8.0 8.0 8.0 8.0 7.5

Splitting file contents into Variables


The file contents may contain numerous data as a single line of string.
If you know the number of tokens, you can split them directly into a sequence of
variables.
var1, var2, ..., varN = string.split()

and convert it to required types by using type(value) and use it

Writing and appending files:


write(string) – Writes the string to the file.( The newline character’\n’ should be
explicitly the part of the string)
e.g. with open(“C:\\Data\\marks.txt”, ”w” ) as file1:
marks=[50,89,75,45]
for mark in marks:
file1.write(str(mark)+”\n”)
#writes the marks in the file at each line

writelines() : For a list of string elements, each string is inserted in the text file. Used to
insert multiple strings at a single time.
with open(“C:\\Data\\names.txt”,”w” ) as file1:
names=[“Sam\n”,”Lucy\n”,”Amanda\n”]
file1.writelines(names)
# writes the entire list to file as various lines
For appending to an existing file the same functions are used only the mode changes
from “w” to “a”

5 Semester 2 Python Programming Maya Nair


e.g. with open(“C:\\Data\\names.txt”,”a” ) as file1:
names=[“Andrews\n”,”Samanta\n”,Bob\n”]
file1.writelines(names) # appends the new names to the
#existing file of names

6 Semester 2 Python Programming Maya Nair


tell() and seek() function of file object
▪ The tell() method tells you the current position within the file in other words, the
next read or write will occur at that many bytes from the beginning of the file:
▪ The seek(offset[, from]) method changes the current file position. The offset
argument indicates the number of bytes to be moved. The from argument specifies
the reference position from where the bytes are to be moved.
▪ If from is set to 0, it means use the beginning of the file as the reference position
and 1 means use the current position as the reference position and if it is set to 2
then the end of the file would be taken as the reference position.
With Python 3 only seeking from beginning is supported(which is default), the
current and end seeking is unsupported
Example:
fo = open("foo.txt", "r+")
str = fo.read(10);
print ("Read String is : ", str )
position = fo.tell();
print ("Current file position : ", position)
position = fo.seek(0, 0);
str = fo.read(10);
print ("Again read String is : ", str)
fo.close()
This would produce following result:
Read String is : Python is
Current file position : 10
Again read String is : Python is

Reading as well as writing:


A file can be opened for both reading as well as writing by using “r+” mode.

7 Semester 2 Python Programming Maya Nair


To traverse through each and every byte of the file in random order a function called
seek() is used , seek(n)- where n is the byte no, which starts from 0 and takes the file
pointer to first character in file. Consider the example below:
with open("C:\\Data\\names.txt","r+" ) as file1:
print(file1.read()) # reads the complete file, file pointer is at the end of file
file1.write("Treasa\n")# writes the content at the end of file
file1.seek(0) # takes the file pointer to start of file
print(file1.read()) # reads the file from start
file1.seek(4) # takes the file pointer to the fourth byte
file1.write("Ancy\n") # writes the content at the fourth byte
file1.seek(0) # takes the file pointer to start of file
print(file1.read()) # reads the file from start

Python File Object Attributes


File attributes give information about the file and file state.

8 Semester 2 Python Programming Maya Nair


▪ Example:
fo = open("foo.txt", "wb")
print "Name of the file: ", fo.name
print "Closed or not : ", fo.closed
print "Opening mode : ", fo.mode
▪ This would produce following result:
Name of the file: foo.txt
Closed or not : False
Opening mode : wb

9 Semester 2 Python Programming Maya Nair


What is Directory in Python?
If there are a large number of files to handle in your Python program, you can arrange
your code within different directories to make things more manageable.

A directory or folder is a collection of files and sub directories. Python has the os module
which provides us with many useful methods to work with directories (and files as well).

Get Current Directory


We can get the present working directory using the getcwd() method.

This method returns the current working directory in the form of a string. We can also
use the getcwdb() method to get it as bytes object.

>>> import os
>>> os.getcwd()
'C:\\Program Files\\Python'
The extra backslash implies escape sequence. The print() function will render this
properly.

>>> print(os.getcwd())
C:\Program Files\Python

Changing Directory
We can change the current working directory using the chdir() method.

The new path that we want to change to must be supplied as a string to this method. We
can use both forward slash (/) or the backward slash (\) to separate path elements.

It is safer to use escape sequence when using the backward slash.

>>> os.chdir('C:\\Python33')
>>> print(os.getcwd())
C:\Python33

10 Semester 2 Python Programming Maya Nair


List Directories and Files
All files and sub directories inside a directory can be known using the listdir() method.

This method takes in a path and returns a list of sub directories and files in that path. If
no path is specified, it returns from the current working directory.

>>> print(os.getcwd())

C:\Python33
>>> os.listdir()
['DLLs',
'Doc',
'include',
'Lib',
'libs',
'LICENSE.txt',
'NEWS.txt',
'python.exe',
'pythonw.exe',
'README.txt',
'Scripts',
'tcl',
'Tools']

>>> os.listdir('G:\\')
['$RECYCLE.BIN',
'Movies', 'Music',
'Photos',
'Series',
'System Volume Information']

11 Semester 2 Python Programming Maya Nair


Making a New Directory
We can make a new directory using the mkdir() method.

This method takes in the path of the new directory. If the full path is not specified, the
new directory is created in the current working directory.

>>> os.mkdir('test')
>>> os.listdir()
['test']

Renaming a Directory or a File

The rename() method can rename a directory or a file.

The first argument is the old name and the new name must be supplies as the second
argument.

>>> os.listdir()
['test']
>>> os.rename('test','new_one')
>>> os.listdir()
['new_one']

Removing Directory or File

A file can be removed (deleted) using the remove() method.

Similarly, the rmdir() method removes an empty directory.

>>> os.listdir()
['new_one', 'old.txt']
>>> os.remove('old.txt')
>>> os.listdir()

12 Semester 2 Python Programming Maya Nair


['new_one']
>>> os.rmdir('new_one')
>>> os.listdir()
[]

However, note that rmdir() method can only remove empty directories.

In order to remove a non-empty directory we can use the rmtree() method inside the
shutil module.

>>> os.listdir()
['test']
>>> os.rmdir('test')
Traceback (most recent call last):
...
OSError: [WinError 145] The directory is not empty: 'test'
>>> import shutil
>>> shutil.rmtree('test')
>>> os.listdir()
[]

Exception Handling in python

What is Exception?
An exception is an event, which occurs during the execution of a program that disrupts
the normal flow of the program's instructions. In general, when a Python script encounters
a situation that it cannot cope with, it raises an exception. An exception is a Python object
that represents an error.
When a Python script raises an exception, it must either handle the exception immediately
otherwise it terminates and quits.
Syntax
Here is simple syntax of try....except...else blocks −

13 Semester 2 Python Programming Maya Nair


try:
You do your operations here
...................... except

ExceptionI:

If there is ExceptionI, then execute this block.


except ExceptionII:
If there is ExceptionII, then execute this block.
......................

else:

If there is no exception then execute this block.


• A single try statement can have multiple except statements. This is useful when the

try block contains statements that may throw different types of exceptions.

• You can also provide a generic except clause, which handles any exception.

• After the except clause(s), you can include an else-clause. The code in the else-
block executes if the code in the try: block does not raise an exception.

• The else-block is a good place for code that does not need the try: block's
protection.
try:
fh = open("testfile", "w")

fh.write("This is my test file for exception handling!!")

except IOError:

print ("Error: can\'t find file or read data")

14 Semester 2 Python Programming Maya Nair


else:

print ("Written content in the file successfully")

fh.close()

This produces the following result −


Written content in the file successfully
#!/usr/bin/python3
try:

fh = open("testfile", "r")

fh.write("This is my test file for exception handling!!")

except IOError:

print ("Error: can\'t find file or read data") else:

print ("Written content in the file successfully")


This produces the following result −
Error: can't find file or read data

The except Clause with No Exceptions


You can also use the except statement with no exceptions defined as follows −

15 Semester 2 Python Programming Maya Nair


try:
You do your operations here
......................

except:

If there is any exception, then execute this block.


......................

else:

If there is no exception then execute this block.

The try-finally Clause


You can use a finally: block along with a try: block. The finally: block is a place to put
any code that must execute, whether the try-block raised an exception or not.
The syntax of the try-finally statement is this −

16 Semester 2 Python Programming Maya Nair


try:
You do your operations here;
......................
Due to any exception, this may be skipped.

finally:

This would always be executed.


......................
#!/usr/bin/python3
try:

fh = open("testfile", "w")

fh.write("This is my test file for exception handling!!")

finally:

print ("Going to close the file")

fh.close()

except IOError:

print ("Error: can\'t find file or read data")


This produces the following result −
Going to close the file

When an exception is thrown in the try block, the execution immediately passes to the
finally block. After all the statements in the finally block are executed, the exception is
raised again and is handled in the except statements if present in the next higher layer of
the try-except statement.

17 Semester 2 Python Programming Maya Nair


Raising an Exception
You can raise exceptions in several ways by using the raise statement. The general syntax
for the raise statement is as follows −
Syntax
raise [Exception [, args [, traceback]]]

Here, Exception is the type of exception (for example, NameError) and argument is a
value for the exception argument. The argument is optional; if not supplied, the
exception argument is None.
The final argument, traceback, is also optional (and rarely used in practice), and if present,
is the traceback object used for the exception.
def functionName( x,y ):

if y==0:

raise Exception(y)
# The code below to this would not be executed
# if we raise the exception

return x/y

try:

l = functionName(3,0)

print ("Quotient = ",l)

except Exception as e:

print ("error in divisor argument",e.args[0])


This produces the following result −

error in divisor argument 0

18 Semester 2 Python Programming Maya Nair


The AssertionError Exception

Instead of waiting for a program to crash midway, you can also start by making an assertion
in Python. We assert that a certain condition is met. If this condition turns out to be True,
then that is excellent! The program can continue. If the condition turns out to be False, you
can have the program throw an AssertionError exception.

Look at the example:

a=int(input("Enter a number"))

assert a<10, "Size must be less than 10"

print(a)

19 Semester 2 Python Programming Maya Nair


Iterables and iterators in Python
Iterators are everywhere in Python. They are elegantly implemented withinfor loops,
comprehensions etc. ,but hidden in plain sight. Iterator in Python is simply an object that
can be iterated upon. An object which will return data, one element at a time.
Python iterator object must implement two special methods__iter__() and
,
__next__(), collectively called the iterator protocol.

An object is called iterable if we can get an iterator from it. Most of built-in containers
in Python like: list, tuple, string etc. are iterables.

The iter() function (which in turn calls the __iter__() method) returns an iterator from
them.

Iterating Through an Iterator in Python


We use the next() function to manually iterate through all the items of an iterator.When
we reach the end and there is no more data to be returned, it will raise StopIteration.
Following is an example.

20 Semester 2 Python Programming Maya Nair


# define a list
my_list = [4, 7, 0, 3]
# get an iterator using iter()
my_iter = iter(my_list)
## iterate through it using next()
#prints 4
print(next(my_iter))
#prints 7
print(next(my_iter))
## next(obj) is same as obj.__next__()
#prints 0
print(my_iter.__next__())
#prints 3
print(my_iter.__next__())
## This will raise error, no items left
next(my_iter)
Output:

4
7
0
3

Traceback (most recent call last): File


"<stdin>", line 24, in <module>
next(my_iter)
StopIteration

21 Semester 2 Python Programming Maya Nair


How for loop actually works?

A more elegant way of automatically iterating is by using the for loop. Using this, we
can iterate over any object that can return an iterator, for example list, string, file etc. A
for loop as given below

for element in iterable:

# do something with element

is actually implemented as
# create an iterator object from that iterable
iter_obj = iter(iterable)

# infinite loop
while True:
try:
# get the next item
element = next(iter_obj)
# do something with element
except StopIteration:
# if StopIteration is raised, break from loop
break
So internally, the for loop creates an iterator object, iter_obj by calling iter() on the
iterable.

Ironically, this for loop is actually an infinite while loop. through.

22 Semester 2 Python Programming Maya Nair


Regular Expressions
Regular Expression (abbreviated regex or regexp) is a search pattern, mainly for use in
pattern matching with strings, i.e. "find and replace"like operations. Each character in a
regular expression is either understood to be a metacharacter with its special meaning, or
a regular character with its literal meaning.

Metacharacters

1. .(dot)- Matches any single character (many applications exclude newlines Within
POSIX bracket expressions, the dot character matches a literal dot. For example,
a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".
2. +- Matches the preceding element one or more times. For example, ab+c matches
"abc", "abbc", "abbbc", and so on, but not "ac"
3. ?- Matches the preceding pattern element zero or one times
4. *- Matches the preceding element zero or more times. For example, ab*c matches
"ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy",
and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.
5. ^- Matches the beginning of a line or string.
6. $- Matches the end of a line or string.
7. [ ]- A bracket expression. Matches a single character that is contained within the
brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which
matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z]
matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].
8. [^ ]- Matches a single character that is not contained within the brackets. For
example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches
any single character that is not a lowercase letter from "a" to "z". Likewise, literal
characters and ranges can be mixed.
9. ()-Defines a marked subexpression
10. {m,n} Matches the preceding element at least m and not more than n times.
For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa".
11. {m}-Matches the preceding element exactly m times

There are some special sequences beginning with '\' represent predefined sets of
characters that are often useful, such as the set of digits, the set of letters, or the set
of anything that isn’t whitespace.

23 Semester 2 Python Programming Maya Nair


\d Matches any decimal digit; this is equivalent to the class [0-9].
\D Matches any non-digit character; this is equivalent to the class [^0-9].
\s Matches any whitespace character; this is equivalent to the
class [ \t\n\r\f\v].
\S Matches any non-whitespace character; this is equivalent to the
class [^ \t\n\r\f\v].
\w Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-
9_].
\W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-
Z0-9_].

The re module provides an interface to the regular expression engine, allowing you to
compile REs into objects and then perform matches with them.

Like any other module, you start by importing it.


>>> import re

Finding All Matches in a String


re.findall() method- It takes two arguments: (1) the regular expression pattern, and (2)
the target string to find matches in.
It returns all matched string portions as a list. If there are no matches, it will simply
return an empty list:
>>> wood = 'How much wood would a woodchuck chuck if a woodchuck could chuck
wood?'

>>> re.findall(r'wo\w+', wood) # r'...' for raw string


['wood', 'would', 'woodchuck', 'woodchuck', 'wood']
>>> re.findall(r'o+', wood)

['o', 'oo', 'o', 'oo', 'oo', 'o', 'oo']


>>> re.findall(r'e+', wood)
[]

24 Semester 2 Python Programming Maya Nair


What if you want to ignore case in your matches? You can specify it as a third optional
argument: re.IGNORECASE.

>>> foo = 'This and that and those'


>>> re.findall(r'th\w+', foo)
['that', 'those']
>>> re.findall(r'th\w+', foo, re.IGNORECASE) # case is ignored while matching
['This', 'that', 'those']

Substituting All Matches in a String

To replace all matching portions with something else can be done using the
re.sub()method. Below, we are finding all vowel sequences and replacing them with '-'.
The method returns the result as a new string.
>>> wood

'How much wood would a woodchuck chuck if a woodchuck could chuck wood?'

>>> re.sub(r'[aeiou]+', '-', wood) # 3 args: regex, replacer string, t

'H-w m-ch w-d w-ld - w-dch-ck ch-ck -f - w-dch-ck c-ld ch-ck w-d?'

Removing the matching portions can also be achieved through re.sub(): just make the
"replacer" string an empty string ''.
>>> re.sub(r'[aeiou]+', '', wood) # substitute with an empty string
'Hw mch wd wld wdchck chck f wdchck cld chck wd?'

Compiling a Regular Expression Object


If you have to match a regular expression on many different strings, it is a good idea to
construct a regular expression as a python object. That way, the finite-state automaton for

25 Semester 2 Python Programming Maya Nair


the regular expression is compiled once and reused. Since constructing a FSA is rather
computationally expensive, this lightens processing loads. To do this, use the
re.compile() method:
>>> myre = re.compile(r'\w+ou\w+') # compiling myre as a reg ex
>>> myre.findall(wood) # calling .findall() directly on myre
['would', 'could']
>>> myre.findall('Colorless green ideas sleep furiously')
['furiously']
>>> myre.findall('The thirty-three thieves thought that they thrilled the throne
throughout Thursday.')
['thought', 'throughout']
Once compiled, you call a re method directly on the regular expression object. In the
example above, myre is the compiled regular expression object corresponding to
r'\w+ou\w+', and you call .findall() on it as myre.findall(). In doing so, you now need to
specify one fewer arguments: the target string myre.findall(wood) is the only thing
needed.
Testing if a Match Exists

There are two functions for this. The match() and the search() function returns a match
object on successful matchand returns None if no match is found. The match() function
only checks if the RE matches at the beginning of the string while search() will scan
forward through the string for a match. It’s important to keep this distinction in mind.
Remember, match() will only report a successful match which will start at 0; if the match
wouldn’t start at zero,match() will not report it.
>>> print(re.match('super', 'superstition').span())
(0, 5)
>>> print(re.match('super', 'insuperable'))
None

Here the span() function on match object return a tuple containing the (start, end)
positions of the match.Other methods are

26 Semester 2 Python Programming Maya Nair


group() Return the string matched by the RE

Return the starting position of the


start()
match

end() Return the ending position of the match

On the other hand, search() will scan forward through the string, reporting the first
match it finds.

>>> print(re.search('super', 'superstition').span())


(0, 5)
>>> print(re.search('super', 'insuperable').span())
(2, 7)

27 Semester 2 Python Programming Maya Nair


Compilation Flags

Compilation flags let you modify some aspects of how regular expressions work.

Flag Meaning

Makes several escapes like \w, \b, \s and \d match only on ASCII
ASCII, A
characters with the respective property.

DOTALL, S Make . match any character, including newlines.

IGNORECASE, I Do case-insensitive matches.

LOCALE, L Do a locale-aware match.

Multi-line matching, affecting ^ and $.


MULTILINE, M

28 Semester 2 Python Programming Maya Nair

You might also like