Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 61

UNIT - 3

System tools
To begin our exploration of the systems domain, we will take a quick tour through the
standard library sys and os modules in this chapter, before moving on to larger system
programming concepts. As you can tell from the length of their attribute lists, both of these are
large modules—the following reflects Python 3.1 running on Windows 7 outside IDLE:

C:\...\PP4E\System> python

Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (...)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import sys, os

>>> len(dir(sys)) # 65 attributes

65

>>> len(dir(os)) # 122 on Windows, more on Unix

122

>>> len(dir(os.path)) # a nested module within os

52

The content of these two modules may vary per Python version and platform. For example, os is
much larger under Cygwin after building Python 3.1 from its source code there (Cygwin is a
system that provides Unix-like functionality on Windows; it is discussed further in More on
Cygwin Python for Windows):

$ ./python.exe

Python 3.1.1 (r311:74480, Feb 20 2010, 10:16:52)

[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin

Type "help", "copyright", "credits" or "license" for more information.

>>> import sys, os

>>> len(dir(sys))

64
>>> len(dir(os))

217

>>> len(dir(os.path))

51

1.The OS module (and sys, and path)


 The os and sys modules provide numerous tools to deal with filenames, paths, directories.
The os module contains two sub-modules os.sys (same as sys) and os.path that are
dedicated to the system and directories; respectively.
 Whenever possible, you should use the functions provided by these modules for file,
directory, and path manipulations. These modules are wrappers for platform-specific
modules, so functions like os.path.split work on UNIX, Windows, Mac OS, and any
other platform supported by Python.
1.1. Quick start
You can build multi-platform path using the proper separator symbol:
>>> import os
>>> import os.path
>>> os.path.join(os.sep, 'home', 'user', 'work')
'/home/user/work'

>>> os.path.split('/usr/bin/python')
('/usr/bin', 'python')

1.2. Functions
The os module has lots of functions. We will not cover all of them thoroughly but this could be a
good start to use the module.
1.2.1. Manipulating Directories
The getcwd() function returns the current directory (in unicode format with getcwdu() ).
The current directory can be changed using chdir():
os.chdir(path)
The listdir() function returns the content of a directory. Note, however, that it mixes directories
and files.
The mkdir() function creates a directory. It returns an error if the parent directory does not exist.
If you want to create the parent directory as well, you should rather use makedirs():
>>> os.mkdir('temp') # creates temp directory inside the current directory
>>> os.makedirs(/tmp/temp/temp")

Once created, you can delete an empty directory with rmdir():


>>> import os
>>> os.mkdir('/tmp/temp')
>>> os.rmdir('/tmp/temp')
You can remove all directories within a directory (if there are not empty) by
using os.removedirs().
If you want to delete a non-empty directory, use shutil.rmtree() (with cautious).
1.2.2. Removing a file
To remove a file, use os.remove(). It raise the OSError exception if the file cannot be removed.
Under Linux, you can also use os.unlink().
1.2.3. Renaming files or directories
You can rename a file from an old name to a new one by using os.rename(). See
also os.renames().
1.2.4. Permission
you can change the mode of a file using chmod(). See also chown, chroot, fchmod, fchown.
The os.access() verifies the access permission specified in the mode argument. Returns 1 if the
access is granted, 0 otherwise. The mode can be:

os.F_OK Value to pass as the mode parameter of access() to test the existence of path.
os.R_OK: Value to include in the mode parameter of access() to test the readability of path.
os.W_OK Value to include in the mode parameter of access() to test the writability of path.
os.X_OK Value to include in the mode parameter of access() to determine if path can be

>>> os.access("validFile", os.F_OK)


True
You can change the mask of a file using the the os.umask() function. The mask is just a number
that summarises the permissions of a file:
os.umask(644)

1.2.5. Using more than one process


On Unix systems, os.fork() tells the computer to copy everything about the currently running
program into a newly created program that is separated, but almost entirely identical. The newly
created process is the child process and gets the data and code of the parent process. The child
process gets a process number known as pid. The parent and child processes are independent.
The following code works on Unix and Unix-like systems only:

import os
pid = os.fork()
if pid == 0: # the child
print "this is the child"
elif pid > 0:
print "the child is pid %d" % pid
else:
print("An error occured")
Here, the fork is zithin the executed script but ,ost of the time; you would require the

One of the most common things to do after an os.fork call is to call os.execl immediately
afterward to run another program. os.execl is an instruction to replace the running program with
a new program, so the calling program goes away, and a new program appears in its place:

import os
pid = os.fork()
# fork and exec together
print "second test"
if pid == 0: # This is the child
print "this is the child"
print "I'm going to exec another program now"
os.execl(`/bin/cat', `cat', `/etc/motd')
else:
print "the child is pid %d" % pid
os.wait()
The os.wait function instructs Python that you want the parent to not do anything until the child
process returns. It is very useful to know how this works because it works well only under Unix
and Unix-like platforms such as Linux.

Windows also has a mechanism for starting up new processes. To make the common task of
starting a new program easier, Python offers a single family of functions that combines os.fork
and os.exec on Unix-like systems, and enables you to do something similar on Windows
platforms.

When you want to just start up a new program, you can use the os.spawn family of functions.

The different between the different spawn versions:

 v - requires a list/vector os parameters. This allows a command to be run with very


different commands from one instance to the next without needing to alter the program at
all.
 l - requires a simple list of parameters.
 e - requires a dictionary containing names and values to replace the current
environment.
 p - requires the value of the PATH key in the environment dictionary to find the program.
The
p variants are available only on Unix-like platforms. The least of what this means is that on
Windows your programs must have a completely qualified path to be usable by the os.spawn
calls, or you have to search the path yourself:

import os, sys


if sys.platform == `win32':
print "Running on a windows platform"
command = "C:\\winnt\\system32\\cmd.exe"
params = []
if sys.platform == `linux2':
print "Running on a Linux system, identified by %s" % sys.platform
command = `/bin/uname'
params = [`uname', `-a']
print "Running %s" % command
os.spawnv(os.P_WAIT, command, params)
The exec function comes in different flavours:

 execl(path, args) or execle(path, args, env) env is a dict with env variables.
 exexp(file; a1; a2, a3) or exexp(file; a1; a2, a3, env)

todo

 os.getloadavg os.setegid
 os.getlogin os.seteuid
 os.abort os.getpgid os.setgid
 os.getpgrp os.setgroups
 os.setpgid os.setpgrp
 os.UserDict os.getresgid os.setregid
 os.getresuid os.setresgid os.getsid
 os.setresuid os.setreuid
 os.closerange os.initgroups os.setsid
 os.confstr os.isatty os.setuid
 os.confstr_names os.ctermid
 os.defpath os.devnull
 os.link os.dup os.dup2
 os.errno os.major
 os.error os.makedev os.stat_float_times
 os.execl
 os.execle os.minor os.statvfs
 os.execlp os.statvfs_result
 os.execlpe os.mkfifo os.strerror
 os.execv os.mknod os.symlink
 os.execve
 os.execvp os.sysconf
 os.execvpe os.open os.sysconf_names
 os.extsep os.openpty os.system
 os.fchdir os.pardir os.tcgetpgrp
 os.tcsetpgrp os.pathconf os.tempnam
 os.fdatasync os.pathconf_names os.times
 os.fdopen os.tmpfile
 os.pipe os.tmpnam
 os.forkpty os.popen os.ttyname
 os.fpathconf os.popen2 os.popen3
 os.fstatvfs os.popen4
 os.fsync os.putenv os.unsetenv
 os.ftruncate os.read os.urandom
 os.readlink os.utime
 os.wait os.wait3
 os.getenv os.wait4
 os.waitpid os.getgroups
 The os.walk() function allows to recursively scan a directory and obtain tuples containing
tuples of (dirpath, dirnames, filename) where dirnames is a list of directories found in
dirpath, and filenames the list of files found in dirpath.
 Alternatevely, the os.path.walk can also be used but works in a different way (see below).

1.2.6. user id and processes


 os.getuid() returns the current process’s user id.
 os.getgid() returns the current process’s group id.
 os.geteuid() and os.getegid() returns the effective user id and effective group id
 os.getpid() returns the current process id
 os.getppid() returns the parent’s process id
1.3. Cross platform os attributes
An alternative character used by the OS to separate pathame components is provided
by os.altsep().
The os.curdir() refers to the current directory. . for unix and windows and : for Mac OS.
Another multi-platform function that could be useful is the line separator. Indeed the final
character that ends a line is coded differently under Linux, Windows and MAC. For instance
under Linux, it is the n character but you may have r or rn. Using the os.linesep() guarantees to
use a universal line_ending character.
The os.uname gives more information about your system:

>>> os.uname
('Linux',
'localhost.localdomain',
'3.3.4-5.fc17.x86_64',
'#1 SMP Mon May 7 17:29:34 UTC 2012',
'x86_64')
The function os.name() returns the OS-dependent module (e.g., posix, doc, mac,...)
The function os.pardir() refers to the parent directory (.. for unix and windows and :: for Mac
OS).
The os.pathsep() function (also found in os.path.sep()) returns the correct path separator for
your system (slash / under Linux and backslash under Windows).
Finally, the os.sep() is the character that separates pathname components (/ for Unix, for
windows and : for Mac OS). It is also available in os.path.sep()
>>> # under linux
>>> os.path.sep
'/'
Another function that is related to multi-platform situations is the os.path.normcase() that is
useful under Windows where the OS ignore cases. So, to compare two filenames you will need
this function.
1.3.1. More about directories and files
os.path provides methods to extract information about path and file names:
>>> os.path.curdir # returns the current directory ('.')
>>> os.path.isdir(dir) # returns True if dir exists
>>> os.path.isfile(file) # returns True if file exists
>>> os.path.islink(link) # returns True if link exists
>>> os.path.exists(dir) # returns True if dir exists (full pathname or filename)
>>> os.path.getsize(filename) # returns size of a file without opening it.
You can access to the time when a file was last modified. Nevertheless, the output is not friendly
user. Under Unix it corresponds to the time since the Jan 1, 1970 (GMT) and under Mac OS
since Jan 1, 1904 (GMT)Use the time module to make it easier to read:

>>> import time


>>> mtime = os.path.getmtime(filename) # returns time when the file was last modified
The output is not really meaningful since it is expressed in seconds. You can use
the time module to get a better layout of that time:
>>> print time.ctime(mtime)
Tue Jan 01 02:02:02 2000
Similarly, the function os.path.getatime() returns the last access time of a file
and os.path.getctime() the metadata change time of a file.
Finally, you can get a all set of information using os.stat() such as file’s size, access time and so
on. The stat() returns a tuple of numbers, which give you information about a file (or directory).
>>> import stat
>>> import time
>>> def dump(st):
... mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
... print "- size:", size, "bytes"
... print "- owner:", uid, gid
... print "- created:", time.ctime(ctime)
... print "- last accessed:", time.ctime(atime)
... print "- last modified:", time.ctime(mtime)
... print "- mode:", oct(mode)
... print "- inode/dev:", ino, dev
>>> dump(os.stat("todo.txt"))
- size: 0 bytes
- owner: 1000 1000
- created: Wed Dec 19 19:40:02 2012
- last accessed: Wed Dec 19 19:40:02 2012
- last modified: Wed Dec 19 19:40:02 2012
- mode: 0100664
- inode/dev: 23855323 64770
There are other similar function os.lstat() for symbolic links, os.fstat() for file descriptor
You can determine is a path is a mount point using os.ismount(). Under unix, it checks if a path
or file is mounted on an other device (e.g. an external hard disk).
1.3.2. Splitting paths
To get the base name of a path (last component):

>>> import os
>>> os.path.basename("/home/user/temp.txt")
temp.txt
To get the directory name of a path, use os.path.dirname():
>>> import os
>>> os.path.dirname("/home/user/temp.txt")
/home/user
The os.path.abspath() returns the absolute path of a file:
>>> import os
>>> os.path.abspath('temp.txt')
In summary, consider a file temp.txt in /home/user:
function Output

basename ‘temp.txt’

dirname ‘’

split (‘’, ‘temp.txt’)


function Output

splitdrive (‘’, ‘temp.txt’)

splitext (‘temp’; ‘txt’)

abspath ‘/home/user/temp.txt

os.path.extsep os.path.genericpath os.path.realpath


os.path.relpath os.path.samefile
os.path.sameopenfile os.path.samestat
os.path.isab
os.path.commonprefix
os.path.defpath s.path.supports_unicode_filenames
os.path.devnull os.path.lexists
os.path.warnings .expanduser os.path.expandvars
Split the basename and directory name in one function call using os.path.split().
The split function only splits off the last part of a component. In order to split off all parts, you
need to write your own function:
Note

the path should not end with ‘/’, otherwise the name is empty.

os.path.split(‘/home/user’) is not the same as os.path.split(‘/home/user/’)

>>> def split_all(path):


... parent, name = os.path.split(path)
... if name == '':
... return (parent, )
... else:
... return split_all(parent) + (name,)
>>> split_all('/home/user/Work')
('/', 'home', 'user', 'Work')
The os.path.splitext() function splits off the extension of a file:
>>> os.path.splitext('image.png')
('image', 'png')
For windows users, you can use the os.splitdrive() that returns a tuple with 2 strings, there first
one being the drive.

Conversely, the join method allows to join several directory name to create a full path name:
>>> os.path.join('/home', 'user')
'/home/user'
os.path.walk() scan a directory recursively and apply a function of each item found (see
also os.walk() above):
def print_info(arg, dir, files):
for file in files:
print dir + ' ' + file
os.path.walk('.', print_info, 0)

1.4. Accessing environment variables


You can easily acecss to the environmental variables:

import os
os.environ.keys()
and if you know what you are doing, you can add or replace a variable:

os.environ[NAME] = VALUE

Python sys Module

The sys module in Python provides various functions and variables that are used to manipulate
different parts of the Python runtime environment. It allows operating on the interpreter as it
provides access to the variables and functions that interact strongly with the interpreter. Let’s
consider the below example.
Example:

Python3

import sys
print(sys.version)

Output:
3.6.9 (default, Oct 8 2020, 12:12:24)

[GCC 8.4.0]

 In the above example, sys.version is used which returns a string containing the version
of Python Interpreter with some additional information.
 This shows how the sys module interacts with the interpreter. Let us dive into the
article to get more information about the sys module.
Input and Output using sys
The sys modules provide variables for better control over input or output. We can even redirect
the input and output to other devices. This can be done using three variables –

 stdin
 stdout
 stderr
stdin: It can be used to get input from the command line directly. It used is for standard input.
It internally calls the input() method. It, also, automatically adds ‘\n’ after each sentence.
Example:

Python3

import sys
for line in sys.stdin:
if 'q' == line.rstrip():
break
print(f'Input : {line}')
print("Exit")

Output:

stdout: A built-in file object that is analogous to the interpreter’s standard output stream in
Python. stdout is used to display output directly to the screen console. Output can be of any
form, it can be output from a print statement, an expression statement, and even a prompt
direct for input. By default, streams are in text mode. In fact, wherever a print function is
called within the code, it is first written to sys.stdout and then finally on to the screen.
Example:
Python3

import sys
sys.stdout.write('Geeks')

Output:

Geeks

stderr: Whenever an exception occurs in Python it is written to sys.stderr.


Example:

Python3

import sys
def print_to_stderr(*a):
# Here a is the array holding the objects
# passed as the arguement of the function
print(*a, file = sys.stderr)
print_to_stderr("Hello World")

Output:

Command Line Arguments


Command-line arguments are those which are passed during the calling of the program along
with the calling statement. To achieve this using the sys module, the sys module provides a
variable called sys.argv. It’s main purpose are:
 It is a list of command-line arguments.
 len(sys.argv) provides the number of command-line arguments.
 sys.argv[0] is the name of the current Python script.
Example: Consider a program for adding numbers and the numbers are passed along with the
calling statement.

Python3

# Python program to demonstrate


# command line arguments
import sys
# total arguments
n = len(sys.argv)
print("Total arguments passed:", n)
# Arguments passed
print("\nName of Python script:", sys.argv[0])
print("\nArguments passed:", end = " ")
for i in range(1, n):
print(sys.argv[i], end = " ")
# Addition of numbers
Sum = 0
for i in range(1, n):
Sum += int(sys.argv[i])
print("\n\nResult:", Sum)

Output:

Exiting the Program


sys.exit([arg]) can be used to exit the program. The optional argument arg can be an integer
giving the exit or another type of object. If it is an integer, zero is considered “successful
termination”.
Note: A string can also be passed to the sys.exit() method.
Example:

Python3

# Python program to demonstrate


# sys.exit()

import sys
age = 17

if age < 18:


# exits the program
sys.exit("Age less than 18")
else:
print("Age is not less than 18")

Output:
An exception has occurred, use %tb to see the full traceback.
SystemExit: Age less than 18
Working with Modules
sys.path is a built-in variable within the sys module that returns the list of directories that the
interpreter will search for the required module.
When a module is imported within a Python file, the interpreter first searches for the specified
module among its built-in modules. If not found it looks through the list of directories defined
by sys.path.
Note: sys.path is an ordinary list and can be manipulated.
Example 1: Listing out all the paths

Python3

import sys
print(sys.path)

Output:

Example 2: Truncating the value of sys.path


 Python3
import sys

# Removing the values


sys.path = []
# importing pandas after removing values
import pandas

Output:
ModuleNotFoundError: No module named 'pandas'

sys.modules return the name of the Python modules that the


current shell has imported.
Example:

 Python3

import sys
print(sys.modules)

Output:

Reference Count
sys.getrefcount() method is used to get the reference count for any given object. This value is
used by Python as when this value becomes 0, the memory for that particular value is deleted.
Example:

Python3

import sys
a = 'Geeks'

print(sys.getrefcount(a))

Output

More Functions in Python sys

Function Description

sys.setrecursionlimit() method is used to set the maximum depth of


sys.setrecursionlimit() the Python interpreter stack to the required limit.

sys.getrecursionlimit() method is used to find the current recursion

sys.getrecursionlimit() limit of the interpreter or to find the maximum depth of the Python
method interpreter stack.

It is used for implementing debuggers, profilers and coverage tools.


This is thread-specific and must register the trace using
threading.settrace(). On a higher level, sys.settrace() registers the
sys.settrace() traceback to the Python interpreter

sys.setswitchinterval() sys.setswitchinterval() method is used to set the interpreter’s thread


method switch interval (in seconds).

It fetches the largest value a variable of data type Py_ssize_t can


sys.maxsize() store.

maxint/INT_MAX denotes the highest value that can be


sys.maxint represented by an integer.
Function Description

sys.getdefaultencoding() sys.getdefaultencoding() method is used to get the current default


method string encoding used by the Unicode implementation.

Directory traversal tool


os.walk() method of the OS module can be used for listing out all the directories. This method
basically generates the file names in the directory tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath,
dirnames, filenames).

dirpath: A string that is the path to the directory

dirnames: All the sub-directories from root.

filenames: All the files from root and directories.

Syntax: os.walk(top, topdown=True, onerror=None, followlinks=False)

Parameters:
top: Starting directory for os.walk().
topdown: If this optional argument is True then the directories are scanned from top-down
otherwise from bottom-up. This is True by default.
onerror: It is a function that handles errors that may occur.
followlinks: This visits directories pointed to by symlinks, if set to True.

Return Type: For each directory in the tree rooted at directory top (including top itself), it
yields a 3-tuple (dirpath, dirnames, filenames).
We want to list out all the subdirectories and file inside the directory Tree. Below is the
implementation.

# Python program to list out


# all the sub-directories and files

import os

# List to store all


# directories
L = []

# Traversing through Test


for root, dirs, files in os.walk('Test'):

# Adding the empty directory to list


L.append((root, dirs, files))

print("List of all sub-directories and files:")


for i in L:
print(i)

Output:

List of all sub-directories and files:

('Test', ['B', 'C', 'D', 'A'], [])

('Test/B', [], [])

('Test/C', [], ['test2.txt'])

('Test/D', ['E'], [])

('Test/D/E', [], [])

('Test/A', ['A2', 'A1'], [])

('Test/A/A2', [], [])

('Test/A/A1', [], ['test1.txt'])

The above code can be shortened using List Comprehension which is a more Pythonic way.
Below is the implementation.
# Python program to list out
# all the sub-directories and files

import os

# List comprehension to enter


# all directories to list

L = [(root, dirs, files) for root, dirs, files, in os.walk('Test')]

print("List of all sub-directories and files:")


for i in L:
print(i)

Output:

List of all sub-directories and files:

('Test', ['B', 'C', 'D', 'A'], [])

('Test/B', [], [])

('Test/C', [], ['test2.txt'])

('Test/D', ['E'], [])

('Test/D/E', [], [])

('Test/A', ['A2', 'A1'], [])

('Test/A/A2', [], [])

('Test/A/A1', [], ['test1.txt'])

Parallel System Tools


 Most computers spend a lot of time doing nothing. If you start a system monitor tool and
watch the CPU utilization,it’s rare to see one hit 100 percent, even when you are running
multiple programs.
 There are just too many delays built into software: disk accesses, network traffic,
database queries, waiting for users to click a button, and so on.
 In fact, the majority of a modern CPU’s capacity is often spent in an idle state; faster
chips help speed up performance demand peaks, but much of their power can go largely
unused.
 Early on in computing, programmers realized that they could tap into such unused
processing power by running more than one program at the same time.
 By dividing the CPU’s attention among a set of tasks, its capacity need not go to waste
while any given task is waiting for an external event to occur.
 The technique is usually called parallel processing (and sometimes “multiprocessing” or
even “multitasking”) because many tasks seem to be performed at once, overlapping and
parallel in time.
 It’s at the heart of modern operating systems, and it gave rise to the notion of multiple-
active-window computer interfaces we’ve all come to take for granted.
 Even within a single program, dividing processing into tasks that run in parallel can make
the overall system faster, at least as measured by the clock on your wall.
 Just as important is that modern software systems are expected to be responsive to users
regardless of the amount of work they must perform behind the scenes.
 It’s usually unacceptable for a program to stall while busy carrying out a request.
Consider an email-browser user interface, for example; when asked to fetch email from a
server, the program must download text from a server over a network.
 If you have enough email or a slow enough Internet link, that step alone can take minutes
to finish. But while the download task proceeds, the program as a whole shouldn’t stall—
it still must respond to screen redraws, mouse clicks, and so on.
 Parallel processing comes to the rescue here, too. By performing such long-running tasks
in parallel with the rest of the program, the system at large can remain responsive no
matter how busy some of its parts may be.
 Moreover, the parallel processing model is a natural fit for structuring such programs and
others; some tasks are more easily conceptualized and coded as components running as
independent, parallel entities.
 There are two fundamental ways to get tasks running at the same time in Python—
process forks and spawned threads. Functionally, both rely on underlying operating
system services to run bits of Python code in parallel.
 Procedurally, they are very different in terms of interface, portability, and
communication. For instance, at this writing direct process forks are not supported on
Windows under standard Python (though they are under Cygwin Python on Windows).
 By contrast, Python’s thread support works on all major platforms.
 Moreover, the os.spawn family of calls provides additional ways to launch programs in a
platform-neutral way that is similar to forks, and the os.popen and os.system calls
and subprocess module.

Forking Processes
 Forked processes are a traditional way to structure parallel tasks, and they are a
fundamental part of the Unix tool set.
 Forking is a straightforward way to start an independent program, whether it is different
from the calling program or not.
 Forking is based on the notion of copying programs: when a program calls the fork
routine, the operating system makes a new copy of that program and its process in
memory and starts running that copy in parallel with the original.
 Some systems don’t really copy the original program (it’s an expensive operation), but
the new copy works as if it were a literal copy.
 After a fork operation, the original copy of the program is called the parent process, and
the copy created by os.fork is called the child process.
 In general, parents can make any number of children, and children can create child
processes of their own; all forked processes run independently and in parallel under the
operating system’s control, and children may continue to run after their parent exits.
 This is probably simpler in practice than in theory, though.
 The Python script in Example 5-1 forks new child processes until you type the letter q at
the console.
Example 5-1. PP4E\System\Processes\fork1.py
"forks child processes until you type 'q'"
import os
def child():
print('Hello from child', os.getpid())
os._exit(0) # else goes back to parent loop
def parent():
while True:
newpid = os.fork()
if newpid == 0:
child()
else:
print('Hello from parent', os.getpid(), newpid)
if input() == 'q': break
parent()
 Python’s process forking tools, available in the os module, are simply thin wrappers over
standard forking calls in the system library also used by C language programs.
 To start a new, parallel process, call the os.fork built-in function.
 Because this function generates a copy of the calling program, it returns a different value
in each copy: zero in the child process and the process ID of the new child in the parent.
 Programs generally test this result to begin different processing in the child only; this
script, for instance, runs the child function in child processes only.
 Because forking is ingrained in the Unix programming model, this script works well on
Unix, Linux, and modern Macs. Unfortunately, this script won’t work on the standard
version of Python for Windows today, because fork is too much at odds with the
Windows model.
 Python scripts can always spawn threads on Windows, and the multiprocessing module
described later in this chapter provides an alternative for running processes portably,
which can obviate the need for process forks on Windows in contexts that conform to its
constraints (albeit at some potential cost in low-level control).
 The script in Example 5-1 does work on Windows, however, if you use the Python
shipped with the Cygwin system (or build one of your own from source-code with
Cygwin’s libraries). Cygwin is a free, open source system that provides full Unix-like
functionality for Windows (and is described further in More on Cygwin Python for
Windows).
 You can fork with Python on Windows under Cygwin, even though its behavior is not
exactly the same as true Unix forks. Because it’s close enough for this book’s examples,
though, let’s use it to run our script live:
[C:\...\PP4E\System\Processes]$ python fork1.py
Hello from parent 7296 7920
Hello from child 7920

Hello from parent 7296 3988


Hello from child 3988

Hello from parent 7296 6796


Hello from child 6796
q
 These messages represent three forked child processes; the unique identifiers of all the
processes involved are fetched and displayed with the os.getpid call. A subtle point:
the child process function is also careful to exit explicitly with an os._exit call. We’ll
discuss this call in more detail later in this chapter, but if it’s not made, the child process
would live on after the child function returns (remember, it’s just a copy of the original
process).
 The net effect is that the child would go back to the loop in parent and start forking
children of its own (i.e., the parent would have grandchildren). If you delete the exit call
and rerun, you’ll likely have to type more than one q to stop, because multiple processes
are running in the parent function.
 In Example 5-1, each process exits very soon after it starts, so there’s little overlap in
time. Let’s do something slightly more sophisticated to better illustrate multiple forked
processes running in parallel. Example 5-2 starts up 5 copies of itself, each copy counting
up to 5 with a one-second delay between iterations.
 The time.sleep standard library call simply pauses the calling process for a number of
seconds (you can pass a floating-point value to pause for fractions of seconds).
Example 5-2. PP4E\System\Processes\fork-count.py
"""
fork basics: start 5 copies of this program running in parallel with
the original; each copy counts up to 5 on the same stdout stream--forks
copy process memory, including file descriptors; fork doesn't currently
work on Windows without Cygwin: use os.spawnv or multiprocessing on
Windows instead; spawnv is roughly like a fork+exec combination;
"""
import os, time
def counter(count): # run in new process
for i in range(count):
time.sleep(1) # simulate real work
print('[%s] => %s' % (os.getpid(), i))

for i in range(5):
pid = os.fork()
if pid != 0:
print('Process %d spawned' % pid) # in parent: continue
else:
counter(5) # else in child/new process
os._exit(0) # run function and exit
print('Main process exiting.') # parent need not wait

When run, this script starts 5 processes immediately and exits. All 5 forked processes check in
with their first count display one second later and every second thereafter. Notice that child
processes continue to run, even if the parent process that created them terminates:
[C:\...\PP4E\System\Processes]$ python fork-count.py
Process 4556 spawned
Process 3724 spawned
Process 6360 spawned
Process 6476 spawned
Process 6684 spawned
Main process exiting.
[4556] => 0
[3724] => 0
[6360] => 0
[6476] => 0
[6684] => 0
[4556] => 1
[3724] => 1
[6360] => 1
[6476] => 1
[6684] => 1
[4556] => 2
[3724] => 2
[6360] => 2
[6476] => 2
[6684] => 2
 The output of all of these processes shows up on the same screen, because all of them
share the standard output stream (and a system prompt may show up along the way, too).
 Technically, a forked process gets a copy of the original process’s global memory,
including open file descriptors.
 Because of that, global objects like files start out with the same values in a child process,
so all the processes here are tied to the same single stream.
 But it’s important to remember that global memory is copied, not shared; if a child
process changes a global object, it changes only its own copy. (As we’ll see, this works
differently in threads, the topic of the next section.)
THE FORK/EXEC COMBINATION
 In Examples 5-1 and 5-2, child processes simply ran a function within the Python
program and then exited. On Unix-like platforms, forks are often the basis of starting
independently running programs that are completely different from the program that
performed the fork call.
 For instance, Example 5-3 forks new processes until we type q again, but child processes
run a brand-new program instead of calling a function in the same file.
 Example 5-3. PP4E\System\Processes\fork-exec.py
"starts programs until you type 'q'"
import os
parm = 0
while True:
parm += 1
pid = os.fork()
if pid == 0: # copy process
os.execlp('python', 'python', 'child.py', str(parm)) # overlay program
assert False, 'error starting program' # shouldn't return
else:
print('Child is', pid)
if input() == 'q': break
 If you’ve done much Unix development, the fork/exec combination will probably look
familiar. The main thing to notice is the os.execlp call in this code. In a nutshell, this call
replaces (overlays) the program running in the current process with a brand new program.
 Because of that, the combination of os.fork and os.execlp means start a new process and
run a new program in that process—in other words, launch a new program in parallel
with the original program.
 os.exec call formats
 The arguments to os.execlp specify the program to be run by giving command-line
arguments used to start the program (i.e., what Python scripts know as sys.argv).
 If successful, the new program begins running and the call to os.execlp itself never
returns (since the original program has been replaced, there’s really nothing to return to).
 If the call does return, an error has occurred, so we code an assert after it that will always
raise an exception if reached.
 There are a handful of os.exec variants in the Python standard library; some allow us to
configure environment variables for the new program, pass command-line arguments in
different forms, and so on.
 All are available on both Unix and Windows, and they replace the calling program (i.e.,
the Python interpreter). exec comes in eight flavors, which can be a bit confusing unless
you generalize:
os.execv(program, commandlinesequence)
 The basic “v” exec form is passed an executable program’s name, along with a list or
tuple of command-line argument strings used to run the executable (that is, the words you
would normally type in a shell to start a program).
os.execl(program, cmdarg1, cmdarg2,... cmdargN)
 The basic “l” exec form is passed an executable’s name, followed by one or more
command-line arguments passed as individual function arguments. This is the same
as os.execv(program, (cmdarg1, cmdarg2,...)).
os.execlp
os.execvp
Adding the letter p to the execv and execl names means that Python will locate the executable’s
directory using your system search-path setting (i.e., PATH).
os.execle
os.execve
Adding a letter e to the execv and execl names means an extra, last argument is a dictionary
containing shell environment variables to send to the program.
os.execvpe
os.execlpe
Adding the letters p and e to the basic exec names means to use the search path and to accept a
shell environment settings dictionary.
 So when the script in Example 5-3 calls os.execlp, individually passed parameters specify
a command line for the program to be run on, and the word python maps to an executable
file according to the underlying system search-path setting environment variable (PATH).
 It’s as if we were running a command of the form python child.py 1 in a shell, but with a
different command-line argument on the end each time.
 Spawned child program
Just as when typed at a shell, the string of arguments passed to os.execlp by the fork-exec script
in Example 5-3 starts another Python program file, as shown in Example 5-4.
Example 5-4. PP4E\System\Processes\child.py
import os, sys
print('Hello from child', os.getpid(), sys.argv[1])
 Here is this code in action on Linux. It doesn’t look much different from the
original fork1.py, but it’s really running a new program in each forked process.
 More observant readers may notice that the child process ID displayed is the same in the
parent program and the launched child.py program; os.execlp simply overlays a program
in the same process:
[C:\...\PP4E\System\Processes]$ python fork-exec.py
Child is 4556
Hello from child 4556 1

Child is 5920
Hello from child 5920 2

Child is 316
Hello from child 316 3
q

Threading and Queue

What Is a Thread?

A thread is a separate flow of execution. This means that your program will have two things
happening at once. But for most Python 3 implementations the different threads do not actually
execute at the same time: they merely appear to.

Starting a Thread
Now that you’ve got an idea of what a thread is, let’s learn how to make one. The Python
standard library provides threading, which contains most of the primitives you’ll see in this
article. Thread, in this module, nicely encapsulates threads, providing a clean interface to work
with them.

To spawn another thread, you need to call following method available in thread module −
thread.start_new_thread ( function, args[, kwargs] )
This method call enables a fast and efficient way to create new threads in both Linux and
Windows.
The method call returns immediately and the child thread starts and calls function with the
passed list of args. When function returns, the thread terminates.
Here, args is a tuple of arguments; use an empty tuple to call function without passing any
arguments. kwargs is an optional dictionary of keyword arguments.
Example
#!/usr/bin/python

import thread
import time

# Define a function for the thread


def print_time( threadName, delay):
count = 0
while count < 5:
time.sleep(delay)
count += 1
print "%s: %s" % ( threadName, time.ctime(time.time()) )

# Create two threads as follows


try:
thread.start_new_thread( print_time, ("Thread-1", 2, ) )
thread.start_new_thread( print_time, ("Thread-2", 4, ) )
except:
print "Error: unable to start thread"

while 1:
pass
When the above code is executed, it produces the following result −
Thread-1: Thu Jan 22 15:42:17 2009
Thread-1: Thu Jan 22 15:42:19 2009
Thread-2: Thu Jan 22 15:42:19 2009
Thread-1: Thu Jan 22 15:42:21 2009
Thread-2: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:23 2009
Thread-1: Thu Jan 22 15:42:25 2009
Thread-2: Thu Jan 22 15:42:27 2009
Thread-2: Thu Jan 22 15:42:31 2009
Thread-2: Thu Jan 22 15:42:35 2009
Although it is very effective for low-level threading, but the thread module is very limited
compared to the newer threading module.
The Threading Module
The newer threading module included with Python 2.4 provides much more powerful, high-
level support for threads than the thread module discussed in the previous section.
The threading module exposes all the methods of the thread module and provides some
additional methods −
 threading.activeCount() − Returns the number of thread objects that are active.
 threading.currentThread() − Returns the number of thread objects in the caller's thread
control.
 threading.enumerate() − Returns a list of all thread objects that are currently active.

In addition to the methods, the threading module has the Thread class that implements
threading. The methods provided by the Thread class are as follows −
 run() − The run() method is the entry point for a thread.
 start() − The start() method starts a thread by calling the run method.
 join([time]) − The join() waits for threads to terminate.
 isAlive() − The isAlive() method checks whether a thread is still executing.
 getName() − The getName() method returns the name of a thread.
 setName() − The setName() method sets the name of a thread.
Creating Thread Using Threading Module
To implement a new thread using the threading module, you have to do the following −
 Define a new subclass of the Thread class.
 Override the __init__(self [,args]) method to add additional arguments.
 Then, override the run(self [,args]) method to implement what the thread should do when
started.
Once you have created the new Thread subclass, you can create an instance of it and then start a
new thread by invoking the start(), which in turn calls run() method.
Example
#!/usr/bin/python

import threading
import time

exitFlag = 0

class myThread (threading.Thread):


def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print "Starting " + self.name
print_time(self.name, 5, self.counter)
print "Exiting " + self.name

def print_time(threadName, counter, delay):


while counter:
if exitFlag:
threadName.exit()
time.sleep(delay)
print "%s: %s" % (threadName, time.ctime(time.time()))
counter -= 1

# Create new threads


thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

# Start new Threads


thread1.start()
thread2.start()

print "Exiting Main Thread"


When the above code is executed, it produces the following result −
Starting Thread-1
Starting Thread-2
Exiting Main Thread
Thread-1: Thu Mar 21 09:10:03 2013
Thread-1: Thu Mar 21 09:10:04 2013
Thread-2: Thu Mar 21 09:10:04 2013
Thread-1: Thu Mar 21 09:10:05 2013
Thread-1: Thu Mar 21 09:10:06 2013
Thread-2: Thu Mar 21 09:10:06 2013
Thread-1: Thu Mar 21 09:10:07 2013
Exiting Thread-1
Thread-2: Thu Mar 21 09:10:08 2013
Thread-2: Thu Mar 21 09:10:10 2013
Thread-2: Thu Mar 21 09:10:12 2013
Exiting Thread-2

Thread Objects

The simplest way to use a Thread is to instantiate it with a target function and call start() to let
it begin working.
import threading

def worker():
"""thread worker function"""
print 'Worker'
return

threads = []
for i in range(5):
t = threading.Thread(target=worker)
threads.append(t)
t.start()

The output is five lines with "Worker" on each:

$ python threading_simple.py

Worker
Worker
Worker
Worker
Worker

It useful to be able to spawn a thread and pass it arguments to tell it what work to do. This
example passes a number, which the thread then prints.

import threading

def worker(num):
"""thread worker function"""
print 'Worker: %s' % num
return

threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()

The integer argument is now included in the message printed by each thread:

$ python -u threading_simpleargs.py

Worker: 0
Worker: 1
Worker: 2
Worker: 3
Worker: 4

Determining the Current Thread

Using arguments to identify or name the thread is cumbersome, and unnecessary.


Each Thread instance has a name with a default value that can be changed as the thread is
created. Naming threads is useful in server processes with multiple service threads handling
different operations.

import threading
import time

def worker():
print threading.currentThread().getName(), 'Starting'
time.sleep(2)
print threading.currentThread().getName(), 'Exiting'

def my_service():
print threading.currentThread().getName(), 'Starting'
time.sleep(3)
print threading.currentThread().getName(), 'Exiting'

t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name

w.start()
w2.start()
t.start()

The debug output includes the name of the current thread on each line. The lines with "Thread-
1" in the thread name column correspond to the unnamed thread w2.

$ python -u threading_names.py

worker Thread-1 Starting


my_service Starting
Starting
Thread-1worker Exiting
Exiting
my_service Exiting
Most programs do not use print to debug. The logging module supports embedding the thread
name in every log message using the formatter code %(threadName)s. Including thread names in
log messages makes it easier to trace those messages back to their source.

The logging module defines a standard API for reporting errors and status information from
applications and libraries. The key benefit of having the logging API provided by a standard
library module is that all Python modules can participate in logging, so an application’s log can
include messages from third-party modules.

Logging in Applications

There are two perspectives for examining logging. Application developers set up
the logging module, directing the messages to appropriate output channels. It is possible to log
messages with different verbosity levels or to different destinations. Handlers for writing log
messages to files, HTTP GET/POST locations, email via SMTP, generic sockets, or OS-specific
logging mechanisms are all included, and it is possible to create custom log destination classes
for special requirements not handled by any of the built-in classes.

Logging to a File

Most applications are probably going to want to log to a file. Use the basicConfig() function to
set up the default handler so that debug messages are written to a file.

import logging
import threading
import time

logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',
)

def worker():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')

def my_service():
logging.debug('Starting')
time.sleep(3)
logging.debug('Exiting')

t = threading.Thread(name='my_service', target=my_service)
w = threading.Thread(name='worker', target=worker)
w2 = threading.Thread(target=worker) # use default name

w.start()
w2.start()
t.start()

logging is also thread-safe, so messages from different threads are kept distinct in the output.

$ python threading_names_log.py

[DEBUG] (worker ) Starting


[DEBUG] (Thread-1 ) Starting
[DEBUG] (my_service) Starting
[DEBUG] (worker ) Exiting
[DEBUG] (Thread-1 ) Exiting
[DEBUG] (my_service) Exiting

Daemon Threads
In computer science, a daemon is a process that runs in the background.

Python threading has a more specific meaning for daemon. A daemon thread will shut down
immediately when the program exits. One way to think about these definitions is to consider
the daemon thread a thread that runs in the background without worrying about shutting it down.

If a program is running Threads that are not daemons, then the program will wait for those threads
to complete before it terminates. Threads that are daemons, however, are just killed wherever they
are when the program is exiting.

Let’s look a little more closely at the output of your program above. The last two lines are the
interesting bit. When you run the program, you’ll notice that there is a pause (of about 2 seconds)
after __main__ has printed its all done message and before the thread is finished.

This pause is Python waiting for the non-daemonic thread to complete. When your Python
program ends, part of the shutdown process is to clean up the threading routine.

If you look at the source for Python threading, you’ll see that threading._shutdown() walks through all
of the running threads and calls .join() on every one that does not have the daemon flag set.

So your program waits to exit because the thread itself is waiting in a sleep. As soon as it has
completed and printed the message, .join() will return and the program can exit.

Frequently, this behavior is what you want, but there are other options available to us. Let’s first
repeat the program with a daemon thread. You do that by changing how you construct the Thread,
adding the daemon=True flag
The default is for threads to not be daemons, so passing True turns the daemon mode on.

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)s) %(message)s',
)

def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

Notice that the output does not include the "Exiting" message from the daemon thread, since all
of the non-daemon threads (including the main thread) exit before the daemon thread wakes up
from its two second sleep.

$ python threading_daemon.py

(daemon ) Starting
(non-daemon) Starting
(non-daemon) Exiting

join() a Thread
Daemon threads are handy, but what about when you want to wait for a thread to stop? What
about when you want to do that and not exit your program? Now let’s go back to your original
program and look at that commented out line twenty:
# x.join()
To tell one thread to wait for another thread to finish, you call .join(). If you uncomment that line,
the main thread will pause and wait for the thread x to complete running.

Did you test this on the code with the daemon thread or the regular thread? It turns out that it
doesn’t matter. If you .join() a thread, that statement will wait until either kind of thread is
finished.

To wait until a daemon thread has completed its work, use the join() method.

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
format='(%(threadName)s) %(message)s',
)

def daemon():
logging.debug('Starting')
time.sleep(2)
logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
logging.debug('Starting')
logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

d.join()
t.join()

Waiting for the daemon thread to exit using join() means it has a chance to produce
its "Exiting" message.

$ python threading_daemon_join.py

(daemon ) Starting
(non-daemon) Starting
(non-daemon) Exiting
(daemon ) Exiting

Working With Many Threads


The example code so far has only been working with two threads: the main thread and one you
started with the threading.Thread object.

Frequently, you’ll want to start a number of threads and have them do interesting work. Let’s
start by looking at the harder way of doing that, and then you’ll move on to an easier method.

Using a ThreadPoolExecutor
There’s an easier way to start up a group of threads than the one you saw above. It’s called
a ThreadPoolExecutor, and it’s part of the standard library in concurrent.futures (as of Python 3.2).

The easiest way to create it is as a context manager, using the with statement to manage the
creation and destruction of the pool.

Race Conditions
Before you move on to some of the other features tucked away in Python threading, let’s talk a bit
about one of the more difficult issues you’ll run into when writing threaded programs: race
conditions.

Once you’ve seen what a race condition is and looked at one happening, you’ll move on to some
of the primitives provided by the standard library to prevent race conditions from happening.

Race conditions can occur when two or more threads access a shared piece of data or resource. In
this example, you’re going to create a large race condition that happens every time, but be aware
that most race conditions are not this obvious. Frequently, they only occur rarely, and they can
produce confusing results. As you can imagine, this makes them quite difficult to debug.

Queue – A thread-safe FIFO implementation

The Queue module provides a FIFO implementation suitable for multi-threaded programming. It
can be used to pass messages or other data between producer and consumer threads safely.
Locking is handled for the caller, so it is simple to have as many threads as you want working
with the same Queue instance. A Queue’s size (number of elements) may be restricted to throttle
memory usage or processing.
Basic FIFO Queue

The Queue class implements a basic first-in, first-out container. Elements are added to one “end”
of the sequence using put(), and removed from the other end using get().

import Queue

q = Queue.Queue()

for i in range(5):
q.put(i)

while not q.empty():


print q.get()

This example uses a single thread to illustrate that elements are removed from the queue in the
same order they are inserted.

$ python Queue_fifo.py

0
1
2
3
4

LIFO Queue

In contrast to the standard FIFO implementation of Queue, the LifoQueue uses last-in, first-out
ordering (normally associated with a stack data structure).

import Queue

q = Queue.LifoQueue()

for i in range(5):
q.put(i)

while not q.empty():


print q.get()

The item most recently put() into the queue is removed by get().
$ python Queue_lifo.py

4
3
2
1
0

Multithreaded Priority Queue


The Queue module allows you to create a new queue object that can hold a specific number of
items. There are following methods to control the Queue −
 get() − The get() removes and returns an item from the queue.
 put() − The put adds item to a queue.
 qsize() − The qsize() returns the number of items that are currently in the queue.
 empty() − The empty( ) returns True if queue is empty; otherwise, False.
 full() − the full() returns True if queue is full; otherwise, False.
Example
#!/usr/bin/python

import Queue
import threading
import time

exitFlag = 0

class myThread (threading.Thread):


def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print "Starting " + self.name
process_data(self.name, self.q)
print "Exiting " + self.name

def process_data(threadName, q):


while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
queueLock.release()
print "%s processing %s" % (threadName, data)
else:
queueLock.release()
time.sleep(1)

threadList = ["Thread-1", "Thread-2", "Thread-3"]


nameList = ["One", "Two", "Three", "Four", "Five"]
queueLock = threading.Lock()
workQueue = Queue.Queue(10)
threads = []
threadID = 1

# Create new threads


for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1

# Fill the queue


queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()

# Wait for queue to empty


while not workQueue.empty():
pass

# Notify threads it's time to exit


exitFlag = 1

# Wait for all threads to complete


for t in threads:
t.join()
print "Exiting Main Thread"
When the above code is executed, it produces the following result −
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing One
Thread-2 processing Two
Thread-3 processing Three
Thread-1 processing Four
Thread-2 processing Five
Exiting Thread-3
Exiting Thread-1
Exiting Thread-2
Exiting Main Thread

Program Exits

The functions quit(), exit(), sys.exit() and os._exit() have almost same functionality as they raise
the SystemExit exception by which the Python interpreter exits and no stack traceback is printed.
We can catch the exception to intercept early exits and perform cleanup activities; if uncaught,
the interpreter exits as usual.

When we run a program in Python, we simply execute all the code in file, from top to bottom.
Scripts normally exit when the interpreter reaches the end of the file, but we may also call for the
program to exit explicitly with the built-in exit functions.

1. The quit function is used to raise the SystemExit exception and it gives you a message:

>>> print (quit) Use quit() or use Ctrl-Z+ Return to quit >>>

This is for beginners trying to learn python. But you need to remember that you must not use the
quit function in production code.
2. The exit function is more like a synonym for the quit function. Like the quit function, the exit
function is also used to make python more user-friendly and it too does display a message:

>>> print (exit) Use exit() or use Ctrl-Z+Return to quit >>>

And also it must not be used in production code.


3. The sys.exit function is used to raise the SystemExit exception in background. This function
has an advantage over the other two functions as it can be used in production code.
4. The os.exit function is used to exit the program without calling flushing stdio buffers, cleanup
handlers, etc. Hence, it is only used in special cases.
Now, we see that all of the four methods are used to exit the program but you can't use the first
two in production code and the last method is non-standard.
So mostly we use sys.exit function.

os._exit():
Exit the process without calling the cleanup handlers.

exit(0):
a clean exit without any errors / problems.

exit(1):
When there was some issue / error / problem and this is only reason to exit the program.

sys.exit():
When the system and python shuts down; it means less memory is being used after the program
is run.

quit():
Closes the python file.

System interfaces by focusing on tools and techniques


An interface is nothing but an abstract class which can contains only abstract methods.

Interfaces in Python are handled differently than in most other languages, and they can vary in
their design complexity.

At a high level, an interface acts as a blueprint for designing classes. Like classes, interfaces
define methods. Unlike classes, these methods are abstract. An abstract method is one that the
interface simply defines. It doesn’t implement the methods. This is done by classes, which
then implement the interface and give concrete meaning to the interface’s abstract methods.

Python’s approach to interface design is somewhat different when compared to languages


like Java, Go, and C++. These languages all have an interface keyword, while Python does not
and It doesn’t require the class that’s implementing the interface to define all of the interface’s
abstract methods.

There are two ways in python to create and implement the interface, which are –

 Informal Interfaces
 Formal Interfaces

1. Informal Interfaces
 python informal interface is also a class that defines methods that can be overridden but
without force enforcement.
 An informal interface also called Protocols or Duck Typing. The duck typing is actually
we execute a method on the object as we expected an object to have, instead of checking
the type of an object.
 An informal interface in python is termed as a protocol because it is informal and cannot
be formally enforced. It is mostly defined by templates or demonstrates in the
documentations.

2. Formal Interfaces

A formal Interface is an interface which enforced formally. In some situations, the protocols or
duck typing creates confusion, like consider the example we have two classes FourWheelVehicle
and TwoWheelVehicle both have a method SpeedUp( ), so the object of both class can speedup,
but both objects are not the same even if both classes implement the same interface. So to resolve
this confusion, we can use the formal interface. To create a formal interface, we need to use
ABCs (Abstract Base Classes).

An ABC is simple as an interface or base classes define as an abstract class in nature and the
abstract class contains some methods as abstract. Next, if any classes or objects implement or
drive from these base classes, then these bases classes forced to implements all those methods.
Note that the interface cannot be instantiated, which means that we cannot create the object of
the interface. So we use a base class to create an object, and we can say that the object
implements an interface. And we will use the type function to confirm that the object implements
a particular interface or not.

Program: Interface having two abstract methods and one sub class

from abc import ABC, abstractmethod

class Bank(ABC):

@abstractmethod

def balance_check(self):
pass

@abstractmethod

def interest(self):

pass

class SBI(Bank):

def balance_check(self):

print("Balance is 100 rupees")

def interest(self):

print("SBI interest is 5 rupees")

s = SBI()

s.balance_check()

s.interest()

Output:

When should we go for interfaces?


Since, Interfaces will not contain implemented methods, when we don‟t know anything about
implementation of requirements, then we should go for interfaces.

When should we go for abstract class?


An abstract class is a class which can contains few implemented methods and few
unimplemented methods as well. When we know about requirements partially, but not
completely, then we should go for abstract class.

When should we go for concrete class?


Concrete class is a class which is fully implemented. It contains only implemented methods.
When we know complete implementation about requirements, then we should go for concrete
class.
ABCs and Virtual Subclass

We can also register a class as a virtual subclass of an ABC. In that case, even if that class
doesn’t subclass our ABC, it will still be treated as a subclass of the ABC (and thus accepted to
have implemented the interface). Example codes will be able to demonstrate this better:

@Bird.register

class Robin:

pass

r = Robin()

And then:

>>> issubclass(Robin, Bird)

True

>>> isinstance(r, Bird)

True

>>>

In this case, even if Robin does not subclass our ABC or define the abstract method, we
can register it as a Bird. issubclass and isinstance behavior can be overloaded by adding two
relevant magic methods.
Binary files, tree walkers:
Trees are non-linear data structures that represent nodes connected by edges. Each tree consists
of a root node as the Parent node, and the left node and right node as Child nodes.
Binary tree
A tree whose elements have at most two children is called a binary tree. Each element in a binary
tree can have only two children. A node’s left child must have a value less than its parent’s
value, and the node’s right child must have a value greater than its parent value.
2714351093142
Implementation
Here we have created a node class and assigned a value to the node.
 python

# node class
class Node:
def __init__(self, data):
# left child
self.left = None
# right child
self.right = None
# node's value
self.data = data
# print function
def PrintTree(self):
print(self.data)
root = Node(27)
root.PrintTree()

Run
The above code will create node 27 as parent node.
Insertion
The insert method compares the value of the node to the parent node and decides whether to add
it as a left node or right node.
Remember: if the node is greater than the parent node, it is inserted as a right
node; otherwise, it’s inserted left.
Finally, the PrintTree method is used to print the tree.
 Python
def __init__(self, data):
self.left = None
self.right = None
self.data = data
def insert(self, data):
# Compare the new value with the parent node
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
# Print the tree
def PrintTree(self):
if self.left:
self.left.PrintTree()
print( self.data),
if self.right:
self.right.PrintTree()

Run
The above code will create root node as 27, left child as 14, and right child as 35.
Searching
While searching for a value in the tree, we need to traverse the node from left to right and with a
parent.
 Python
def __init__(self, data):
self.left = None
self.right = None
self.data = data
# Insert method to create nodes
def insert(self, data):
if self.data:
if data < self.data:
if self.left is None:
self.left = Node(data)
else:
self.left.insert(data)
elif data > self.data:
if self.right is None:
self.right = Node(data)
else:
self.right.insert(data)
else:
self.data = data
# findval method to compare the value with nodes
def findval(self, lkpval):
if lkpval < self.data:
if self.left is None:
return str(lkpval)+" is not Found"
return self.left.findval(lkpval)

Python’s library support for running programs in parallel

Parallel processing can increase the number of tasks done by your program which reduces the
overall processing time. These help to handle large scale problems.
In this section we will cover the following topics:
 Introduction to parallel processing
 Multi Processing Python library for parallel processing
 IPython parallel framework
Introduction to parallel processing
For parallelism, it is important to divide the problem into sub-units that do not depend on other
sub-units (or less dependent). A problem where the sub-units are totally independent of other
sub-units is called embarrassingly parallel.

For example, An element-wise operation on an array. In this case, the operation needs to aware
of the particular element it is handling at the moment.

In another scenario, a problem which is divided into sub-units have to share some data to
perform operations. These results in the performance issue because of the communication cost.

There are two main ways to handle parallel programs:


 Shared Memory
In shared memory, the sub-units can communicate with each other through the same memory
space. The advantage is that you don’t need to handle the communication explicitly because
this approach is sufficient to read or write from the shared memory. But the problem arises
when multiple process access and change the same memory location at the same time. This
conflict can be avoided using synchronization techniques.

 Distributed memory
In distributed memory, each process is totally separated and has its own memory space. In this
scenario, communication is handled explicitly between the processes. Since the communication
happens through a network interface, it is costlier compared to shared memory.
Threads are one of the ways to achieve parallelism with shared memory. These are the
independent sub-tasks that originate from a process and share memory. Due to Global
Interpreter Lock (GIL) , threads can’t be used to increase performance in Python. GIL is a
mechanism in which Python interpreter design allow only one Python instruction to run at a
time. GIL limitation can be completely avoided by using processes instead of thread. Using
processes have few disadvantages such as less efficient inter-process communication than
shared memory, but it is more flexible and explicit.
Multiprocessing for parallel processing
Using the standard multiprocessing module, we can efficiently parallelize simple tasks by
creating child processes. This module provides an easy-to-use interface and contains a set of
utilities to handle task submission and synchronization.
Process and Pool Class
Process
By subclassing multiprocessing.process, you can create a process that runs independently. By
extending the __init__ method you can initialize resource and by
implementing Process.run() method you can write the code for the subprocess. In the below code,
we see how to create a process which prints the assigned id:

To spawn the process, we need to initialize our Process object and invoke Process.start() method.
Here Process.start() will create a new process and will invoke the Process.run() method.

The code after p.start() will be executed immediately before the task completion of process p. To
wait for the task completion, you can use Process.join().

Here’s the full code:

import multiprocessing
import time

class Process(multiprocessing.Process):
def __init__(self, id):
super(Process, self).__init__()
self.id = id

def run(self):
time.sleep(1)
print("I'm the process with id: {}".format(self.id))

if __name__ == '__main__':
p = Process(0)
p.start()
p.join()
p = Process(1)
p.start()
p.join()

Output:

Pool class
Pool class can be used for parallel execution of a function for different input data.
The multiprocessing.Pool() class spawns a set of processes called workers and can submit tasks
using the methods apply/apply_async and map/map_async . For parallel mapping, you should first
initialize a multiprocessing.Pool() object. The first argument is the number of workers; if not
given, that number will be equal to the number of cores in the system.
Let see by an example. In this example, we will see how to pass a function which computes the
square of a number. Using Pool.map() you can map the function to the list and passing the
function and the list of inputs as arguments, as follows:

import multiprocessing
import time

def square(x):
return x * x

if __name__ == '__main__':
pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=4)
inputs = [0,1,2,3,4]
outputs = pool.map(square, inputs)
print("Input: {}".format(inputs))
print("Output: {}".format(outputs))

Output:

When we use the normal map method, the execution of the program is stopped until all the
workers completed the task. Using map_async(), the AsyncResult object is returned immediately
without stopping the main program and the task is done in the background. The result can be
retrieved by using the AsyncResult.get() method at any time as shown below:

import multiprocessing
import time
def square(x):
return x * x
if __name__ == '__main__':
pool = multiprocessing.Pool()
inputs = [0,1,2,3,4]
outputs_async = pool.map_async(square, inputs)
outputs = outputs_async.get()
print("Output: {}".format(outputs))

Output:

Pool.apply_async assigns a task consisting of a single function to one of the workers. It takes the
function and its arguments and returns an AsyncResult object.

import multiprocessing
import time
def square(x):
return x * x

if __name__ == '__main__':
pool = multiprocessing.Pool()
result_async = [pool.apply_async(square, args = (i, )) for i in
range(10)]
results = [r.get() for r in result_async]
print("Output: {}".format(results))

Output:

IPython Parallel Framework


 IPython parallel package provides a framework to set up and execute a task on single,
multi-core machines and multiple nodes connected to a network.
 In IPython.parallel, you have to start a set of workers called Engines which are
managed by the Controller. A controller is an entity that helps in communication
between the client and engine. In this approach, the worker processes are started
separately, and they will wait for the commands from the client indefinitely.
 Ipcluster shell commands are used to start the controller and engines.
$ ipcluster start
After the above process, we can use an IPython shell to perform task in parallel. IPython comes
with two basic interfaces:
 Direct Interface
 Task-based Interface
Direct Interface
Direct Interface allows you to send commands explicitly to each of the computing units. This is
flexible and easy to use. To interact with units, you need to start the engine and then an
IPython session in a separate shell. You can establish a connection to the controller by creating
a client. In the below code, we import the Client class and create an instance:
from IPython.parallel import Client
rc = Client()
rc.ids
Here, Client.ids will give list of integers which give details of available engines.
Using Direct View instance, you can issue commands to the engine. Two ways we can get a
direct view instance:
 By indexing the client instance
dview = rc[0]

 By calling the DirectView.direct_view method


dview = rc.direct_view(‘all’).

As a final step, you can execute commands by using the DirectView.execute method.
dview.execute(‘ a = 1 ’)
The above command will be executed individually by each engine. Using the get method you
can get the result in the form of an AsyncResult object.
dview.pull(‘ a ‘).get()
dview.push({‘ a ’ : 2})
As shown above, you can retrieve the data by using the DirectView.pull method and send the data
by using the DirectView.push method.
Task-based interface
The task-based interface provides a smart way to handle computing tasks. From the user point
of view, this has a less flexible interface but it is efficient in load balancing on the engines and
can resubmit the failed jobs thereby increasing the performance.
LoadBalanceView class provides the task-based interface using load_balanced_view method.
from IPython.parallel import Client
rc = Client()
tview = rc.load_balanced_view()
Using the map and apply method we can run some tasks. In LoadBalanceView the task
assignment depends upon how much load is present on an engine at the time. This ensures that
all engines work without downtime.

You might also like