DataScienceWithPython Ed2018

44444444444.
EUGENIA BAHIT
DATA SCIENCE
WITH PYTHON
STUDY MATERIAL
Information and registration:

Course: http://escuela.eugeniabahit.com | Certifications: http://python.laeci.org
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit
SUMMARY
VARIABLE MANIPULATION METHODS...........................................................................................................5
STRING MANIPULATION.....................................................................................................................................5
FORMATTING METHODS.................................................................................................................................5
CAPITALIZE THE FIRST LETTER................................................................................................................5
CONVERT A STRING TO LOWERCASE.....................................................................................................5
CONVERT A STRING TO UPPERCASE.......................................................................................................6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA...............................................................6
CONVERT A STRING TO TITLE FORMAT.................................................................................................6
CENTER A TEXT.............................................................................................................................................6
ALIGN TEXT TO THE LEFT..........................................................................................................................6
ALIGN TEXT TO THE RIGHT.......................................................................................................................7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS....................................................................................7
RESEARCH METHODS......................................................................................................................................7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING......................................................................7
SEARCH FOR A SUBSTRING WITHIN A STRING.....................................................................................7
VALIDATION METHODS..................................................................................................................................8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING.............................................................8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING.................................................................8
TO KNOW IF A STRING IS ALPHANUMERIC...........................................................................................8
TO KNOW IF A STRING IS ALPHABETIC..................................................................................................8
TO KNOW IF A STRING IS NUMERIC.........................................................................................................9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS....................................................9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS......................................................9
TO KNOW IF A STRING CONTAINS ONLY BLANKS............................................................................10
TO KNOW IF A STRING HAS A TITLE FORMAT....................................................................................10
SUBSTITUTION METHODS............................................................................................................................10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT...................................................10
REPLACE TEXT IN A STRING....................................................................................................................11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING...................................................11
REMOVE CHARACTERS TO THE LEFT OF A STRING..........................................................................11
REMOVE CHARACTERS TO THE RIGHT OF A STRING.......................................................................11
JOINING AND SPLITTING METHODS..........................................................................................................11
ITERATIVELY JOIN A CHAIN....................................................................................................................11
-2-
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

4.0
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR..................................................12

SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR............................................12
SPLIT A STRING INTO LINES.....................................................................................................................12
MANIPULATION OF LISTS AND TUPLES.......................................................................................................14
AGGREGATION METHODS............................................................................................................................14
ADD AN ITEM TO THE END OF THE LIST...............................................................................................14
ADD SEVERAL ITEMS TO THE END OF THE LIST................................................................................14
ADD AN ELEMENT IN A GIVEN POSITION.............................................................................................14
ELIMINATION METHODS..............................................................................................................................14
DELETE THE LAST ITEM IN THE LIST....................................................................................................14
DELETE AN ELEMENT BY ITS INDEX.....................................................................................................15
DELETE AN ITEM BY ITS VALUE.............................................................................................................15
ORDER METHODS...........................................................................................................................................15
SORT A LIST IN REVERSE (REVERSE ORDER)......................................................................................15
SORT A LIST IN ASCENDING ORDER......................................................................................................15
SORT A LIST IN DESCENDING ORDER....................................................................................................15
RESEARCH METHODS....................................................................................................................................15
COUNT NUMBER OF OCCURRENCES ELEMENTS...............................................................................15
GET INDEX NUMBER..................................................................................................................................16
ANNEX ON LISTS AND TUPLES...................................................................................................................16
TYPE CONVERSION.....................................................................................................................................16
CONCATENATION OF COLLECTIONS.....................................................................................................17
MAXIMUM AND MINIMUM VALUE........................................................................................................20
COUNT ITEMS...............................................................................................................................................20
DICTIONARY MANIPULATION........................................................................................................................22
ELIMINATION METHODS..............................................................................................................................22
EMPTY A DICTIONARY..............................................................................................................................22
AGGREGATION AND CREATION METHODS.............................................................................................22
COPY A DICTIONARY.................................................................................................................................22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY.......................23
SEQUENCE.....................................................................................................................................................23
CONCATENATE DICTIONARIES...............................................................................................................23
SET A DEFAULT KEY AND VALUE..........................................................................................................23
RETURN METHODS.........................................................................................................................................24
GET THE VALUE OF A KEY.......................................................................................................................24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY...............................................................................24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY........................................................................24
-3-

OBTAIN THE KEYS TO A DICTIONARY..................................................................................................24

GET THE VALUES OF A DICTIONARY....................................................................................................25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY.........................................................................25
FILE HANDLING AND MANIPULATION.........................................................................................................27
WAYS TO OPEN A FILE..................................................................................................................................27
SOME METHODS OF THE FILE OBJECT......................................................................................................29
CSV FILE HANDLING.........................................................................................................................................30
SOME EXAMPLES OF CSV FILES.................................................................................................................30
WORKING WITH CSV FILES FROM PYTHON............................................................................................32
READING CSV FILES...................................................................................................................................32
WRITING CSV FILES....................................................................................................................................37
PROBABILITY AND STATISTICS WITH PYTHON.........................................................................................40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON......40
SAMPLE SPACE............................................................................................................................................40
SIMPLE AND COMPOUND EVENTS.........................................................................................................40
PROBABILITY ASSIGNMENT....................................................................................................................41
SIMPLE MUTUALLY EXCLUSIVE EVENTS.........................................................................................41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS...........................................42
FUNCTIONS...................................................................................................................................................43
CONDITIONAL PROBABILITY IN PYTHON................................................................................................43
FUNCTIONS...................................................................................................................................................44
DEPENDENT EVENTS..................................................................................................................................44
SET THEORY IN PYTHON.......................................................................................................................46
INDEPENDENT EVENTS.............................................................................................................................46
BAYES THEOREM IN PYTHON.....................................................................................................................47
BAYES' THEOREM AND PROBABILITY OF CAUSES............................................................................47
DATA: CASE STUDY................................................................................................................................47
ANALYSIS..................................................................................................................................................48
PROCEDURE..............................................................................................................................................49
FUNCTIONS...................................................................................................................................................54
COMPLEMENTARY BIBLIOGRAPHY.......................................................................................................54
ANNEX I: COMPLEX CALCULATIONS............................................................................................................60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF.........................................................60
VARIANCE AND STANDARD DEVIATION.................................................................................................60
SCALAR PRODUCT OF TWO VECTORS......................................................................................................61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS.....................................61
ANNEX II: CREATION OF A MENU OF OPTIONS..........................................................................................63
-4-

VARIABLE MANIPULATION METHODS

In Python, every variable is considered an object. Different types of actions called methods can be performed on
each object. Methods are functions but they are derived from a variable. Therefore, these functions are accessed
using the syntax:
variable.function()
In some cases, these methods (functions of an object) will accept parameters like any other function.
variable.function(parameter)
STRING MANIPULATION
The main methods that can be applied to a text string, organized by category, are described below.
FORMATTING METHODS
CAPITALIZE THE FIRST LETTER

Method: capitalize()
Returns: a copy of the string with the first letter capitalized
> >> string = "welcome to my application".
> >> result = string.capitalize()
> >> result
Welcome to my application
CONVERT A STRING TO LOWERCASE

Method: lower()
Returns: a copy of the string in lowercase letters
> >> string = "Hello World".
> >> string.lower()
-5-

hello world
CONVERT A STRING TO UPPERCASE

Method: upper()
Returns: a copy of the string in uppercase letters
> >> string.upper()
HELLO WORLD
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA

Method: swapcase()
Returns: a copy of the string converted from uppercase to lowercase and vice versa.
> >> string.swapcase()
hOLA mUNDO
CONVERT A STRING TO TITLE FORMAT

Method: title()
Returns: a copy of the converted string
> >> string = "hello world
> >> string.title()
Hello World
CENTER A TEXT
Method: center(length[, "fill character"])
Returns: a copy of the centered string
> >> string = "welcome to my application".capitalize()
> >> string.center(50, "=")
===========Welcome to my application============
> >> string.center(50, " ")

ALIGN TEXT TO THE LEFT

Method: ljust(length[, "fill character"])
Returns: a copy of the left-aligned string
>>> string.ljust(50, "=")
Welcome to my application=======================
-6-

4.0
ALIGN TEXT TO THE RIGHT

Method: rjust(length[, "fill character"])
Returns: a copy of the right-aligned string
>>> string = "welcome to my application".capitalize()
>>> string.rjust(50, "=")
=======================Welcome to my application
>>> string.rjust(50, " ")

FILL IN A TEXT BY PREFIXING IT WITH ZEROS

Method: zfill(length)
Returns: a copy of the string padded with leading zeros until the specified final length is reached
>>> invoice_number = 1575
>>> str(invoice_number).zfill(12)
000000001575
RESEARCH METHODS
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING

Method: count("substring"[, start_position, end_position])
Returns: an integer representing the number of occurrences of substring within string
>>> string.count("a")
3
SEARCH FOR A SUBSTRING WITHIN A STRING

Method: find("substring"[, start_position, end_position])
Returns: an integer representing the position where the substring starts within
chain. If not found, returns -1
>>> string.find("my")
13
>>> string.find("my", 0, 10)
-1
-7-

4.0
VALIDATION METHODS
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING

Method: startswith("substring"[, start_position, end_position])
Returns: True or False
> >> string.startswith("Welcome")
True
> >> string.startswith("application")
False
> >> string.startswith("application", 16)
True
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING

Method: endswith("substring"[, start_position, end_position])
> >> string.endswith("application")
True
> >> string.endswith("Welcome")
False
> >> string.endswith("Welcome", 0, 10)
True
TO KNOW IF A STRING IS ALPHANUMERIC

Method: isalnum()
> >> string = "pepegrillo 75".
> >> string.isalnum()
False
> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
True
> >> string = "pepegrillo75".
True
TO KNOW IF A STRING IS ALPHABETIC

Method: isalpha()
> >> string.isalpha()
False
-8-

4.0
> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
True
False
TO KNOW IF A STRING IS NUMERIC

Method: isdigit()
> >> string.isdigit()
False
> >> string = "7584"
True
> >> string = "75 84"
False
> >> string = "75.84"
False
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS

Method: islower()
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.islower()
True
> >> string = "Jiminy Cricket".
False
> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
False
True
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS

Method: isupper()
> >> string = "PEPE GRILLO".
> >> string.isupper()
True
False
-9-

4.0
> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
False
> >> string = "PEPEGRILLO".
True
TO KNOW IF A STRING CONTAINS ONLY BLANKS

Method: isspace()
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.isspace()
False
> >> string = " "
> >> string.isspace()
True
TO KNOW IF A STRING HAS A TITLE FORMAT

Method: istitle()
> >> string.istitle()
True
> >> string.istitle()
False
SUBSTITUTION METHODS
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT

Method: format(*args, **kwargs)
Returns: the formatted string
> >> string = "welcome to my application {0}"
> >> string.format("in Python")
welcome to my Python application
> >> string = "Gross Amount: ${0} + VAT: ${1} = Net Amount: {2}"
> >> string.format(100, 21, 121)
Gross amount: $100 + VAT: $21 = Net amount: 121
> >> string = "Gross amount: ${gross} + VAT: ${VAT} = Net amount: {net}"
> >> string.format(gross=100, vat=21, net=121)
> >> string.format(gross=100, vat=100 * 21 / 100, net=100 * 21 / 100 + 100)

- 10 -

4.0
REPLACE TEXT IN A STRING

Method: replace("substring to search for", "substring to replace with")
Returns: the replaced string
> >> search = "first name last name
> >> replace_by = "John Smith".
> >> "Dear Mr. first name last name:".replace(search, replace_by) Dear Mr. John Smith:
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING

Method: strip(["character"])
Returns: the substituted string
> >> string = " www.eugeniabahit.com "
> >> string.strip()
www.eugeniabahit.com
> >> string.strip(' ')
REMOVE CHARACTERS TO THE LEFT OF A STRING

Method: lstrip(["character"])
> >> string ="www.eugeniabahit.com"
> >> string.lstrip("w." )
eugeniabahit.com
> >> string = " www.eugeniabahit.com"

> >> string.lstrip()
REMOVE CHARACTERS TO THE RIGHT OF A STRING

Method: rstrip(["character"])
> >> string ="www.eugeniabahit.com "
> >> string.rstrip( )
JOINING AND SPLITTING METHODS
ITERATIVELY JOIN A CHAIN

Method: join(iterable)
Returns: the string joined with the iterable (the string is separated by each of the elements of the iterable).
>>> format_invoice_number = ("No. 0000-0", "-0000 (ID: ", ")")
- 11 -

4.0
>>> number = "275"

> >> invoice_number = number.join(invoice_number_format)
> >> invoice_number
NO. 0000-0275-0000 (ID: 275)
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR

Method: partition("separator")
Returns: a tuple of three elements where the first is the contents of the string before the separator, the second is
the separator itself and the third is the contents of the string after the separator.
> >> tuple = "http://www.eugeniabahit.com".partition("www.")
> >> tuple
('http://', 'www.', 'eugeniabahit.com')
> >> protocol, separator, domain = tuple

>>>> "Protocol: {0}"protocol, domain: {1}".format(protocol, domain) Protocol: http://
Domain: eugeniabahit.com
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR

Method: split("separator")
Returns: a list of all elements found by dividing the string by a separator
>>> keywords = "python, guide, course, tutorial".split(", ")
> >> keywords
['python', 'guide', 'course', 'tutorial' ]
SPLIT A STRING INTO LINES

Method: splitlines()
Returns: a list where each element is a fraction of the string divided into lines.
>>> text = """Line 1
Line 2
Line 3
Line 4 """
> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3', 'Line 4'].
> >> text = "Line 1 Line 2 Line 3".

> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3'].
- 12 -

4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
MANIPULATION OF LISTS AND TUPLES
In this chapter, we will see the methods that the list object has. Some of them are also available for
tuples.
AGGREGATION METHODS
ADD AN ITEM TO THE END OF THE LIST

Method: append("new element")
> >> male_names = ["Alvaro", "Jacinto", "Miguel", "Edgardo", "David"]
> >> male_names.append("Jose")
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose'].
ADD SEVERAL ITEMS TO THE END OF THE LIST

Method: extend(other_list)
> >> male_names.extend(["Jose", "Gerardo"])
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose', 'Gerardo'].
ADD AN ELEMENT IN A GIVEN POSITION

Method: insert(position, "new element")
> >> male_names.insert(0, "Ricky")
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Gerardo'].
ELIMINATION METHODS
DELETE THE LAST ITEM IN THE LIST

Method: pop()
Returns: the deleted element
> >> male_names.pop()
Gerardo
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].

4.0
DELETE AN ELEMENT BY ITS INDEX

Method: pop(index)
Returns: the deleted element
>>> male_names.pop(3)
Edgardo
>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].
DELETE AN ITEM BY ITS VALUE

Method: remove("value")
>>> male_names.remove("Jose")
>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Ricky', 'Jose', 'Jose'].
ORDER METHODS
SORT A LIST IN REVERSE (REVERSE ORDER)

Method: reverse()
>>> male_names.reverse()
>>> male_names
['Jose', 'Jose', 'Ricky', 'Jacinto', 'David', 'Alvaro', 'Ricky'].
SORT A LIST IN ASCENDING ORDER

Method: sort()
>>> male_names.sort()
>>> male_names
['Alvaro', 'David', 'Jacinto', 'Jose', 'Jose', 'Ricky', 'Ricky'].
SORT A LIST IN DESCENDING ORDER

Method: sort(reverse=True)
>>> male_names.sort(reverse=True)
>>> male_names
['Ricky', 'Ricky', 'Jose', 'Jose', 'Jacinto', 'David', 'Alvaro'].
RESEARCH METHODS
COUNT NUMBER OF OCCURRENCES ELEMENTS

Method: count(element)
>>> male_names = ["Alvaro", "Miguel", "Edgardo", "David", "Miguel"].
- 15 -

>>> male_names.count("Miguel") 2 >>> male_names = ("Alvaro", "Miguel", "Edgardo", "David", "Miguel")

>>> male_names.count("Miguel")
2
GET INDEX NUMBER

Method: index(element[, start_index, end_index])
>>> male_names.index("Miguel") 1
>>> male_names.index("Miguel", 2, 5) 4
ANNEX ON LISTS AND TUPLES
TYPE CONVERSION
In the set of Python built-in functions, it is possible to find two functions that allow you to convert lists into
tuples, and vice versa. These functions are list and tuple, to convert tuples to lists and lists to tuples, respectively.
One of the most frequent uses is the conversion of tuples to lists, which need to be modified. This is often the
case with results obtained from a database query.
>>> tuple = (1, 2, 3, 4)

>>> tuple (1, 2, 3, 4)
>>> list(tuple)
[1, 2, 3, 4]
>>> list = [1, 2, 3, 4].

>>> list [1, 2, 3, 4].
>>> tuple(list)
(1, 2, 3, 4)
- 16 -

4.0
CONCATENATION OF COLLECTIONS
You can concatenate (or join) two or more lists or two or more tuples, by means of the addition sign +.
You cannot join a list to a tuple. The collections to be joined must be of the same type.
>>> list1 = [1, 2, 3, 4].

> >> list2 = [3, 4, 5, 6, 7, 8]
> >> list3 = list1 + list2
> >> list3
[1, 2, 3, 4, 3, 4, 5, 6, 7, 8]
VARIABLE MANIPULATION METHODS 5

STRING MANIPULATION 5
FORMATTING METHODS 5
CAPITALIZE THE FIRST LETTER 5
CONVERT A STRING TO LOWERCASE 5
CONVERT A STRING TO UPPERCASE 6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA 6
CONVERT A STRING TO TITLE FORMAT 6
CENTER A TEXT 6
ALIGN TEXT TO THE LEFT 6
ALIGN TEXT TO THE RIGHT 7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS 7
RESEARCH METHODS 7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING 7
SEARCH FOR A SUBSTRING WITHIN A STRING 7
VALIDATION METHODS 8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING IS ALPHANUMERIC 8
TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11
- 17 -

REMOVE CHARACTERS TO THE LEFT OF A STRING 11

REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
ELIMINATION METHODS 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16
TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23
- 18 -

4.0
RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42
FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60
- 19 -

VARIANCE AND STANDARD DEVIATION 60

SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63
>
> >> tuple4 = tuple1 + tuple2 + tuple3
> >> tuple4
(1, 2, 3, 4, 5, 4, 6, 8, 10, 3, 5, 7, 9)
MAXIMUM AND MINIMUM VALUE

The maximum and minimum value of both lists and tuples can be obtained:
> >> max(tuple4)

10
> >> max(tuple1)
5
> >> min(tuple1)
1
> >> max(list3)
8
> >> min(list1)
1
COUNT ITEMS
The len() function is used to count elements in a list or tuple, as well as characters in a text string:
> >> len(list3)

10
> >> len(list1)
4
- 20 -

4.0
DICTIONARY MANIPULATION
ELIMINATION METHODS
EMPTY A DICTIONARY
Method: clear()
>>> dictionary = {"color": "violet", "size": "XS", "price": 174.25}
> >> dictionary
{'color': 'violet', 'price': 174.25, 'size': 'XS'}
> >> dictionary.clear()

> >> dictionary
{}
AGGREGATION AND CREATION METHODS
COPY A DICTIONARY
Method: copy()
> >> dictionary = {"color": "violet",
> >> t-shirt = dictionary.copy()
> >> dictionary
{'color': 'violet', 'price': 174.25,
> >> t-shirt {'color': 'violet', 'price':
> >> dictionary.clear()

"size": "XS", "price": 174.25}
> >> dictionary {}
> >> t-shirt

'size': 'XS'}
{'color': 'violet', 'price':
> >> musculosa = T-shirt

> >> t-shirt 174.25, 'size': 'XS'}
{'color': 'violet', 'price':
> >> muscled {'color': 'violet', 'price':
>>> remera.clear()
>>> T-shirt {} >>> T-shirt {}
174.25, 'size': 'XS'}
174.25, 'size': 'XS'}
174.25, 'size': 'XS'}

4.0
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY

SEQUENCE
Method: dict.fromkeys(sequence[, default value])
>>> sequence = ["color", "size", "brand"]
>>> dictionary1 = dict.fromkeys(sequence)
>>> dictionary1
{'color': None, 'brand': None, 'size': None}
>>> dictionary2 = dict.fromkeys(sequence, 'default value')

>>> dictionary2
{'color': 'default x value', 'brand': 'default x value', 'size': 'default x value'}
CONCATENATE DICTIONARIES
Method: update(dictionary)
>>> dictionary1 = {"color": "green", "price": 45}
>>> dictionary2 = {"size": "M", "brand": "Lacoste"}
>>> dictionary1.update(dictionary2)
>>> dictionary1
{'color': 'green', 'price': 45, 'brand': 'Lacoste', 'size': 'M'}
SET A DEFAULT KEY AND VALUE

Method: setdefault("key"[, None|default_value])
If the key does not exist, it creates it with the default value. Always returns the value for the key passed as
parameter.
>>> t-shirt = {"color": "pink", "brand": "Zara"}

>>> key = remera.setdefault("talle", "U")
> >> key
'U'
> >> t-shirt

{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> t-shirt2 = t-shirt.copy()

> >> t-shirt2
{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> key = remera2.setdefault("estampado")

> >> key
> >> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}
> >> key = t-shirt2.setdefault("brand", "Lacoste")

> >> key
Zara
- 23 -

4.0
>>> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}
RETURN METHODS
GET THE VALUE OF A KEY

Method: get(key[, "default x value if key does not exist"])
>>> t-shirt.get("color")
'pink
>>> remera.get("stock")
>>> t-shirt.get("stock", "no stock")
'out of stock
TO KNOW IF A KEY EXISTS IN THE DICTIONARY

Method: 'key' in dictionary
> >> exists = 'price' in t-shirt
> >> exists
False
> >> exists = 'color' in t-shirt

> >> exists
True
OBTAIN THE KEYS AND VALUES OF A DICTIONARY

Method: items()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
for key, value in dictionary.items():

key, value
Output:
('color', 'pink')
('brand', 'Zara')
('size', 'U')
OBTAIN THE KEYS TO A DICTIONARY

Method: keys()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'} for key in dictionary.keys():
key
'brand
size
color
Get keys in a list
- 24 -

4.0
> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> keys = list(dictionary.keys())
> >> keys
['color', 'brand', 'size' ]
GET THE VALUES OF A DICTIONARY

Method: values()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
for key in dictionary.values():
key
'pink
Zara
'U'
Get values in a list

> >> keys = list(dictionary.values())
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY

To count the elements of a dictionary, as with lists and tuples, the built-in function len() is used.
> >> len(dictionary)
3
- 25 -

4.0
FILE HANDLING AND MANIPULATION
Python allows you to work on two different levels with respect to the file and directory system.
One of them is through the os module, which facilitates the work with the entire file and directory
system, at the level of the Operating System itself.
The second level is the one that allows working with files by manipulating their reading and writing from the
application or script itself, treating each file as an object.
WAYS TO OPEN A FILE

The way a file is opened is related to the final objective that answers the question"what is this file being opened
for? The answers can be several: to read, to write, or to read and write.
Each time a file is "opened" a pointer is created in memory.
This pointer will position a cursor (or access point) at a specific location in memory (more simply put, it will
position the cursor on a specific byte of the file contents).
This cursor will move within the file as the file is read or written to.
When a file is opened in read mode, the cursor is positioned at byte 0 of the file (i.e. at the beginning of the file).
Once the file has been read, the cursor moves to the final byte of the file (equivalent to the total number of bytes
in the file). The same happens when it is opened in write mode. The cursor will move as you type.

4.0
When you want to write to the end of a non-null file, the append mode is used. In this way, the file is opened
with the cursor at the end of the file.
The + symbol as a mode suffix adds the opposite mode to the opening mode once the opening action is executed.
For example, the r (read) mode with the suffix + (r+), opens the file for reading, and after reading, returns the
cursor to byte 0.
The following table shows the different ways of opening a file:

Indicator Opening mode Pointer location
r Read only At the beginning of the file
rb Read only in binary mode At the beginning of the file
r+ Reading and writing At the beginning of the file

rb+ Read and write in binary mode At the beginning of the file
Writing only.
Overwrite the file if it exists.
w At the beginning of the file
Create the file if it does not exist.
Write only in binary mode. Overwrite the file

wb if it exists. Create the file if it does not exist. At the beginning of the file
Writing and reading.

w+ Overwrite the file if it exists. At the beginning of the file
Create the file if it does not exist.
Writing and reading in binary mode.
wb+ Overwrite the file if it exists. Create the file if At the beginning of the file
it does not exist.
If the file exists, at the end of the
Added (add content). file.
a Create the file if it does not exist. If the file does not exist, at the
beginning.
Added in binary mode (add content). file.
ab Create the file if it does not exist. If the file does not exist, at the
beginning.
a+ Added (add content) and read. If the file exists, at the end of
- 28 -

4.0
this one.
Create the file if it does not exist. If the file does not exist, at the
beginning.
Added (add content) and read in binary file.
ab+ mode. If the file does not exist, at the
Create the file if it does not exist. beginning.
SOME METHODS OF THE FILE OBJECT

The file object, among its methods, has the following ones:
Method Description
Reads the entire contents of a file.
read([bytes]) If the byte length is passed, it will read only the contents up
to the specified length.
readlines() Reads all lines of a file
write(string) Write string inside the file

Sequence will be any iterable whose elements will be
writelines(sequence)
written one per line
ACCESSING FILES THROUGH THE WITH STRUCTURE With the with structure and the
open() function, you can open a file in any mode and work with it, without having to close it or destroy the
pointer, as this is taken care of by the with structure.
Read a file:
with open("file.txt", "r") as file: content = file.read()
Write to a file:
content = """
This will be the content of the new file.
The file will have several lines.
- 29 -

4.0
"""
with open("file.txt", "r") as file: file.write(content)
CSV FILE HANDLING

The CSV format derives its name from "comma separated values" , as defined in the RFC 4180. These are plain
text files, intended for massive data storage. It is one of the simplest formats for data analysis. In fact, many non-
free (or free but more complex) file formats are often converted to CSV format to apply complex data science
with various languages.
A CSV file consists of a header that defines column names, and the following rows have the data corresponding
to each column, separated by a comma. However, many other symbols can be used as cell separators. Among
them, the tab and the semicolon are just as frequent as the comma.
SOME EXAMPLES OF CSV FILES

Weather data (separated by ;)
ID;DATA;VV;DV;T;HR;PPT;RS;P
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
Scores obtained by players in a tournament (separated by ,) name,number,year
- 30 -

4.0
Maria,858,1930
Jose,665,1930
Rosa,591,1930
Juan Carlos,522,1930
Antonio,509,1930
Maria Esther,495,1930
Maria Luisa,470,1930
Joan,453,1930
John,436,1930
Companies registered with the General Inspectorate of Justice of Argentina (separated by , and data in
quotation marks)
"correlative_number", "company_type", "company_type_description", "company_reason_of", "company_name",

"deregistration_code", "deregistration_detail".
"10", "10", "PARTNERSHIP", "A A VALLE Y COMPA¥IA", "S", "42014", "BELONGS TO REGISTER INACTIVE ENTITIES".
"11", "10", "PARTNERSHIP", "A LUCERO Y H CARATOLI", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".
"12", "10", "SOCIEDAD COLECTIVA", "A PUIG E HIJOS", "S", "42014", "PERTENECE A
"13", "10", "GENERAL PARTNERSHIP", "A C I C A", "S", "42014", "BELONGS TO REGISTRY
INACTIVE ENTITIES".
"14", "10", "PARTNERSHIP", "A¥ON BEATRIZ S Y CIA", "S", "42014", "BELONGS TO
"15", "10", "PARTNERSHIP", "ABA DIESEL", "S", "42014", "BELONGS TO REGISTRY.
INACTIVE ENTITIES".
"16", "10", "PARTNERSHIP", "ABADA L JOSE AND JORGE JOSE
ABADAL", "S", "42014", "BELONGS TO REGISTRY OF INACTIVE ENTITIES", "ABADAL", "S", "42014", "BELONGS TO
REGISTRY OF INACTIVE ENTITIES".
"17", "10", "PARTNERSHIP", "ABADAL JOSE E HIJO", "S", "42014", "BELONGS TO REGISTER OF INACTIVE
ENTITIES".
"18", "10", "SOCIEDAD COLECTIVA", "ABATE Y MACIAS", "S", "42014", "BELONGS TO
It is also possible to find data stored in text files (TXT) with formats very similar to what you would expect to
find in a CSV. Sometimes it is possible to develop a formatting script to correct these files to work with a CSV.
Meteorological observations in TXT
DATE TMAX TMIN NAME

--------------------------------------------------------------------
07122017 28.0 19.0 AEROPARQUE AERO
07122017 26.8 12.4 AERO BLUE
- 31 -

4.0
07122017 29.6 7.8 BAHIA BLANCA AERO

07122017 22.7 6.7 BARILOCHE AERO
07122017 3.0 -8.5 BELGRANO BASE II
07122017 2.4 -0.2 CARLINI BASE (EX JUBANY)
07122017 3.9 -0.6 BASIS HOPE
07122017 0.7 -3.6 MARAMBIO BASE
WORKING WITH CSV FILES FROM PYTHON

Python provides its own module called csv, which facilitates the parsing of data from CSV files, both for reading
and writing.
This module is used in combination with the with structure and the open function to read or generate the file, and
the CSV module forparsing.
READING CSV FILES
Contents of .csv file
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
8;2016-03-01 04:00:00;;;8.7;39;;;
from csv import reader
with open("file.csv", "r") as file: document = reader(file, delimiter=';', for row in document:
' '.join(row) quotechar='"')
Output:
'0 2016-03-01 00:00:00 9.9 73

'1 2016-03-01 00:30:00 9.0 67
'2 2016-03-01 01:00:00 8.3 64
'3 2016-03-01 01:30:00 8.0 61
'4 2016-03-01 02:00:00 7.4 62
'5 2016-03-01 02:30:00 8.3 47
'6 2016-03-01 03:00:00 7.7 50
'7 2016-03-01 03:30:00 9.0 39
'8 2016-03-01 04:00:00 8.7 39
- 32 -

4.0
When the CSV file has a header, it is necessary to skip the header:
Contents of .csv file
ID;DATA;VV;DV;T;HR;PPT;RS;P
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
8;2016-03-01 04:00:00;;;8.7;39;;;
from csv import reader
with open("file.csv", "r") as file: document = reader(file, delimiter=';', headers = next(document)

for row in document: quotechar='"')
' '.join(row)
Output:
VARIABLE MANIPULATION METHODS 5

STRING MANIPULATION 5
FORMATTING METHODS 5
CAPITALIZE THE FIRST LETTER 5
CONVERT A STRING TO LOWERCASE 5
CONVERT A STRING TO UPPERCASE 6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA 6
CONVERT A STRING TO TITLE FORMAT 6
CENTER A TEXT 6
ALIGN TEXT TO THE LEFT 6
ALIGN TEXT TO THE RIGHT 7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS 7
RESEARCH METHODS 7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING 7
SEARCH FOR A SUBSTRING WITHIN A STRING 7
VALIDATION METHODS 8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING 8
- 33 -

4.0
TO KNOW IF A STRING IS ALPHANUMERIC 8

TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11
REMOVE CHARACTERS TO THE LEFT OF A STRING 11
REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16
- 34 -

4.0
TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23
RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN
PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41
- 35 -

4.0
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42

FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60
VARIANCE AND STANDARD DEVIATION 60
SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63
Another way to read CSV files with headers is to use the DictReader object instead of the reader,
and thus access only the value of the desired columns by name:
from csv import DictReader
with open("file.csv", "r") as file: document = DictReader(file, delimiter=';', for row in document:
row['DATA']] quotechar='"')
- 36 -

4.0
Output:
'2016-03-01 00:00:00'
'2016-03-01 00:30:00'
'2016-03-01 01:00:00'
'2016-03-01 01:30:00'
'2016-03-01 02:00:00'
'2016-03-01 02:30:00'
'2016-03-01 03:00:00'
'2016-03-01 03:30:00'
'2016-03-01 04:00:00'
WRITING CSV FILES
Writing a CSV without header:
from csv import writer with open("data.csv", "w") as file:

document = writer(file, delimiter=';', quotechar='"') document.writerows(array)
In the above example, an array could be a list of lists with equal number of elements. For example:
matrix = [
['John', 373, 1970],

['Ana', 124, 1983],
['Pedro', 901, 1650],
['Rosa', 300, 2000],
['Juana', 75, 1975],
]
This would generate a file named data.csv with the following content:
eugenia@bella:~$ cat datos.csv

John;373;1970
Ana;124;1983
Peter;901;1650
Rose;300;2000
Joan;75;1975
- 37 -

4.0
Writing a CSV with header:
In this case, the matrix to be written will need to be a list of dictionaries whose keys match the
indicated headers.
matrix = [
dict(player='Juan', points=373, year=1970), dict(player='Ana', points=124, year=1983),
dict(player='Pedro', points=901, year=1650), dict(player='Rosa', points=300, year=2000), dict(player='Juana',
points=75, year=1975), ] from csv import DictWriter
headers = ['player', 'points', 'year'] with open("data.csv", "w") as file:

document = DictWriter(file, delimiter=';', fieldnames=headers) quotechar='"',
document.writeheader()
document.writerows(matrix)
Simple statistical functions
Simple statistical functions such as the following can be performed on lists and tuples obtained or
not from a CSV:
Counting elements len(collection)

Add elements sum(collection)
Get higher number max(collection)
Get smaller number min(collection)
- 38 -

4.0
PROBABILITY AND STATISTICS WITH PYTHON
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND
EVENTS IN PYTHON
SAMPLE SPACE
A sample space is a set of possible events, such as those that could result from rolling a die:
E=(1,2,3,4,5,6)
sample_space = [1, 2, 3, 4, 5, 6].
Each element in a sample space is referred to as a sample point . The number of sample points is
denoted by n such that for sample spaceE=11,2,3,4,5,6/

, n=6 .
n = len(monthly_space)
SIMPLE AND COMPOUND EVENTS

An event is a set of outcomes within a sample space. For example:
• the rolling of a die is an event
• the probability that the number 5 comes out in this throw, is a simple event A = {5} and is
exclusive: if 5 comes out, no other number can simultaneously come out.
• the probability that an odd number is thrown, is the composite eventB=11,3,5}

which will depend in turn on the events of the
simple exclusive ^ = {1} , B2 = {2} and B3 = {3}

4.0
PROBABILITY ASSIGNMENT
Probability assignment is that which provides mathematical models to calculate the chances of
specific events occurring or not occurring.
The probability of an event is denoted by P( event)

.
The events can be:
• simple or compound
• mutually exclusive or independent
SIMPLE MUTUALLY EXCLUSIVE EVENTS

If we consider a sample space A , each of the k sample points will be denoted by Ak and the
probability of these, designated as P(Ak) , will be determined by:
P(A,) = -
n
probability = 1.0 / n
In Python, at least one element of the equation is required to be a real number if what is
required as a result is a real number.
The probability of each sample point, as mutually exclusive events, is the same for each event.
P^) = P(5) = P(4) = P(3) = P(2) = P(1)
- 41 -

4.0
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS

When the simple events that make up the composite event A are mutually exclusive, the probability
of the composite event will be given by the sum of the probabilities of each simple event P(Ak) ,
such that:
P(A) = P(A1)-P(A2)-.-P(Ak)
For example, to estimate the probability that a single throw of a die will produce an even number,
we obtain the event .4 = {2. 4,. 6}
given by the sum of
the probabilities of each of the simple eventsP(2)-P(3)-P(4,of the sample spaceE=11,2,3,4,5,6}

such that:
P(A) = P(2) - P(4) +P(6)

P(A) =1+1+1=8
P(A) - |
3
In the first result 6 ( in the second step, before finding the maximum common
1
divisor [DCM] and reduce the fraction to 2 ) , the denominator is equivalent to the number of single
events within the composite event "even numbers" and is denoted by h. The denominator, 6 , is n,
the total of all events in the sample space. Thus, the probability of an event composed A by
mutually exclusive events is given by the quotient of hyn such that:

p(a) = - n
pair_numbers = [i for i in sample_space if i % 2 is 0] h = len(pair_numbers) probability = float(h) / n
- 42 -

4.0
A composite event can be denoted by the union of its simple events (symbol u , read as "o"), such
that:
P(A, u A u ...A,) = P (A, ) + P (A2) + ...A,
For example, for the case of the event "even numbers", it is obtained that:
P(2U4U 6) = P(2)+P(4)+P(6)
P(2U4U6) = 1++1+1 =8
P(2 U 4 U 6) = | |
Such that P(2U4U 6)

is an event and P(2) , P( 4) and P(6) are the probabilities of
the three events that compose it. In a new context, U 4 U 6)
can be
treated as an event A.
FUNCTIONS
# Probability of mutually exclusive simple events pssme = lambda e: 1.0 / len(e) # Probability of mutually
exclusive compound events def pscme(e, sc):
n = len(e)
return len(sc) / float(n)
CONDITIONAL PROBABILITY IN PYTHON

B= {2,4,6}
c. Probability of B: P(B) =$=6=2
d. Probability of intersection:
P(ARB) = P(A)P(B)
P(A n B) = 1 I
P(A n B) = I
e = sample_space = [1, 2, 3, 4, 5, 6] n = len(e) # total of the sample
# probability of A
a = [i for i in e if i % 2 is not 0] pa = len(a) / float(n)
- 43 -

4.0
# probability of B
b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0
pb = len(b) / float(n)
# probability of the intersection of events pi = pa * pb
FUNCTIONS
# Conditional probability: dependent events def pscd(e, a, b):
i = list(set(a).intersection(b))
pi = pscme(e, i)
pa = pscme(e, a)
return pi / pa
# Conditional probability: independent events def psci(e, a, b):

pa = pscme(e, a)
pb = pscme(e, b)
return pa * pb
DEPENDENT EVENTS
Refers to the probability of two events occurring simultaneously where the second event depends on
the occurrence of the first.
The probability of B occurring if A occurs is denoted by P(BA) and is read as "the probability of B
given A", such that:
P(BI^ _1
■ P(A)
Where PA n B)
is the probability of the intersection of the events of AandB
- defined as: P{A n B) = P(A)P(B-A,

-such that the intersection is a new event
composed of simple events. In the following example, it would equal 11,3} (because 1 and 3 are in
both A and B ).
Example: what is theprobability of rolling a die with an odd number less than 4?
- 44 -

4.0
The throwing of the die is an event in itself. We wish to find the probability of B = {1.2.3} (number
less than 4) given that A=(1,3,5 (odd number) occurred in the sample space E = {1.2. 3. 4,5,6} .
sample_space = [1, 2, 3, 4, 5, 6].

a = [i for i in monthly_space if i % 2 is not 0].
b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4]
To calculate the probability of an intersection, first the intersection is obtained:
An ={1,3}
intersec = [i for i in a if i in b]
And then, the probability of the new composite event is calculated:
112 1
P(AnB)=P(1)+P(3)=+=é=, b0O3
or, in other words:
poand_1-2_1
n6 3
It is also necessary to obtain the probability of A , taking into account that

is also a compound event:
p_I_2_1
Finally, it is obtained that:
P(B|A) = P427
P(B|.A) = 1/2
P(B-A) =5=0.6
e = sample_space = [1, 2, 3, 4, 5, 6].
a = [i for i in e if i % 2 is not 0] # odd numbers

b = [i for i in e if i < 4] b = [i for i in e if i < 4] b = [i for i in e if i < 4] # numbers less than 4
- 45 -

4.0
intersec = [i for i in a if i in b] # intersection of A and B
n = len(e) # total sample

ha = len(a) # total number of single events in A
hintersec = len(intersec) # total number of single events at the intersection
# probability of intersection
probability_intersec = float(hintersec) / n
# probability of 'a
probability_a = float(ha) / n
# conditional probability
probability_b_given_a = probability_intersec / probability_a
SET THEORY IN PYTHON

When obtaining the intersection of two compound events, a manual method has been used by
saying: return 'i' for each 'i' in list 'a' if it is in list 'b'.
However, since each compound event is a set and Python provides a data type called set, it is
possible to obtain the intersection by manipulating compound events as Python sets. With set you
can convert any iterable to a set and perform set operations such as union and intersection when
necessary. intersec = list(set(a).intersection(b))
Here the set obtained is converted into a list in order to be consistent with the rest of the code and to
ensure that the resulting element supports the usual operations and processing of a list. When in
doubt as to whether to use lists or sets, the principle of simplicity should be applied and the simplest
solution should be implemented.
INDEPENDENT EVENTS
Unlike the previous case, here the probability of occurrence of B is not affected by the occurrence of
A . For example, the probability of rolling a die
- 46 -

4.0
and obtain an even number (event B) is not affected by the fact that an odd number was obtained in
a previous throw (event A). The probability of B is independent of A and is given by the product of
the probability of both events:
P(AnB) = P(A)P(B)
Here the intersection is the probability of the confluence of both events.
Once the probability of both independent events is calculated, they are multiplied obtaining:
a. Sample space (for both events):
E = {1,2,3,4,5,0}
b. Probability of A:
.4= {1,3,5}
P(A) = h = 2 = 1
BAYES THEOREM IN PYTHON
BAYES' THEOREM AND PROBABILITY OF CAUSES

Given a series of events Ak whose sum total is a sample space E and any event B, Bayes' Theorem
allows us to know the probability that each event Ak of E is the cause of B. For this reason, it is also
known as probability of causes.
DATA: CASE STUDY

Given a city of 50,000 inhabitants, with the following distribution:
- 47 -

4.0
Girls Children Women Men

11000 9000 16000 14000
And, a report of 9,000 cases of influenza, distributed as follows:
Girls Boys Women Men

2000 1500 3000 2500
The aim is to obtain the probability that the cause of contracting influenza is the fact of belonging to
a certain demographic sector (for example, the demographic sector made up of boys or girls).
ANALYSIS
From what has been stated above, it follows that:
• The city (absolute total inhabitants) is the sample space E.

• The number of girls, boys, women and men is each of the events Ak of the sample space E
• The value of n is taken as the sum of the sample space 2 Aa , such that
n = 50000
• The value of h for the events Ak is each of the values given in the population distribution
table.
• Having the flu is event B.

• The distribution table of influenza cases corresponds to the intersections of event B with
each event Ak , i.e. each Akn B
Depending on the probability calculation applied, the following can be obtained:
- 48 -

4.0
• The probability of being a girl, boy, woman or man in the city, by P(Ak) . It is considered
an a priori probability.
• The probability of being a girl, boy, woman or man and having influenza, which is obtained
with P(Ak B) and is considered a conditional probability.
• The probability that any inhabitant, regardless of the sector to which he or she belongs, will
have the flu is obtained with
n
P(B)=>P(A)P(BA,)
k=1 and is considered a total probability.
• The probability that someone with influenza is a girl, boy, woman or man is obtained with
Bayes' Theorem. This probability is considered an a posteriori probability, allowing us to
answer questions such as: Whatis the probability that a new case of influenza will be in a
child?
An efficient and orderly way to obtain an a posteriori probability with Bayes' Theorem is to first
obtain the three prior probabilities: a priori, conditional and total.
NOTICE:
In the following, map(float, <list>) will be used in the source code to convert the elements of
a list into real numbers, as long as doing so does not overload the code.
PROCEDURE
1. A priori probability calculation
Returns: probability that an inhabitant belongs to a specific demographic sector.
- 49 -

Formula:
Data required:
hk = data from the population distribution table
n = always the total amount of the sample space (50 000)
Results:
m) = = 0.22
50000 probability of begirl
9000
P(A2) = 50000
= 0.18
probability of bechild
16000
= 0.32
50000
probability of bewoman
14000 = 0.28
50000 probability of beman
Python code:
inhabitants = map(float, [11000, 9000, 16000, 14000])

n = sum(inhabitants)
pa = [h / n for h in inhabitants].
2. Conditional probability
Returns: probability of having flu while belonging to a demographic sector.

specific.
Certainty: Ak (demographic sector)
Objective: B (the probability of having the flu)
- 50 -

4.0
P(AknB)
P(BAk)= P(A,)
Formula:
Data required:
h=B,
P(Ak n B) =
h = intersections (data from the table of distribution of influenza cases)
Results:
. .....
P(BA1)= WOOL1
0 18 0.22 ' probability of having the flu as a child
IWO
P(BI A) = 50000 = 0.16
0.18
probability of getting the flu as a child
3000
P(BI A3) = woou __0 19
0.32 '
probability of having the flu as a woman
2500
P(BI A) = WOOL __0 18
0.28 ' probability of getting the flu as a man
Python code:
affected = map(float, [2000, 1500, 3000, 2500]) pi = [k / n for k in affected].

pba = [pi[i] / pa[i] for i in range(len(pi))].
3. Total probability
Returns: probability that any of the inhabitants, regardless of the demographic sector to which they
belong, may have influenza.
- 51 -

4.0
n
P(B) = >P(A,) P(BA,)
Formula: k=1
Data required:
a priori probability
conditional probability
Results:
P(B) = PA ,) PBA)) PBA)) (])+() PB])
P(B") = 0.22 ■ 0.18-0.18 ■ 0.16+0.32 ■ 0.19 - 0.28 ■ 0.18
PiB] = 0.04 - 0.03 - 0.06 - 0.05
P(B) = 0.18
Python code:
products = [pa[i] * pba[i] for i in range(len(pa))] pb = sum(products)
Remarks:
(a) note that in the above output there will be a difference of .01 with respect to the manual
solution. This is due to the rounding performed in the manual solution. This difference can be
eradicated by using 3 decimal places in the conditional probability values (instead of two) in the
manual solution.
- 52 -

4.0
(b) the probability of NOT having the flu will be given by 1 - P(B'l such that
1 -0.18 = 0.82 but it will not be necessary to use it for this example with the
Bayes theorem.
4. A posteriori probability
Returns: probability of belonging to a specific demographic sector and having the flu.
Certainty: B (have flu)
Objective: Ak (the probability of belonging to a specific demographic sector).
_ . 2P(A,)P(BA+)
Formula: k=1
Data required:
PAk) P(BAk,
= the product obtained in each of the terms of total probability
2P(AL)P(B|A)
k=1 = the total probability
Results:
= 0.22
probability of being a girl having the flu
0.03
P(A2B)= 0.18
= 0.16
probability of being a child having the flu
- 53 -

4.0
0.06 —
P(A3B)= = 0.33
0.18 probability of being a woman having the flu
0.05 -
P(A4B)= = 0.27
0.18 probability of being a man having the flu
Python code:
pab = [p / pb for p in products].
FUNCTIONS
# Bayes' Theorem
def bayes(e, b):
n = float(sum(e))
pa = [h / n for h in e].
pi = [k / n for k in b].
pba = [pi[i] / pa[i] for i in range(len(pi))].
prods = [pa[i] * pba[i] for i in range(len(pa))]]
ptb = sum(prods)
pab = [p / pb for p in prods].
return pab
COMPLEMENTARY BIBLIOGRAPHY
[0] Probability and Statistics, Murray Spiegel. McGraw-Hill, Mexico 1988. ISBN: 968-451-102-7
- 54 -

4.0
ANNEX I: COMPLEX CALCULATIONS
POPULATION AND SAMPLING STATISTICS: CALCULATION OF
VARIANCE AND STANDARD DEVIATION
from math import sqrt
samples = [12, 23, 24, 24, 22, 10, 17] # sample list
n = len(samples)
average = sum(samples) / float(n)
Media
Population variance
2 _ H(,2)2
" n
Sample variance
Sample standard deviation
Population standard deviation
differences = [xi - mean for xi in samples].

powers = [x ** 2 for x in differences].
summation = sum(powers)
monthly_variance = summation / (n - 1) population_variance = summation / n
monthly_deviation = sqrt(monthly_variance) population_deviation = sqrt(population_variance)

4.0
SCALAR PRODUCT OF TWO VECTORS

vector1 = [3, 0]
vvector2 = [4, 3]
pe = sum([x * y for x, y in zip(vector1, vector2)])
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY

CALCULATIONS
# ABSOLUTE FREQUENCY
# Number of times a value appears in a sample
samples = [1, 2, 3, 4, 4, 3, 2, 6, 7, 3, 3, 3, 1, 8, 5, 9] absolute = []

frequencies = []
for n in samples:
if not n in absolutes:
absolute.append(n)
fi = samples.count(n)
frequencies.append(fi)
N = sum(frequencies) # == len(samples)
# RELATIVE FREQUENCY
# Quotient between absolute frequency and relative N = [float(fi) / N for fi in frequencies] sumarelative =
round(sum(relative)) # == 1
# CUMULATIVE FREQUENCY
# Sum of all frequencies less than or equal to the absolute frequency frequencies.sort()
cumulative = [sum(frequencies[:i+1]) for i, fi in enumerate(frequencies)]]
# CUMULATIVE RELATIVE FREQUENCY

# Ratio between accumulated frequency and total amount of accumulated data = [float(f) / N for f in
accumulated].
- 61 -

4.0
ANNEX II: CREATION OF A MENU OF OPTIONS
In scripting, it can be useful to give the user a menu of options and have the script act according to
the option chosen by the user. Here is a trick to solve this in a simple and ingenious way.
1) First, the entire script needs to be organized into functions.
2) Secondly, it is necessary that all functions have their corresponding documentation, defining
what exactly the function does:
def read_file():
"Read CSV file"""""
return "read"
def write_file():
"""Write CSV file"""
return "write"
def _sum_numbers(list):
"""Add the numbers in a list""" return "private""
3) Next, a list is defined with the name of all the functions that will be accessible by the user from
the menu:
functions = ['read_file', ' write_file']]
The trick is to automate both the generation of the menu and the function call.
To automate menu generation, the trick is to use:
▪ The list in step 3

4.0
▪ The locals() function.
▪ The doc attribute
number = 1 # will then be used to access the function

menu = "Choose an option".
for function in functions:

menu += "\t{}. {}".format(number, locals()[function].__doc__) number = number + 1 # increments the
number in each iteration
echo(menu)
option = int(get("Your option: "))
# echo and get: hacks learned in the introductory course
Finally, to dynamically access the function chosen by the user, the trick is to use the option chosen
by the user, as an index to access the function name from the list, and again resort to locals to invoke
the function:
function = functions[option - 1] locals() # you get the name of the function

[function]() # the function is invoked by locals()
- 64 -

4.0
CERTIFICATE
Show how much you have learned!
If you reached the end of the course you can obtain a triple certification:
- Certificate of attendance (issued by Eugenia Bahit School )

- Certificate of Achievement (issued by CLA Linux)
- Approval certification (issued by LAECI)
Check with your teacher or visit the certification website at http://python.eugeniabahit.org.
If you need to prepare for your exam, you can register in the
Data Science with Python course at
Escuela de Informática Eugenia Bahit
www.eugeniabahit.com www.eugeniabahit.com

DataScienceWithPython Ed2018

Uploaded by

Copyright:

Available Formats

You might also like

DataScienceWithPython Ed2018

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DataScienceWithPython Ed2018

Uploaded by

Copyright:

Available Formats

44444444444.

Information and registration:

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR..................................................12

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

OBTAIN THE KEYS TO A DICTIONARY..................................................................................................24

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

VARIABLE MANIPULATION METHODS

using the syntax:

CAPITALIZE THE FIRST LETTER

CONVERT A STRING TO LOWERCASE

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

CONVERT A STRING TO UPPERCASE

CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA

CONVERT A STRING TO TITLE FORMAT

> >> string.center(50, " ")

ALIGN TEXT TO THE LEFT

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

ALIGN TEXT TO THE RIGHT

>>> string.rjust(50, " ")

FILL IN A TEXT BY PREFIXING IT WITH ZEROS

COUNT NUMBER OF OCCURRENCES OF A SUBSTRING

SEARCH FOR A SUBSTRING WITHIN A STRING

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING

TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING

TO KNOW IF A STRING IS ALPHANUMERIC

TO KNOW IF A STRING IS ALPHABETIC

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

TO KNOW IF A STRING IS NUMERIC

TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS

TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

TO KNOW IF A STRING CONTAINS ONLY BLANKS

TO KNOW IF A STRING HAS A TITLE FORMAT

FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT

> >> string.format(gross=100, vat=100 * 21 / 100, net=100 * 21 / 100 + 100)

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

REPLACE TEXT IN A STRING

REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING

REMOVE CHARACTERS TO THE LEFT OF A STRING

> >> string = " www.eugeniabahit.com"

REMOVE CHARACTERS TO THE RIGHT OF A STRING

JOINING AND SPLITTING METHODS

ITERATIVELY JOIN A CHAIN

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

>>> number = "275"

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR

> >> protocol, separator, domain = tuple

SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR

SPLIT A STRING INTO LINES

> >> text = "Line 1 Line 2 Line 3".

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution

ADD AN ITEM TO THE END OF THE LIST

ADD SEVERAL ITEMS TO THE END OF THE LIST

ADD AN ELEMENT IN A GIVEN POSITION

DELETE THE LAST ITEM IN THE LIST