DataScienceWithPython Ed2018

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 66

44444444444.

EUGENIA BAHIT

DATA SCIENCE
WITH PYTHON

STUDY MATERIAL

Information and registration:


Course: http://escuela.eugeniabahit.com | Certifications: http://python.laeci.org
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SUMMARY
VARIABLE MANIPULATION METHODS...........................................................................................................5
STRING MANIPULATION.....................................................................................................................................5
FORMATTING METHODS.................................................................................................................................5
CAPITALIZE THE FIRST LETTER................................................................................................................5
CONVERT A STRING TO LOWERCASE.....................................................................................................5
CONVERT A STRING TO UPPERCASE.......................................................................................................6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA...............................................................6
CONVERT A STRING TO TITLE FORMAT.................................................................................................6
CENTER A TEXT.............................................................................................................................................6
ALIGN TEXT TO THE LEFT..........................................................................................................................6
ALIGN TEXT TO THE RIGHT.......................................................................................................................7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS....................................................................................7
RESEARCH METHODS......................................................................................................................................7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING......................................................................7
SEARCH FOR A SUBSTRING WITHIN A STRING.....................................................................................7
VALIDATION METHODS..................................................................................................................................8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING.............................................................8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING.................................................................8
TO KNOW IF A STRING IS ALPHANUMERIC...........................................................................................8
TO KNOW IF A STRING IS ALPHABETIC..................................................................................................8
TO KNOW IF A STRING IS NUMERIC.........................................................................................................9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS....................................................9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS......................................................9
TO KNOW IF A STRING CONTAINS ONLY BLANKS............................................................................10
TO KNOW IF A STRING HAS A TITLE FORMAT....................................................................................10
SUBSTITUTION METHODS............................................................................................................................10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT...................................................10
REPLACE TEXT IN A STRING....................................................................................................................11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING...................................................11
REMOVE CHARACTERS TO THE LEFT OF A STRING..........................................................................11
REMOVE CHARACTERS TO THE RIGHT OF A STRING.......................................................................11
JOINING AND SPLITTING METHODS..........................................................................................................11
ITERATIVELY JOIN A CHAIN....................................................................................................................11

-2-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR..................................................12


SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR............................................12
SPLIT A STRING INTO LINES.....................................................................................................................12
MANIPULATION OF LISTS AND TUPLES.......................................................................................................14
AGGREGATION METHODS............................................................................................................................14
ADD AN ITEM TO THE END OF THE LIST...............................................................................................14
ADD SEVERAL ITEMS TO THE END OF THE LIST................................................................................14
ADD AN ELEMENT IN A GIVEN POSITION.............................................................................................14
ELIMINATION METHODS..............................................................................................................................14
DELETE THE LAST ITEM IN THE LIST....................................................................................................14
DELETE AN ELEMENT BY ITS INDEX.....................................................................................................15
DELETE AN ITEM BY ITS VALUE.............................................................................................................15
ORDER METHODS...........................................................................................................................................15
SORT A LIST IN REVERSE (REVERSE ORDER)......................................................................................15
SORT A LIST IN ASCENDING ORDER......................................................................................................15
SORT A LIST IN DESCENDING ORDER....................................................................................................15
RESEARCH METHODS....................................................................................................................................15
COUNT NUMBER OF OCCURRENCES ELEMENTS...............................................................................15
GET INDEX NUMBER..................................................................................................................................16
ANNEX ON LISTS AND TUPLES...................................................................................................................16
TYPE CONVERSION.....................................................................................................................................16
CONCATENATION OF COLLECTIONS.....................................................................................................17
MAXIMUM AND MINIMUM VALUE........................................................................................................20
COUNT ITEMS...............................................................................................................................................20
DICTIONARY MANIPULATION........................................................................................................................22
ELIMINATION METHODS..............................................................................................................................22
EMPTY A DICTIONARY..............................................................................................................................22
AGGREGATION AND CREATION METHODS.............................................................................................22
COPY A DICTIONARY.................................................................................................................................22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY.......................23
SEQUENCE.....................................................................................................................................................23
CONCATENATE DICTIONARIES...............................................................................................................23
SET A DEFAULT KEY AND VALUE..........................................................................................................23
RETURN METHODS.........................................................................................................................................24
GET THE VALUE OF A KEY.......................................................................................................................24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY...............................................................................24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY........................................................................24

-3-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

OBTAIN THE KEYS TO A DICTIONARY..................................................................................................24


GET THE VALUES OF A DICTIONARY....................................................................................................25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY.........................................................................25
FILE HANDLING AND MANIPULATION.........................................................................................................27
WAYS TO OPEN A FILE..................................................................................................................................27
SOME METHODS OF THE FILE OBJECT......................................................................................................29
CSV FILE HANDLING.........................................................................................................................................30
SOME EXAMPLES OF CSV FILES.................................................................................................................30
WORKING WITH CSV FILES FROM PYTHON............................................................................................32
READING CSV FILES...................................................................................................................................32
WRITING CSV FILES....................................................................................................................................37
PROBABILITY AND STATISTICS WITH PYTHON.........................................................................................40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON......40
SAMPLE SPACE............................................................................................................................................40
SIMPLE AND COMPOUND EVENTS.........................................................................................................40
PROBABILITY ASSIGNMENT....................................................................................................................41
SIMPLE MUTUALLY EXCLUSIVE EVENTS.........................................................................................41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS...........................................42
FUNCTIONS...................................................................................................................................................43
CONDITIONAL PROBABILITY IN PYTHON................................................................................................43
FUNCTIONS...................................................................................................................................................44
DEPENDENT EVENTS..................................................................................................................................44
SET THEORY IN PYTHON.......................................................................................................................46
INDEPENDENT EVENTS.............................................................................................................................46
BAYES THEOREM IN PYTHON.....................................................................................................................47
BAYES' THEOREM AND PROBABILITY OF CAUSES............................................................................47
DATA: CASE STUDY................................................................................................................................47
ANALYSIS..................................................................................................................................................48
PROCEDURE..............................................................................................................................................49
FUNCTIONS...................................................................................................................................................54
COMPLEMENTARY BIBLIOGRAPHY.......................................................................................................54
ANNEX I: COMPLEX CALCULATIONS............................................................................................................60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF.........................................................60
VARIANCE AND STANDARD DEVIATION.................................................................................................60
SCALAR PRODUCT OF TWO VECTORS......................................................................................................61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS.....................................61
ANNEX II: CREATION OF A MENU OF OPTIONS..........................................................................................63

-4-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VARIABLE MANIPULATION METHODS


In Python, every variable is considered an object. Different types of actions called methods can be performed on

each object. Methods are functions but they are derived from a variable. Therefore, these functions are accessed

using the syntax:

variable.function()

In some cases, these methods (functions of an object) will accept parameters like any other function.

variable.function(parameter)

STRING MANIPULATION
The main methods that can be applied to a text string, organized by category, are described below.

FORMATTING METHODS

CAPITALIZE THE FIRST LETTER


Method: capitalize()
Returns: a copy of the string with the first letter capitalized
> >> string = "welcome to my application".
> >> result = string.capitalize()
> >> result
Welcome to my application

CONVERT A STRING TO LOWERCASE


Method: lower()
Returns: a copy of the string in lowercase letters
> >> string = "Hello World".
> >> string.lower()

-5-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

hello world

CONVERT A STRING TO UPPERCASE


Method: upper()
Returns: a copy of the string in uppercase letters
> >> string = "Hello World".
> >> string.upper()
HELLO WORLD

CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA


Method: swapcase()
Returns: a copy of the string converted from uppercase to lowercase and vice versa.
> >> string = "Hello World".
> >> string.swapcase()
hOLA mUNDO

CONVERT A STRING TO TITLE FORMAT


Method: title()
Returns: a copy of the converted string
> >> string = "hello world
> >> string.title()
Hello World

CENTER A TEXT
Method: center(length[, "fill character"])
Returns: a copy of the centered string
> >> string = "welcome to my application".capitalize()
> >> string.center(50, "=")
===========Welcome to my application============

> >> string.center(50, " ")


Welcome to my application

ALIGN TEXT TO THE LEFT


Method: ljust(length[, "fill character"])
Returns: a copy of the left-aligned string
> >> string = "welcome to my application".capitalize()
>>> string.ljust(50, "=")
Welcome to my application=======================

-6-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

ALIGN TEXT TO THE RIGHT


Method: rjust(length[, "fill character"])
Returns: a copy of the right-aligned string
>>> string = "welcome to my application".capitalize()
>>> string.rjust(50, "=")
=======================Welcome to my application

>>> string.rjust(50, " ")


Welcome to my application

FILL IN A TEXT BY PREFIXING IT WITH ZEROS


Method: zfill(length)
Returns: a copy of the string padded with leading zeros until the specified final length is reached
>>> invoice_number = 1575
>>> str(invoice_number).zfill(12)
000000001575

RESEARCH METHODS

COUNT NUMBER OF OCCURRENCES OF A SUBSTRING


Method: count("substring"[, start_position, end_position])
Returns: an integer representing the number of occurrences of substring within string
>>> string = "welcome to my application".capitalize()
>>> string.count("a")
3

SEARCH FOR A SUBSTRING WITHIN A STRING


Method: find("substring"[, start_position, end_position])
Returns: an integer representing the position where the substring starts within
chain. If not found, returns -1
>>> string = "welcome to my application".capitalize()
>>> string.find("my")
13
>>> string.find("my", 0, 10)
-1

-7-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VALIDATION METHODS

TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING


Method: startswith("substring"[, start_position, end_position])
Returns: True or False
> >> string = "welcome to my application".capitalize()
> >> string.startswith("Welcome")
True
> >> string.startswith("application")
False
> >> string.startswith("application", 16)
True

TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING


Method: endswith("substring"[, start_position, end_position])
Returns: True or False
> >> string = "welcome to my application".capitalize()
> >> string.endswith("application")
True
> >> string.endswith("Welcome")
False
> >> string.endswith("Welcome", 0, 10)
True

TO KNOW IF A STRING IS ALPHANUMERIC


Method: isalnum()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isalnum()
False
> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
> >> string.isalnum()
True
> >> string = "pepegrillo75".
> >> string.isalnum()
True

TO KNOW IF A STRING IS ALPHABETIC


Method: isalpha()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isalpha()
False

-8-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo" >> string = "pepegrillo
> >> string.isalpha()
True
> >> string = "pepegrillo75".
> >> string.isalpha()
False

TO KNOW IF A STRING IS NUMERIC


Method: isdigit()
Returns: True or False
> >> string = "pepegrillo 75".
> >> string.isdigit()
False
> >> string = "7584"
> >> string.isdigit()
True
> >> string = "75 84"
> >> string.isdigit()
False
> >> string = "75.84"
> >> string.isdigit()
False

TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS


Method: islower()
Returns: True or False
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.islower()
True
> >> string = "Jiminy Cricket".
> >> string.islower()
False
> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
> >> string.islower()
False
> >> string = "pepegrillo75".
> >> string.islower()
True

TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS


Method: isupper()
Returns: True or False
> >> string = "PEPE GRILLO".
> >> string.isupper()
True
> >> string = "Jiminy Cricket".
> >> string.isupper()
False

-9-

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo" >> string = "Pepegrillo
> >> string.isupper()
False
> >> string = "PEPEGRILLO".
> >> string.isupper()
True

TO KNOW IF A STRING CONTAINS ONLY BLANKS


Method: isspace()
Returns: True or False
> >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo" >> string = "pepe grillo
> >> string.isspace()
False
> >> string = " "
> >> string.isspace()
True

TO KNOW IF A STRING HAS A TITLE FORMAT


Method: istitle()
Returns: True or False
> >> string = "Jiminy Cricket".
> >> string.istitle()
True
> >> string = "Jiminy Cricket".
> >> string.istitle()
False

SUBSTITUTION METHODS

FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT


Method: format(*args, **kwargs)
Returns: the formatted string
> >> string = "welcome to my application {0}"
> >> string.format("in Python")
welcome to my Python application

> >> string = "Gross Amount: ${0} + VAT: ${1} = Net Amount: {2}"
> >> string.format(100, 21, 121)
Gross amount: $100 + VAT: $21 = Net amount: 121

> >> string = "Gross amount: ${gross} + VAT: ${VAT} = Net amount: {net}"
> >> string.format(gross=100, vat=21, net=121)
Gross amount: $100 + VAT: $21 = Net amount: 121

> >> string.format(gross=100, vat=100 * 21 / 100, net=100 * 21 / 100 + 100)


Gross amount: $100 + VAT: $21 = Net amount: 121

- 10 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

REPLACE TEXT IN A STRING


Method: replace("substring to search for", "substring to replace with")
Returns: the replaced string
> >> search = "first name last name
> >> replace_by = "John Smith".
> >> "Dear Mr. first name last name:".replace(search, replace_by) Dear Mr. John Smith:

REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING


Method: strip(["character"])
Returns: the substituted string
> >> string = " www.eugeniabahit.com "
> >> string.strip()
www.eugeniabahit.com
> >> string.strip(' ')
www.eugeniabahit.com

REMOVE CHARACTERS TO THE LEFT OF A STRING


Method: lstrip(["character"])
Returns: the substituted string
> >> string ="www.eugeniabahit.com"
> >> string.lstrip("w." )
eugeniabahit.com

> >> string = " www.eugeniabahit.com"


> >> string.lstrip()
www.eugeniabahit.com

REMOVE CHARACTERS TO THE RIGHT OF A STRING


Method: rstrip(["character"])
Returns: the substituted string
> >> string ="www.eugeniabahit.com "
> >> string.rstrip( )
www.eugeniabahit.com

JOINING AND SPLITTING METHODS

ITERATIVELY JOIN A CHAIN


Method: join(iterable)
Returns: the string joined with the iterable (the string is separated by each of the elements of the iterable).
>>> format_invoice_number = ("No. 0000-0", "-0000 (ID: ", ")")

- 11 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> number = "275"


> >> invoice_number = number.join(invoice_number_format)
> >> invoice_number
NO. 0000-0275-0000 (ID: 275)

SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR


Method: partition("separator")
Returns: a tuple of three elements where the first is the contents of the string before the separator, the second is
the separator itself and the third is the contents of the string after the separator.
> >> tuple = "http://www.eugeniabahit.com".partition("www.")
> >> tuple
('http://', 'www.', 'eugeniabahit.com')

> >> protocol, separator, domain = tuple


>>>> "Protocol: {0}"protocol, domain: {1}".format(protocol, domain) Protocol: http://
Domain: eugeniabahit.com

SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR


Method: split("separator")
Returns: a list of all elements found by dividing the string by a separator
>>> keywords = "python, guide, course, tutorial".split(", ")
> >> keywords
['python', 'guide', 'course', 'tutorial' ]

SPLIT A STRING INTO LINES


Method: splitlines()
Returns: a list where each element is a fraction of the string divided into lines.
>>> text = """Line 1
Line 2
Line 3
Line 4 """
> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3', 'Line 4'].

> >> text = "Line 1 Line 2 Line 3".


> >> text.splitlines()
['Line 1', 'Line 2', 'Line 3'].

- 12 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
MANIPULATION OF LISTS AND TUPLES
In this chapter, we will see the methods that the list object has. Some of them are also available for
tuples.

AGGREGATION METHODS

ADD AN ITEM TO THE END OF THE LIST


Method: append("new element")
> >> male_names = ["Alvaro", "Jacinto", "Miguel", "Edgardo", "David"]
> >> male_names.append("Jose")
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose'].

ADD SEVERAL ITEMS TO THE END OF THE LIST


Method: extend(other_list)
> >> male_names.extend(["Jose", "Gerardo"])
> >> male_names
['Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose', 'Gerardo'].

ADD AN ELEMENT IN A GIVEN POSITION


Method: insert(position, "new element")
> >> male_names.insert(0, "Ricky")
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Gerardo'].

ELIMINATION METHODS

DELETE THE LAST ITEM IN THE LIST


Method: pop()
Returns: the deleted element
> >> male_names.pop()
Gerardo
> >> male_names
['Ricky', 'Alvaro', 'David', 'Edgardo', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

DELETE AN ELEMENT BY ITS INDEX


Method: pop(index)
Returns: the deleted element
>>> male_names.pop(3)
Edgardo

>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Jose', 'Ricky', 'Jose', 'Jose', 'Jose'].

DELETE AN ITEM BY ITS VALUE


Method: remove("value")
>>> male_names.remove("Jose")
>>> male_names
['Ricky', 'Alvaro', 'David', 'Jacinto', 'Ricky', 'Jose', 'Jose'].

ORDER METHODS

SORT A LIST IN REVERSE (REVERSE ORDER)


Method: reverse()
>>> male_names.reverse()
>>> male_names
['Jose', 'Jose', 'Ricky', 'Jacinto', 'David', 'Alvaro', 'Ricky'].

SORT A LIST IN ASCENDING ORDER


Method: sort()
>>> male_names.sort()
>>> male_names
['Alvaro', 'David', 'Jacinto', 'Jose', 'Jose', 'Ricky', 'Ricky'].

SORT A LIST IN DESCENDING ORDER


Method: sort(reverse=True)
>>> male_names.sort(reverse=True)
>>> male_names
['Ricky', 'Ricky', 'Jose', 'Jose', 'Jacinto', 'David', 'Alvaro'].

RESEARCH METHODS

COUNT NUMBER OF OCCURRENCES ELEMENTS


Method: count(element)
>>> male_names = ["Alvaro", "Miguel", "Edgardo", "David", "Miguel"].

- 15 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> male_names.count("Miguel") 2 >>> male_names = ("Alvaro", "Miguel", "Edgardo", "David", "Miguel")


>>> male_names.count("Miguel")
2

GET INDEX NUMBER


Method: index(element[, start_index, end_index])
>>> male_names.index("Miguel") 1

>>> male_names.index("Miguel", 2, 5) 4

ANNEX ON LISTS AND TUPLES

TYPE CONVERSION
In the set of Python built-in functions, it is possible to find two functions that allow you to convert lists into

tuples, and vice versa. These functions are list and tuple, to convert tuples to lists and lists to tuples, respectively.

One of the most frequent uses is the conversion of tuples to lists, which need to be modified. This is often the

case with results obtained from a database query.

>>> tuple = (1, 2, 3, 4)


>>> tuple (1, 2, 3, 4)

>>> list(tuple)
[1, 2, 3, 4]

>>> list = [1, 2, 3, 4].


>>> list [1, 2, 3, 4].

>>> tuple(list)
(1, 2, 3, 4)

- 16 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

CONCATENATION OF COLLECTIONS
You can concatenate (or join) two or more lists or two or more tuples, by means of the addition sign +.

You cannot join a list to a tuple. The collections to be joined must be of the same type.

>>> list1 = [1, 2, 3, 4].


> >> list2 = [3, 4, 5, 6, 7, 8]
> >> list3 = list1 + list2
> >> list3
[1, 2, 3, 4, 3, 4, 5, 6, 7, 8]

VARIABLE MANIPULATION METHODS 5


STRING MANIPULATION 5
FORMATTING METHODS 5
CAPITALIZE THE FIRST LETTER 5
CONVERT A STRING TO LOWERCASE 5
CONVERT A STRING TO UPPERCASE 6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA 6
CONVERT A STRING TO TITLE FORMAT 6
CENTER A TEXT 6
ALIGN TEXT TO THE LEFT 6
ALIGN TEXT TO THE RIGHT 7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS 7
RESEARCH METHODS 7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING 7
SEARCH FOR A SUBSTRING WITHIN A STRING 7
VALIDATION METHODS 8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING IS ALPHANUMERIC 8
TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11

- 17 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

REMOVE CHARACTERS TO THE LEFT OF A STRING 11


REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
ELIMINATION METHODS 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16
TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
ELIMINATION METHODS 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23

- 18 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41
EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42
FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60

- 19 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

VARIANCE AND STANDARD DEVIATION 60


SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63
>
> >> tuple4 = tuple1 + tuple2 + tuple3
> >> tuple4
(1, 2, 3, 4, 5, 4, 6, 8, 10, 3, 5, 7, 9)

MAXIMUM AND MINIMUM VALUE


The maximum and minimum value of both lists and tuples can be obtained:

> >> max(tuple4)


10
> >> max(tuple1)
5
> >> min(tuple1)
1
> >> max(list3)
8
> >> min(list1)
1

COUNT ITEMS
The len() function is used to count elements in a list or tuple, as well as characters in a text string:

> >> len(list3)


10
> >> len(list1)
4

- 20 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
DICTIONARY MANIPULATION
ELIMINATION METHODS

EMPTY A DICTIONARY
Method: clear()
>>> dictionary = {"color": "violet", "size": "XS", "price": 174.25}
> >> dictionary
{'color': 'violet', 'price': 174.25, 'size': 'XS'}

> >> dictionary.clear()


> >> dictionary
{}

AGGREGATION AND CREATION METHODS

COPY A DICTIONARY
Method: copy()
> >> dictionary = {"color": "violet",
> >> t-shirt = dictionary.copy()
> >> dictionary
{'color': 'violet', 'price': 174.25,

> >> t-shirt {'color': 'violet', 'price':

> >> dictionary.clear()


"size": "XS", "price": 174.25}
> >> dictionary {}

> >> t-shirt


'size': 'XS'}
{'color': 'violet', 'price':

> >> musculosa = T-shirt


> >> t-shirt 174.25, 'size': 'XS'}
{'color': 'violet', 'price':

> >> muscled {'color': 'violet', 'price':

>>> remera.clear()
>>> T-shirt {} >>> T-shirt {}
174.25, 'size': 'XS'}

174.25, 'size': 'XS'}

174.25, 'size': 'XS'}

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY


SEQUENCE
Method: dict.fromkeys(sequence[, default value])
>>> sequence = ["color", "size", "brand"]
>>> dictionary1 = dict.fromkeys(sequence)
>>> dictionary1
{'color': None, 'brand': None, 'size': None}

>>> dictionary2 = dict.fromkeys(sequence, 'default value')


>>> dictionary2
{'color': 'default x value', 'brand': 'default x value', 'size': 'default x value'}

CONCATENATE DICTIONARIES
Method: update(dictionary)
>>> dictionary1 = {"color": "green", "price": 45}
>>> dictionary2 = {"size": "M", "brand": "Lacoste"}
>>> dictionary1.update(dictionary2)
>>> dictionary1
{'color': 'green', 'price': 45, 'brand': 'Lacoste', 'size': 'M'}

SET A DEFAULT KEY AND VALUE


Method: setdefault("key"[, None|default_value])

If the key does not exist, it creates it with the default value. Always returns the value for the key passed as
parameter.

>>> t-shirt = {"color": "pink", "brand": "Zara"}


>>> key = remera.setdefault("talle", "U")
> >> key
'U'

> >> t-shirt


{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> t-shirt2 = t-shirt.copy()


> >> t-shirt2
{ 'color': 'pink', 'brand': 'Zara', 'size': 'U'}

> >> key = remera2.setdefault("estampado")


> >> key
> >> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}

> >> key = t-shirt2.setdefault("brand", "Lacoste")


> >> key
Zara

- 23 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

>>> t-shirt2
{'color': 'pink', 'print': None, 'brand': 'Zara', 'size': 'U'}

RETURN METHODS

GET THE VALUE OF A KEY


Method: get(key[, "default x value if key does not exist"])
>>> t-shirt.get("color")
'pink

>>> remera.get("stock")
>>> t-shirt.get("stock", "no stock")
'out of stock

TO KNOW IF A KEY EXISTS IN THE DICTIONARY


Method: 'key' in dictionary
> >> exists = 'price' in t-shirt
> >> exists
False

> >> exists = 'color' in t-shirt


> >> exists
True

OBTAIN THE KEYS AND VALUES OF A DICTIONARY


Method: items()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}

for key, value in dictionary.items():


key, value
Output:
('color', 'pink')
('brand', 'Zara')
('size', 'U')

OBTAIN THE KEYS TO A DICTIONARY


Method: keys()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'} for key in dictionary.keys():
key
'brand
size
color

Get keys in a list

- 24 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}


> >> keys = list(dictionary.keys())
> >> keys
['color', 'brand', 'size' ]

GET THE VALUES OF A DICTIONARY


Method: values()
dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
for key in dictionary.values():
key
'pink
Zara
'U'

Get values in a list


> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> keys = list(dictionary.values())

OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY


To count the elements of a dictionary, as with lists and tuples, the built-in function len() is used.
> >> dictionary = {'color': 'pink', 'brand': 'Zara', 'size': 'U'}
> >> len(dictionary)
3

- 25 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
FILE HANDLING AND MANIPULATION
Python allows you to work on two different levels with respect to the file and directory system.

One of them is through the os module, which facilitates the work with the entire file and directory

system, at the level of the Operating System itself.

The second level is the one that allows working with files by manipulating their reading and writing from the

application or script itself, treating each file as an object.

WAYS TO OPEN A FILE


The way a file is opened is related to the final objective that answers the question"what is this file being opened

for? The answers can be several: to read, to write, or to read and write.

Each time a file is "opened" a pointer is created in memory.

This pointer will position a cursor (or access point) at a specific location in memory (more simply put, it will

position the cursor on a specific byte of the file contents).

This cursor will move within the file as the file is read or written to.

When a file is opened in read mode, the cursor is positioned at byte 0 of the file (i.e. at the beginning of the file).

Once the file has been read, the cursor moves to the final byte of the file (equivalent to the total number of bytes

in the file). The same happens when it is opened in write mode. The cursor will move as you type.

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

When you want to write to the end of a non-null file, the append mode is used. In this way, the file is opened
with the cursor at the end of the file.
The + symbol as a mode suffix adds the opposite mode to the opening mode once the opening action is executed.

For example, the r (read) mode with the suffix + (r+), opens the file for reading, and after reading, returns the

cursor to byte 0.

The following table shows the different ways of opening a file:


Indicator Opening mode Pointer location
r Read only At the beginning of the file
rb Read only in binary mode At the beginning of the file

r+ Reading and writing At the beginning of the file


rb+ Read and write in binary mode At the beginning of the file
Writing only.
Overwrite the file if it exists.
w At the beginning of the file
Create the file if it does not exist.

Write only in binary mode. Overwrite the file


wb if it exists. Create the file if it does not exist. At the beginning of the file

Writing and reading.


w+ Overwrite the file if it exists. At the beginning of the file
Create the file if it does not exist.
Writing and reading in binary mode.
wb+ Overwrite the file if it exists. Create the file if At the beginning of the file
it does not exist.
If the file exists, at the end of the
Added (add content). file.
a Create the file if it does not exist. If the file does not exist, at the
beginning.
If the file exists, at the end of the
Added in binary mode (add content). file.
ab Create the file if it does not exist. If the file does not exist, at the
beginning.

a+ Added (add content) and read. If the file exists, at the end of

- 28 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

this one.
Create the file if it does not exist. If the file does not exist, at the
beginning.
If the file exists, at the end of the
Added (add content) and read in binary file.
ab+ mode. If the file does not exist, at the
Create the file if it does not exist. beginning.

SOME METHODS OF THE FILE OBJECT


The file object, among its methods, has the following ones:

Method Description
Reads the entire contents of a file.
read([bytes]) If the byte length is passed, it will read only the contents up
to the specified length.
readlines() Reads all lines of a file

write(string) Write string inside the file


Sequence will be any iterable whose elements will be
writelines(sequence)
written one per line

ACCESSING FILES THROUGH THE WITH STRUCTURE With the with structure and the
open() function, you can open a file in any mode and work with it, without having to close it or destroy the
pointer, as this is taken care of by the with structure.

Read a file:

with open("file.txt", "r") as file: content = file.read()

Write to a file:

content = """
This will be the content of the new file.
The file will have several lines.

- 29 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

"""

with open("file.txt", "r") as file: file.write(content)

CSV FILE HANDLING


The CSV format derives its name from "comma separated values" , as defined in the RFC 4180. These are plain

text files, intended for massive data storage. It is one of the simplest formats for data analysis. In fact, many non-

free (or free but more complex) file formats are often converted to CSV format to apply complex data science

with various languages.

A CSV file consists of a header that defines column names, and the following rows have the data corresponding

to each column, separated by a comma. However, many other symbols can be used as cell separators. Among

them, the tab and the semicolon are just as frequent as the comma.

SOME EXAMPLES OF CSV FILES


Weather data (separated by ;)

ID;DATA;VV;DV;T;HR;PPT;RS;P
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;

Scores obtained by players in a tournament (separated by ,) name,number,year

- 30 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Maria,858,1930
Jose,665,1930
Rosa,591,1930
Juan Carlos,522,1930
Antonio,509,1930
Maria Esther,495,1930
Maria Luisa,470,1930
Joan,453,1930
John,436,1930

Companies registered with the General Inspectorate of Justice of Argentina (separated by , and data in

quotation marks)

"correlative_number", "company_type", "company_type_description", "company_reason_of", "company_name",


"deregistration_code", "deregistration_detail".
"10", "10", "PARTNERSHIP", "A A VALLE Y COMPA¥IA", "S", "42014", "BELONGS TO REGISTER INACTIVE ENTITIES".
"11", "10", "PARTNERSHIP", "A LUCERO Y H CARATOLI", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".
"12", "10", "SOCIEDAD COLECTIVA", "A PUIG E HIJOS", "S", "42014", "PERTENECE A
REGISTRATION OF INACTIVE ENTITIES".
"13", "10", "GENERAL PARTNERSHIP", "A C I C A", "S", "42014", "BELONGS TO REGISTRY
INACTIVE ENTITIES".
"14", "10", "PARTNERSHIP", "A¥ON BEATRIZ S Y CIA", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".
"15", "10", "PARTNERSHIP", "ABA DIESEL", "S", "42014", "BELONGS TO REGISTRY.
INACTIVE ENTITIES".
"16", "10", "PARTNERSHIP", "ABADA L JOSE AND JORGE JOSE
ABADAL", "S", "42014", "BELONGS TO REGISTRY OF INACTIVE ENTITIES", "ABADAL", "S", "42014", "BELONGS TO
REGISTRY OF INACTIVE ENTITIES".
"17", "10", "PARTNERSHIP", "ABADAL JOSE E HIJO", "S", "42014", "BELONGS TO REGISTER OF INACTIVE
ENTITIES".
"18", "10", "SOCIEDAD COLECTIVA", "ABATE Y MACIAS", "S", "42014", "BELONGS TO
REGISTRATION OF INACTIVE ENTITIES".

It is also possible to find data stored in text files (TXT) with formats very similar to what you would expect to

find in a CSV. Sometimes it is possible to develop a formatting script to correct these files to work with a CSV.

Meteorological observations in TXT

DATE TMAX TMIN NAME


--------------------------------------------------------------------
07122017 28.0 19.0 AEROPARQUE AERO
07122017 26.8 12.4 AERO BLUE

- 31 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

07122017 29.6 7.8 BAHIA BLANCA AERO


07122017 22.7 6.7 BARILOCHE AERO
07122017 3.0 -8.5 BELGRANO BASE II
07122017 2.4 -0.2 CARLINI BASE (EX JUBANY)
07122017 3.9 -0.6 BASIS HOPE
07122017 0.7 -3.6 MARAMBIO BASE

WORKING WITH CSV FILES FROM PYTHON


Python provides its own module called csv, which facilitates the parsing of data from CSV files, both for reading

and writing.

This module is used in combination with the with structure and the open function to read or generate the file, and

the CSV module forparsing.

READING CSV FILES

Contents of .csv file

0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
8;2016-03-01 04:00:00;;;8.7;39;;;

from csv import reader

with open("file.csv", "r") as file: document = reader(file, delimiter=';', for row in document:
' '.join(row) quotechar='"')

Output:

'0 2016-03-01 00:00:00 9.9 73


'1 2016-03-01 00:30:00 9.0 67
'2 2016-03-01 01:00:00 8.3 64
'3 2016-03-01 01:30:00 8.0 61
'4 2016-03-01 02:00:00 7.4 62
'5 2016-03-01 02:30:00 8.3 47
'6 2016-03-01 03:00:00 7.7 50
'7 2016-03-01 03:30:00 9.0 39
'8 2016-03-01 04:00:00 8.7 39

- 32 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

When the CSV file has a header, it is necessary to skip the header:

Contents of .csv file

ID;DATA;VV;DV;T;HR;PPT;RS;P
0;2016-03-01 00:00:00;;;9.9;73;;;
1;2016-03-01 00:30:00;;;9.0;67;;;
2;2016-03-01 01:00:00;;;8.3;64;;;
3;2016-03-01 01:30:00;;;8.0;61;;;
4;2016-03-01 02:00:00;;;7.4;62;;;
5;2016-03-01 02:30:00;;;8.3;47;;;
6;2016-03-01 03:00:00;;;7.7;50;;;
7;2016-03-01 03:30:00;;;9.0;39;;;
8;2016-03-01 04:00:00;;;8.7;39;;;

from csv import reader

with open("file.csv", "r") as file: document = reader(file, delimiter=';', headers = next(document)


for row in document: quotechar='"')
' '.join(row)

Output:

VARIABLE MANIPULATION METHODS 5


STRING MANIPULATION 5
FORMATTING METHODS 5
CAPITALIZE THE FIRST LETTER 5
CONVERT A STRING TO LOWERCASE 5
CONVERT A STRING TO UPPERCASE 6
CONVERT UPPERCASE TO LOWERCASE AND VICE VERSA 6
CONVERT A STRING TO TITLE FORMAT 6
CENTER A TEXT 6
ALIGN TEXT TO THE LEFT 6
ALIGN TEXT TO THE RIGHT 7
FILL IN A TEXT BY PREFIXING IT WITH ZEROS 7
RESEARCH METHODS 7
COUNT NUMBER OF OCCURRENCES OF A SUBSTRING 7
SEARCH FOR A SUBSTRING WITHIN A STRING 7
VALIDATION METHODS 8
TO KNOW IF A STRING BEGINS WITH A GIVEN SUBSTRING 8
TO KNOW IF A STRING ENDS WITH A GIVEN SUBSTRING 8

- 33 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

TO KNOW IF A STRING IS ALPHANUMERIC 8


TO KNOW IF A STRING IS ALPHABETIC 8
TO KNOW IF A STRING IS NUMERIC 9
TO KNOW IF A STRING CONTAINS ONLY LOWERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY UPPERCASE LETTERS 9
TO KNOW IF A STRING CONTAINS ONLY BLANKS 10
TO KNOW IF A STRING HAS A TITLE FORMAT 10
SUBSTITUTION METHODS 10
FORMATTING A STRING, DYNAMICALLY SUBSTITUTING TEXT 10
REPLACE TEXT IN A STRING 11
REMOVE CHARACTERS TO THE LEFT AND RIGHT OF A STRING 11
REMOVE CHARACTERS TO THE LEFT OF A STRING 11
REMOVE CHARACTERS TO THE RIGHT OF A STRING 11
JOINING AND SPLITTING METHODS 11
ITERATIVELY JOIN A CHAIN 11
SPLITTING A STRING INTO THREE PARTS, USING A SEPARATOR 12
SPLITTING A STRING INTO SEVERAL PARTS, USING A SEPARATOR 12
SPLIT A STRING INTO LINES 12
MANIPULATION OF LISTS AND TUPLES 14
AGGREGATION METHODS 14
ADD AN ITEM TO THE END OF THE LIST 14
ADD SEVERAL ITEMS TO THE END OF THE LIST 14
ADD AN ELEMENT IN A GIVEN POSITION 14
ELIMINATION METHODS 14
DELETE THE LAST ITEM IN THE LIST 14
DELETE AN ELEMENT BY ITS INDEX 15
DELETE AN ITEM BY ITS VALUE 15
ORDER METHODS 15
SORT A LIST IN REVERSE (REVERSE ORDER) 15
SORT A LIST IN ASCENDING ORDER 15
SORT A LIST IN DESCENDING ORDER 15
RESEARCH METHODS 15
COUNT NUMBER OF OCCURRENCES ELEMENTS 15
GET INDEX NUMBER 16
ANNEX ON LISTS AND TUPLES 16

- 34 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

TYPE CONVERSION 16
CONCATENATION OF COLLECTIONS 17
MAXIMUM AND MINIMUM VALUE 20
COUNT ITEMS 20
DICTIONARY MANIPULATION 22
ELIMINATION METHODS 22
EMPTY A DICTIONARY 22
AGGREGATION AND CREATION METHODS 22
COPY A DICTIONARY 22
CREATE A NEW DICTIONARY FROM THE KEYS OF AN EXISTING DICTIONARY 23
SEQUENCE 23
CONCATENATE DICTIONARIES 23
SET A DEFAULT KEY AND VALUE 23
RETURN METHODS 24
GET THE VALUE OF A KEY 24
TO KNOW IF A KEY EXISTS IN THE DICTIONARY 24
OBTAIN THE KEYS AND VALUES OF A DICTIONARY 24
OBTAIN THE KEYS TO A DICTIONARY 24
GET THE VALUES OF A DICTIONARY 25
OBTAIN THE NUMBER OF ITEMS IN A DICTIONARY 25
FILE HANDLING AND MANIPULATION 27
WAYS TO OPEN A FILE 27
SOME METHODS OF THE FILE OBJECT 29
CSV FILE HANDLING 30
SOME EXAMPLES OF CSV FILES 30
WORKING WITH CSV FILES FROM PYTHON 32
READING CSV FILES 32
WRITING CSV FILES 37
PROBABILITY AND STATISTICS WITH PYTHON 40
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND EVENTS IN
PYTHON 40
SAMPLE SPACE 40
SIMPLE AND COMPOUND EVENTS 40
PROBABILITY ASSIGNMENT 41
SIMPLE MUTUALLY EXCLUSIVE EVENTS 41

- 35 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS 42


FUNCTIONS 43
CONDITIONAL PROBABILITY IN PYTHON 43
FUNCTIONS 44
DEPENDENT EVENTS 44
SET THEORY IN PYTHON 46
INDEPENDENT EVENTS 46
BAYES THEOREM IN PYTHON 47
BAYES' THEOREM AND PROBABILITY OF CAUSES 47
DATA: CASE STUDY 47
ANALYSIS 48
PROCEDURE 49
FUNCTIONS 54
COMPLEMENTARY BIBLIOGRAPHY 54
ANNEX I: COMPLEX CALCULATIONS 60
POPULATION AND SAMPLING STATISTICS: CALCULATION OF 60
VARIANCE AND STANDARD DEVIATION 60
SCALAR PRODUCT OF TWO VECTORS 61
RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY CALCULATIONS 61
ANNEX II: CREATION OF A MENU OF OPTIONS 63

Another way to read CSV files with headers is to use the DictReader object instead of the reader,

and thus access only the value of the desired columns by name:

from csv import DictReader

with open("file.csv", "r") as file: document = DictReader(file, delimiter=';', for row in document:
row['DATA']] quotechar='"')

- 36 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Output:

'2016-03-01 00:00:00'
'2016-03-01 00:30:00'
'2016-03-01 01:00:00'
'2016-03-01 01:30:00'
'2016-03-01 02:00:00'
'2016-03-01 02:30:00'
'2016-03-01 03:00:00'
'2016-03-01 03:30:00'
'2016-03-01 04:00:00'

WRITING CSV FILES

Writing a CSV without header:

from csv import writer with open("data.csv", "w") as file:


document = writer(file, delimiter=';', quotechar='"') document.writerows(array)

In the above example, an array could be a list of lists with equal number of elements. For example:
matrix = [

['John', 373, 1970],


['Ana', 124, 1983],
['Pedro', 901, 1650],
['Rosa', 300, 2000],
['Juana', 75, 1975],
]

This would generate a file named data.csv with the following content:

eugenia@bella:~$ cat datos.csv


John;373;1970
Ana;124;1983
Peter;901;1650
Rose;300;2000
Joan;75;1975

- 37 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Writing a CSV with header:

In this case, the matrix to be written will need to be a list of dictionaries whose keys match the

indicated headers.

matrix = [
dict(player='Juan', points=373, year=1970), dict(player='Ana', points=124, year=1983),
dict(player='Pedro', points=901, year=1650), dict(player='Rosa', points=300, year=2000), dict(player='Juana',
points=75, year=1975), ] from csv import DictWriter

headers = ['player', 'points', 'year'] with open("data.csv", "w") as file:


document = DictWriter(file, delimiter=';', fieldnames=headers) quotechar='"',
document.writeheader()
document.writerows(matrix)

Simple statistical functions

Simple statistical functions such as the following can be performed on lists and tuples obtained or

not from a CSV:

Counting elements len(collection)


Add elements sum(collection)
Get higher number max(collection)
Get smaller number min(collection)

- 38 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
PROBABILITY AND STATISTICS WITH PYTHON
PROBABILITY OF MUTUALLY EXCLUSIVE SIMPLE AND COMPOUND
EVENTS IN PYTHON
SAMPLE SPACE
A sample space is a set of possible events, such as those that could result from rolling a die:

E=(1,2,3,4,5,6)
sample_space = [1, 2, 3, 4, 5, 6].

Each element in a sample space is referred to as a sample point . The number of sample points is

denoted by n such that for sample spaceE=11,2,3,4,5,6/


, n=6 .

n = len(monthly_space)

SIMPLE AND COMPOUND EVENTS


An event is a set of outcomes within a sample space. For example:

• the rolling of a die is an event

• the probability that the number 5 comes out in this throw, is a simple event A = {5} and is
exclusive: if 5 comes out, no other number can simultaneously come out.

• the probability that an odd number is thrown, is the composite eventB=11,3,5}


which will depend in turn on the events of the
simple exclusive ^ = {1} , B2 = {2} and B3 = {3}

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

PROBABILITY ASSIGNMENT
Probability assignment is that which provides mathematical models to calculate the chances of

specific events occurring or not occurring.

The probability of an event is denoted by P( event)


.

The events can be:

• simple or compound
• mutually exclusive or independent

SIMPLE MUTUALLY EXCLUSIVE EVENTS


If we consider a sample space A , each of the k sample points will be denoted by Ak and the

probability of these, designated as P(Ak) , will be determined by:

P(A,) = -
n

probability = 1.0 / n

In Python, at least one element of the equation is required to be a real number if what is
required as a result is a real number.

The probability of each sample point, as mutually exclusive events, is the same for each event.

P^) = P(5) = P(4) = P(3) = P(2) = P(1)

- 41 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

EVENTS COMPOSED OF MUTUALLY EXCLUSIVE SIMPLE EVENTS


When the simple events that make up the composite event A are mutually exclusive, the probability

of the composite event will be given by the sum of the probabilities of each simple event P(Ak) ,

such that:

P(A) = P(A1)-P(A2)-.-P(Ak)
For example, to estimate the probability that a single throw of a die will produce an even number,

we obtain the event .4 = {2. 4,. 6}

given by the sum of

the probabilities of each of the simple eventsP(2)-P(3)-P(4,of the sample spaceE=11,2,3,4,5,6}


such that:

P(A) = P(2) - P(4) +P(6)


P(A) =1+1+1=8
P(A) - |

3
In the first result 6 ( in the second step, before finding the maximum common

1
divisor [DCM] and reduce the fraction to 2 ) , the denominator is equivalent to the number of single

events within the composite event "even numbers" and is denoted by h. The denominator, 6 , is n,

the total of all events in the sample space. Thus, the probability of an event composed A by

mutually exclusive events is given by the quotient of hyn such that:


p(a) = - n

pair_numbers = [i for i in sample_space if i % 2 is 0] h = len(pair_numbers) probability = float(h) / n

- 42 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

A composite event can be denoted by the union of its simple events (symbol u , read as "o"), such

that:

P(A, u A u ...A,) = P (A, ) + P (A2) + ...A,

For example, for the case of the event "even numbers", it is obtained that:

P(2U4U 6) = P(2)+P(4)+P(6)
P(2U4U6) = 1++1+1 =8
P(2 U 4 U 6) = | |

Such that P(2U4U 6)


is an event and P(2) , P( 4) and P(6) are the probabilities of
the three events that compose it. In a new context, U 4 U 6)
can be
treated as an event A.

FUNCTIONS
# Probability of mutually exclusive simple events pssme = lambda e: 1.0 / len(e) # Probability of mutually
exclusive compound events def pscme(e, sc):
n = len(e)
return len(sc) / float(n)

CONDITIONAL PROBABILITY IN PYTHON


B= {2,4,6}
c. Probability of B: P(B) =$=6=2

d. Probability of intersection:

P(ARB) = P(A)P(B)
P(A n B) = 1 I
P(A n B) = I

e = sample_space = [1, 2, 3, 4, 5, 6] n = len(e) # total of the sample

# probability of A
a = [i for i in e if i % 2 is not 0] pa = len(a) / float(n)

- 43 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

# probability of B
b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0] b = [i for i in e if i % 2 is 0
pb = len(b) / float(n)

# probability of the intersection of events pi = pa * pb

FUNCTIONS
# Conditional probability: dependent events def pscd(e, a, b):
i = list(set(a).intersection(b))
pi = pscme(e, i)
pa = pscme(e, a)
return pi / pa

# Conditional probability: independent events def psci(e, a, b):


pa = pscme(e, a)
pb = pscme(e, b)
return pa * pb

DEPENDENT EVENTS
Refers to the probability of two events occurring simultaneously where the second event depends on

the occurrence of the first.

The probability of B occurring if A occurs is denoted by P(BA) and is read as "the probability of B

given A", such that:

P(BI^ _1
■ P(A)

Where PA n B)
is the probability of the intersection of the events of AandB

- defined as: P{A n B) = P(A)P(B-A,


-such that the intersection is a new event

composed of simple events. In the following example, it would equal 11,3} (because 1 and 3 are in

both A and B ).

Example: what is theprobability of rolling a die with an odd number less than 4?

- 44 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

The throwing of the die is an event in itself. We wish to find the probability of B = {1.2.3} (number
less than 4) given that A=(1,3,5 (odd number) occurred in the sample space E = {1.2. 3. 4,5,6} .

sample_space = [1, 2, 3, 4, 5, 6].


a = [i for i in monthly_space if i % 2 is not 0].
b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4] b = [i for i in sample_space if i < 4]

To calculate the probability of an intersection, first the intersection is obtained:

An ={1,3}

intersec = [i for i in a if i in b]

And then, the probability of the new composite event is calculated:

112 1
P(AnB)=P(1)+P(3)=+=é=, b0O3

or, in other words:

poand_1-2_1
n6 3

It is also necessary to obtain the probability of A , taking into account that


is also a compound event:

p_I_2_1
Finally, it is obtained that:

P(B|A) = P427
P(B|.A) = 1/2
P(B-A) =5=0.6

e = sample_space = [1, 2, 3, 4, 5, 6].

a = [i for i in e if i % 2 is not 0] # odd numbers


b = [i for i in e if i < 4] b = [i for i in e if i < 4] b = [i for i in e if i < 4] # numbers less than 4

- 45 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

intersec = [i for i in a if i in b] # intersection of A and B

n = len(e) # total sample


ha = len(a) # total number of single events in A
hintersec = len(intersec) # total number of single events at the intersection

# probability of intersection
probability_intersec = float(hintersec) / n

# probability of 'a
probability_a = float(ha) / n

# conditional probability
probability_b_given_a = probability_intersec / probability_a

SET THEORY IN PYTHON


When obtaining the intersection of two compound events, a manual method has been used by

saying: return 'i' for each 'i' in list 'a' if it is in list 'b'.

However, since each compound event is a set and Python provides a data type called set, it is

possible to obtain the intersection by manipulating compound events as Python sets. With set you

can convert any iterable to a set and perform set operations such as union and intersection when

necessary. intersec = list(set(a).intersection(b))

Here the set obtained is converted into a list in order to be consistent with the rest of the code and to

ensure that the resulting element supports the usual operations and processing of a list. When in

doubt as to whether to use lists or sets, the principle of simplicity should be applied and the simplest

solution should be implemented.

INDEPENDENT EVENTS

Unlike the previous case, here the probability of occurrence of B is not affected by the occurrence of

A . For example, the probability of rolling a die

- 46 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

and obtain an even number (event B) is not affected by the fact that an odd number was obtained in
a previous throw (event A). The probability of B is independent of A and is given by the product of
the probability of both events:
P(AnB) = P(A)P(B)

Here the intersection is the probability of the confluence of both events.

Once the probability of both independent events is calculated, they are multiplied obtaining:

a. Sample space (for both events):

E = {1,2,3,4,5,0}

b. Probability of A:

.4= {1,3,5}
P(A) = h = 2 = 1

BAYES THEOREM IN PYTHON

BAYES' THEOREM AND PROBABILITY OF CAUSES


Given a series of events Ak whose sum total is a sample space E and any event B, Bayes' Theorem

allows us to know the probability that each event Ak of E is the cause of B. For this reason, it is also

known as probability of causes.

DATA: CASE STUDY


Given a city of 50,000 inhabitants, with the following distribution:

- 47 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Girls Children Women Men


11000 9000 16000 14000

And, a report of 9,000 cases of influenza, distributed as follows:

Girls Boys Women Men


2000 1500 3000 2500

The aim is to obtain the probability that the cause of contracting influenza is the fact of belonging to

a certain demographic sector (for example, the demographic sector made up of boys or girls).

ANALYSIS
From what has been stated above, it follows that:

• The city (absolute total inhabitants) is the sample space E.


• The number of girls, boys, women and men is each of the events Ak of the sample space E

• The value of n is taken as the sum of the sample space 2 Aa , such that
n = 50000

• The value of h for the events Ak is each of the values given in the population distribution
table.

• Having the flu is event B.


• The distribution table of influenza cases corresponds to the intersections of event B with
each event Ak , i.e. each Akn B

Depending on the probability calculation applied, the following can be obtained:

- 48 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

• The probability of being a girl, boy, woman or man in the city, by P(Ak) . It is considered
an a priori probability.

• The probability of being a girl, boy, woman or man and having influenza, which is obtained
with P(Ak B) and is considered a conditional probability.

• The probability that any inhabitant, regardless of the sector to which he or she belongs, will
have the flu is obtained with

n
P(B)=>P(A)P(BA,)
k=1 and is considered a total probability.

• The probability that someone with influenza is a girl, boy, woman or man is obtained with
Bayes' Theorem. This probability is considered an a posteriori probability, allowing us to

answer questions such as: Whatis the probability that a new case of influenza will be in a

child?

An efficient and orderly way to obtain an a posteriori probability with Bayes' Theorem is to first

obtain the three prior probabilities: a priori, conditional and total.

NOTICE:
In the following, map(float, <list>) will be used in the source code to convert the elements of
a list into real numbers, as long as doing so does not overload the code.

PROCEDURE
1. A priori probability calculation

Returns: probability that an inhabitant belongs to a specific demographic sector.

- 49 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

Formula:

Data required:

hk = data from the population distribution table

n = always the total amount of the sample space (50 000)

Results:
m) = = 0.22
50000 probability of begirl

9000
P(A2) = 50000
= 0.18
probability of bechild
16000
= 0.32
50000
probability of bewoman
14000 = 0.28
50000 probability of beman

Python code:

inhabitants = map(float, [11000, 9000, 16000, 14000])


n = sum(inhabitants)
pa = [h / n for h in inhabitants].

2. Conditional probability

Returns: probability of having flu while belonging to a demographic sector.


specific.

Certainty: Ak (demographic sector)

Objective: B (the probability of having the flu)

- 50 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

P(AknB)
P(BAk)= P(A,)
Formula:
Data required:

h=B,
P(Ak n B) =
h = intersections (data from the table of distribution of influenza cases)

Results:

. .....
P(BA1)= WOOL1
0 18 0.22 ' probability of having the flu as a child

IWO
P(BI A) = 50000 = 0.16
0.18
probability of getting the flu as a child
3000
P(BI A3) = woou __0 19
0.32 '
probability of having the flu as a woman
2500
P(BI A) = WOOL __0 18
0.28 ' probability of getting the flu as a man

Python code:

affected = map(float, [2000, 1500, 3000, 2500]) pi = [k / n for k in affected].


pba = [pi[i] / pa[i] for i in range(len(pi))].

3. Total probability

Returns: probability that any of the inhabitants, regardless of the demographic sector to which they

belong, may have influenza.

- 51 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

n
P(B) = >P(A,) P(BA,)
Formula: k=1

Data required:

a priori probability
conditional probability

Results:

P(B) = PA ,) PBA)) PBA)) (])+() PB])

P(B") = 0.22 ■ 0.18-0.18 ■ 0.16+0.32 ■ 0.19 - 0.28 ■ 0.18

PiB] = 0.04 - 0.03 - 0.06 - 0.05

P(B) = 0.18

Python code:

products = [pa[i] * pba[i] for i in range(len(pa))] pb = sum(products)

Remarks:

(a) note that in the above output there will be a difference of .01 with respect to the manual

solution. This is due to the rounding performed in the manual solution. This difference can be

eradicated by using 3 decimal places in the conditional probability values (instead of two) in the

manual solution.

- 52 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

(b) the probability of NOT having the flu will be given by 1 - P(B'l such that
1 -0.18 = 0.82 but it will not be necessary to use it for this example with the
Bayes theorem.

4. A posteriori probability

Returns: probability of belonging to a specific demographic sector and having the flu.

Certainty: B (have flu)

Objective: Ak (the probability of belonging to a specific demographic sector).

_ . 2P(A,)P(BA+)
Formula: k=1

Data required:

PAk) P(BAk,
= the product obtained in each of the terms of total probability

2P(AL)P(B|A)
k=1 = the total probability

Results:

= 0.22
probability of being a girl having the flu

0.03
P(A2B)= 0.18
= 0.16
probability of being a child having the flu

- 53 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

0.06 —
P(A3B)= = 0.33
0.18 probability of being a woman having the flu

0.05 -
P(A4B)= = 0.27
0.18 probability of being a man having the flu

Python code:

pab = [p / pb for p in products].

FUNCTIONS
# Bayes' Theorem
def bayes(e, b):
n = float(sum(e))
pa = [h / n for h in e].
pi = [k / n for k in b].
pba = [pi[i] / pa[i] for i in range(len(pi))].
prods = [pa[i] * pba[i] for i in range(len(pa))]]
ptb = sum(prods)
pab = [p / pb for p in prods].
return pab

COMPLEMENTARY BIBLIOGRAPHY
[0] Probability and Statistics, Murray Spiegel. McGraw-Hill, Mexico 1988. ISBN: 968-451-102-7

- 54 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
ANNEX I: COMPLEX CALCULATIONS
POPULATION AND SAMPLING STATISTICS: CALCULATION OF
VARIANCE AND STANDARD DEVIATION
from math import sqrt

samples = [12, 23, 24, 24, 22, 10, 17] # sample list

n = len(samples)
average = sum(samples) / float(n)

Media

Population variance
2 _ H(,2)2
" n

Sample variance

Sample standard deviation

Population standard deviation

differences = [xi - mean for xi in samples].


powers = [x ** 2 for x in differences].
summation = sum(powers)

monthly_variance = summation / (n - 1) population_variance = summation / n

monthly_deviation = sqrt(monthly_variance) population_deviation = sqrt(population_variance)

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

SCALAR PRODUCT OF TWO VECTORS


vector1 = [3, 0]
vvector2 = [4, 3]
pe = sum([x * y for x, y in zip(vector1, vector2)])

RELATIVE, ABSOLUTE AND CUMULATIVE FREQUENCY


CALCULATIONS
# ABSOLUTE FREQUENCY
# Number of times a value appears in a sample

samples = [1, 2, 3, 4, 4, 3, 2, 6, 7, 3, 3, 3, 1, 8, 5, 9] absolute = []


frequencies = []

for n in samples:
if not n in absolutes:
absolute.append(n)
fi = samples.count(n)
frequencies.append(fi)

N = sum(frequencies) # == len(samples)

# RELATIVE FREQUENCY
# Quotient between absolute frequency and relative N = [float(fi) / N for fi in frequencies] sumarelative =
round(sum(relative)) # == 1

# CUMULATIVE FREQUENCY
# Sum of all frequencies less than or equal to the absolute frequency frequencies.sort()
cumulative = [sum(frequencies[:i+1]) for i, fi in enumerate(frequencies)]]

# CUMULATIVE RELATIVE FREQUENCY


# Ratio between accumulated frequency and total amount of accumulated data = [float(f) / N for f in
accumulated].

- 61 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
ANNEX II: CREATION OF A MENU OF OPTIONS
In scripting, it can be useful to give the user a menu of options and have the script act according to

the option chosen by the user. Here is a trick to solve this in a simple and ingenious way.

1) First, the entire script needs to be organized into functions.

2) Secondly, it is necessary that all functions have their corresponding documentation, defining

what exactly the function does:

def read_file():
"Read CSV file"""""
return "read"

def write_file():
"""Write CSV file"""
return "write"

def _sum_numbers(list):
"""Add the numbers in a list""" return "private""

3) Next, a list is defined with the name of all the functions that will be accessible by the user from

the menu:

functions = ['read_file', ' write_file']]

The trick is to automate both the generation of the menu and the function call.

To automate menu generation, the trick is to use:

▪ The list in step 3

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
Introduction to Data Science with Python - Escuela de Informática Eugenia Bahit

▪ The locals() function.

▪ The doc attribute

number = 1 # will then be used to access the function


menu = "Choose an option".

for function in functions:


menu += "\t{}. {}".format(number, locals()[function].__doc__) number = number + 1 # increments the
number in each iteration

echo(menu)
option = int(get("Your option: "))
# echo and get: hacks learned in the introductory course

Finally, to dynamically access the function chosen by the user, the trick is to use the option chosen

by the user, as an index to access the function name from the list, and again resort to locals to invoke

the function:

function = functions[option - 1] locals() # you get the name of the function


[function]() # the function is invoked by locals()

- 64 -

(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution


4.0
(C) 2011 - 2018 Eugenia Bahit. Creative Commons Attribution 4.0
CERTIFICATE

Show how much you have learned!

If you reached the end of the course you can obtain a triple certification:

- Certificate of attendance (issued by Eugenia Bahit School )


- Certificate of Achievement (issued by CLA Linux)
- Approval certification (issued by LAECI)

Check with your teacher or visit the certification website at http://python.eugeniabahit.org.

If you need to prepare for your exam, you can register in the
Data Science with Python course at
Escuela de Informática Eugenia Bahit
www.eugeniabahit.com www.eugeniabahit.com

You might also like