Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Comp 1012: Computer Programming for

Scientists and Engineers


Week 3

Copyright © University of Manitoba, 2019 1


This Week
▪ Reading Files
▪ Loops
▪ Application to CSV files
▪ Lists

Copyright © University of Manitoba, 2019 2


Inputs
▪ Input is not limited to the keyboard
▪ We can write a program to read the contents of a file
▪ What kinds of information are on a computer?
▪ Videos
▪ Images
▪ Text
▪ In this lecture, we will only consider reading text files

Copyright © University of Manitoba, 2019 3


Text files
▪ A text file is simply a file that contains textual data made up of either
ASCII or Unicode characters.
▪ ASCII and Unicode
▪ If you are creating a text file, you should use .txt extension on the
filename. For example, ‘temperature.txt’
▪ It’s often the case that a program needs to read data from a file and do
some processing on the data

Copyright © University of Manitoba, 2019 4


Opening a file

infile = open(fileName, 'r’)

▪ The open() function is used to read a file


▪ a file name is a string; it must be given
▪ the second item in the brackets (also string) is optional
▪ it is called a mode; the default mode is 'rt’
▪ We can open a file for reading or writing
▪ Open() returns a stream object for reading the file

Copyright © University of Manitoba, 2019 5


Reading data

▪ Several ways to read Example 1:


my_document = open("filename.txt")
contents (using
different functions) # read **everything** into a
string

entire_document =
my_document.read()

# OR read *one line*


one_line = my_document.readline()
Copyright © University of Manitoba, 2019 6
Closing the file
▪ When you are finished with a file, you should close the file.
my_document.close()
▪ Terminates your connection to the external file.

Copyright © University of Manitoba, 2019 7


Activity 1
▪ We have a file named ‘temperature.txt’
▪ We will first read the entire file
▪ And we will print the contents of the file

Copyright © University of Manitoba, 2019 8


Encoding of a file
▪ The text we see are numeric values
▪ Computer translates those numeric values to visible characters
▪ Encoding standards assign each text character to a numeric value
▪ Some encodings:
▪ latin-1
▪ iso-8859-1
▪ utf-8

Copyright © University of Manitoba, 2019 9


Encoding in open()
▪ There is an optional parameter in open() function named:
encoding
open(filename, encoding = ‘UTF8’)
▪ The default value of encoding is platform dependent
▪ If correct encoding is not provided when reading the file,
UnicodeDecodeError is thrown

Copyright © University of Manitoba, 2019 10


Activity 2: Reading a book

▪ Go to: https://www.gutenberg.org/
▪ Choose any book from there
▪ Download the chosen book in plain text UTF-8 format
▪ Read the book with your code

Copyright © University of Manitoba, 2019 11


Activity 3:

Based on the Activity 2, answer few questions regarding:


1. How many characters are in your book?
2. What’s the nth letter of your book?

Copyright © University of Manitoba, 2019 12


The readline() function

▪ read() is not very useful


▪ readline() lets us read one line at a time
▪ We can call it repeatedly
▪ Every time we call readline(), we get the next line
▪ Python remembers where we are in the file

Let’s calculate something: what’s the average line length?

Copyright © University of Manitoba, 2019 13


“Start writing readline() like this cool”
Photo source: https://pixabay.com/photos/monkey-laptop-computer-technology-4042658/

Copyright © University of Manitoba, 2019 14


We may end up being like this,
but readline() writing
continues
Photo source: https://pixabay.com/photos/computer-
addiction-help-1106900/

Copyright © University of Manitoba, 2019 15


We need something better!

▪ One line at a time is too slow

▪ Reading the entire file is going to take for ever

and too much coding

Let’s learn how to read all lines in a file at once

Copyright © University of Manitoba, 2019 16


Loops

▪ A loop is a repetition of a series of statements or instructions


▪ Loops are a very important concept in computer
programming and is available in most programming language

▪ Loops use different values from a sequence each time

Copyright © University of Manitoba, 2019 17


Loops

▪ New keywords: for in Example 2:


▪ my_document is a file my_document = open("file.txt")
lengths = 0
▪ for implicitly call
readline()(repeatedly) for line in my_document:
# <-- Note: no readline()!
▪ Each time it calls
readline(), it executes print(line)
all the statements in the lengths = lengths + len(line)
loop body print(lengths)

Copyright © University of Manitoba, 2019 18


Loops

▪ The indented part is Example 2:


considered as the loop my_document = open("file.txt")
body lengths = 0
▪ The first line after the loop for line in my_document:
that is not indented, # <-- Note: no readline()!
indicates the end of the print(line)
loop lengths = lengths + len(line)
print(lengths)

Copyright © University of Manitoba, 2019 19


CSV
▪ A very common format for storing tabular data is the comma-separated
value format
▪ For example, in order to work with spreadsheet data from Microsoft
Excel, you could convert it to CSV format first
▪ Data is recorded in a text format
▪ Recorded in rows and columns (a spreadsheet!)
▪ Columns → variables being observed
▪ Rows → actual observations
▪ Values in rows are separated by commas “,”

Copyright © University of Manitoba, 2019 20


An Example of a CSV file
ID,Name,Level,HP
7,Squirtle,16,20
8,Wartortle,23,59
19,Rattata,2,18
74,Geodude,24,78
Reading a csv file
▪ Mostly the same as reading files
▪ Open the .csv file
▪ readline() the first line (the header)
▪ Process the rest of the data in a loop
▪ Let’s try it out with our data
▪ Just change the filename of Activity 1

Copyright © University of Manitoba, 2019 22


Using the data
▪ We’ve got the lines
▪ We can print them
▪ We want the data between the commas
76,Golem,45,100

Copyright © University of Manitoba, 2019 23


Reading multiple values on a line

▪ Data files may have multiple Example 3:


values on a line for line in infile:
▪ usually separated by count += 1
commas week, mx, mean, mn =
line.split(',')
▪ called comma-separated max = float(mx)
values (.csv) files min = float(mn)
▪ We have to separate the values mean = float(mean)
on each line infile.close()
▪ Use the split() function
Copyright © University of Manitoba, 2019 24
Activity 4: Finding average of maximum and
minimum temperatures
Given the temperature.csv file,
▪ Read the .csv file
▪ Find the average of maximum and minimum temperatures
▪ Print the average

Copyright © University of Manitoba, 2019 25


Reading multiple values on a line
▪ What will happen if we don’t know the number of columns?
▪ What will happen if there are thousands of columns in the file?
▪ Should we create thousands of variables?

Copyright © University of Manitoba, 2019 26


The split() function
▪ A separator is a symbol that Example 4:
separates records my_document =
▪ A split() function takes a open("employee-small.csv")
separator as a parameter headers =
my_document.readline()
▪ A .CSV file separator is ,
for job in my_document:
▪ Returns a list(?) of words
between #print(job)
job_details =
job.split(",")
Lists
▪ A list is a sequence of values
▪ A variable referring to a list has 0 or more values
▪ We can refer to an entire list by using the name of the list
▪ We can refer to single items by using the name with an index
▪ We can refer to sub-sequences of the items in a list by using the name
with a slice
▪ Lists can hold any kind of data items

You might also like