Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Introduction to Statistical Programming

in Python

Lab 5 - Solutions

Task 1. (a) Consider the following piece of code:


class Example:
class_attribute=3
def __init__(self,val):
self.instance_attribute=val
d=4
def some_method(self):
self.c=3
print(Example.class_attribute)
print(Example.instance_attribute)
Explain why Example.class attribute returns 3 but Example.instance attribute returns an error.
(b) What is the problem in the following example?
exx=Example(3)
print(exx.d)
(c) Why does the following lead to an error?
exx=Example(3)
print(exx.c)

Answer

(a) This is a key difference between class and instance attributes. class attribute is an attribute of the
class Example that will be shared by all instances of this class. This is defined outside of init .
On the other hand, instance attribute is an element that belongs to an individual instance of the class.
For this reason, this attribute is only assigned at the time an instance of the class is created (which is when
init is evaluated). So in the following example:
x=Example(3)

x.class attribute and y.class attribute will return the same value while
y=Example(5)

x.instance attribute and y.instance attribute will not.

(b) While d is assigned a value of 4 inside of init , because we do not use self.d=4 this is not an
attribute of the class (remember that self represents the current instance of the class). For this reason,
exx.d returns an error.

(c) The attribute c is not assigned until the method some method is evaluated. This would be done using
exx.some method().

Task 2. (a) In last week’s lab, we considered a class that could be used to represent the linear model f (x) = wx + b. This was
done using the following:
class LinearModel:
def __init__(self,w,b):
self.w=w
self.b=b
def predict(self,x):
return self.w*x+self.b

1
The following code attempts to update this class to allow for the computation of the squared error (y − f (x))2 .
However, due to bug(s) in the code the error squared method does not work. Correct the code so that the
squared error in the model prediction can be calculated.
class LinearModel:
def __init__(self,w,b):
self.w=w
self.b=b
def predict(self,x):
return self.w*x+self.b
def error_squared(self,x,y):
return (self.y-predict(x))**2
If implemented correctly, the output of the following code should be 1.44:
mdl=LinearModel(0.5,0.3)
mdl.error_squared(1.,2.)
(b) Make sure your solution to (a) is correct before attempting this part of the question.
In this question, we are going to consider the use of inheritance in Python classes.
Suppose we now wish to implement a model of the form:

f (x) = wx2 + b

Instead of copying and pasting code from the original LinearModel class, we can use inheritance to inherit at-
tributes and methods from the parent class. Implement a class LinearModel2, with parent class LinearModel, that
allows us to predict from the updated model f (x) = wx2 +b and calculate the squared error. If implemented correctly,
the output of the following should be 139.24:
mdl2=LinearModel2(0.5,0.3)
mdl2.error_squared(5.,1.)

Answer

(a) The issue here is that predict is a method of the LinearModel class. However, in order to use this function,
we require self to specify that this is part of the class as follows:
class LinearModel:
def __init__(self,w,b):
self.w=w
self.b=b
def predict(self,x):
return self.w*x+self.b
def error_squared(self,x,y):

(b)
return (y-self.predict(x))**2

class LinearModel2(LinearModel):
def __init__(self,w,b):
super().__init__(w,b)
def predict(self,x):
return self.w*x**2+self.b

mdl2=LinearModel2(0.5,0.3)
mdl2.error_squared(5.,1.)

Task 3. Assume the store class defined in the previous lab describes a general type of stores which only sell books and comics.
Now assume a more specialised type of store, which on top of books and comics also sells magazines. Define the
subclass SpecialisedStore which inherits from class Store and uses the super method in init to make
use of the already defined init of Store.

Answer

class SpecialisedStore(Store):
def __init__(self, books, comics, magazines):
super().__init__(books, comics)
self.magazines = magazines

2
def buy_magazines(self, quantity):
self.magazines = max(0, self.magazines - quantity)

Task 4. The item class defined in the previous lab was a very general definition. Assume that we wish to define a more specific
category of items, the tools. Define the tool class which inherits all the behaviour of an item. A tool should have a
level property (1, 2 or 3) which should be specified in the init method. It should also implement a function called
fair trade which should accept as an argument another tool object and return True if the trade is fair and False
otherwise. A trade is fair only if both items are of the same level and their difference in value is not greater than one.

Answer

class Tool(Item):
def __init__(self, value, deterioration_percentage, level):
super().__init__(value, deterioration_percentage)
self.level = level
def fair_trade(self, other_tool):
return (self.level == other_tool.level) and \
(abs(other_tool.value - self.value) <= 1)

Task 5. Write a class Pizza which takes a list of toppings and a diameter as arguments. The class should implement a
repr or str method as well as a method called is vegetarian which should return False if one topping
is ”ham”, ”salami”, ”sausage” or ”meat”.
Write a subclass PizzaMargherita which has the ingredients set to ”tomato” and ”cheese”.

Answer

We can use the following Python code.


class Pizza:
def __init__(self, toppings, diameter):
self.toppings = toppings
self.diameter = diameter

def __str__(self):
if len(self.toppings)==0:
toppings = "no toppings"
else:
toppings = " and ".join(self.toppings[-2:])
if len(self.toppings)>2:
toppings = "{}, {}".format(", ".join(self.toppings[:-2]), toppings)
return "A pizza (diameter {}cm) with {}.".format(self.diameter, toppings)

def __repr__(self):
return "Pizza({}, {})".format(self.toppings.__repr__(), self.diameter)

def is_vegetarian(self):
return len(set(self.toppings) & set(["ham", "salami", "sausage", "meat"]))==0

class PizzaMargherita(Pizza):
def __init__(self, diameter):
super().__init__(["tomato", "cheese"], diameter)

In the repr we could have just used ‘self.toppings‘ instead of self.toppings. repr () as Python
would call the repr for us to turn the object into a string.
We can now test the classes.
print(PizzaMargherita(24).is_vegetarian())

Given that the PizzaMargherita class differs from the Pizza class only in how it is set up and
given that all the remaining functionality is the same, we could have considered creating a class method
prepare margherita instead of a subclass.
class Pizza:
def __init__(self, toppings, diameter):
self.toppings = toppings
self.diameter = diameter

@classmethod

3
def prepare_margherita(cls, diameter):
return cls(["tomato", "cheese"], diameter)

def __str__(self):
if len(self.toppings)==0:
toppings = "no toppings"
else:
toppings = " and ".join(self.toppings[-2:])
if len(self.toppings)>2:
toppings = "{}, {}".format(", ".join(self.toppings[:-2]), toppings)
return "A pizza (diameter {}cm) with {}.".format(self.diameter, toppings)

def __repr__(self):
return "Pizza({})".format(self.toppings.__repr__())

def is_vegetarian(self):
return len(set(self.toppings) & set(["ham", "salami", "sausage", "meat"]))==0

print(Pizza.prepare_margherita(24).is_vegetarian())

Task 6. Define a class Triangle that takes the three side lengths, a, b and c as arguments. When creating the triangle
check that the side lengths provided are viable, i.e. they are not negative and the longest one is less than the sum of
the two shorter ones. Raise a value error if this is not the case.
The class should also provide a method perimeterp , which calculates the perimeter a + b + c, and a method area,
which calculates the area using Heron’s formula, p × (p − a) × (p − b) × (p − c), where p = a+b+c .
Use your class Triangle to efficiently implement a class EquilateralTriangle, which can be set up using
2

only a single side length.

Answer

You can use the following Python code.


import math

class Triangle:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
self.check()

def check(self):
[a, b, c] = sorted([ self.a, self.b, self.c ])
if a<0:
raise ValueError("Negative side lengths not allowed.")
if a+b<c:
raise ValueError("Longest side too long.")

def perimeter(self):
return self.a + self.b + self.c

def area(self):
# Heron formula
p = (self.a + self.b + self.c) / 2
return math.sqrt(p*(p-self.a)*(p-self.b)*(p-self.c))

class EquilateralTriangle(Triangle):
def __init__(self, a):
super().__init__(a, a, a)

We can now test the classes.


t = Triangle(3, 4, 5)
print(t.perimeter())
print(t.area())
t = EquilateralTriangle(3)
print(t.perimeter())

Given that an equilateral triangle behaves just like a normal triangle, we might not need to create a separate
class, but just provide a new class method in the class Triangle, which creates an equilateral triangle.
class Triangle:

4
...
@classmethod
def create_equilateral_triangle(cls, a):
return cls(a, a, a)
...

We could then use


t = Triangle.create_equilateral_triangle(3)
print(t.area())

Alternatively we could modify the init method for the class Triangle to accept a variable number of
arguments and create equilateral (and isosceles) triangles that way
class Triangle:
def __init__(self, *args):
if len(args)==0 or len(args)>3:
raise TypeError(
"You need to provide between one and tree side lengths as arguments."
)
if len(args)==3: # Arbitrary triangle
[self.a, self.b, self.c] = args
if len(args)==2: # Isosceles triangle (two sides have same length)
[self.a, self.c] = args
self.b = self.a
if len(args)==1: # Equilateral triangle
self.a = args[0]
self.b = args[0]
self.c = args[0]
self.check()
...
class Triangle:
def __init__(self, *args):
if len(args)==0 or len(args)>3:
raise TypeError(
"You need to provide between one and tree side lengths as arguments."
)
if len(args)==3: # Arbitrary triangle
[self.a, self.b, self.c] = args
if len(args)==2: # Isosceles triangle (two sides have same length)
[self.a, self.c] = args
self.b = self.a
if len(args)==1: # Equilateral triangle
self.a = args[0]
self.b = args[0]
self.c = args[0]
self.check()

def check(self):
[a, b, c] = sorted([ self.a, self.b, self.c ])
if a<0:
raise ValueError("Negative side lengths not allowed.")
if a+b<c:
raise ValueError("Longest side too long.")

def perimeter(self):
return self.a + self.b + self.c

def area(self):
# Heron formula
p = (self.a + self.b + self.c) / 2
return math.sqrt(p*(p-self.a)*(p-self.b)*(p-self.c))

We could then use


t = Triangle(3, 4, 5)
print(t.perimeter())
print(t.area())
t = Triangle(3)
print(t.perimeter())

Task 7. Write a class Quotient which represents quotients of the form a


by storing the numerator and the denominator of
a quotient.
b

5
Implement for a quotient a
...
• a method neg which returns the negative of the quotient, b ,
b
−a

• a method reciprocal which returns the reciprocal of the quotient, ab ,

• a method add which takes another quotient ( dc ) as argument and which returns their sum bd .
ad+bc

• a method sub which takes another quotient and subtracts it by exploiting that a c a
+ − dc .

− =
• a method mul ‘ which takes another quotient ( dc ) as argument and which returns their product bd .
b d b
ac

• a method truediv which takes another quotient and returns their ratio by exploiting that ab / dc = a
b × 1
c .
d

Also implement methods str and/or repr .


You can then calculate say the sum 1
2 + 1
3 using
Quotient(1,2).__add__(Quotient(1,3))
The reason why I chose the funny looking method names we have used above is that Python matches operators to
them and lets us also write
Quotient(1,2) + Quotient(1,3)
Optionally, also implement a method simplify which simplifies the quotient by dividing the numerator and the
denominator by their greatest common denominator, which can be found using the function ‘math.gcd.

Answer

We can use the following Python code.


import math

class Quotient:
def __init__(self, numerator, denominator):
self.numerator = numerator
self.denominator = denominator
self.simplify()

def simplify(self):
gcd = math.gcd(self.numerator, self.denominator)
self.numerator //= gcd
self.denominator //= gcd
return self

def __neg__(self):
return Quotient(-self.numerator, self.denominator)

def reciprocal(self):
return Quotient(self.denominator, self.numerator)

def __add__(self, other):


return Quotient(self.numerator * other.denominator +\
other.numerator * self.denominator, self.denominator * other.denominator).simplify()

def __sub__(self, other):


return Quotient.__add__(other.__neg__())

def __mul__(self, other):


return Quotient(self.numerator * other.numerator, self.denominator * other.denominator)

def __truediv__(self, other):


return self.__mul__(other.reciprocal())

def __str__(self):
return "{} / {}".format(self.numerator, self.denominator)

def __repr__(self):
return "Quotient({},{})".format(self.numerator, self.denominator)

The class as it currently is does not work if we ask it to compute


print(Quotient(1,2) + Quotient(1,3))

or
Quotient(1,2) + 1

6
Can you address this?
1 + Quotient(1,2)

*Hint*: Check the type of other and look up what the method radd does.
If you have a working implementation, please post it in the forum.

Task 8. In many mathematical and statistical models, parameters are subject to some sort of constraint. Such a constraint can
be thought of in the forward mode, where we transform from unconstrained to constrained space or in the inverse
mode, where we transform from constrained space to unconstrained space. We could represent such a transformation
in the following function:
def transform(x, inv=False):
if not inv:
return forward_mapping(x)
else:
return inverse_mapping(x)
A common example is a positivity constraint, which could be implemented using the following:
from math import exp,log

class PowerTransform:
def __init__(self,power):
self.power=power
def forward(self,x):
if self.power==0:
return exp(x)
else:
return x**power
def inverse(self,x):
if self.power==0:
return log(x)
else:
x**(1/power)

class ExpTransform(PowerTransform):
def __init__(self):
super().__init__(0)
(a) For this first question, write a brief description of the above code for the Exp transformation. In particular, discuss
the use of inheritance in the code.
(b) While our ultimate interest for model prediction is the constrained value of the parameter, it is often advised that
we perform parameter estimation in the unconstrained space. Create a class Transformed Parameter that could
be used to represent such a parameter in a model. The constructor of this class should look as follows:
class TransformedParameter:
def __init__(self,value,name,transform):
self.name=name
self.transform=transform
self.unconstrained_value=self.transform.inverse(value)
Add a property value with appropriate get value and set value methods so that the following will output the
constrained value of the parameter:
y=TransformedParameter(1.,'my_parameter',ExpTransform())
print(y.value)
which is 1. In addition, the output of the following:
y=TransformedParameter(1.,'my_parameter',ExpTransform())
y.unconstrained_value=2
print(y.value)
should be 7.3891 (this is the value rounded).

7
Answer

(a) The class PowerTransform provides a general transform of the form xp where p is specified by
power. If power is equal to 0, then the PowerTransform implements an exponential function and inverse
(the log). ExpTransform then inherits this class, where super(). init (0) runs the constructor of
PowerTransform with power=0.
(b)
class TransformedParameter:
def __init__(self,value,name,transform):
self.name=name
self.transform=transform
self.unconstrained_value=self.transform.inverse(value)
def get_value(self):
return self.transform.forward(self.unconstrained_value)
def set_value(self,val):
self.unconstrained_value=self.transform.inverse(val)
value=property(get_value,set_value)

Task 9. Decision trees are popular classification and prediction methods, mostly due to their easy interpretability. Their
predictive performance is often not particularly good. In this task we will not fully implement an algorithm that
constructs a decision tree, we will just implement a class predicting from a decision tree. Consider the tree shown
below.

yes no
Is y < 0?

yes no yes no
Is x < 1? Is x < 2?

Predict 3 Predict 4 Predict 5 Predict 6

For a new observation with variables x = 1.5 and y = −1 we would predict a response value of 5.
Trees lend themselves very well to object-oriented programming.
We can represent each node by an object. When we want to compute a prediction we provide the values of the
covariates for a single observation to the root node first. The root node will have to do nothing other than decide
which of its two children will have to deal with this data point. As the value y = −1 provided is less than the cutoff
of 0, its left child will have to deal with this observation and compute the prediction for it. From a programming point
of view this means we can simply ask the left child to compute the prediction for this observation. This node then
checks the the value x = 1.5 is less than the cutoff of 1, which it is not, so it delegates computing the prediction to
its own right child, which in turn produces a prediction of 5.
Implement a class TerminalNode that stores a single value as a prediction and provides a method predict that
takes an argument data which it ignores and returns the value stored as a prediction.
Implement a second class InternalNode that stores the name of the variable to be used, the cutoff value, the left
child and the right child, so that the tree from above can be constructed using
branch1 = InternalNode("x", 1, TerminalNode(3), TerminalNode(4))
branch2 = InternalNode("x", 2, TerminalNode(5), TerminalNode(6))
rootnode = InternalNode("y", 0, branch1, branch2)
The class InternalNode should also provide a method ‘predict‘ that takes a dictionary ‘data‘ as argument, contain-
ing the values of the covariates for one observation. If the covariate to be used is less than the threshold provided
when creating the class, the method should call the ‘predict‘ method of the left child, and return its result. Otherwise
it should call the ‘predict‘ method of the right child and return that result.
You should then be able to compute the prediction for the above case using
rootnode.predict({"x": 1.5, "y": -1})

8
Answer

We can use the following Python implementation.


class InternalNode:
def __init__(self, variable, cutoff, left_child, right_child):
self.variable = variable
self.cutoff = cutoff
self.left_child = left_child
self.right_child = right_child

def predict(self, data):


if data[self.variable]<self.cutoff:
return self.left_child.predict(data)
else:
return self.right_child.predict(data)

class TerminalNode:
def __init__(self, prediction):
self.prediction = prediction

def predict(self, data):


return self.prediction

branch1 = InternalNode("x", 1, TerminalNode(3), TerminalNode(4))


branch2 = InternalNode("x", 2, TerminalNode(5), TerminalNode(6))
rootnode = InternalNode("y", 0, branch1, branch2)

rootnode.predict({"x": 1.5, "y": -1})

Task 10. The code below (also available for download at https://raw.githubusercontent.com/UofGAnalyticsData/
DPIP/main/LikelihoodModel.py implements a simple golden section search to find the maximum-likelihood
estimate in one parameter models. Golden section search is not a particularly fast method, but it is reasonably robust.
If the log-likelihood is unimodal it is guaranteed to converge to the maximum likelihood estimate (m.l.e.) provided that
the m.l.e. lies in the interval provided. Most importantly, and this is our reason for using this method over say Newton’s
algorithm, is that it does not require any derivatives.
class LikelihoodModel:
""" Mixin class for maximum-likelihood estimation in one-parameter models """

def __init__(self, x):


""" Set up LikelihoodModel with data in iterable x """
self.x=x

def loglik(self, theta, xi):


""" Function to be implemented by subclasses implementing a specific model
Should return the contribution of observation xi to the loglikelihood
for parameter theta """
raise NotImplementedError

def initialise_optimisation(self):
""" Optional method which will be called before the loglikelihood is
optimised. To be used to set theta_min and theta_max in a
data-adaptive way """
pass

def full_loglik(self, theta):


""" Compute full loglikelihood using method loglik """
return sum([self.loglik(theta, xi) for xi in self.x])

def mle(self):
""" Compute the maximum likelihood estimate

9
Requires that two attributes, self.theta_min and self.theta_max
(smallest and largest possible value of the parameters) have been
set """
phi = (1+math.sqrt(5)) / (3+math.sqrt(5))
self.initialise_optimisation()
# The algorithm is based on maintaining a list of three values, so that
# the value in the middle has the largest log-likelihood
theta = [ self.theta_min,
(1-phi) * self.theta_min + phi * self.theta_max,
self.theta_max ]
# Evaluate loglikelihood for these three values
loglik = list(map(lambda theta: self.full_loglik(theta), theta))
# Add new values as long as the three values are too far apart ...
while theta[2] - theta[0] > 1e-12:
# We currently have three points [theta[0]. theta[1], theta [2]]
# Where we add the new point depends on the distances between
# the thetas
if theta[1]-theta[0] > theta[2] - theta[1]:
# theta[0] theta[1] theta[2]
# ˆˆˆˆˆˆ new point will go here
theta_new = (1-phi) * theta[1] + phi * theta[0]
loglik_new = self.full_loglik(theta_new)
# We now have four values of theta, we want to keep only three
# We keep the largest one and the closest one to the left and
# right
if loglik_new > loglik[1]:
theta = [theta[0], theta_new, theta[1]]
loglik = [loglik[0], loglik_new, loglik[1]]
else:
theta = [theta_new, theta[1], theta[2]]
loglik = [loglik_new, loglik[1], loglik[2]]
else:
# theta[0] theta[1] theta[2]
# ˆˆˆˆˆˆ new point will go here
theta_new = (1-phi) * theta[1] + phi * theta[2]
loglik_new = self.full_loglik(theta_new)
# We now have four values of theta, we want to keep only three
# We keep the largest one and the closest one to the left and
# right
if loglik_new > loglik[1]:
theta = [theta[1], theta_new, theta[2]]
loglik = [loglik[1], loglik_new, loglik[2]]
else:
theta = [theta[0], theta[1], theta_new]
loglik = [loglik[0], loglik[1], loglik_new]
return theta[1]
Models for specific distributions should be written as subclasses that must provide an implementation of the loglik
method. The subclass must also set the attributes theta min and theta max, typically either in the class definition
(if not data-dependent) or in the method initialise optimisation.
Write subclasses ‘BernoulliModel‘ and ‘ExponentialModel‘ that implement maximum-likelihood estimation for obser-
vations from the Bernoulli and the exponential distribution. The corresponding contributions to the loglikelihood
from an observation xi are
xi log(θ) + (1 − xi ) log(1 − θ)
for the Bernoulli distribution (Xi ∼ Bi(1, θ), θ ∈ (0, 1)) and

log(θ) − θxi

for the exponential distribution (Xi ∼ Expo(θ), θ ∈ R+ ).


For a Bernoulli sample x = (0, 0, 1, 1, 1) the maximum likelihood estimate should be computed when running
BernoulliModel([0, 0, 1, 1, 1]).mle()
and for an exponential sample x = (1, 3, 2) the maximum likelihood estimate should be computed when running

10
ExponentialModel([1, 3 , 2]).mle()
For the Bernoulli model you can assume that the m.l.e. lies between say 10−16 and 1 − 10−16 . For the exponential
sample you can assume that the m.l.e. lies between max{x1 ,x1 2 ,...,xn } and min{x1 ,x12 ,...,xn } .

Answer

For the Bernoulli distribution we set the attributes theta min and theta max class-wide in the class defi-
nition.
class ExponentialModel(LikelihoodModel):
def initialise_optimisation(self):
self.theta_min = 1 / max(self.x)
self.theta_max = 1 / min(self.x)
def loglik(self, theta, xi):
return math.log(theta) - theta*xi

For the exponential distribution we set the attributes theta min and theta max inside the
print(ExponentialModel([1, 3 , 2]).mle())

initialise optimisation method, as their values depend on the data.


class BernoulliModel(LikelihoodModel):
theta_min = 1e-8
theta_max = 1-1e-8
def loglik(self, theta, xi):
return xi * math.log(theta) + (1-xi) * math.log(1-theta)

print(BernoulliModel([0, 0, 1, 1, 1]).mle())

11

You might also like