Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

This document is for Coventry University students for their own use in completing their

assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.

Faculty of Engineering, Environment and Computing


7144CEM Principles of Data Science

Assignment Brief

Module Title: Individual Cohort Module Code:


Principles of Data Science /Group: (Sept/Jan/May): 7144CEM
Individual Jan 2023
Coursework Title: Hand out date:
Coursework 06/03/2023
Lecturer: Due date and time:
Mark Johnston / Omid Chatrabgoun 24/03/2023 at 6pm
Estimated Time (hrs): 50 hours Coursework type: Credit value assessed:
Word Limit*: maximum of 3000 words Assignment 10 credits

Submission arrangement online via: Aula


File types and method of recording: docx, pdf

Mark and Feedback date (DD/MM/YY): 7/04/2023


Mark and Feedback method (e.g. in lecture, electronic via Aula): Aula

Module Learning Outcomes Assessed:

ILO1. Demonstrate systematic knowledge and critical understanding in topics in linear algebra,
probability and statistical models, relevant to data science.

Task and Mark distribution:

This Individual Coursework involves investigating the different properties and operations involving
vectors, matrices and probability, including Markov chains and Eigenfaces. You are encouraged to
explore the topic, use your initiative, and show some originality, within the time available. Make sure
you read each task through carefully, and answer all parts of each task.

Please ensure that your Coursework is all your own work and you clearly reference any sources you
have used. No collaboration with other students is permitted.

Please submit one report (e.g. as a single Microsoft Word document) covering all of the tasks below,
clearly organised by subtask. Start each task on a new page. Make sure you include your Python code
and selected output directly in the report. You must not submit a zip file. You must not submit a
Jupyter notebook (but you can print a Jupyter notebook to a PDF file and submit the PDF file). The
word limit is a maximum rather than a target. Concentrate on producing a clear and concise answer to
each subtask.
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.

Task 1. Markov Chains

Example: Flowers. Consider a particular variety of flower which can take one of three colours: red,
white and pink. Suppose after each year, 40% of red flowers remain red but 60% of red flowers mutate
to pink. For pink flowers, 30% mutate to red, 50% remain as pink, and 20% mutate to white. For white
flowers, 90% remain as white and 10% mutate to pink. We can capture these probabilities in a
transition matrix 𝑃 where the columns correspond to red, pink and white flowers in the current year
and the rows correspond to red, pink and white flowers in the next year.
0.4 0.3 0
𝑃 = [0.6 0.5 0.1]
0 0.2 0.9
If a certain garden initially has 80 of this variety of red flowers and 20 of this variety of white flowers
then after one year:
80 32
𝑃 [ 0 ] = [50]
20 18
so we would expect to see 32 red, 50 pink and 18 white flowers in the garden. If we look at the garden
after 5 years:
80 80 20.88
𝑃𝑃𝑃𝑃𝑃 [ 0 ] = 𝑃5 [ 0 ] = [37.08]
20 20 42.04
so we would expect to see approximately 21 red, 37 pink and 42 white flowers in the garden. This is an
example of a system called a “Markov chain”. The next state of the system depends only on its present
state and does not depend on the earlier history of the system.

Assessment: Mice in a maze. Consider the closed maze shown below containing seven rooms (labelled
0 to 6) with doorways connecting the rooms. A number of mice are placed in the maze and the mice
then move randomly from room to room. Each time step, a mouse either stays in the same room (with
probability 0.6) or leaves the room it is in by choosing at random (equally likely) one of the doors out of
the room.

(1) Construct a 7 × 7 transition matrix 𝑃 (as a numpy array) to model the mice moving around the
maze as a Markov chain.
[10 marks]
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.

(2) Suppose that 50 mice are placed in Room 0 and 90 mice are placed in Room 1. All other rooms
are initially empty. Use numpy to calculate how many mice we would expect to see in each
room at the end of each time step for each of the next 30 time steps. Use Python to plot your
results on a line graph showing one line for each room, together with a legend. Briefly
comment on what you observe. You might find numpy.linalg.matrix_power() helpful.

[15 marks]

(3) Use numpy to find the eigenvectors of the transition matrix 𝑃 from part (1), and explain how
one of these eigenvectors is related to the number of mice we would expect to see in each
room in the “long-run steady state”. Also, for each room 𝑗, find the expected number of time
steps for a particular mouse initially located in room 𝑗 to first return to room 𝑗. Hint: if 𝑝𝑗 is the
probability that the particular mouse is in room 𝑗 in the “long-run steady state” then the
expected first return time is given by 1/𝑝𝑗 .

[15 marks]

(4) Suppose the mouse population is in the “long run steady state” from part (3) when a large
piece of poisoned cheese is placed in Room 6 so that any mouse that enters that room instantly
dies.
(a) Modify the transition matrix 𝑃 from part (1) and estimate the number of time steps until
80% of the mice have died.
(b) The expected number of time steps needed for a mouse starting from room 𝑖 to reach
room 𝑘 for the first time can be calculated using matrix operations and is denoted 𝜇𝑖𝑘 .
For a particular destination room 𝑘, let 𝑁 be the transition matrix 𝑃 from part (4)(a) but
with row 𝑘 and column 𝑘 both deleted, and let 𝐼 be the 6 × 6 identity matrix. The sum of
each column of the matrix (𝐼 − 𝑁)−1 gives the values 𝜇𝑖𝑘 . Use this method to find the
expected number of time steps needed for a mouse to die, for each of the possible rooms
in the maze that the mouse could start from. Briefly comment on what you observe.

[20 marks]

Task 2. Eigenfaces

The Olivetti Faces dataset consists of 400 images of human faces. Five examples are given below.

Each image is 64 pixels wide by 64 pixels high. However, the Python code below assembles the 400
images as columns of the matrix Xall. Each image is greyscale (not colour) and all pixel values are
between 0 and 1. The matrix Xsub is the last 300 columns of the matrix Xall.

The only Python libraries you may use in this task are numpy and matplotlib, except that you may use
sklearn only to load the dataset as in the Python code below.
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.

import sklearn.datasets
faces = sklearn.datasets.fetch_olivetti_faces()
Xall = faces.data.T
print(Xall.shape)
print(Xall.min(), Xall.max())
Xsub = Xall[:,100:]

(1) Use numpy to find the mean of each row of Xsub and reshape as a 4096 × 1 vector (call this
xbar). Adapt the Python code below to visualise this “mean face” and compare it to the image
stored in the column of Xall corresponding to the last two digits of your Student ID.

plt.imshow(**something**, cmap=plt.cm.gray, vmin=0, vmax=1)


[10 marks]

(2) Use numpy.cov() to calculate the covariance matrix of Xsub, giving a 4096 × 4096 matrix
(call this matrix 𝐶). Then use numpy.linalg.eigh() to find the eigenvalues (𝑉) and
eigenvectors (columns of the matrix 𝑃) of 𝐶. Use the Python code below to reverse the entries
in 𝑉 and the columns of 𝑃. Let 𝐷 be the diagonal matrix consisting of the elements of 𝑉 along
its diagonal. Use numpy to confirm that 𝑃 is an orthogonal matrix and that 𝐶 = 𝑃𝐷𝑃𝑇 .

Make sure you use numpy.linalg.eigh() not numpy.linalg.eig() for this part.

V = V[::-1]
P = P[:,::-1]
[10 marks]

(3) In the following Python code, we apply 𝑃𝑇 (as a matrix transformation) to the faces in Xsub,
where xbar is the “mean face” (a 4096 × 1 vector) from part (1).

Ysub = P.T@(Xsub-xbar)

(a) The eigenvectors of 𝐶 (i.e. the columns of 𝑃) are called “eigenfaces” (Turk and Pentland,
1991). Visualise the eigenfaces corresponding to the largest 10 eigenvalues. Comment on
what you discover. You will need to remove vmin and vmax from the plt.imshow()
example given above.

(b) Show (by example) that if y is any one column of Ysub then P@y+xbar perfectly
recreates the corresponding column of Xsub. This shows that each “face” in Xsub is a
“linear combination” of eigenfaces plus the “mean face”.

(c) Although 𝑃 is a 4096 × 4096 matrix, generally only those eigenfaces corresponding to the
largest eigenvalues of 𝐶 are “useful”. If there are 𝑘 such eigenvalues (with 𝑘 ≥ 2), let
Xpartial be the matrix formed from matrix multiplication of the first 𝑘 columns of 𝑃 and
the first 𝑘 rows of Ysub, and adding xbar. Investigate how well Xpartial
approximates Xsub for different values of 𝑘 by using the numpy.linalg.norm()
function applied to Xpartial minus Xsub (this provides a measure for “goodness of
fit”). Build an appropriate line graph as part of your investigation.
[20 marks]
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
facultyregistry.eec@coventry.ac.uk.

Reference
Turk, M.A. and Pentland, A.P. Face recognition using Eigenfaces. Proceedings of the IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, June 1991, pp 586-591.
https://ieeexplore.ieee.org/document/139758

Notes:
1. You are expected to use Coventry University APA style for referencing. For support and advice
on this students can contact Centre for Academic Writing (CAW).
2. Please notify your registry course support team and module leader for disability support.
3. Any student requiring an extension or deferral should follow the university process as outlined
here.
4. The University cannot take responsibility for any coursework lost or corrupted on disks, laptops
or personal computer. Students should therefore regularly back-up any work and are advised to
save it on the University system.
5. If there are technical or performance issues that prevent submitting coursework through the
online coursework submission system on the day of a coursework deadline, an appropriate
extension to the coursework submission deadline will be agreed. This extension will normally be
24 hours or the next working day if the deadline falls on a Friday or over the weekend period.
This will be communicated via your Module Leader.
6. You are encouraged to check the originality of your work by using the draft Turnitin links on Aula.
7. Collusion between students (where sections of your work are similar to the work submitted by
other students in this or previous module cohorts) is taken extremely seriously and will be
reported to the academic conduct panel. This applies to both coursework and exam answers.
8. A marked difference between your writing style, knowledge and skill level demonstrated in class
discussion, any test conditions and that demonstrated in a coursework assignment may result in
you having to undertake a Viva Voce in order to prove the coursework assignment is entirely your
own work.
9. If you make use of the services of a proofreader in your work you must keep your original
version and make it available as a demonstration of your written efforts. Also, please read the
university Proof Reading Policy.
10. You must not submit work for assessment that you have already submitted (partially or in full),
either for your current course or for another qualification of this university, with the exception of
resits, where for the coursework, you may be asked to rework and improve a previous attempt.
This requirement will be specifically detailed in your assignment brief or specific course or module
information. Where earlier work by you is citable, i.e., it has already been published/submitted,
you must reference it clearly. Identical pieces of work submitted concurrently may also be
considered to be self-plagiarism.
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third
parties or posted on any website. Any infringements of this rule should be reported to facultyregistry.eec@coventry.ac.uk.

Marking Rubric (PG)

Mark Outcome Guidelines


band
Distinction - Exceptional work with very high degree of rigour, creativity and critical/analytic skills. Mastery of
knowledge and subject-specific theories with originality and autonomy. Demonstrates exceptional ability to analyse and
apply concepts within the complexities and uncertainties of the subject/discipline.
90-100%
Innovative research with exceptional ability in the utilisation of research methodologies. Demonstrates, creativity,
originality and outstanding problem-solving skills. Work completed with very high degree of accuracy, proficiency and
Distinction
autonomy. Exceptional communication and expression demonstrated throughout. Student evidences the full range of
technical and/or artistic skills. Work pushes the boundaries of the discipline and may be strongly considered for external
publication/dissemination/presentation.
Distinction - Outstanding work with high degree of rigour, creativity and critical/analytic skills. Near mastery of
knowledge and subject-specific theories with originality and autonomy. Demonstrates outstanding ability to analyse and
apply concepts within the complexities and uncertainties of the subject/discipline.
80-89% Meets learning
Innovative research with outstanding ability in the utilisation of research methodologies. Work consistently
outcomes
demonstrates creativity, originality and outstanding problem-solving skills. Work completed with high degree of
Distinction
accuracy, proficiency and autonomy. Outstanding communication and expression demonstrated throughout. Student
demonstrates a very wide range of technical and/or artistic skills. With some amendments, the work may be considered
for external publication/dissemination/presentation
Distinction - Excellent work undertaken with rigour, creativity and critical/analytic skills. Excellent degree of knowledge
and subject-specific theories with originality and autonomy demonstrated. The work exhibits excellent ability to analyse
70-79% and apply concepts within the complexities and uncertainties of the subject/discipline.
Innovative research with excellent ability in the utilisation of research methodologies. Work demonstrates creativity,
Distinction originality and excellent problem-solving skills. Work completed with very consistent levels of accuracy, proficiency and
autonomy. Excellent communication and expression demonstrated throughout. Student demonstrates a very wide
range of technical and/or artistic skills.
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third
parties or posted on any website. Any infringements of this rule should be reported to facultyregistry.eec@coventry.ac.uk.

Merit - Very good work often undertaken with rigour, creativity and critical/analytic skills. Very good degree of
knowledge and subject-specific theories with some originality and autonomy demonstrated. The work often exhibits the
60-69% ability to fully analyse and apply concepts within the complexities and uncertainties of the subject/discipline.
Very good research evidence and shows very good ability in the utilisation of research methodologies. Work
Merit demonstrates creativity, originality and problem-solving skills. Work completed with very consistent levels of accuracy,
proficiency and autonomy. Very good communication and expression demonstrated throughout. Student demonstrates
a wide range of technical and/or artistic skills.
Pass - Good work undertaken with some creativity and critical/analytic skills. Demonstrates knowledge and subject-
50-59% specific theories with some originality and autonomy demonstrated. The work exhibits the ability to analyse and apply
concepts within the complexities and uncertainties of the subject/discipline.
Pass Good research and shows some ability in the utilisation of research methodologies. Work demonstrates problem-solving
skills and is completed with some level of accuracy, proficiency and autonomy. Satisfactory communication and
expression demonstrated throughout. Student demonstrates some of the technical and/or artistic skills.
Pass - Assessment demonstrates some advanced knowledge and understanding of the subject informed by current
practice, scholarship and research. Work may be incomplete with some irrelevant material present. Sometimes
demonstrates the ability to analyse and apply concepts within the complexities and uncertainties of the
40-49%
subject/discipline.
Acceptable research with evidence of basic ability in the utilisation of research methodologies. Demonstrates some
Pass
originality, creativity and problem-solving skills but often with inconsistencies. Expression and presentation sufficient for
accuracy and proficiency. Sufficient communication and expression with professional skill set. Student demonstrates
some technical and/or artistic skills.
Fail - Very limited understanding of relevant theories, concepts and issues with deficiencies in rigour and analysis. Some
relevant material may be present but be informed from very limited sources. Fundamental errors and some
misunderstanding likely to be present. Demonstrates limited ability to analyse and apply concepts within the
30-39% Fails to achieve
complexities and uncertainties of the subject/discipline.
learning
Limited research scope and ability in the utilisation of research methodologies. Limited originality, creativity, and
Fail outcomes
struggles with problem-solving skills. Expression and presentation insufficient for accuracy and proficiency. Insufficient
communication and expression and with deficiencies in professional skill set. Student demonstrates deficiencies in the
range of technical and/or artistic skills.
This document is for Coventry University students for their own use in completing their assessed work for this module and should not be passed to third
parties or posted on any website. Any infringements of this rule should be reported to facultyregistry.eec@coventry.ac.uk.

Fail - Clear failure demonstrating little understanding of relevant theories, concepts, issues and only a vague knowledge
of the area. Little relevant material may be present and informed from very limited sources. Serious and fundamental
errors and virtually no evidence of relevant research. Fundamental errors and misunderstandings likely to be present.
20-29%
Little or no research with no evidence of utilisation of research methodologies. No originality, creativity, and struggles
Fail - with problem-solving skills. Expression and presentation insufficient for accuracy and proficiency. Insufficient
communication and expression and with serious deficiencies in professional skill set. Student has clear deficiencies in
range of technical and/or artistic skills.
Fail - Clear failure demonstrating no understanding of relevant theories, concepts, issues and no understanding of area.
0-19% Little or no relevant material may be present and informed from minimal sources. No evidence of ability in the
utilisation of research methodologies. No evidence of originality, creativity, and problem-solving skills. Expression and
Fail presentation deficient for accuracy and proficiency. Insufficient communication and expression and with deficiencies in
professional skill set. Student has clear deficiencies in range of technical and/or artistic skills.

You might also like