Professional Documents
Culture Documents
Assignment2 1
Assignment2 1
Assignment 2
Implementation and Evaluation of Information Retrieval (IR) System
Your task is to implement a vector space retrieval system. You will use the Cranfield
collection to develop and test your system.
To read a document in MATLAB and prepare it for the Bag-of-Word (BoW) model, the
following web pages will be helpful
https://uk.mathworks.com/help/textanalytics/ug/extract-text-data-from-files.html
https://uk.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html
Create the BoW model for the Cranfield collection. You can get help from the following
page to implement the BoW model.
https://uk.mathworks.com/help/textanalytics/ref/bagofwords.html
Clean and preprocess each query. Then, use the built-in function encode to encode the
processed query as a matrix of word counts according to a bag-of-words model.
https://uk.mathworks.com/help/textanalytics/ref/bagofwords.encode.html
For each processed query, determine a ranked list of documents in descending order of
their cosine similarity with the query.
You can use the following formula to compute the cosine similarity of a query Q with
a document D.
⃗
⃗ . 𝑄
𝐷
𝐶𝑜𝑠𝑆𝑖𝑚(𝐷, 𝑄) =
⃗|
⃗ | × |𝑄
|𝐷
where ‘.’ and ‘×’ represent the dot operator and multiplication, respectively.
Page 1 of 3
(c) Evaluation of IR System [40%]
Determine the average precision and recall for the selected five queries, when you use:
top 10 documents in the ranking
top 20 documents in the ranking
top 30 documents in the ranking
Note: A list of relevant documents for each query is provided to you in the file ‘Query-Doc.xlsx’ so
that you can determine precision and recall. The file is available on Moodle with the assignment.
Report
The report should cover breif despcription of all given tasks (a-c).
There should not be any manual work in the implementation of the IR system. All tasks should be
implemented in MATLAB.
The complete code of the working system should be included in the report.
You can add screenshots to justify your work where necessary.
The report should not be handwritten.
Note: You are also allowed to do the assignment in Python programming language.
SUBMISSION
PLAGIARISM
You should work individually on this assignment. Anything you submit is assumed to be entirely your
work. The usual Essex policy on plagiarism applies:
http://www1.essex.ac.uk/plagiarism/
Best of Luck!
Page 2 of 3
Appendix A
N
Query
o.
1 experimental investigation of the aerodynamics of a wing in a slipstream .
2 simple shear flow past a flat plate in an incompressible fluid of small viscosity .
3 the boundary layer in simple shear flow past a flat plate .
4 approximate solutions of the incompressible laminar boundary layer equations for a plate in shear flow .
5 one-dimensional transient heat conduction into a double-layer slab subjected to a linear heat input for a
small time internal .
6 one-dimensional transient heat flow in a multilayer slab .
7 the effect of controlled three-dimensional roughness on boundary layer transition at supersonic speeds .
21 what investigations have been made of the wave system created by a static pressure distribution over a
liquid surface .
22 has anyone investigated the effect of shock generated vorticity on heat transfer to a blunt body .
23 what is the heat transfer to a blunt body in the absence of vorticity .
24 what are the general effects on flow fields when the reynolds number is small .
25 find a calculation procedure applicable to all incompressible laminar boundary layer flow problems
having good accuracy and reasonable computation time .
26 papers applicable to this problem (calculation procedures for laminar incompressible flow with arbitrary
pressure gradient) .
27 has anyone investigated the shear buckling of stiffened plates .
28 papers on shear buckling of unstiffened rectangular plates under shear .
29 in practice, how close to reality are the assumptions that the flow in a hypersonic shock tube using
nitrogen is non-viscous and in thermodynamic equilibrium .
30 what design factors can be used to control lift-drag ratios at mach numbers above 5 .
Page 3 of 3