Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

CE205 Databases and Information Retrieval

Assignment 2
Implementation and Evaluation of Information Retrieval (IR) System

Your task is to implement a vector space retrieval system. You will use the Cranfield
collection to develop and test your system.

Tasks and Guidelines

(a) Construct a Bag-of-Words Model for the Cranfield Collection [30%]


Cranfield collection consists of 1400 documents. Each document is in text format and contains an
abstract of a scientific research paper. This collection is provided with the assignment on Moodle.

 To read a document in MATLAB and prepare it for the Bag-of-Word (BoW) model, the
following web pages will be helpful
https://uk.mathworks.com/help/textanalytics/ug/extract-text-data-from-files.html
https://uk.mathworks.com/help/textanalytics/ug/prepare-text-data-for-analysis.html

 Create the BoW model for the Cranfield collection. You can get help from the following
page to implement the BoW model.
https://uk.mathworks.com/help/textanalytics/ref/bagofwords.html

(b) Scoring and Rank of Retrieved Documents [30%]


 Select any five queries from Appendix A (at the end of this document).

 Clean and preprocess each query. Then, use the built-in function encode to encode the
processed query as a matrix of word counts according to a bag-of-words model.
https://uk.mathworks.com/help/textanalytics/ref/bagofwords.encode.html

 For each processed query, determine a ranked list of documents in descending order of
their cosine similarity with the query.
You can use the following formula to compute the cosine similarity of a query Q with
a document D.

⃗ . 𝑄
𝐷
𝐶𝑜𝑠𝑆𝑖𝑚(𝐷, 𝑄) =
⃗|
⃗ | × |𝑄
|𝐷

where ‘.’ and ‘×’ represent the dot operator and multiplication, respectively.

Page 1 of 3
(c) Evaluation of IR System [40%]
 Determine the average precision and recall for the selected five queries, when you use:
 top 10 documents in the ranking
 top 20 documents in the ranking
 top 30 documents in the ranking

Note: A list of relevant documents for each query is provided to you in the file ‘Query-Doc.xlsx’ so
that you can determine precision and recall. The file is available on Moodle with the assignment.

Report
 The report should cover breif despcription of all given tasks (a-c).

 There should not be any manual work in the implementation of the IR system. All tasks should be
implemented in MATLAB.

 The complete code of the working system should be included in the report.
 You can add screenshots to justify your work where necessary.
 The report should not be handwritten.

Note: You are also allowed to do the assignment in Python programming language.

SUBMISSION

 Submit one pdf file to FASER called: YourRegNo_CE205_Assignment2.pdf


 The submission deadline is 15 January 2024, 13:59. Always, keep an eye on FASER for the
submission deadline.

PLAGIARISM
You should work individually on this assignment. Anything you submit is assumed to be entirely your
work. The usual Essex policy on plagiarism applies:

http://www1.essex.ac.uk/plagiarism/

Best of Luck!

Page 2 of 3
Appendix A
N
Query
o.
1 experimental investigation of the aerodynamics of a wing in a slipstream .
2 simple shear flow past a flat plate in an incompressible fluid of small viscosity .
3 the boundary layer in simple shear flow past a flat plate .
4 approximate solutions of the incompressible laminar boundary layer equations for a plate in shear flow .

5 one-dimensional transient heat conduction into a double-layer slab subjected to a linear heat input for a
small time internal .
6 one-dimensional transient heat flow in a multilayer slab .
7 the effect of controlled three-dimensional roughness on boundary layer transition at supersonic speeds .

8 measurements of the effect of two-dimensional and three-dimensional roughness elements on boundary


layer transition .
9 transition studies and skin friction measurements on an insulated flat plate at a mach number of 5.8 .
10 the theory of the impact tube at low pressure .
11 similar solutions in compressible laminar free mixing problems .
12 some structural and aerelastic considerations of high speed flight .
13 similarity laws for stressing heated wings .
14 piston theory - a new aerodynamic tool for the aeroelastician .
15 on two-dimensional panel flutter .
16 transformation of the compressible turbulent boundary layer .
17 remarks on the eddy viscosity in compressible mixing flows .
18 the flow field in the diffuser of a radial compressor .
19 an investigation of the pressure distribution on conical bodies in hypersonic flows .
20 generalised-newtonian theory .

21 what investigations have been made of the wave system created by a static pressure distribution over a
liquid surface .
22 has anyone investigated the effect of shock generated vorticity on heat transfer to a blunt body .
23 what is the heat transfer to a blunt body in the absence of vorticity .
24 what are the general effects on flow fields when the reynolds number is small .

25 find a calculation procedure applicable to all incompressible laminar boundary layer flow problems
having good accuracy and reasonable computation time .

26 papers applicable to this problem (calculation procedures for laminar incompressible flow with arbitrary
pressure gradient) .
27 has anyone investigated the shear buckling of stiffened plates .
28 papers on shear buckling of unstiffened rectangular plates under shear .

29 in practice, how close to reality are the assumptions that the flow in a hypersonic shock tube using
nitrogen is non-viscous and in thermodynamic equilibrium .
30 what design factors can be used to control lift-drag ratios at mach numbers above 5 .

Page 3 of 3

You might also like