Professional Documents
Culture Documents
Information Storage, Retrieval, Indexing
Information Storage, Retrieval, Indexing
ng
DATABASE MANAGEMENT I
CSC 226
LECTURE 3,4,5,6,7
(INFORMATION STORGE, INDEXING AND RETRIEVAL )
COVENER: Dr. Akputu Oryina Kingsley
Email: Oryina.akputu@trinityuniversity.edu.ng
Office Consultation: Wednesdays, 9a-12m ; 4-5pm
Fridays: 1-4pm.
Lecture Overview 2
Confusion Matrix?
▪ A confusion matrix is a table that is often used to describe the performance of the ISRS or classification
model (or "classifier") on a set of test data for which the true values are known.
▪ The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing
Let's start with an example confusion matrix for a binary ISRS model (though it
Binary ISRS classifier model works on two class problem; where retrieval prediction
3
ISRS Performance Measure (cont’d) 4
Confusion Matrix?
Example confusion matrix for a binary ISRS model results for disease prediction problem
There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for
example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.
The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that
disease).
Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
In reality, 105 patients in the sample have the disease, and 60 patients do not.
4
ISRS Performance Measure (cont’d) 5
Confusion Matrix?
True Positive Rate: When it's actually yes, how often does it predict yes?
▪ TP/actual yes = 100/105 = 0.95
▪ also known as "Sensitivity" or "Recall"
False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17
True Negative Rate: When it's actually no, how often does it predict no?
▪ TN/actual no = 50/60 = 0.83
▪ equivalent to 1 minus False Positive Rate also known as "Specificity"
5
ISRS Performance Measure (cont’d) 6
Confusion Matrix?
Prevalence: How often does the yes condition actually occur in our sample?
6
7
ISRS- Design Components
An ISRS has 3 basic components:
1. User Interface
2. Knowledge Base
3. Search Agent
1. User Interface
• User interface is the front page or the front-end or (User’s) operational area of the system which
enables user to put a query and displays results.
• It is of two types:
a. Query Interface
b. Result Interface
a. Query Interface: is the end from where users enter his/her search terms and initiate communication with
the system. The Query Interface generally need to have following features:
7
8
ISRS- Design Components
• Query Interface: generally need to have the following features:
✓ The system front-end should have the ability to translate a search statement and formulate a search strategy
iv. Modification of search strategy: If one does not get desired output from the database, ISRS
should have procedure for further modification of search
strategy.
Dr. Akputu Oryina Kingsley
8
ISRS- Design Components (cont’d) 9
b. Result interface
✓ Search results should also display the ratings in the light of search terms. For this purpose statistical
techniques can be used.
2. Knowledge Based
• The store house of any ISRS is its Knowledge Base. It contains list of facts or related facts
(information). Any kind of query is answered based on the facts stored in the Knowledge Base. A
Knowledge Base could be a Database Management System (DBMS).
9
ISRS- Design Components (cont’d) 10
• Retrieval of information from storage depends on two important aspects of the KB:
i. Knowledge Representation
ii. Indexing and Clustering
i. Knowledge Representation
• The first and foremost objective in constructing an ISRS is representation of facts within the
Knowledge Base.
10
ISRS- Design Components (cont’d) 11
• A semantic network contains points called nodes connected by links called as arcs.
• The nodes represent objects, concepts or events - in other words documents or information. The arcs are
used to represent the relations between the nodes. Arcs build a kind of hierarchies in the Knowledge Base.
Arcs usually represent relations like is_a or has_part.
✓ It is concerned with the relationship between signifiers—like words, phrases, signs, and
symbols—and what they stand for in reality, their denotation
11
ISRS- Design Components (cont’d) 12
12
ISRS- Design Components (cont’d) 13
• The original idea of frames was developed by Minsky (1975) who defined them as “data structures
for representing stereotyped situations”, such as going into a class room.
• The pointers where properties are stored are known as slots. Similarly, if frame represents an object,
slots represent the properties or attributes of the object. Slots contain value for that particular
attribute.
For example, a book in a library is an object, therefore it can be represented
as frame. The properties of book, i.e., Title, Author, Place, Publisher and so on
are stored as slots and each slot would have corresponding value.
13
ISRS- Design Components (cont’d) 14
Is a : Chair
Location : Class Room
Height: 20-40cm
Legs : 4
Comfortable: Yes
Use: Sitting
14
ISRS- Design Components (cont’d) 15
If search is in collection of BOOKS THEN display Title, Author, Place, Publisher, Year,
Physical Description, ISBN
IF <premise>THEN<action>
• In a rule based knowledge representation, the domain knowledge is represented as a set of rules that are
checked against a collection of facts or knowledge about the current situation.
Dr. Akputu Oryina Kingsley
15
ISRS- Design Components (cont’d) 16
▪ The first column is the Search key that contains a copy of the primary key or candidate key of the table.
These values are stored in sorted order so that the corresponding data can be accessed quickly (Note that
the data may or may not be stored in sorted order).
▪ The second column is the Data Reference which contains a set of pointers holding the address of the disk
block where that particular key value can be found.
16
ISRS- Design Components (cont’d) 17
Boolean Operators
▪ Boolean Operators are simple words (AND, OR, NOT or AND NOT) used as conjunctions to combine
or exclude keywords in a search, resulting in more focused and productive results.
▪ AND and NOT operators increase precision whereas OR increases recall of search results. The shaded area
in the diagram represents retrieved records in the following example
17
ISRS- Design Components (cont’d) 18
18
ISRS- Design Components (cont’d) 19
• Using these operators can greatly reduce or expand the amount of records returned
• Note that each search engine or database collection uses Boolean operators in a slightly different way
or may require the operator be typed in capitals or have special punctuation.
• The specific phrasing will be found in either the guide to the specific database found in Research
Resources or the search engine's help screens.
• AND—requires both terms to be in each item returned. If one term is contained in the document and
the other is not, the item is not included in the resulting list. (Narrows the search)
19
ISRS- Design Components (cont’d) 20
Example: A Google search on stock market AND trading includes results: How does
stock works, stock market trading strategies; stock market today, etc.
20