Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

www.trinityuniversity.edu.

ng

DATABASE MANAGEMENT I
CSC 226
LECTURE 3,4,5,6,7
(INFORMATION STORGE, INDEXING AND RETRIEVAL )
COVENER: Dr. Akputu Oryina Kingsley
Email: Oryina.akputu@trinityuniversity.edu.ng
Office Consultation: Wednesdays, 9a-12m ; 4-5pm
Fridays: 1-4pm.
Lecture Overview 2

▪ ISRS Performance Measure (cont’d)


▪ Components of ISRS designs

Dr. Akputu Oryina Kingsley


ISRS Performance Measure (cont’d) 3

Confusion Matrix?

▪ A confusion matrix is a table that is often used to describe the performance of the ISRS or classification

model (or "classifier") on a set of test data for which the true values are known.

▪ The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing

Let's start with an example confusion matrix for a binary ISRS model (though it

can easily be extended to the case of more than two classes):

Binary ISRS classifier model works on two class problem; where retrieval prediction

assumes only two outcomes (e.g. positive or negative, yes or no etc.)

Dr. Akputu Oryina Kingsley

3
ISRS Performance Measure (cont’d) 4

Confusion Matrix?
Example confusion matrix for a binary ISRS model results for disease prediction problem

What can we learn from this matrix?

There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for
example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.
The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that
disease).
Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
In reality, 105 patients in the sample have the disease, and 60 patients do not.

Dr. Akputu Oryina Kingsley

4
ISRS Performance Measure (cont’d) 5

Confusion Matrix?

Accuracy: Overall, how often is the classifier correct?

▪ (TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?


(FP+FN)/total = (10+5)/165 = 0.09
▪ equivalent to 1 minus Accuracy also known as "Error Rate"

True Positive Rate: When it's actually yes, how often does it predict yes?
▪ TP/actual yes = 100/105 = 0.95
▪ also known as "Sensitivity" or "Recall"

False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17

True Negative Rate: When it's actually no, how often does it predict no?
▪ TN/actual no = 50/60 = 0.83
▪ equivalent to 1 minus False Positive Rate also known as "Specificity"

Dr. Akputu Oryina Kingsley

5
ISRS Performance Measure (cont’d) 6

Confusion Matrix?

Precision: When it predicts yes, how often is it correct?

▪ TP/predicted yes = 100/110 = 0.91

Prevalence: How often does the yes condition actually occur in our sample?

▪ actual yes/total = 105/165 = 0.64

Dr. Akputu Oryina Kingsley

6
7
ISRS- Design Components
An ISRS has 3 basic components:

1. User Interface
2. Knowledge Base
3. Search Agent

1. User Interface

• User interface is the front page or the front-end or (User’s) operational area of the system which
enables user to put a query and displays results.

• It is of two types:

a. Query Interface
b. Result Interface

a. Query Interface: is the end from where users enter his/her search terms and initiate communication with
the system. The Query Interface generally need to have following features:

Dr. Akputu Oryina Kingsley

7
8
ISRS- Design Components
• Query Interface: generally need to have the following features:

i. Understanding the user input statement


✓ The front-end interface needs to understand the keywords given by the users and capture them to pass on to
the search program.
✓ The front-end should have understandable look and feel, distinguishable colour combinations, and search
instructions.
ii. Refining the problem statement
✓ The interface should have ability or flexibility for further refining any query or statement, narrow down from
broader to specific search or further modification within the displayed search results with some kind of
arrangement among topical terms which further facilitate browsing through the system.

iii. Search statement to search strategy translation

✓ The system front-end should have the ability to translate a search statement and formulate a search strategy

in the programming language which is understood by Search Agent.

iv. Modification of search strategy: If one does not get desired output from the database, ISRS
should have procedure for further modification of search
strategy.
Dr. Akputu Oryina Kingsley

8
ISRS- Design Components (cont’d) 9

b. Result interface

✓ In the Result Interface, display of search results should be user friendly.


✓ Not only that the result should cater the needs of individual users but the display should also be
customizable (like e-resource publishers interface).

✓ Search results should also display the ratings in the light of search terms. For this purpose statistical
techniques can be used.

2. Knowledge Based

• The store house of any ISRS is its Knowledge Base. It contains list of facts or related facts
(information). Any kind of query is answered based on the facts stored in the Knowledge Base. A
Knowledge Base could be a Database Management System (DBMS).

A knowledge base (KB) is a technology used to store complex

structured and unstructured information used by a computer

or information Search and Retrieval systems.

Dr. Akputu Oryina Kingsley

9
ISRS- Design Components (cont’d) 10

▪ Knowledge Based (cont’d)


• A KB system consists of a knowledge-base that represents facts about the world and an inference
engine that can reason about those facts and use rules and other forms of logic to deduce new
facts or highlight inconsistencies.

• Retrieval of information from storage depends on two important aspects of the KB:
i. Knowledge Representation
ii. Indexing and Clustering

i. Knowledge Representation

• The first and foremost objective in constructing an ISRS is representation of facts within the
Knowledge Base.

• There are different ways for the Knowledge representation:

a. Semantic Network Knowledge Representation


b. Frame Based Knowledge Representation
c. Rule-Based Knowledge Representation

Dr. Akputu Oryina Kingsley

10
ISRS- Design Components (cont’d) 11

a. Semantic Network Knowledge Representation


• Semantic network is a method of knowledge representation based on a network structure.

• A semantic network contains points called nodes connected by links called as arcs.
• The nodes represent objects, concepts or events - in other words documents or information. The arcs are
used to represent the relations between the nodes. Arcs build a kind of hierarchies in the Knowledge Base.
Arcs usually represent relations like is_a or has_part.

▪ Semantic networks are useful in representation of sentences of natural language.

Semantics is the linguistic and philosophical study of


meaning, in language, programming languages, formal
logics, and semiotics.

✓ It is concerned with the relationship between signifiers—like words, phrases, signs, and
symbols—and what they stand for in reality, their denotation

Dr. Akputu Oryina Kingsley

11
ISRS- Design Components (cont’d) 12

a. Semantic Network Knowledge Representation

Example of semantic representation

Dr. Akputu Oryina Kingsley

12
ISRS- Design Components (cont’d) 13

b. Frame Based Knowledge Representation

• The original idea of frames was developed by Minsky (1975) who defined them as “data structures
for representing stereotyped situations”, such as going into a class room.

• It is an object-oriented approach. A frame represents an object (document or information) or class of


objects (collection of documents or information) or several facts. When they represent a class of
objects, they generalize certain groups identifying overall properties of those groups, it shares.

• The pointers where properties are stored are known as slots. Similarly, if frame represents an object,
slots represent the properties or attributes of the object. Slots contain value for that particular
attribute.
For example, a book in a library is an object, therefore it can be represented
as frame. The properties of book, i.e., Title, Author, Place, Publisher and so on
are stored as slots and each slot would have corresponding value.

Frame: Slots: Value:


Book Title Information Storage & Retrieval
Author G. G. Chaudhury
Publisher Ess Publication
Place New Delhi
Size 18 X 14 cm

Dr. Akputu Oryina Kingsley

13
ISRS- Design Components (cont’d) 14

b. Frame Based Knowledge Representation (cont’d)


• The simplest type of frame is just a data structure with similar properties and possibilities for knowledge
representation as a semantic network, with the same ideas of inheritance and default values.
Basic Idea: A frame consists of a selection of slots which can be filled by values, or
procedures for calculating values, or pointers to other frames. :
is-a Room
Location: Department
Contains: {Desk, Bench, Black Board,
Table, Chairs..}

Is a : Chair
Location : Class Room
Height: 20-40cm
Legs : 4
Comfortable: Yes
Use: Sitting

c. Rule-Based Knowledge Representation


• Rule based representation is a popular approach. Rules are employed to state the way in which the
inference has to be done.
• Rules provide a formal way of representing recommendations, directives, or strategies. Rules are
appropriate when the domain knowledge results from empirical associations developed through years
of experience in solving problems in a given area.

Dr. Akputu Oryina Kingsley

14
ISRS- Design Components (cont’d) 15

c. Rule-Based Knowledge Representation


• Rules are expressed in the form of IF-THEN statements. For example:

If search is in collection of BOOKS THEN display Title, Author, Place, Publisher, Year,
Physical Description, ISBN

If search is in collection of ARTICLES THEN display Title,


Author, Name of Journal, Volume, Issue, Year, ISSN

Rules – antecedent clause (condition) related to a consequent clause Formalisms


(action) by implication if (A and B) THEN S1

IF <premise>THEN<action>

<premise>–is Boolean. The AND, and to a lesser


Degree OR and NOT, logical connectives are
possible.
<action>–a series of statements

• In a rule based knowledge representation, the domain knowledge is represented as a set of rules that are
checked against a collection of facts or knowledge about the current situation.
Dr. Akputu Oryina Kingsley

15
ISRS- Design Components (cont’d) 16

ii. Indexing and Clustering


▪ An index : or database index is a data structure which is used to quickly locate and access the data in a
database table.
▪ Indexing: is a way to optimize performance of a database by minimizing the number of disk accesses
required when a query is processed.

Indexes are created using some database columns:

▪ The first column is the Search key that contains a copy of the primary key or candidate key of the table.
These values are stored in sorted order so that the corresponding data can be accessed quickly (Note that
the data may or may not be stored in sorted order).

▪ The second column is the Data Reference which contains a set of pointers holding the address of the disk
block where that particular key value can be found.

Dr. Akputu Oryina Kingsley

16
ISRS- Design Components (cont’d) 17

iii. Search Agent

▪ Search Agents are vital components of any ISRS system.


▪ These are basically programs which takes input from Search Interface and searches in the
Knowledge Base using existing index.

▪ A good search agent must be equipped with following features:

1. Facility of using Boolean operators


2. Context setting to search terms
3. Clustering algorithms
4. Phonetic algorithms (soundex and metaphone algorithms)

Boolean Operators

▪ Boolean Operators are simple words (AND, OR, NOT or AND NOT) used as conjunctions to combine
or exclude keywords in a search, resulting in more focused and productive results.

▪ AND and NOT operators increase precision whereas OR increases recall of search results. The shaded area
in the diagram represents retrieved records in the following example

Dr. Akputu Oryina Kingsley

17
ISRS- Design Components (cont’d) 18

iii. Search Agent (cont’d)

Dr. Akputu Oryina Kingsley

18
ISRS- Design Components (cont’d) 19

iii. Search Agent (cont’d)

▪ Exercise the above key words in class with google search:

• Using these operators can greatly reduce or expand the amount of records returned

• Note that each search engine or database collection uses Boolean operators in a slightly different way
or may require the operator be typed in capitals or have special punctuation.

• The specific phrasing will be found in either the guide to the specific database found in Research
Resources or the search engine's help screens.
• AND—requires both terms to be in each item returned. If one term is contained in the document and
the other is not, the item is not included in the resulting list. (Narrows the search)

Dr. Akputu Oryina Kingsley

19
ISRS- Design Components (cont’d) 20

iii. Search Agent (cont’d)

Example: A Google search on stock market AND trading includes results: How does
stock works, stock market trading strategies; stock market today, etc.

Dr. Akputu Oryina Kingsley

20

You might also like