Professional Documents
Culture Documents
Unit 5adtnotes
Unit 5adtnotes
AND <antecedent n> THEN <consequent> THEN the advice is ‘take an umbrella’
Relation ● Heuristic
● There are five members of the expert system development team: interviews the domain expert to find out how a particular
the domain expert, the knowledge engineer, the programmer, problem is solved. The knowledge engineer establishes what
the project manager and the end-user. reasoning methods the expert uses to handle facts and rules
and decides how to represent them in the expert system. The
The success of their expert system entirely depends on how well the knowledge engineer then chooses some development software
members work together. or an expert system shell, or looks at programming languages
for encoding the knowledge. And finally, the knowledge
The main players in the development team
engineer is responsible for testing, revising and integrating the
expert system into the workplace.
● The programmer is the person responsible for the actual
programming, describing the domain knowledge in terms that
a computer can understand. The programmer needs to have
skills in symbolic programming in such AI languages as LISP,
Prolog and OPS5 and also some experience in the application
of different types of expert system shells. In addition, the
programmer should know conventional programming
languages like C, Pascal, FORTRAN and Basic.
● The project manager is the leader of the expert system
development team, responsible for keeping the project on
track. He or she makes sure that all deliverables and
milestones are met, interacts with the expert, knowledge
● The domain expert is a knowledgeable and skilled person
engineer, programmer and end-user.
capable of solving problems in a specific area or domain.
● The end-user, often called just the user, is a person who uses
This person has the greatest expertise in a given domain. This
the expert system when it is developed. The user must not
expertise is to be captured in the expert system. Therefore, the
only be confident in the expert system performance but also
expert must be able to communicate his or her knowledge, be
feel comfortable using it. Therefore, the design of the user
willing to participate in the expert system development and
interface of the expert system is also vital for the project’s
commit a substantial amount of time to the project. The
success; the end-user’s contribution here can be crucial.
domain expert is the most important player in the expert
system development team.
● The knowledge engineer is someone who is capable of
designing, building and testing an expert system. He or she Structure of a rule-based expert system
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● In the early seventies, Newell and Simon from ● The knowledge base contains the domain knowledge useful for
Carnegie-Mellon University proposed a production system problem solving. In a rule-based expert system, the knowledge is
model, the foundation of the modern rule-based expert represented as a set of rules. Each rule specifies a relation,
systems. recommendation, directive, strategy or heuristic and has the IF
● The production model is based on the idea that humans solve (condition) THEN (action) structure. When the condition part of a
problems by applying their knowledge (expressed as rule is satisfied, the rule is said to fire and the action part is
production rules) to a given problem represented by executed.
problem-specific information. ● The database includes a set of facts used to match against the IF
● The production rules are stored in the long-term memory and (condition) parts of rules stored in the knowledge base.
the problem-specific information or facts in the short-term
memory. ● The inference engine carries out the reasoning whereby the expert
system reaches a solution. It links the rules given in the
Production system model knowledge base with the facts provided in the database.
● The explanation facilities enable the user to ask the expert system
how a particular conclusion is reached and why a specific fact is
needed. An expert system must be able to explain its reasoning
and justify its advice, analysis or conclusion.
Characteristics of an expert system systems, but mistakes are possible and we should be aware of
this.
● An expert system is built to perform at a human expert level in
a narrow, specialised domain. Thus, the most important ● In expert systems, knowledge is separated from its
characteristic of an expert system is its high-quality processing (the knowledge base and the inference engine are
performance. No matter how fast the system can solve a split up). A conventional program is a mixture of knowledge
problem, the user will not be satisfied if the result is wrong. and the control structure to process this knowledge. This
● On the other hand, the speed of reaching a solution is very mixing leads to difficulties in understanding and reviewing the
important. Even the most accurate decision or diagnosis may program code, as any change to the code affects both the
not be useful if it is too late to apply, for instance, in an knowledge and its processing.
emergency, when a patient dies or a nuclear power plant
explodes. ● When an expert system shell is used, a knowledge engineer or
an expert simply enters rules in the knowledge base. Each
● Expert systems apply heuristics to guide the reasoning and new rule adds some new knowledge and makes the expert
thus reduce the search area for a solution. system smarter.
Comparison of expert systems with conventional systems and human experts (Continued)
Use inexact reasoning and Permit inexact reasoning and Work only on problems
can deal with incomplete, can deal with incomplete, where data is complete and
uncertain and fuzzy uncertain and fuzzy data. exact.
information.
Can make mistakes when Can make mistakes when Provide no solution at all, or
information is incomplete or data is incomplete or fuzzy. a wrong one, when data is
fuzzy. incomplete or fuzzy.
Inference engine cycles via a match-fire procedure ● Forward chaining is the data-driven reasoning. The reasoning
starts from the known data and proceeds forward with that data.
Each time only the topmost rule is executed. When fired, the rule
adds a new fact in the database. Any rule can be executed only
once. The match-fire cycle stops when no further rules can be
fired.
● Forward chaining is a technique for gathering information and then
inferring from it whatever can be inferred.
● However, in forward chaining, many rules may be executed that
have nothing to do with the established goal.
● Therefore, if our goal is to infer only one particular fact, the
forward chaining inference technique would not be efficient.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
Backward chaining
● Rule 1:
o IF the ‘traffic light’ is green
o THEN the action is go
● Rule 2:
o IF the ‘traffic light’ is red
o THEN the action is stop
● Rule 3:
o IF the ‘traffic light’ is red
o THEN the action is go
● We have two rules, Rule 2 and Rule 3, with the same IF part. Thus
both of them can be set to fire when the condition part is satisfied.
These rules represent a conflict set. The inference engine must
determine which rule to fire from such a set. A method for
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
choosing a rule to fire when more than one rule can be fired in a Metarules
given cycle is called conflict resolution. ● Metarule 1:
● In forward chaining, BOTH rules would be fired. Rule 2 is fired ▪ Rules supplied by experts have higher priorities
first as the topmost one, and as a result, its THEN part is executed than rules supplied by novices.
and linguistic object action obtains value stop. However, Rule 3 is
also fired because the condition part of this rule matches the fact ● Metarule 2:
‘traffic light’ is red , which is still in the database. As a Rules governing the rescue of human lives have higher
consequence, object action takes new value go. priorities than rules concerned with clearing overloads on power
system equipment.
Methods used for conflict resolution
Advantages of rule-based expert systems
● Fire the rule with the highest priority. In simple applications, the
priority can be established by placing the rules in an appropriate ● Natural knowledge representation. An expert usually
order in the knowledge base. Usually this strategy works well for explains the problem-solving procedure with such expressions
expert systems with around 100 rules. as this: “In such-and-such situation, I do so-and-so”. These
● Fire the most specific rule. This method is also known as the expressions can be represented quite naturally as IF-THEN
longest matching strategy. It is based on the assumption that a production rules.
specific rule processes more information than a general one. ● Uniform structure. Production rules have the uniform
IF-THEN structure. Each rule is an independent piece of
● Fire the rule that uses the data most recently entered in the knowledge. The very syntax of production rules enables them
database. This method relies on time tags attached to each fact in to be self-documented.
the database. In the conflict set, the expert system first fires the ● Separation of knowledge from its processing. The structure
rule whose antecedent uses the data most recently added to the of a rule-based expert system provides an effective separation
database. of the knowledge base from the inference engine. This makes
it possible to develop different applications using the same
Metaknowledge
expert system shell.
● Metaknowledge can be simply defined as knowledge about ● Dealing with incomplete and uncertain knowledge. Most
knowledge. Metaknowledge is knowledge about the use and rule-based expert systems are capable of representing and
control of domain knowledge in an expert system. reasoning with incomplete and uncertain knowledge.
● In rule-based expert systems, metaknowledge is represented
Disadvantages of rule-based expert systems
by metarules. A metarule determines a strategy for the use of
task-specific rules in the expert system.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
knowledge
• The KNOWLEDGE BASED SYSTEM embodies a program whose goal
is to apply (interpret) the given knowledge.
In Knowledge Based System
Program + knowledge = Knowledge based system
TOPIC 2: KNOWLEDGE BASED
KNOWLEDGE BASES:
● Knowledge-based systems, expert systems
o structure, characteristics
o main components
o advantages, disadvantages
● Base techniques of knowledge-based systems
o rule-based techniques
o inductive techniques
o hybrid techniques
o symbol-manipulation techniques
o case-based techniques
o (qualitative techniques, model-based techniques, temporal
reasoning techniques, neural networks)
“A computer program that uses knowledge of the application domain to
solve problems in that domain, obtaining essentially the same solutions that
a person with experience in the same domain would obtain.”
Three fundamental differences between KNOWLEDGE BASED SYSTEM
and other types of software systems are:
1. Separation between knowledge and the use
of knowledge
2. Utilization of knowledge specific to a domain
3. Heuristic nature (as opposed to algorithmic) A KNOWLEDGE BASED SYSTEM is something more than a program
which just copies the algorithm/formula used by the expert. It must be able
• The fundamental characteristic of KNOWLEDGE BASED SYSTEM: to use the information in an "intelligent" way
– Separating knowledge from the application of
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
o KNOWLEDGE-REPRESENTATION METHOD!
● inference engine
o is a “engine” of problem solving (general problem solving
knowledge)
o is used for supporting the operation of the other
components
o has PROBLEM SOLVING METHOD!
● case-specific database
o auxiliary component
o specific information (information from outside, initial
Structure and characteristics data of the concrete problem)
● KNOWLEDGE BASED SYSTEMs are computer systems o information obtained during reasoning
o contain stored knowledge ● explanation subsystem
o solve problems like humans would o explanation of system’ actions in case of user’ request
● KNOWLEDGE BASED SYSTEMs are AI programs with program typical explanation facilities:
structure of new type. It has ▪ explanation during problem solving:
o knowledge-base (rules, facts, meta-knowledge) ● WHY... (explanative reasoning,
o inference engine (reasoning and search strategy for intelligent help, tracing information
solution, other services) about the actual reasoning steps)
o and problem data ● WHAT IF... (hypothetical reasoning,
● characteristics of KNOWLEDGE BASED SYSTEMs: conditional assignment and its
o intelligent information processing systems consequences, can be withdrawn)
o representation of domain of interest 🡪symbolic ● WHAT IS ... (gleaning in
representation knowledge-base and case specific
o problem solving 🡪 by symbol-manipulation database)
o symbolic programs o explanation after problem solving:
Main components ▪ HOW ... (explanative reasoning, information
● knowledge-base (KB) consists of about the way the result has been found)
o knowledge about the field of interest (in natural ▪ WHY NOT ... (explanative reasoning, finding
language-like formalism) counterexamples)
o symbolically described system-specification
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
▪ WHAT IS ... (gleaning in knowledge-base and ● developer interface (🡪 knowledge engineer, human expert)
case specific database) ● the main tasks of the knowledge engineer:
● knowledge acquisition subsystem o knowledge acquisition and design of KNOWLEDGE
BASED SYSTEM: determination, classification,
refinement and formalization of methods, thumbrules and
procedures
o selection of knowledge representation method and
reasoning strategy
o implementation of knowledge-based system verification
and validation of KB
o KB maintenance
● Knowledge databases are the database for knowledge management.
● Knowledge management is the way to gather, manage and use the
knowledge of an organisation.
● The basic objectives of knowledge management are to achieve
improved performance, competitive advantage and higher levels of
innovation in various tasks of an organisation.
● Knowledge is the key to such systems. Knowledge has several
aspects:
● main tasks:
o checking the syntax of knowledge elements o Knowledge can be implicit (called tacit knowledge) which
o checking the consistency of KB (verification, validation) are internalised or can be explicit knowledge.
o knowledge extraction, building KB
o automatic logging and book-keeping of the changes of o Knowledge can be captured before, during, or even after
KB knowledge activity is conducted.
o tracing facilities (handling breakpoints, automatic
monitoring and reporting the values of knowledge o Knowledge can be represented in logical form, semantic
elements) network form or database form.
● user interface (🡪 user)
o dialogue on natural language (consultation/ suggestion) o Knowledge once properly represented can be used to
● specially intefaces generate more knowledge using automated deductive
o database and other connections reasoning.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● The optional WHEN clause is used to specify any conditions that ● Rule R2 is similar to Rl, but it is triggered by an UPDATE
need to be checked after the rule is triggered but before the action operation that updates the SALARY of an employee rather than by
is executed. an INSERT.
● Finally, the actionts) to be taken are specified as a PL/SQL block, ● Rule R3 is triggered by an update to the DNO attribute of
which typically contains one or more SQL statements or calls to EMPLOYEE, which signifies changing an employee's assignment
execute external procedures. from one department to another.
The four triggers (active rules) Rl , R2, R3, and R4 illustrate a number of ● There is no condition to check in R3, so the action is executed
features of active rules. whenever the triggering event occurs.
1. First, the basic events that can be specified for triggering the rules ● The action updates both the old department and new department of
are the standard SQL update commands: INSERT, DELETE, and the reassigned employees by adding their salary to TOTAL_SAL
UPDATE. These are specified by the keywords INSERT, of their new department and subtracting their salary from
DELETE, and UPDATE in Oracle notation. TOTAL_SAL of their old department.
In the case of UPDATE one may specify the attributes to be ● This should work even if the value of DNO was null, because in
updated-for example, by writing UPDATEOF SALARY, DNO. this case no department will be selected for the rule action.
2. Second, the rule designer needs to have a way to refer to the tuples
that have been inserted, deleted, or modified by the triggering ● The effect of the optional FOR EACH ROW clause, signifies that
event. the rule is triggered separately for each tuple. This is known as a
The keywords NEW and OLD are used in Oracle notation; NEW is row-level trigger.
used to refer to a newly inserted or newly updated tuple, whereas ● If FOR EACH ROW clause was left out, the trigger would be
OLD is used to refer to a deleted tuple or to a tuple before it was known as a statement-level trigger and would be triggered once
updated. for each triggering statement.
● To update multiple records, a rule using row-level semantics would
● Thus rule Rl is triggered after an INSERT operation is applied to be triggered once for eachrow, whereas a rule using statement-level
the EMPLOYEE relation. semantics is triggered only once.
● In Rl, the condition (NEW. DNO IS NOT NULL) is checked, and ● The keywords NEW and OLD can only be used with row-level
if it evaluates to true, meaning that the newly inserted employee triggers.
tuple is related to a department, then the action is executed. Design and Implementation Issues for Active Databases
● The action updates the DEPARTMENT tuplets) related to the (1)The first issue concerns activation, deactivation, and grouping of rules.
newly inserted employee by adding their salary (NEW. SALARY) ● In addition to creating rules, an active database system should
to the TOTAL_SAL attribute of their related department. allow users to activate, deactivate, and drop rules by referring to
their rule names.
Deactivated rule:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● A deactivated rule will not be triggered by the triggering event. 2. Deferred consideration:
● This feature allows users to selectively deactivate rules for certain The condition is evaluated at the end of the transaction that included
periods of time when they are not needed. the triggering event.
Active command: In this case, there could be many triggered rules waiting to have their
● The activate command will make the rule active again. conditions evaluated.
Drop Command: 3. Detached consideration:
● The drop command deletes the rule from the system. The condition is evaluated as a separate transaction, spawned from the
triggering transaction.
● Another option is to group rules into named rule sets, so the whole ● The next set of options concerns the relationship between
set of rules could be activated, deactivated, or dropped. evaluating the rule condition and executing the rule action.
● It is also useful to have a command that can trigger a rule or rule ● Three options are possible here: immediate, deferred, and
set via an explicit PROCESS RULES command issued by the user. detached execution.
(2)The second issue concerns whether the triggered action should be ● Most active systems use the immediate option.
executed before, after, or concurrently with the triggering event. ● That is, as soon as the condition is evaluated, if it returns true, the
(3)A related issue is whether the action being executed should be action is immediately executed.
considered as a separate transaction or whether it should be part of the (4)Another issue concerning active database rules is the distinction between
same transaction that triggered the rule. row-level rules versus statement-level rules.
● Because SQL update statements (which act as triggering events)
● The triggering event occurs as part of a transaction execution. can specify a set of tuples, one has to distinguish between whether
● We should first consider the various options for how the triggering the rule should be considered once for the whole statement or
event is related to the evaluation of the rule's condition. whether it should be considered separately for eachrow (that is,
● The rule condition evaluation is also known as rule consideration, tuple) affected by the statement.
since the action is to be executed only after considering whether ● The sQL-99 standard and the Oracle system allow the user to
the condition evaluates to true or false. choose which of the two options is to be used for each rule,
● There are three main possibilities for rule consideration: whereas STARBURST uses statement-level semantics only.
1. Immediate consideration: ● One of the difficulties that may have limited the widespread use of
The condition is evaluated as part of the same transaction as the active rules, in spite of their potential to simplify database and
triggering event, and is evaluated immediately. This case can be further software development, is that there are no easy-to-use techniques
categorized into three options: for designing, writing, and verifying rules. For example, it is quite
• Evaluate the condition before executing the triggering event. difficult to verify that a set of rules is consistent, meaning that two
• Evaluate the condition after executing the triggering event. or more rules in the set do not contradict one another.
• Evaluate the condition instead of executing the triggering event.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● It is also difficult to guarantee termination of a set of rules under ● A related application is to maintain replicated tables consistent by
all circumstances. specifying rules that modify the replicas whenever the master table
● If dozens of rules are written, it is very difficult to determine is modified.
whether termination is guaranteed or not.
● If active rules are to reach their potential, it is necessary to develop
tools for the design, debugging, and monitoring of active rules that
can help users in designing and debugging their rules.
● An inference engine (or deduction mechanism) within the system The main difference between rules and views is that rules may
can deduce new facts from the database by interpreting these involve recursion and hence may yield virtual relations that cannot
rules. be defined in terms of basic relational views.
● The model used for deductive databases is closely related to the
relational data model, and particularly to the domain relational ● The evaluation of Prolog programs is based on a technique called
calculus formalism. backward chaining, which involves a top-down evaluation of
● It is also related to the field of logic programming and the Prolog goals.
language. ● In the deductive databases that use Datalog, attention has been
● The deductive database work based on logic has used Prolog as a devoted to handling large volumes of data stored in a relational
starting point. database.
● A variation of Prolog called Datalog is used to define rules ● Hence, evaluation techniques have been devised that resemble
declaratively in conjunction with an existing set of relations, those for a bottom-up evaluation.
which are themselves treated as literals in the language.
● Although the language structure of Datalog resembles that of ● Prolog suffers from the limitation that the order of specification of
Prolog, its operational semantics-that is, how a Datalog program is facts and rules is significant in evaluation; moreover, the order of
to be executed-is still different. literals within a rule is significant.
● A deductive database uses two main types of specifications: ● The execution techniques for Datalog programs attempt to
facts and rules. circumvent these problems.
Prolog/Datalog Notation
● Facts: ● The notation used in Prolog/Datalog is based on providing
Facts are specified in a manner similar to the way relations are predicates with unique names.
specified, except that it is not necessary to include the attribute ● A predicate has an implicit meaning, which is suggested by the
names. predicate name, and a fixed number of arguments.
In a deductive database, the meaning of an attribute value in a ● If the arguments are all constant values, the predicate simply
tuple determined solely by its position within the tuple. states that a certain fact is true.
● Rules: ● If, on the other hand, the predicate has variables as arguments, it
Rules are somewhat similar to relational views. is either considered as a query or as part of a rule or constraint.
They specify virtual relations that are not actually stored but that ● In Prolog convention all constant values in a predicate are either
can be formed from the facts by applying inference mechanisms numeric or character strings; they are represented as identifiers
based on the rule specifications. (or names) starting with lowercase letters only, whereas variable
● Difference between views and rules names always start with an uppercase letter.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● The commas between the RHS predicates may be read as meaning superior(X, Y)-the rule head-is also true, since Y would be a direct
"and." subordinate of X (at one level down).
● Consider the definition of the predicate superior in Figure, whose ● This rule can be used to generate all direct superior/subordinate
first argument is an employee name and whose second argument is relationships from the facts that define the supervise predicate.
an employee who is either a direct or an indirect subordinate of the ● The second recursive rule states that, if supervise (X , Z) and
first employee. supervisor (X , Y) are both true, then superior (X, Y) is also true.
● This is an example of a recursive rule, where one of the rule body
predicates in the RHS is the same as the rule head predicate in the
LHS.
● In general, the rule body defines a number of premises such that, if
they are all true, we can deduce that the conclusion in the rule head
is also true.
● If we have two (or more) rules with the same head (LHS
predicate), it is equivalent to saying that the predicat is true (that is,
that it can be instantiated) if eitherone of the bodies is true; hence,
it is equivalent to a logical or operation.
● For example, if we have two rules X : - Y and X : - Z, they are
equivalent to a rule X : - Y or Z.
● The latter form is not used in deductive systems, however, because
it is not in the standard form of rule, called a Horn clause.
● A Prolog system contains a number of built-in predicates that the
system can interpret directly.
● These typically include the equality comparison operator =(X, Y),
which returns true if X and Yare identical and can also be written
● By indirect subordinate, we mean the subordinate of some
as X=Y by using the standard infix notation.
subordinate down to any number of levels.
● Other comparison operators for numbers, such as <, <=, >, and >=,
● Thus superior(X, Y) stands for the fact that "X is a superior of Y"
can be treated as binary predicates.
through direct or indirect supervision.
● Arithmetic functions such as +, -, *, and / can be used as arguments
● We can write two rules that together specify the meaning of the
in predicates in Prolog.
new predicate.
● In contrast, Datalog (in its basic form) does not allow functions
● The first rule under Rules in the figure states that, for every value
such as arithmetic operations as arguments; indeed, this is one of
of X and Y, if supe rvise(X, Y)-the rule body-is true, then
the main differences between Prolog and Datalog.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● Later extensions to Datalog have been proposed to include ● In Datalog, atomic formulas are literals of the form p(al , a2 , ……,
functions. an), where p is the predicate name and n is the number of
● A query typically involves a predicate symbol with some variable arguments for predicate p.
arguments, and its meaning (or "answer") is to deduce all the ● Different predicate symbols have different numbers of arguments,
different constant combinations that, when bound (assigned) to the and the number of arguments n of predicate p is sometimes called
variables, can make the predicate true. the arity or degree of p.
● For example, the first query in Figure requests the names of all ● The arguments can be either constant values or variable names.
subordinates of "james" at any level. ● We use the convention that constant values either are numeric or
start with a lowercase character, whereas variable names always
start with an uppercase character.
Built in Predicates
● A number of built-in predicates are included in Datalog, which can
also be used to construct atomic formulas.
● The built-in predicates are of two main types:
(i) the binary comparison predicates over ordered domains
Like <(less), <=(less_or_equal), >(greater), and >= (greater_
or_equal)
and
● A different type of query, which has only constant symbols as (ii) the comparison predicates over ordered or unordered domains
arguments, returns either a true or a false result, depending on like = (equal) and /= (not_equa1)
whether the arguments provided can be deduced from the facts and ● These can be used as binary predicates with the same functional
rules. syntax as other predicates for example by writing less(X, 3)
● For example, the second query in Figure returns true, since or
superior (james, joyce) can be deduced. they can be specified by using the customary infix notation X<3.
● Because the domains of these predicates are potentially infinite,
Datalog Notation they should be used with care in rule definitions.
● In Datalog, as in other logic-based languages, a program is built Example:
from basic objects called atomic formulas. For example, the predicate greate r (X, 3), if used alone, generates
● It is customary to define the syntax of logic-based languages by an infinite set of values for X that satisfy the predicate (all integer
describing the syntax of atomic formulas and identifying how they numbers greater than 3).
can be combined to form a program. ● A literal is either an atomic formula called a positive literal
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
or
● an atomic formula preceded by not- This is a negated atomic
formula, called a negative literal. ● This clause has n negative literals and positive literals. Such a
● Datalog programs can be considered to be a subset of the clause can be transformed into the following equivalent logical
predicate calculus formulas. formula:
● In Datalog, the predicate calculus formulas are first converted into
clausal form before they are expressed in Datalog; and
● where => is the implies symbol.
● only formulas given in a restricted clausal form, called Horn
● The formulas (1) and (2) are equivalent, meaning that their truth
clauses can be used in Datalog.
values are always the same.
Clausal Form and Horn Clauses
● This is the case because, if all the Pi literals (i = 1,2, ... ,n) are true,
● A formula in the relational calculus is a condition that includes
the formula (2) is true only if at least one of the Qi’s is true, which
predicates called atoms (based on relation names).
is the meaning of the => (implies) symbol.
● A formula can have quantifiers-namely, the universal quantifier
● For formula (1), if all the Pi literals (i = 1,2, ... , n) are true, their
(for all) and the existential quantifier (there exists).
negations are all false; so in this case formula (1) is true only if at
● In clausal form, a formula must be transformed into another
least one of the Qi’s is true.
formula with the following characteristics:
● In Datalog, rules are expressed as a restricted form of clauses
• All variables in the formula are universally quantified.
called Horn clauses, in which a clause can contain at most one
Hence, it is not necessary to include the universal
positive literal.
quantifiers (for all) explicitly; the quantifiers are removed,
● Hence, a Horn clause is either of the form
and all variables in the formula are implicitly quantified
by the universal quantifier.
• In clausal form, the formula is made up of a number of
clauses, where each clause is composed of a number of
literals connected by OR logical connectives only. Hence,
each clause is a disjunction of literals.
• The clauses themselves are connected by AND logical
connectives only, to form a formula. Hence, the clausal
form of a formula is a conjunction of clauses.
Example
Any formula can be converted into clausal form.
Literals can be positive literals or negative literals.
Consider a clause of the form:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
interpretation, the predicate in the head of the rule must also be ● For example, if supe rvise(a, b) and superior(b, c) are both true
true. under some interpretation, but supe rior (a, c) is not true, the
● The interpretation shown in Figure is a model for the two rules interpretation cannot be a model for the recursive rule:
shown, since it can never cause the rules to be violated. superior(X,Y) :- supervise(X,Z), superior(Z,Y)
● In the model-theoretic approach, the meaning of the rules is
established by providing a model for these rules.
Minimal Model:
● A model is called a minimal model for a set of rules if we cannot
change any fact from true to false and still get a model for these
rules.
● The model shown in Figure is the minimal model for the set of
facts that are defined by the supervise predicate.
● In general, the minimal model that corresponds to a given set of
facts in the model theoretic interpretation should be the same as the
facts generated by the proof-theoretic interpretation for the same
original set of ground and deductive axioms.
● However, this is generally true only for rules with a simple
structure.
● Once we allow negation in the specification of rules, the
correspondence between interpretations does not hold.
● In fact, with negation, numerous minimal models are possible for a
given set of facts.
● A third approach to interpreting the meaning of rules involves
defining an inference mechanism that is used by the system to
deduce facts from the rules.
● This inference mechanism would define a computational
interpretation to the meaning of the rules.
● The Prolog logic programming language uses its inference
mechanism to define the meaning of the rules and facts in a Prolog
program.
● Not all Prolog programs correspond to the proof theoretic or
model-theoretic interpretations; it depends on the type of rules in
the program.
● For example, consider the interpretation in Figure, and assume that ● However, for many simple Prolog programs, the Prolog inference
the supervise predicate is defined by a set of known facts, whereas mechanism infers the facts that correspond either to the
the superior predicate is defined as an interpretation (model) for proof-theoretic interpretation or to a minimal model under the
the rules. model-theoretic interpretation.
● Suppose that we add the predicate superior (james , bob) to the true Datalog Programs and Their Safety
predicates. This remains a model for the rules shown, but it is not a ● There are two main methods of defining the truth values of
minimal model, since changing the truth value of superior (james , predicates in actual Datalog programs.
bob) from true to false still provides us with a model for the rules. ● Fact-defined predicates (or relations) are defined by listing all the
combinations of values (the tuples) that make the predicate true.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● A program or a rule is said to be safe if it generates a finite set of ● It is generally advisable to write such a rule in the safest form, with
facts. the predicates that restrict possible bindings of variables placed
● The general theoretical problem of determining whether a set of first.
rules is safe is undecidable. ● As another example of an unsafe rule, consider the following rule:
● We can determine the safety of restricted forms of rules. has_something(X,Y) :- employee(X)
● One situation where we get unsafe rules that can generate an ● Here, an infinite number of Y values can again be generated, since
infinite number of facts arises when one of the variables in the rule the variable Y appears only in the head of the rule and hence is not
can range over an infinite domain of values, and that variable is not limited to a finite set of values.
limited to ranging over a finite relation. ● To define safe rules more formally, we use the concept of a limited
For example, consider the rule variable. A variable X is limited in a rule if
big_salary(Y) :- Y>60000 (1) it appears in a regular (not built-in) predicate in the
● Here, we can get an infinite result if Y ranges over all possible body of the rule;
integers. (2) it appears in a predicate of the form X=c or c=X or
● But suppose that we change the rule as follows: (c1<=X and X<=c2) in the rule body, where c, cl , and c2 are
big_salary(Y) :- employee(X), salary(X,Y), Y>60000 constant values; or
● In the second rule, the result is not infinite, since the values that Y (3) it appears in a predicate of the form X=Y or Y=X in
can be bound to are now restricted to values that are the salary of the rule body, where Y is a limited variable.
some employee in the database-presumably, a finite set of values. ● A rule is said to be safe if all its variables are limited.
We can also rewrite the rule as follows: Use of Relational Operations
big_salary(Y) :- Y>60000, employee(X), salary(X,Y) ● It is straightforward to specify many operations of the relational
● In this case, the rule is still theoretically safe. However, in Prolog algebra in the form of Datalog rules that define the result of
or any other system that uses a top-down, depth-first inference applying these operations on the database relations (fact
mechanism, the rule creates an infinite loop, since we first search predicates).
for a value for Y and then check whether it is a salary of an ● This means that relational queries and views can easily be
employee. specified in Datalog.
● The result is generation of an infinite number of Y values, even ● The additional power that Datalog provides is in the specification
though these, after a certain point, cannot lead to a set of true RHS of recursive queries, and views based on recursive queries.
predicates. ● In Datalog, we do not need to specify the attribute names; rather,
● One definition of Datalog considers both rules to be safe, since it the arity (degree) of each predicate is the important aspect.
does not depend on a particular inference mechanism. ● In a practical system, the domain (data type) of each attribute is
also important for operations such as UNION, INTERSECTION,
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
and JOIN, and we assume that the attribute types are compatible
for the various operations.
● If the Datalog model is based on the relational model and hence
assumes that predicates (fact relations and query results) specify
sets of tuples, duplicate tuples in the same predicate are
automatically eliminated.
● This may or may not be true, depending on the Datalog inference
engine.
● It is definitely not the case in Prolog, so any of the rules in that
involve duplicate elimination are not correct for Prolog.
● The rules should work for Datalog, if duplicates are automatically
● Figure shows the graph for the fact and rule predicates.
eliminated.
● The dependency graph contains a node for each predicate.
Evaluation of Nonrecursive Datalog Queries
● Whenever a predicate A is specified in the body (RHS) of a rule,
● In order to use Datalog as a deductive database system, it is
and the head (LHS) of that rule is the predicate B, we say that B
appropriate to define an inference mechanism based on relational
depends on A, and we draw a directed edge from A to B.
database query processing concepts.
● This indicates that, in order to compute the facts for the predicate
● The inherent strategy involves a bottom-up evaluation, starting
B (the rule head), we must first compute the facts for all the
with base relations; the order of operations is kept flexible and
predicates A in the rule body.
subject to query optimization.
● If the dependency graph has no cycles, we call the rule set
● If a query involves only fact-defined predicates, the inference
nonrecursive.
becomes one of searching among the facts for the query result.
● If there is at least one cycle, the rule set is called recursive.
● The query involves relational SELECT and PROJECT operations
● In Figure there is one recursively defined predicate-namely,
on a base relation, and it can be handled by the database query
superior-which has a recursive edge pointing back to itself.
processing and optimization techniques.
● In addition, because the predicate subordinate depends on superior,
● When a query involves rule-defined predicates, the inference
it also requires recursion in computing its result.
mechanism must compute the result based on the rule definitions.
● A query that includes only nonrecursive predicates is called a
● If a query is nonrecursive and involves a predicate P that appears
nonrecursive query.
as the head of a rule P : - Pl. P2-------Pn, the strategy is first to
● In the predicate dependency graph, the nodes corresponding to
compute the relations corresponding to Pl,' P2, ...• pn, and then to
fact-defined predicates do not have any incoming edges, since all
compute the relation corresponding to p.
fact-defined predicates have their facts stored in a database
● It is useful to keep track of the dependency among the predicates
relation.
of a deductive database in a predicate dependency graph.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● The contents of a fact-defined predicate can be computed by ● Multimedia databases provide features that allow users to store and
directly retrieving the tuples in the corresponding database query different types of multimedia information, which includes
relation. images (such as pictures or drawings), video clips (such as movies,
● The main function of an inference mechanism is to compute the news reels, or home videos), audio clips (such as songs, phone
facts that correspond to query predicates. messages, or speeches), and documents (such as books or articles).
● This can be accomplished by generating a relational expression ● Databases provides functionalities for the easy manipulation of
involving relational operators as SELECT, PROJECT, JOIN, query and retrieval of highly relevant information from huge
UNION, and SET DIFFERENCE (with appropriate provision for collections of stored data.
dealing with safety issues) that, when executed, provides the query
result. Need for Multimedia Databases
● The query can then be executed by utilizing the internal query
● Early applications of MMDBMS tended to use multimedia for
processing and optimization operations of a relational database
presentational requirements only. For example, an employee
management system.
database might include an image of each employee.
● Whenever the inference mechanism needs to compute the fact set
● These systems could be implemented relatively simply by storing
corresponding to a nonrecursive rule-defined predicate p, it first
the image files externally to the database and storing a file
locates all the rules that have p as their head.
reference in the database.
● The idea is to compute the fact set for each such rule and then to
● However, this external data could be manipulated by the DBMS.
apply the UNION operation to the results, since UNION
Multimedia applications are evolving because people want to
corresponds to a logical OR operation.
exploit Multimedia data in a “natural” way, Interrogating,
● The dependency graph indicates all predicates q on which each p
retrieving and manipulating the data.
depends, and since we assume that the predicate is nonrecursive,
● Complex applications are developing such as entertainment
we can always determine a partial order among such predicates q.
services (e.g. video on demand), multimedia sales for houses,
Before computing the fact set for p, we first compute the fact sets
goods and services, groupware , telepresence, surveillance and
for all predicates q on which p depends, based on their partial
telemedicine.
order.
database must use some model to organize and index the (I) The first is based on automatic analysis of the multimedia
multimedia sources based on their contents. sources to identify certain mathematical characteristics of their
● Identifying the contents of multimedia sources is a difficult and contents.
time-consuming task. ● This approach uses different techniques depending on the type of
multimedia source (image, text, video, or audio).
There are two main approaches. (2) The second approach depends on manual identification of the
objects and activities of interest in each multimedia source and on
● The first is based on automatic analysis of the multimedia sources
using this information to index the sources.
to identify certain mathematical characteristics of their contents.
● This approach can be applied to all the different multimedia
This approach uses different techniques depending on the type of
sources, but it requires a manual preprocessing phase where a
multimedia source (image, text, video, or audio).
person has to scan each multimedia source to identify and catalog
● The second approach depends on manual identification of the
the objects and activities it contains so that they can be used to
objects and activities of interest in each multimedia source and on
index these sources.
using this information to index the sources.
● The main types of database queries that are needed involve
Image:
locating multimedia sources that contain certain objects of interest.
● An image is typically stored either in raw form as a set of pixel or
Example
cell values, or in compressed form to save space.
● For example, one may want to locate all video clips in a video
● The image shape descriptor describes the geometric shape of the
database that include a certain person in them, say Bill Clinton.
raw image, which is typically a rectangle of cells of a certain width
● One may also want to retrieve video clips based on certain
and height.
activities included in them, such as a video clips where a goal is
● Each image can be represented by an m by n grid of cells.
scored in soccer game by a certain player or team.
● Each cell contains a pixel value that describes the cell content.
● In black/white images, pixels can be one bit.
● The above types of queries are referred to as content-based
● In gray scale or color images, a pixel is multiple bits.
retrieval, because the multimedia source is being retrieved based
● Because images may require large amounts of space, they are often
on its containing certain objects or activities.
stored in compressed form.
● A multimedia database must use some model to organize and index
● Compression standards, such as GlF or JPEG, use various
the multimedia sources based on their contents.
mathematical transformations to reduce the number of cells stored
● Identifying the contents of multimedia sources is a difficult and
but still maintain the main image characteristics.
time-consuming task.
● The mathematical transforms that can be used include Discrete
There are two main approaches.
Fourier Transform (OFT), Discrete Cosine Transform (OCT), and
wavelet transforms.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● To identify objects of interest in an image, the image is typically ● Rather than identifying the objects and activities in every
divided into homogeneous segments using a homogeneity individual frame, the video is divided into video segments, where
predicate. each segment is made up of a sequence of contiguous frames that
● For example, in a color image, cells that are adjacent to one includes the same objects/activities.
another and whose pixel values are close are grouped into a ● Each segment is identified by its starting and ending frames.
segment. ● The objects and activities identified in each video segment can be
● The homogeneity predicate defines the conditions for how to used to index the segments.
automatically group those cells. ● An indexing technique called frame segment trees has been
● Segmentation and compression can hence identify the main proposed for video indexing.
characteristics of an image. ● The index includes both objects, such as persons, houses, cars, and
● A typical image database query would be to find images in the activities, such as a person delivering a speech or two people
database that are similar to a given image. talking.
● The given image could be an isolated segment that contains, say, a ● Videos are also often compressed using standards such as MPEG.
pattern of interest, and the query is to locate other images that Text
contain that same pattern. ● A text/document source is basically the full text of some article,
● There are two main techniques for this type of search. book, or magazine.
● (i)The first approach uses a distance function to compare the given ● These sources are typically indexed by identifying the keywords
image with the stored images and their segments. that appear in the text and their relative frequencies.
● If the distance value returned is small, the probability of a match is ● The filler words are eliminated from that process.
high. ● Because there could be too many keywords when attempting to
● Indexes can be created to group together stored images that are index a collection of documents, techniques have been developed
close in the distance metric so as to limit the search space. to reduce the number of keywords to those that are most relevant to
● (ii)The second approach, called the transformation approach, the collection.
measures image similarity by having a small number of ● A technique called singular value decompositions (SVO), which is
transformations that can transform one image's cells to match the based on matrix transformations, can be used for this purpose.
other image. ● An indexing technique called telescoping vector trees, or TV-trees,
● Transformations include rotations, translations, and scaling. can then be used to group similar documents together.
● The seccond is more general, it is also more time consuming and Audio:
difficult. ● Audio sources include stored recorded messages, such as speeches,
Video class presentations, or even surveillance recording of phone
● A video source is typically represented as a sequence of frames, messages or conversations by law enforcement.
where each frame is a still image.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● Here, discrete transforms can be used to identify the main ■ Complex structure demands deriving semantics from contents
characteristics of a certain person's voice in order to have similarity
based indexing and retrieval. ■ Real world objects present in images, video, animation or
● Audio characteristic features include loudness, intensity, pitch, and discussed in audio are often the subject of query
clarity.
■ Techniques such as image interpretation, Speech
Document types
recognition, etc can be used to derive information
■ Mono medium corresponding to multimedia objects
■ text, video, image, music, speech, graph,... Multimedia data modeling concerns with Logical & Physical
representation of multimedia objects, relations among them, extraction
■ multimedia & use of features in obtaining semantics
■ interlinked text document (eg XML, HTML) o Data must be stored & managed according to their
specific characteristics of the storage media
■ hypermedia
✔ Descriptive Search Methods
■ interlinked multimedia documents
o Query must be descriptive & content oriented
Nature of Multimedia data
✔ Device-Independent Interface
■ Consisting of alphanumeric, graphics, image, animation, audio,
visual information ✔ Format-Independent Interface
■ From the presentation view point, multimedia data is huge & ✔ View Specific & Simultaneous Data Access
involves time dependent characteristics that must be adhered
o Same data can be accessed through different queries by
■ No matter, if these objects exists or created on the fly different applications
■ Presentation & subsequent interactions needed demands ✔ Management of Large Amounts of data
much more from a DBMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
✔ Relational Consistency of Data Management ✔ Data transfer of real time activity gets higher priority than
other database activities
o Relations among data of one or different media
should stay consistent ✔ Large Transactions
o These relations can be used for queries ✔ Large transactions must be done in a reliable fashion,
& data o/p since it take long time.
o Provides different descriptions / presentations of Like the traditional DBMS, MM-DBMS should address requirements:
same object
▪ Integration
✔ Component Relation Data items do not need to be duplicated for different
programs.
o Consistency among all parts belonging to same
object ▪ Data independence
Separate the database and the management from the
✔ Substitution Relation application programs.
o Concerns with different kinds of presentations of ▪ Concurrency control
same object Allows concurrent transactions.
▪ Integrity control
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
⮚ A hybrid of the first two. Certain media types use their own
indexes, while others use the "unified" index
⮚ An attempt to capture the advantages of the first two
⮚ Joins across multiple data sources using their native indexes.
– Media portals
– Standards become available: coding, delivery, and
description.
Network/device transparent • Handle a variety of data compression and storage formats. The data
encoding has a variety of formats even within a single application. For
Quality of service (graceful degradation) instance, in medical applications, the MRI images of brain has lossless or
very stringent quality of lossy coding technique, while the X-ray images of
Intelligent tools and interfaces
bones can be less stringent. Also, the radiological image data, the ECG data,
Automated protection and transaction other patient data, etc. have widely varying formats.
Multimedia data types Support different computing platforms and operating systems. Different
users operate computers and devices suited to their needs and tastes. But
⮚ Text they need the same kind of user-level view of the database.
⮚ Image
⮚ Video • Integrate different data models. Some data such as numeric and textual
⮚ Audio data are best handled using a relational database model, while some others
⮚ mixed multimedia data such as video documents are better handled using an object-oriented
database model. So these two models should coexist together in MMDBs.
Designing MMDBs • Offer a variety of user-friendly query systems suited to different kinds
of media. From a user point of view, easy-to-use queries and fast and
Characteristics of multimedia data that have impacts on the design of accurate retrieval of information is highly desirable. The query for the same
MMDBs include : item can be in different forms. For example, a portion of interest in a video
can be queried by using either
the huge size of MMDBs, temporal nature, richness of content, complexity
of representation and subjective interpretation. The major challenges in 1) a few sample video frames as an example,
designing multimedia databases arise from several requirements they need
to satisfy such as the following: 2) a clip of the corresponding audio track or
• Manage different types of input, output, and storage devices. Data 3) a textual description using keywords
input can be from a variety of devices such as scanners, digital camera for
• Handle different kinds of indices. The inexact and subjective nature of
images, microphone, MIDI devices for audio, video cameras. Typical output
multimedia data has rendered keyword-based indices and exact and range
devices are high-resolution monitors for images and video, and speakers for
searches used in traditional databases ineffective. For example, the retrieval
audio.
of records of persons based on social security number is precisely defined,
but the retrieval of records of persons having certain facial features from a
database of facial images requires, content-based queries and
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
similarity-based retrievals. This requires indices that are content dependent, • Distributed multimedia systems
in addition to key-word indices.
(NSF, AFRL, IBM, Intel, Siemens)
• Develop measures of data similarity that correspond well with
perceptual similarity. Measures of similarity for different media types • High-performance multimedia database architecture for
need to be quantified to correspond well with the perceptual similarity of
storage management
objects of those data types. These need to be incorporated into the search
process. (NSF, AT&T)
Provide transparent view of geographically distributed data. MMDBs
are likely to be a distributed nature. The media data resides in many
different storage units possibly spread out geographically. This is partly due
to the changing nature of computation and computing resources from
centralized to networked and distributed.
Multimedia Databases
● Data size is increasing which means its size is increasing in not just – a range of values which are acceptable
numbers and small strings but also multimedia data as well –
structured (text, images, video, audio, VR, etc.) – some operations which are available
● Databases promise: ● the schema indicates a domain for each part of the database and the
DBMS enforces the domain constraint
– well structured data organisation
– e.g. in a Relational Database, each column is assigned a
– efficient storage of large amounts of data domain
– querying ● Therefore a DBMS must provide domain types for any kind of data
that they wish to house and the overall structure will deal with the
– transactional support for concurrent users integration
– multimedia is large and may swamp other data ● DBMS typically provide three different kinds of domain for
multimedia data:
– multimedia data structures are completely different from
standard database structures 1. large object domains, sequences of data often of two kinds
– multimedia data structures do not easily lend themselves ● Binary Large Objects – BLOBs – which are an
to content-based searching unstructured sequence of bytes
Data integration ● Character Large Objects – CLOBs – which are
an unstructured sequence of characters
● Databases already integrate various kinds of data, numbers, dates,
small text strings. 2. file references – instead of holding the data, a file reference
contains a link to the data (OLE in Access)
● They do this by the use of domains
3. genuine multimedia data types – (Oracle and Jasmine)
– i.e. each atomic value in the database belongs to one of a
small number of types 1. There is an important difference between the last of these
and the first two:
● each type has two aspects:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
Multimedia DB Storgage
Querying MM data
Data Structure ● With multimedia data this is more difficult and requires some
method of identifying contents of which there are two kinds:
● automatic identification
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● an algorithm takes the data and returns a measure – the set of classes is extensible and so we can freely create
which can be compared – e.g. of blackness domains
● manual identification ● Object-relational DBMS are fundamentally relations but are not
First Normal Form
● a person examines the data and catalogues it –
e.g. in a table of pictures, there is a column for – the values in cells can be object references as well as
the picture and another for the painter atomic values
● There are three kinds of DBMS that might be used for housing How can we use these different types?
multimedia data.
● In a relational database, we can have:
● Relational DBMS store everything as First Normal Form tables
– domain types for large objects
– all data items are atomic and are held in rectangular
tables – using a string type for file names
– data can only be related if they are in one or in two – extra file types as in OLE in Access
records connected by a common value (foreign key)
● In an object-oriented database, we can have:
– records are identified only by content
– specially designed classes for multimedia
– it is difficult (if not impossible) to extend the set of
● In an object-relational database, we can have:
domains
– specially designed types for multimedia
● Object-oriented DBMS store everything as classes of objects
– data is related by object reference (i.e. one class variable R type database e.g. Access and OLE
has a type which is another class and the values of that
variable are instances of that class)
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● Object Linking and Embedding was Microsoft’s first architecture ● Oracle and SQL3 support three large object types:
for integrating files of different types:
– BLOB - The BLOB domain type stores unstructured
● Each file type in Windows is associated with an application It is binary data in the database. BLOBs can store up to four
possible to place a file of one type inside another: gigabytes of binary data.
– either by wholly embedding the data in which case it is – CLOB – The CLOB domain type stores up to four
rendered by a plug-in associated with the program gigabytes of single-byte character set data
– or by placing a link to the data in which case it is rendered – NCLOB - The NCLOB domain type stores up to four
by calling the original program gigabytes of fixed-width and varying width multi-byte
national character set data
● Access works with this system by providing a domain type for
OLE * SQL3 is a significant extension to standard SQL which turns into a full
object-based language
• There’s not much we can do with OLE fields since the data is in a
format that Access does not understand ● These types support
• We can plug the foreign data into a report or a form and little else – Concatenation – making up one LOB by putting two of
them together
R databases e.g. BFILEs in Oracle
– Substring – extract a section of a LOB
● The BFILE datatype provides access to BLOB files of up to 4
gigabytes that are stored in file systems outside an Oracle database. – Overlay – replace a substring of one LOB with another
The BFILE datatype allows read-only support of large binary files; we – Trim – removing particular characters (e.g. whitespace)
cannot modify a file through Oracle. Oracle provides APIs to access file from the beginning or end
data
– Length – returns the length of the LOB
– LOBs can only appear in a where clause using “=”, “<>” The Object Relational Multimedia Domain Types in interMedia
or “like” and not in group by or order by at all
● interMediaprovides the ORDAudio, ORDImage, and
Large Object Types in MySQL ORDVideoobject types and methods for:
MySQL has four BLOB and four CLOB (called TEXT in MySQL) domain – updateTimeORDSource attribute manipulation
types:
– manipulating multimedia data source attribute
● TINYBLOB and TINYTEXT – store up to 256 bytes information
● BLOB and TEXT – store up to 64K bytes – extracting attributes from multimedia data
● MEDIUMBLOB and MEDIUMTEXT – store up to 16M bytes – getting and managing multimedia data from Oracle
interMedia, Web servers, and other servers
● LONGBLOB and LONGTEXT – store up to 4G bytes
– performing a minimal set of manipulation operations on
Oracle interMedia Audio, Image, and Video multimedia data (images only)
● Oracle interMedia supports multimedia storage, retrieval, and ● The properties available are:
management of:
– ORDImage– the height, width, data size of the on-disk
– BLOBs stored locally in Oracle8i onwards and containing image, file type, image type,compression type, and MIME
audio, image, or video data type
– BFILEs, stored locally in operating system-specific file – ORDAudio – the format, encoding, number of channels,
systems and containing audio, image or video data sampling rate, sample size,compression type, and audio
duration
– URLs containing audio, image, or video data stored on
any HTTP server such as Oracle Application Server, – ORDVideo– the format, frame size, frame resolution,
Netscape Application Server, Microsoft Internet frame rate, video duration, number of frames,
Information Server, Apache HTTPD server, and Spyglass compression type, number of colours, and bit rate
servers
● Oracle also stores metadata including:
– Streaming audio or video data stored on specialized
media servers such as the Oracle Video Server – source type, location, and source name
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
– Characteristics such as height and width of an image, 2. The properties are not well understood or implementable in
number of audio channels, video frame rate, pay time, etc. reasonable time
OO databases – e.g. Jasmine – what does it mean to say that one image is before another
in order therefore there are few operators in the where
● Jasmine is an Object-Oriented database and has an application clause that work
known as Studio is its development environment
– At the moment, there is no reason for putting multimedia data into
● It comes with a number of built in classes include four multimedia a relational database
classes:
– it just slows everything down
– Picture -
– and we can’t do very much
– Image –
Disadvantages of OODBMS AND ORDBMS
– Video –
– We could use an object relational or object oriented database
– Audio -
– now we can do more
● These come with manipulation and compression facilities They
also have been made to fit well with Java Media Framework – but the products are immature
● At present we cannot do much with MM data, there are two There are three main reasons for integrating multimedia data with a
reasons for this: database:
– indexing on multimedia data is not reasonable nor is – a column for file names is good enough
storing a default value
– 2. Decorating Reports
– other retrieval may be slowed down
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
– The OLE approach works well here Otherwise a file name § A multimedia database management system (MM-DBMS) is a
column and a simple application for generating the reports framework that manages different types of data potentially
would do represented in a wide diversity of formats on a wide array of media
sources.
– 3. Web Applications
§ Like the traditional DBMS, MM-DBMS should address
– Again a file name column is good enough requirements:
o Integration
o Data independence
o Concurrency control
o Persistence
o Privacy
● playback
● rewind
● fast forward
● pause
Major Issues:
Presentation & Delivery Support
● a need to interact with other remote servers to assemble ■ Give the description of the real world object o,
the presentation (or parts of it) corresponding to s. User may click on s to select
it.
● a bound on the buffer, bandwidth, load, and other
resources available on the system ■ Retrieve all the video shots of my friend, giving
friend’s photo
● a mismatch between the host server's capabilities and the
customers machine capabilities ■ These kind of queries are not present in a conventional
database management system
§ Presentations should optimize Quality of Service (QoS).
■ Extremely rare, two images matches exactly
Operations On Data
■ Match is a real value, 0 …1
■ Indexing is complicated
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
■ Extract n numerical valued feature from media object & § Police officer Rocky has a photograph in front of him.
the indexing methods can be based on nearest-neighbor
methods § He wants to find the identity of the person in the picture.
■ n features can be distinct / composite § Query: “Retrieve all images from the image library in which the
person appearing in the (currently displayed) photograph appears”
A Sample Multimedia Scenario
Image Query (by keywords):
§ Consider a police investigation of a large-scale drug operation.
This investigation may generate the following types of data § Police officer Rocky wants to examine pictures of “Big Spender”.
● Video data captured by surveillance cameras that record § Query: "Retrieve all images from the image library in which “Big
the activities taking place at various locations. Spender” appears."
§ Find all individuals who have been photographed with “Big Based on Principle of Uniformity
Spender” and who have been convicted of attempted murder in
South China and who have recently had electronic fund transfers § A single abstract structure to index all media types
made into their bank accounts from ABC Corp.
§ Abstract out the common part of different media types (difficult!) -
MM Database Architectures metadata
§ Each media type is organized in a media-specific manner suitable § Annotations for different media types
for that media type
§ A hybrid of the first two. Certain media types use their own
indexes, while others use the "unified" index
Organizing Multimedia Data Based on the Principle of Uniformity Querying SMDSs (Uniform Representation)
§ Consider the following statements about media data and they may Querying SMDS based on top of SQL.
be made by a human or may be produced by the output of an
Basic functions include:
image/video/text content retrieval engine.
§ FindType(Obj): This function takes a media object Obj as input,
● The image photol.gif shows Jane Shady, “Big Spender”
and returns the output type of the object. For example,
and an unidentified third person, in SheungShui. The
picture was taken on January 5, 1997. FindType(iml.gif) = gif.
● The video-clip videol.mpg shows Jane Shady giving “Big FindType(moviel.mpg) = mpg.
Spender” a briefcase (in frames 50-100). The video was
obtained from surveillance set up at Big Spender’s house § FindObjWithFeature(f): This function takes a feature f as input
in Kowloon Tong, in October, 1996. and returns as output, the set of all media objects that contain that
feature. For example,
● The document bigspender.txt contains background
information on Big Spender, a police’s file. FindObjWithFeature(john)=
{iml.gif,im2.gif,im3.gif,videol.mpg:[1,5]}.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
FindObjWithFeature(mary)=
{videol.mpg:[1,5],videol.mpg:[15,50]}.
§ The WHERE statement allows (in addition to standard SQL to the existence of an unknown person whose identity is to be
constructs), expressions of the form determined.
HM-SQL is exactly like SQL except that the SELECT, FROM, WHERE § Find all video clips containing Big Spender, from both the video
clauses are extended as follows: sources, videol, and video2, where the former is implemented via
an SMDS and the latter is implemented via a legacy video
§ theSELECT and FROM clauses are treated in exactly the same database:
way as in SMDS-SQL.
SELECT M FROM smds video1, videodb video2 WHERE M
§ The WHERE statement allows (in addition to standard SQL IN smds:FindObjWithFeature(Big Spender) OR M IN
constructs) expressions of the form videodb:FindVideoWithObject(Big Spender)
Term IN MS:func_call § Find all people seen with Big Spender in either video1, video2, or
idb.
where
(SELECT P1 FROM smds video1 V1
1. term is either a variable (in which case it ranges over the output
type of func_call) or an object having the same output type as func_call WHERE V1 IN smds:FindObjWithFeature(Big Spender)AND
as defined in the media source MS and P1 IN smds:FindFeaturesinObj(V1) AND Pl≠Big Spender)
UNION
2. eitherMS=SMDS and func_call is one of the five SMDS
functions, or (SELECT P2 FROM videodb video2 V2
3. MS is not an SMDS-media source., and func_call is a query in WHERE V2 IN videodb:FindVideoWithObject(Big Spender)
QL(MS). AND P2 IN videodb:FindObjectsinVideo(V2) AND
§ Thus, there are 2 differences between HM-SQL and SMDS-SQL: P2≠Big Spender)
1. func_calls occurring in the WHERE clause must be explicitly UNION
annotated with the media-source involved, and
(SELECT P3 FROM imagedbidb I3
2. queries from the query languages of the individual (non-SMDS)
media-source implementations may be embedded within an HM-SQL WHERE I3 IN imagedb:getpic(Big Spender) AND P3 IN
query. This latter feature makes HM-SQL very powerful indeed as it is, in imagedb:getfeatures(I3) AND P3≠Big Spender)
principle, able to express queries in other, third-party, or legacy media
implementations.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
Connective Summary ● For example, cartographic databases that store maps include two
dimensional spatial descriptions of their objects-from countries and
When faced with the problem of creating a multimedia database, we must states to rivers, cities, roads, seas, and so on.
consider: ● These applications are also known as Geographical Information
Systems (GIS), and are used in areas such as environmental,
§ The kinds of media data should the MM database provide access to
emergency, and battle management. Other databases, such as
§ Check whether legacy algorithms already exist (and are they meteorological databases for weather information, are
available) to index this data reliably and accurately using three-dimensional, since temperatures and other meteorological
content-based indexing methods information are related to three-dimensional spatial points.
● A spatial database stores objects that have spatial characteristics
§ determine the use of uniform representation or hybrid that describe them.
representation ● The spatial relationships among the objects are important, and they
are often needed when querying the database.
● A spatial database can in general refer to an n-dimensional space
for any n.
● The main extensions that are needed for spatial databases are
models that can interpret spatial characteristics.
● Special indexing and storage structures are often needed to
improve performance.
Model extensions for two-dimensional spatial databases:
● The basic extensions needed are to include two dimensional
geometric concepts, such as points, lines and line segments,
circles, polygons, and arcs, in order to specify the spatial
characteristics of objects.
● In addition, spatial operations are needed to operate on the objects'
TOPIC: SPATIAL DATABASE spatial characteristics-for example, to compute the distance
between two objects-c-as well as spatial Boolean conditions-for
Spatial databases provide concepts for databases that keep track of objects example, to check whether two objects spatially overlap.
in a multidimensional space. Example
Example ● To illustrate, consider a database that is used for emergency
management applications.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES
● A description of the spatial positions of many types of objects ● Typical criteria for dividing the space include minimizing the
would be needed. rectangle areas, since this would lead to a quicker narrowing of the
● Some of these objects generally have static spatial characteristics, search space.
such as streets and highways, water pumps (for fire control), police ● Problems such as having objects with overlapping spatial areas are
stations, fire stations, and hospitals. handled in different ways by the many different variations of R+
● Other objects have dynamic spatial characteristics that change over trees.
time, such as police vehicles, ambulances, or fire trucks. ● The intemal nodes of Rvtrees are associated with rectangles whose
● The following categories illustrate three typical types of spatial area covers all the rectangles in its subtree.
queries: ● Hence, R+ trees can easily answer queries, such as find all objects
• Range query: Finds the objects of a particular type that are within a given in a given area by limiting the tree search to those subtrees whose
spatial area rectangles intersect with the area given in the query.
or within a particular distance from a given location. (For example, finds all
hospitals within the Dallas city area, or finds all ambulances within five Quad trees
miles of an accident location.) ● Other spatial storage structures include quadtrees and their
• Nearest neighbor query: Finds an object of a particular type that is closest variations.
to a given location. ● Quadtrees generally divide each space or subspace into equally
(For example, finds the police car that is closest to a particular location.) sized areas, and proceed with the subdivisions of each subspace to
• Spatial joins or overlays: Typically joins the objects of two types based identify the positions of various objects.
on some spatial condition, such as the objects intersecting or overlapping
spatially or being within a certain distance of one another.
(For example, finds all cities that fall on a major highway or finds all homes
that are within two miles of a lake.)
R+ trees:
● For these and other types of spatial queries to be answered
efficiently, special techniques for spatial indexing are needed.
● One of the best known techniques is the use of R+ trees and their
variations.
● R+ trees group together objects that are in close spatial physical
proximity on the same leaf nodes of a tree-structured index.
● Since a leaf node can point to only a certain number of objects,
algorithms for dividing the space into rectangular subspaces that
include the objects are needed.