Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

LOYOLA-ICAM

COLLEGE OF ENGINEERING AND TECHNOLOGY


LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

UNIT V CURRENT ISSUES Rules as a knowledge representation technique


Rules - Knowledge Bases - Active and Deductive Databases -
Multimedia Databases– Multimedia Data Structures – Multimedia ● The term rule in AI, which is the most commonly used type of
Query languages - Spatial Databases. knowledge representation, can be defined as an IF-THEN
structure that relates given information or facts in the IF part
TOPIC 1: Rules to some action in the THEN part.
● A rule provides some description of how to solve a problem.
Rules: ● Rules are relatively easy to create and understand.
● Any rule consists of two parts: the IF part, called the
Rule-based expert systems
antecedent (premise or condition) and the THEN part called
● Knowledge is a theoretical or practical understanding of a subject the consequent (conclusion or action).
or a domain.
IF <antecedent>
● Knowledge is also the sum of what is currently known, and
apparently knowledge is power. THEN <consequent>
● Those who possess knowledge are called experts.
● Anyone can be considered a domain expert if he or she has deep ● A rule can have multiple antecedents joined by the keywords
knowledge of both facts and rules and strong practical experience AND (conjunction), OR (disjunction) or a combination of
in a particular domain. both.
● The area of the domain may be limited.
● In general, an expert is a skilful person who can do things other IF <antecedent 1> AND <antecedent 2>
people cannot.
● The human mental process is internal, and it is too complex to be .
represented as an algorithm.
.
● Most experts are capable of expressing their knowledge in the form
of rules for problem solving. .

IF the ‘traffic light’ is green AND <antecedent n> THEN <consequent>

THEN the action is go

IF the ‘traffic light’ is red IF <antecedent 1> OR <antecedent 2>

THEN the action is stop .


LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

. AND the sky is cloudy

. AND the forecast is drizzle

AND <antecedent n> THEN <consequent> THEN the advice is ‘take an umbrella’

● The antecedent of a rule incorporates two parts: an object Directive


(linguistic object) and its value. The object and its value are
linked by an operator. IF the car is dead
● The operator identifies the object and assigns the value.
AND the ‘fuel tank’ is empty
Operators such as is, are, is not, are not are used to assign a
symbolic value to a linguistic object. THEN the action is ‘refuel the car’
● Expert systems can also use mathematical operators to define
an object as numerical and assign it to the numerical value. ● Strategy

IF ‘age of the customer’ < 18 ● IF the car is dead


● THEN the action is ‘check the fuel tank
AND ‘cash withdrawal’ > 1000 o step1 is complete
● IF step1 is complete
THEN ‘signature of the parent’ is required
● AND the ‘fuel tank’ is full
Rules can represent relations, recommendations, directives, strategies ● THEN the action is ‘check the battery
and heuristics: o step2 is complete

Relation ● Heuristic

IF the ‘fuel tank’ is empty ▪ IF the spill is liquid


▪ AND the ‘spill pH’ < 6
THEN the car is dead ▪ AND the ‘spill smell’ is vinegar
▪ THEN the ‘spill material’ is ‘acetic acid’

The main players in the development team


Recommendation

IF the season is autumn


LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● There are five members of the expert system development team: interviews the domain expert to find out how a particular
the domain expert, the knowledge engineer, the programmer, problem is solved. The knowledge engineer establishes what
the project manager and the end-user. reasoning methods the expert uses to handle facts and rules
and decides how to represent them in the expert system. The
The success of their expert system entirely depends on how well the knowledge engineer then chooses some development software
members work together. or an expert system shell, or looks at programming languages
for encoding the knowledge. And finally, the knowledge
The main players in the development team
engineer is responsible for testing, revising and integrating the
expert system into the workplace.
● The programmer is the person responsible for the actual
programming, describing the domain knowledge in terms that
a computer can understand. The programmer needs to have
skills in symbolic programming in such AI languages as LISP,
Prolog and OPS5 and also some experience in the application
of different types of expert system shells. In addition, the
programmer should know conventional programming
languages like C, Pascal, FORTRAN and Basic.
● The project manager is the leader of the expert system
development team, responsible for keeping the project on
track. He or she makes sure that all deliverables and
milestones are met, interacts with the expert, knowledge
● The domain expert is a knowledgeable and skilled person
engineer, programmer and end-user.
capable of solving problems in a specific area or domain.
● The end-user, often called just the user, is a person who uses
This person has the greatest expertise in a given domain. This
the expert system when it is developed. The user must not
expertise is to be captured in the expert system. Therefore, the
only be confident in the expert system performance but also
expert must be able to communicate his or her knowledge, be
feel comfortable using it. Therefore, the design of the user
willing to participate in the expert system development and
interface of the expert system is also vital for the project’s
commit a substantial amount of time to the project. The
success; the end-user’s contribution here can be crucial.
domain expert is the most important player in the expert
system development team.
● The knowledge engineer is someone who is capable of
designing, building and testing an expert system. He or she Structure of a rule-based expert system
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● In the early seventies, Newell and Simon from ● The knowledge base contains the domain knowledge useful for
Carnegie-Mellon University proposed a production system problem solving. In a rule-based expert system, the knowledge is
model, the foundation of the modern rule-based expert represented as a set of rules. Each rule specifies a relation,
systems. recommendation, directive, strategy or heuristic and has the IF
● The production model is based on the idea that humans solve (condition) THEN (action) structure. When the condition part of a
problems by applying their knowledge (expressed as rule is satisfied, the rule is said to fire and the action part is
production rules) to a given problem represented by executed.
problem-specific information. ● The database includes a set of facts used to match against the IF
● The production rules are stored in the long-term memory and (condition) parts of rules stored in the knowledge base.
the problem-specific information or facts in the short-term
memory. ● The inference engine carries out the reasoning whereby the expert
system reaches a solution. It links the rules given in the
Production system model knowledge base with the facts provided in the database.

● The explanation facilities enable the user to ask the expert system
how a particular conclusion is reached and why a specific fact is
needed. An expert system must be able to explain its reasoning
and justify its advice, analysis or conclusion.

● The user interface is the means of communication between a user


seeking a solution to the problem and an expert system.

Complete structure of a rule-based expert system


Basic structure of a rule-based expert system
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Characteristics of an expert system systems, but mistakes are possible and we should be aware of
this.
● An expert system is built to perform at a human expert level in
a narrow, specialised domain. Thus, the most important ● In expert systems, knowledge is separated from its
characteristic of an expert system is its high-quality processing (the knowledge base and the inference engine are
performance. No matter how fast the system can solve a split up). A conventional program is a mixture of knowledge
problem, the user will not be satisfied if the result is wrong. and the control structure to process this knowledge. This
● On the other hand, the speed of reaching a solution is very mixing leads to difficulties in understanding and reviewing the
important. Even the most accurate decision or diagnosis may program code, as any change to the code affects both the
not be useful if it is too late to apply, for instance, in an knowledge and its processing.
emergency, when a patient dies or a nuclear power plant
explodes. ● When an expert system shell is used, a knowledge engineer or
an expert simply enters rules in the knowledge base. Each
● Expert systems apply heuristics to guide the reasoning and new rule adds some new knowledge and makes the expert
thus reduce the search area for a solution. system smarter.

● A unique feature of an expert system is its explanation


capability. It enables the expert system to review its own
reasoning and explain its decisions.

● Expert systems employ symbolic reasoning when solving a


problem. Symbols are used to represent different types of
knowledge such as facts, concepts and rules.

Can expert systems make mistakes?

● Even a brilliant expert is only a human and thus can make


mistakes. This suggests that an expert system built to perform
at a human expert level also should be allowed to make
mistakes. But we still trust experts, even we recognise that
their judgements are sometimes wrong. Likewise, at least in
most cases, we can rely on solutions provided by expert
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Comparison of expert systems with conventional systems and human


experts
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Comparison of expert systems with conventional systems and human experts (Continued)

Human Experts Expert Systems Conventional Programs

Use inexact reasoning and Permit inexact reasoning and Work only on problems
can deal with incomplete, can deal with incomplete, where data is complete and
uncertain and fuzzy uncertain and fuzzy data. exact.
information.

Can make mistakes when Can make mistakes when Provide no solution at all, or
information is incomplete or data is incomplete or fuzzy. a wrong one, when data is
fuzzy. incomplete or fuzzy.

Enhance the quality of Enhance the quality of Enhance the quality of


problem solving via years of problem solving by adding problem solving by changing
learning and practical new rules or adjusting old the program code, which
training. This process is ones in the knowledge base. affects both the knowledge
slow, inefficient and When new knowledge is and its processing, making
expensive. acquired, changes are easy changes difficult.
to accomplish.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Forward chaining and backward chaining An example of an inference chain

● In a rule-based expert system, the domain knowledge is


represented by a set of IF-THEN production rules and data is
represented by a set of facts about the current situation. The
inference engine compares each rule stored in the knowledge base
with facts contained in the database. When the IF (condition) part
of the rule matches a fact, the rule is fired and its THEN (action)
part is executed.
● The matching of the rule IF parts to the facts produces inference
chains. The inference chain indicates how an expert system applies
the rules to reach a conclusion. Forward chaining

Inference engine cycles via a match-fire procedure ● Forward chaining is the data-driven reasoning. The reasoning
starts from the known data and proceeds forward with that data.
Each time only the topmost rule is executed. When fired, the rule
adds a new fact in the database. Any rule can be executed only
once. The match-fire cycle stops when no further rules can be
fired.
● Forward chaining is a technique for gathering information and then
inferring from it whatever can be inferred.
● However, in forward chaining, many rules may be executed that
have nothing to do with the established goal.
● Therefore, if our goal is to infer only one particular fact, the
forward chaining inference technique would not be efficient.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Backward chaining

● Backward chaining is the goal-driven reasoning. In backward


chaining, an expert system has the goal (a hypothetical solution)
and the inference engine attempts to find the evidence to prove it.
First, the knowledge base is searched to find rules that might have
How do we choose between forward and backward chaining?
the desired solution. Such rules must have the goal in their THEN
(action) parts. If such a rule is found and its IF (condition) part ● If an expert first needs to gather some information and then tries to
matches data in the database, then the rule is fired and the goal is infer from it whatever can be inferred, choose the forward chaining
proved. However, this is rarely the case. inference engine.
● However, if your expert begins with a hypothetical solution and
● Thus the inference engine puts aside the rule it is working with (the
then attempts to find facts to prove it, choose the backward
rule is said to stack) and sets up a new goal, a subgoal, to prove the
chaining inference engine.
IF part of this rule. Then the knowledge base is searched again for
rules that can prove the subgoal. The inference engine repeats the Conflict resolution
process of stacking the rules until no rules are found in the
knowledge base to prove the current subgoal. Earlier we considered two simple rules for crossing a road. Let us now add
third rule:

● Rule 1:
o IF the ‘traffic light’ is green
o THEN the action is go
● Rule 2:
o IF the ‘traffic light’ is red
o THEN the action is stop
● Rule 3:
o IF the ‘traffic light’ is red
o THEN the action is go
● We have two rules, Rule 2 and Rule 3, with the same IF part. Thus
both of them can be set to fire when the condition part is satisfied.
These rules represent a conflict set. The inference engine must
determine which rule to fire from such a set. A method for
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

choosing a rule to fire when more than one rule can be fired in a Metarules
given cycle is called conflict resolution. ● Metarule 1:
● In forward chaining, BOTH rules would be fired. Rule 2 is fired ▪ Rules supplied by experts have higher priorities
first as the topmost one, and as a result, its THEN part is executed than rules supplied by novices.
and linguistic object action obtains value stop. However, Rule 3 is
also fired because the condition part of this rule matches the fact ● Metarule 2:
‘traffic light’ is red , which is still in the database. As a Rules governing the rescue of human lives have higher
consequence, object action takes new value go. priorities than rules concerned with clearing overloads on power
system equipment.
Methods used for conflict resolution
Advantages of rule-based expert systems
● Fire the rule with the highest priority. In simple applications, the
priority can be established by placing the rules in an appropriate ● Natural knowledge representation. An expert usually
order in the knowledge base. Usually this strategy works well for explains the problem-solving procedure with such expressions
expert systems with around 100 rules. as this: “In such-and-such situation, I do so-and-so”. These
● Fire the most specific rule. This method is also known as the expressions can be represented quite naturally as IF-THEN
longest matching strategy. It is based on the assumption that a production rules.
specific rule processes more information than a general one. ● Uniform structure. Production rules have the uniform
IF-THEN structure. Each rule is an independent piece of
● Fire the rule that uses the data most recently entered in the knowledge. The very syntax of production rules enables them
database. This method relies on time tags attached to each fact in to be self-documented.
the database. In the conflict set, the expert system first fires the ● Separation of knowledge from its processing. The structure
rule whose antecedent uses the data most recently added to the of a rule-based expert system provides an effective separation
database. of the knowledge base from the inference engine. This makes
it possible to develop different applications using the same
Metaknowledge
expert system shell.
● Metaknowledge can be simply defined as knowledge about ● Dealing with incomplete and uncertain knowledge. Most
knowledge. Metaknowledge is knowledge about the use and rule-based expert systems are capable of representing and
control of domain knowledge in an expert system. reasoning with incomplete and uncertain knowledge.
● In rule-based expert systems, metaknowledge is represented
Disadvantages of rule-based expert systems
by metarules. A metarule determines a strategy for the use of
task-specific rules in the expert system.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Opaque relations between rules. Although the individual


production rules are relatively simple and self-documented, their
logical interactions within the large set of rules may be opaque.
Rule-based systems make it difficult to observe how individual
rules serve the overall strategy.
● Ineffective search strategy. The inference engine applies an
exhaustive search through all the production rules during each
cycle. Expert systems with a large set of rules (over 100 rules) can
be slow, and thus large rule-based systems can be unsuitable for
real-time applications.
● Inability to learn. In general, rule-based expert systems do not
have an ability to learn from the experience. Unlike a human
expert, who knows when to “break the rules”, an expert system
cannot automatically modify its knowledge base, or adjust existing
rules or add new ones. The knowledge engineer is still responsible
for revising and maintaining the system.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

knowledge
• The KNOWLEDGE BASED SYSTEM embodies a program whose goal
is to apply (interpret) the given knowledge.
In Knowledge Based System
Program + knowledge = Knowledge based system
TOPIC 2: KNOWLEDGE BASED

KNOWLEDGE BASES:
● Knowledge-based systems, expert systems
o structure, characteristics
o main components
o advantages, disadvantages
● Base techniques of knowledge-based systems
o rule-based techniques
o inductive techniques
o hybrid techniques
o symbol-manipulation techniques
o case-based techniques
o (qualitative techniques, model-based techniques, temporal
reasoning techniques, neural networks)
“A computer program that uses knowledge of the application domain to
solve problems in that domain, obtaining essentially the same solutions that
a person with experience in the same domain would obtain.”
Three fundamental differences between KNOWLEDGE BASED SYSTEM
and other types of software systems are:
1. Separation between knowledge and the use
of knowledge
2. Utilization of knowledge specific to a domain
3. Heuristic nature (as opposed to algorithmic) A KNOWLEDGE BASED SYSTEM is something more than a program
which just copies the algorithm/formula used by the expert. It must be able
• The fundamental characteristic of KNOWLEDGE BASED SYSTEM: to use the information in an "intelligent" way
– Separating knowledge from the application of
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

o KNOWLEDGE-REPRESENTATION METHOD!

● inference engine
o is a “engine” of problem solving (general problem solving
knowledge)
o is used for supporting the operation of the other
components
o has PROBLEM SOLVING METHOD!
● case-specific database
o auxiliary component
o specific information (information from outside, initial
Structure and characteristics data of the concrete problem)
● KNOWLEDGE BASED SYSTEMs are computer systems o information obtained during reasoning
o contain stored knowledge ● explanation subsystem
o solve problems like humans would o explanation of system’ actions in case of user’ request
● KNOWLEDGE BASED SYSTEMs are AI programs with program typical explanation facilities:
structure of new type. It has ▪ explanation during problem solving:
o knowledge-base (rules, facts, meta-knowledge) ● WHY... (explanative reasoning,
o inference engine (reasoning and search strategy for intelligent help, tracing information
solution, other services) about the actual reasoning steps)
o and problem data ● WHAT IF... (hypothetical reasoning,
● characteristics of KNOWLEDGE BASED SYSTEMs: conditional assignment and its
o intelligent information processing systems consequences, can be withdrawn)
o representation of domain of interest 🡪symbolic ●  WHAT IS ... (gleaning in
representation knowledge-base and case specific
o problem solving 🡪 by symbol-manipulation database)
o  symbolic programs o explanation after problem solving:
Main components ▪  HOW ... (explanative reasoning, information
● knowledge-base (KB) consists of about the way the result has been found)
o knowledge about the field of interest (in natural ▪ WHY NOT ... (explanative reasoning, finding
language-like formalism) counterexamples)
o symbolically described system-specification
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

▪ WHAT IS ... (gleaning in knowledge-base and ● developer interface (🡪 knowledge engineer, human expert)
case specific database) ● the main tasks of the knowledge engineer:
● knowledge acquisition subsystem o knowledge acquisition and design of KNOWLEDGE
BASED SYSTEM: determination, classification,
refinement and formalization of methods, thumbrules and
procedures
o selection of knowledge representation method and
reasoning strategy
o implementation of knowledge-based system verification
and validation of KB
o KB maintenance
● Knowledge databases are the database for knowledge management.
● Knowledge management is the way to gather, manage and use the
knowledge of an organisation.
● The basic objectives of knowledge management are to achieve
improved performance, competitive advantage and higher levels of
innovation in various tasks of an organisation.
● Knowledge is the key to such systems. Knowledge has several
aspects:
● main tasks:
o checking the syntax of knowledge elements o Knowledge can be implicit (called tacit knowledge) which
o checking the consistency of KB (verification, validation) are internalised or can be explicit knowledge.
o knowledge extraction, building KB
o automatic logging and book-keeping of the changes of o Knowledge can be captured before, during, or even after
KB knowledge activity is conducted.
o tracing facilities (handling breakpoints, automatic
monitoring and reporting the values of knowledge o Knowledge can be represented in logical form, semantic
elements) network form or database form.
● user interface (🡪 user)
o dialogue on natural language (consultation/ suggestion) o Knowledge once properly represented can be used to
● specially intefaces generate more knowledge using automated deductive
o database and other connections reasoning.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

o the information in such systems is specified using a


o Knowledge may sometimes be incomplete. In fact, one of declarative language in the form of rules and facts,
the most important aspects of the knowledge base is that it
should contain upto date and excellent quality of o an inference engine that is contained within the system is
information. used to deduce new facts from the database of rules and
facts,
● Simple knowledge databases may consist of the explicit knowledge
of an organsiation including articles, user manuals, white papers, o these databases use concepts from the relational database
troubleshooting information etc. domain (relational calculus) and logic programming
● Such a knowledge base would provide basic solutions to some of domain (Prolog Language),
the problems of the less experienced employees.
o The variant of Prolog known as Datalog is used in
● A good knowledge base should have: deductive databases. The Datalog has a different way of
executing programs than the Prolog and
o good quality articles having up to date information,
o the data in such databases is specified with the help of
o a good classification structure, facts and rules. For example, The fact that Rakesh is the
manager of Mohan will be represented as:
o a good content format, and
Manager(Rakesh, Mohan) having the schema:
o an excellent search engine for information retrieval.
Application of Knowledge Database Manager(Mgrname, beingmangaed)
● One of the knowledge base technologies is based on deductive
database technology.
Similarly the following represents a rule:
Deductive Databases Manager(Rakesh, Mohan) :- Managedby(Mohan, Rakesh)
A deductive database is a database system that can be used to make
deductions from the available rules and facts that are stored in such o During the representation of the fact the data is
databases. The following are the key characteristics of the represented using the attribute value only and not the
deductive databases: attribute name. The attribute name determination is on the
basis of the position of the data. For instance, in the
example above Rakesh is the Mgrname.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Some of the features of semantic modeling and semantic databases


o The rules in the Datalog do not contain the data. These are are:
evaluated on the basis of the stored data in order to
deduce more information. o these models represent information using high-level
modeling abstractions,
● Deductive databases normally operate in very narrow problem
domains. These databases are quite close to expert systems except o these models reduce the semantic overloading of data type
that deductive databases use, the database to store facts and rules, constructors,
whereas expert systems store facts and rules in the main memory.
● Expert systems also find their knowledge through experts whereas o semantic models represent objects explicitly along with
deductive database have their knowledge in the data. their attributes,
● Deductive databases are applied to knowledge discovery and
hypothesis testing. o semantic models are very strong in representing
Semantic Databases relationships among objects, and
● Information in most of the database management systems is
represented using a simple table with records and fields. o they can also be modeled to represent IS A relationships,
● However, simple database models fall short of applications that derived schema and also complex objects.
require complex relationships and rich constructs to be represented
using the database. ● Some of the applications that may be supported by such database
● Semantic modeling provides a far too rich set of data structuring systems in addition to knowledge databases may be applications
capabilities for database applications. such as bio-informatics, that require support for complex
● A semantic model contains far too many constructs that may be relationships, rich constraints, and large-scale data handling.
able to represent structurally complex inter-relations among data in Advantages of KBSs
a somewhat more natural way.
● Such complex inter-relationships typically occur in commercial ● make up for shortage of experts, spread expert’ knowledge on
applications. available price
● Semantic modeling is one of the tools for representing knowledge
● field of interest’ changes are well-tracked (R1)
especially in Artificial Intelligence and object-oriented
applications. ● increase expert’ ability and efficiency
● Thus, it may be a good idea to model some of the knowledge
databases using semantic database system. ● preserve know-how
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● can be developed systems unrealizabled with tradicional TOPIC 3: ACTIVE DATABASE


technology (Buck Rogers)
● Active databases provide additional functionality for specifying
● self-consistents in advising, equable in performance active rules.
● These rules can be automatically triggered by events that occur,
● are available permanently such as a database update or a certain time being reached and can
initiate certain actions that have been specified in the rule
● able to work even with partial, non-complete data
declaration if certain conditions are met.
● able to give expanation ● Many commercial packages have some of the functionality
provided by active databases in the form of triggers.
Disadvantages of KBSs ● Triggers are now part of the sQL-99 standard.
ACTIVE DATABASE CONCEPTS AND TRIGGERS
● their knowledge is from a narrow field, don’t know the limits ● Rules that specify actions that are automatically triggered by
certain events have been considered as important enhancements to
● the answers are not always correct (advices have to be analysed!)
a database system.
● don’t have common sense (greatest restriction) → all of the ● The concept of triggers-a technique for specifying certain types of
self-evident checking have to be defined active rules-has existed in early versions of the SQL specification
for relational databases and triggers are now part of the sQL-99
(many exceptions → increase the size of KB and the running time) standard.
Application:
● Commercial relational DBMSs-such as Oracle, DB2, and
SYBASE-have had various versions of triggers available.
Generalized Model for Active Databases and Oracle Triggers
The model that has been used for specifying active database rules is referred
to as the Event-Condition-Action, or ECAmodel.
A rule in the ECA model has three components:
1. The event (or events) that triggers the rule:
These events are usually database update operations that are
explicitly applied to the database.
In the general model, they could also be temporal events/ or other
kinds of external events.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

2. The condition that determines whether the rule action should


be executed:
Once the triggering event has occurred, an optional condition may
be evaluated.
If no condition is specified, the action will be executed once the
event occurs.
● Employee have a name (NAME), social security number (SSN),
If a condition is specified, it is first evaluated, and only if it
salary (SALARY), department to which they are currently assigned
evaluates to true will the rule action be executed.
(DNO, a foreign key to DEPARTMENT), and a direct supervisor
3. The action to be taken:
(SUPERVISOR_SSN, a (recursive) foreign key to EMPLOYEE).
The action is usually a sequence of SQL statements, but it could
● Null is allowed for DNO, indicating that an employee may be
also be a database transaction or an external program that will be
temporarily unassigned to any department.
automatically executed.
● Each department has a name (DNAME), number (DNO), the total
salary of all employees assigned to the department (TOTAL_SAL),
and a manager (MANAGER_SSN, a foreign key to EMPLOYEE).
● The TOTAL_SAL attribute is really a derived attribute, whose
value should be the sum of the salaries of all employees who are
assigned to the particular department.
● Maintaining the correct value of such a derived attribute can be
done via an active rule.
● We first have to determine the events that may cause a change in
the value of TOTAL_SAL, which are as follows:
1. Inserting (one or more) new employee tuples.
2. changing the salary of (one or more) existing employees.
3. Changing the assignment of existing employees from one
department to another.
Example: 4. Deleting (one or more) employee tuples.
COMPANY database application:
● In the case of event 1, we only need to recompute TOTAL_SAL if
the new employee is immediately assigned to a department-that is,
if the value of the DNO attribute for the new employee tuple is not
null (assuming null is allowed for DNO).
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Hence, this would be the condition to be checked.


● A similar condition could be checked for event 2 (and 4) to
determine whether the employee whose salary is changed (or who
is being deleted) is currently assigned to a department.
● For event 3, we will always execute an action to maintain the value
of TOTAL_SAL correctly, so no condition is needed (the action is
always executed).
● The action for events 1, 2, and 4 is to automatically update the
value of TOTAL_SAL for the employee's department to reflect the
newly inserted, updated, or deleted employee's salary.
● In the case of event 3, a twofold action is needed; one to update the
TOTAL_SAL of the employee's old department and the other to
update the TOTAL_SAL of the employee's new department.
● The four active rules (or triggers) R1, R2, R3, and
R4-corresponding to the above situation-can be specified in the
notation of the Oracle DBMS as shown in Figure.

● Let us consider rule R1 to illustrate the syntax of creating triggers


in Oracle. The CREATE
● TRIGGER statement specifies a trigger (or active rule)
name-TOTALSALl for Rl.
● The AFTER-clause specifies that the rule will be triggered after
the events that trigger the rule occur. The triggering events-an
insert of a new employee in this example-are specified following
the AFTER keyword." The ON-clause specifies the relation on
which the rule is specified-EMPLOYEE for Rl.
● The optional keywords FOR EACH ROW specify that the rule
will be triggered once for eachrow that is affected by the triggering
event."
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The optional WHEN clause is used to specify any conditions that ● Rule R2 is similar to Rl, but it is triggered by an UPDATE
need to be checked after the rule is triggered but before the action operation that updates the SALARY of an employee rather than by
is executed. an INSERT.
● Finally, the actionts) to be taken are specified as a PL/SQL block, ● Rule R3 is triggered by an update to the DNO attribute of
which typically contains one or more SQL statements or calls to EMPLOYEE, which signifies changing an employee's assignment
execute external procedures. from one department to another.
The four triggers (active rules) Rl , R2, R3, and R4 illustrate a number of ● There is no condition to check in R3, so the action is executed
features of active rules. whenever the triggering event occurs.
1. First, the basic events that can be specified for triggering the rules ● The action updates both the old department and new department of
are the standard SQL update commands: INSERT, DELETE, and the reassigned employees by adding their salary to TOTAL_SAL
UPDATE. These are specified by the keywords INSERT, of their new department and subtracting their salary from
DELETE, and UPDATE in Oracle notation. TOTAL_SAL of their old department.
In the case of UPDATE one may specify the attributes to be ● This should work even if the value of DNO was null, because in
updated-for example, by writing UPDATEOF SALARY, DNO. this case no department will be selected for the rule action.
2. Second, the rule designer needs to have a way to refer to the tuples
that have been inserted, deleted, or modified by the triggering ● The effect of the optional FOR EACH ROW clause, signifies that
event. the rule is triggered separately for each tuple. This is known as a
The keywords NEW and OLD are used in Oracle notation; NEW is row-level trigger.
used to refer to a newly inserted or newly updated tuple, whereas ● If FOR EACH ROW clause was left out, the trigger would be
OLD is used to refer to a deleted tuple or to a tuple before it was known as a statement-level trigger and would be triggered once
updated. for each triggering statement.
● To update multiple records, a rule using row-level semantics would
● Thus rule Rl is triggered after an INSERT operation is applied to be triggered once for eachrow, whereas a rule using statement-level
the EMPLOYEE relation. semantics is triggered only once.
● In Rl, the condition (NEW. DNO IS NOT NULL) is checked, and ● The keywords NEW and OLD can only be used with row-level
if it evaluates to true, meaning that the newly inserted employee triggers.
tuple is related to a department, then the action is executed. Design and Implementation Issues for Active Databases
● The action updates the DEPARTMENT tuplets) related to the (1)The first issue concerns activation, deactivation, and grouping of rules.
newly inserted employee by adding their salary (NEW. SALARY) ● In addition to creating rules, an active database system should
to the TOTAL_SAL attribute of their related department. allow users to activate, deactivate, and drop rules by referring to
their rule names.
Deactivated rule:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● A deactivated rule will not be triggered by the triggering event. 2. Deferred consideration:
● This feature allows users to selectively deactivate rules for certain The condition is evaluated at the end of the transaction that included
periods of time when they are not needed. the triggering event.
Active command: In this case, there could be many triggered rules waiting to have their
● The activate command will make the rule active again. conditions evaluated.
Drop Command: 3. Detached consideration:
● The drop command deletes the rule from the system. The condition is evaluated as a separate transaction, spawned from the
triggering transaction.
● Another option is to group rules into named rule sets, so the whole ● The next set of options concerns the relationship between
set of rules could be activated, deactivated, or dropped. evaluating the rule condition and executing the rule action.
● It is also useful to have a command that can trigger a rule or rule ● Three options are possible here: immediate, deferred, and
set via an explicit PROCESS RULES command issued by the user. detached execution.
(2)The second issue concerns whether the triggered action should be ● Most active systems use the immediate option.
executed before, after, or concurrently with the triggering event. ● That is, as soon as the condition is evaluated, if it returns true, the
(3)A related issue is whether the action being executed should be action is immediately executed.
considered as a separate transaction or whether it should be part of the (4)Another issue concerning active database rules is the distinction between
same transaction that triggered the rule. row-level rules versus statement-level rules.
● Because SQL update statements (which act as triggering events)
● The triggering event occurs as part of a transaction execution. can specify a set of tuples, one has to distinguish between whether
● We should first consider the various options for how the triggering the rule should be considered once for the whole statement or
event is related to the evaluation of the rule's condition. whether it should be considered separately for eachrow (that is,
● The rule condition evaluation is also known as rule consideration, tuple) affected by the statement.
since the action is to be executed only after considering whether ● The sQL-99 standard and the Oracle system allow the user to
the condition evaluates to true or false. choose which of the two options is to be used for each rule,
● There are three main possibilities for rule consideration: whereas STARBURST uses statement-level semantics only.
1. Immediate consideration: ● One of the difficulties that may have limited the widespread use of
The condition is evaluated as part of the same transaction as the active rules, in spite of their potential to simplify database and
triggering event, and is evaluated immediately. This case can be further software development, is that there are no easy-to-use techniques
categorized into three options: for designing, writing, and verifying rules. For example, it is quite
• Evaluate the condition before executing the triggering event. difficult to verify that a set of rules is consistent, meaning that two
• Evaluate the condition after executing the triggering event. or more rules in the set do not contradict one another.
• Evaluate the condition instead of executing the triggering event.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● It is also difficult to guarantee termination of a set of rules under ● A related application is to maintain replicated tables consistent by
all circumstances. specifying rules that modify the replicas whenever the master table
● If dozens of rules are written, it is very difficult to determine is modified.
whether termination is guaranteed or not.
● If active rules are to reach their potential, it is necessary to develop
tools for the design, debugging, and monitoring of active rules that
can help users in designing and debugging their rules.

Potential Applications for Active Databases


● One important application is to allow notification of certain
conditions that occur.
● For example, an active database may be used to monitor, say, the
temperature of an industrial furnace.
● Active rules can also be used to enforce integrity constraints by
specifying the types of events that may cause the constraints to be
violated and then evaluating appropriate conditions that check
whether the constraints are actually violated by the event or not.
● Complex application constraints, often known as business rules
may be enforced that way.
● For example, in the UNIVERSITY database application, one rule
may monitor the grade point average of students whenever a new
grade is entered, and it may alert the advisor if the CPA of a
student falls below a certain threshold; another rule may check that
course prerequisites are satisfied before allowing a student to
enroll in a course; and so on.
● Other applications include the automatic maintenance of derived
data.
● A similar application is to use active rules to maintain the
TOPIC: 4 DEDUCTIVE DATABASE
consistency of materialized views whenever the base relations are
modified. ● In a deductive database system, we typically specify rules
● This application is also relevant to the new data warehousing through a declarative language-a language in which we specify
technologies what to achieve rather than how to achieve it.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● An inference engine (or deduction mechanism) within the system The main difference between rules and views is that rules may
can deduce new facts from the database by interpreting these involve recursion and hence may yield virtual relations that cannot
rules. be defined in terms of basic relational views.
● The model used for deductive databases is closely related to the
relational data model, and particularly to the domain relational ● The evaluation of Prolog programs is based on a technique called
calculus formalism. backward chaining, which involves a top-down evaluation of
● It is also related to the field of logic programming and the Prolog goals.
language. ● In the deductive databases that use Datalog, attention has been
● The deductive database work based on logic has used Prolog as a devoted to handling large volumes of data stored in a relational
starting point. database.
● A variation of Prolog called Datalog is used to define rules ● Hence, evaluation techniques have been devised that resemble
declaratively in conjunction with an existing set of relations, those for a bottom-up evaluation.
which are themselves treated as literals in the language.
● Although the language structure of Datalog resembles that of ● Prolog suffers from the limitation that the order of specification of
Prolog, its operational semantics-that is, how a Datalog program is facts and rules is significant in evaluation; moreover, the order of
to be executed-is still different. literals within a rule is significant.
● A deductive database uses two main types of specifications: ● The execution techniques for Datalog programs attempt to
facts and rules. circumvent these problems.

Prolog/Datalog Notation
● Facts: ● The notation used in Prolog/Datalog is based on providing
Facts are specified in a manner similar to the way relations are predicates with unique names.
specified, except that it is not necessary to include the attribute ● A predicate has an implicit meaning, which is suggested by the
names. predicate name, and a fixed number of arguments.
In a deductive database, the meaning of an attribute value in a ● If the arguments are all constant values, the predicate simply
tuple determined solely by its position within the tuple. states that a certain fact is true.
● Rules: ● If, on the other hand, the predicate has variables as arguments, it
Rules are somewhat similar to relational views. is either considered as a query or as part of a rule or constraint.
They specify virtual relations that are not actually stored but that ● In Prolog convention all constant values in a predicate are either
can be formed from the facts by applying inference mechanisms numeric or character strings; they are represented as identifiers
based on the rule specifications. (or names) starting with lowercase letters only, whereas variable
● Difference between views and rules names always start with an uppercase letter.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Example ● Attribute names are omitted in the Prolog notation.


● Attribute names are only represented by virtue of the position of
● Consider the example shown in Figure, which is based on the each argument in a predicate: the first argument represents the
relational database. supervisor, and the second argument represents a direct
subordinate.
● The other two predicate names are defined by rules.
● The main contribution of deductive databases is the ability to
specify recursive rules, and to provide a framework for inferring
new information based on the specified rules.
● A rule is of the form head :- body, where :- is read as "if and only
if."
● A rule usually has a single predicate to the left of the :symbol-
called the head or left-hand side (LHS) or conclusion of the
rule-and one or more predicates to the right of the :- symbol-called
the body or right-hand side (RHS) or premisets) of the rule.
● A predicate with constants as arguments is said to be ground; we
also refer to it as an instantiated predicate.
● The arguments of the predicates that appear in a rule typically
include a number of variable symbols, although predicates can also
contain constants as arguments.
● A rule specifies that, if a particular assignment or binding of
constant values to the variables in the body (RHS predicates)
● There are three predicate names: supervise, superior, and
makes allthe RHS predicates true, it also makes the head (LHS
subordinate.
predicate) true by using the same assignment of constant values to
● The supervise predicate is defined via a set of facts, each of which
variables.
has two arguments: a supervisor name, followed by the name of a
● Hence, a rule provides us with a way of generating new facts that
direct supervisee (subordinate) of that supervisor.
are instantiations of the head of the rule.
● These facts correspond to the actual data that is stored in the
● These new facts are based on facts that already exist,
database, and they can be considered as constituting a set of tuples
corresponding to the instantiations (or bindings) of predicates in
in a relation SUPERVISE with two attributes whose schema is
the body of the rule.
SUPERVISE (Supervisor, Supervisee)
● By listing multiple predicates in the body of a rule we implicitly
● Thus, supervise(X, Y) states the fact that "X supervises Y."
apply the logical and operator to these predicates.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The commas between the RHS predicates may be read as meaning superior(X, Y)-the rule head-is also true, since Y would be a direct
"and." subordinate of X (at one level down).
● Consider the definition of the predicate superior in Figure, whose ● This rule can be used to generate all direct superior/subordinate
first argument is an employee name and whose second argument is relationships from the facts that define the supervise predicate.
an employee who is either a direct or an indirect subordinate of the ● The second recursive rule states that, if supervise (X , Z) and
first employee. supervisor (X , Y) are both true, then superior (X, Y) is also true.
● This is an example of a recursive rule, where one of the rule body
predicates in the RHS is the same as the rule head predicate in the
LHS.
● In general, the rule body defines a number of premises such that, if
they are all true, we can deduce that the conclusion in the rule head
is also true.
● If we have two (or more) rules with the same head (LHS
predicate), it is equivalent to saying that the predicat is true (that is,
that it can be instantiated) if eitherone of the bodies is true; hence,
it is equivalent to a logical or operation.
● For example, if we have two rules X : - Y and X : - Z, they are
equivalent to a rule X : - Y or Z.
● The latter form is not used in deductive systems, however, because
it is not in the standard form of rule, called a Horn clause.
● A Prolog system contains a number of built-in predicates that the
system can interpret directly.
● These typically include the equality comparison operator =(X, Y),
which returns true if X and Yare identical and can also be written
● By indirect subordinate, we mean the subordinate of some
as X=Y by using the standard infix notation.
subordinate down to any number of levels.
● Other comparison operators for numbers, such as <, <=, >, and >=,
● Thus superior(X, Y) stands for the fact that "X is a superior of Y"
can be treated as binary predicates.
through direct or indirect supervision.
● Arithmetic functions such as +, -, *, and / can be used as arguments
● We can write two rules that together specify the meaning of the
in predicates in Prolog.
new predicate.
● In contrast, Datalog (in its basic form) does not allow functions
● The first rule under Rules in the figure states that, for every value
such as arithmetic operations as arguments; indeed, this is one of
of X and Y, if supe rvise(X, Y)-the rule body-is true, then
the main differences between Prolog and Datalog.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Later extensions to Datalog have been proposed to include ● In Datalog, atomic formulas are literals of the form p(al , a2 , ……,
functions. an), where p is the predicate name and n is the number of
● A query typically involves a predicate symbol with some variable arguments for predicate p.
arguments, and its meaning (or "answer") is to deduce all the ● Different predicate symbols have different numbers of arguments,
different constant combinations that, when bound (assigned) to the and the number of arguments n of predicate p is sometimes called
variables, can make the predicate true. the arity or degree of p.
● For example, the first query in Figure requests the names of all ● The arguments can be either constant values or variable names.
subordinates of "james" at any level. ● We use the convention that constant values either are numeric or
start with a lowercase character, whereas variable names always
start with an uppercase character.

Built in Predicates
● A number of built-in predicates are included in Datalog, which can
also be used to construct atomic formulas.
● The built-in predicates are of two main types:
(i) the binary comparison predicates over ordered domains
Like <(less), <=(less_or_equal), >(greater), and >= (greater_
or_equal)
and
● A different type of query, which has only constant symbols as (ii) the comparison predicates over ordered or unordered domains
arguments, returns either a true or a false result, depending on like = (equal) and /= (not_equa1)
whether the arguments provided can be deduced from the facts and ● These can be used as binary predicates with the same functional
rules. syntax as other predicates for example by writing less(X, 3)
● For example, the second query in Figure returns true, since or
superior (james, joyce) can be deduced. they can be specified by using the customary infix notation X<3.
● Because the domains of these predicates are potentially infinite,
Datalog Notation they should be used with care in rule definitions.
● In Datalog, as in other logic-based languages, a program is built Example:
from basic objects called atomic formulas. For example, the predicate greate r (X, 3), if used alone, generates
● It is customary to define the syntax of logic-based languages by an infinite set of values for X that satisfy the predicate (all integer
describing the syntax of atomic formulas and identifying how they numbers greater than 3).
can be combined to form a program. ● A literal is either an atomic formula called a positive literal
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

or
● an atomic formula preceded by not- This is a negated atomic
formula, called a negative literal. ● This clause has n negative literals and positive literals. Such a
● Datalog programs can be considered to be a subset of the clause can be transformed into the following equivalent logical
predicate calculus formulas. formula:
● In Datalog, the predicate calculus formulas are first converted into
clausal form before they are expressed in Datalog; and
● where => is the implies symbol.
● only formulas given in a restricted clausal form, called Horn
● The formulas (1) and (2) are equivalent, meaning that their truth
clauses can be used in Datalog.
values are always the same.
Clausal Form and Horn Clauses
● This is the case because, if all the Pi literals (i = 1,2, ... ,n) are true,
● A formula in the relational calculus is a condition that includes
the formula (2) is true only if at least one of the Qi’s is true, which
predicates called atoms (based on relation names).
is the meaning of the => (implies) symbol.
● A formula can have quantifiers-namely, the universal quantifier
● For formula (1), if all the Pi literals (i = 1,2, ... , n) are true, their
(for all) and the existential quantifier (there exists).
negations are all false; so in this case formula (1) is true only if at
● In clausal form, a formula must be transformed into another
least one of the Qi’s is true.
formula with the following characteristics:
● In Datalog, rules are expressed as a restricted form of clauses
• All variables in the formula are universally quantified.
called Horn clauses, in which a clause can contain at most one
Hence, it is not necessary to include the universal
positive literal.
quantifiers (for all) explicitly; the quantifiers are removed,
● Hence, a Horn clause is either of the form
and all variables in the formula are implicitly quantified
by the universal quantifier.
• In clausal form, the formula is made up of a number of
clauses, where each clause is composed of a number of
literals connected by OR logical connectives only. Hence,
each clause is a disjunction of literals.
• The clauses themselves are connected by AND logical
connectives only, to form a formula. Hence, the clausal
form of a formula is a conjunction of clauses.
Example
Any formula can be converted into clausal form.
Literals can be positive literals or negative literals.
Consider a clause of the form:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The inference mechanism is a computational procedure and


hence provides a computational interpretation of the meaning of
rules.
Proof theoretic:
● In the proof-theoretic interpretation of rules, we consider the
facts and rules to be true statements, or axioms.
● A Datalog rule, as in (6), is hence a Horn clause, and its meaning, ● Ground axioms contain no variables.
based on formula (5), is that if the predicates ● The facts are ground axioms that are given to be true.
● Rules are called deductive axioms, since they can be used to
are all true for a
deduce new facts.
particular binding to their variable arguments, then Q is also true
● The deductive axioms can be used to construct proofs that derive
and can hence be inferred.
new facts from existing facts.
● The Datalog expression (8) can be considered as an integrity
● For example, Figure below shows how to prove the fact
constraint, where all the predicates must be true to satisfy the
superior(james, ahmad) from the rules and facts.
query.

● A Prolog or Datalog system has an internal inference engine that


can be used to process and compute the results of such queries.
● Prolog inference engines typically return one result to the query
(that is, one set of values for the variables in the query) at a time
and must be prompted to return additional results.
● On the contrary, Datalog returns results set-at-a-time.
Interpretations of Rules
● There are two main alternatives for interpreting the theoretical
meaning of rules: proof theoretic and model-theoretic.
● In practical systems, the inference mechanism within a system
defines the exact interpretation, which may not coincide with
either of the two theoretical interpretations.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The proof theoretic interpretation gives us a procedural or


computational approach for computing an answer to the Datalog
query.
● The process of proving whether a certain fact (theorem) holds is
known as theorem proving.

Model theoretic interpretation:


● The second type of interpretation is called the model-theoretic
interpretation.
● Whenever a particular substitution to the variables in the rules is
applied, if all the predicated are true under the interpretation, the
predicate at the head of the rule must also be true
● Given a finite or an infinite domain of constant values, we assign
to a predicate every possible combination of values as arguments.
● This interpretation assigns a truth value (true or false) to every
● We must then determine whether the predicate is true or false.
possible combination of argument values (from a finite domain) for
● It is sufficient to specify the combinations of arguments that make
the two predicates.
the predicate true, and to state that all other combinations make the
● An interpretation is called a model for a specific set of rules if
predicate false.
those rules are always true under that interpretation; that is, for any
● If this is done for every predicate, it is called an interpretation of
values assigned to the variables in the rules, the head of the rules is
the set of predicates.
true when we substitute the truth values assigned to the predicates
● For example, consider the interpretation shown in Figure for the
in the body of the rule by that interpretation. Hence, whenever a
predicates supervise and superior.
particular substitution (binding) to the variables in the rules is
applied, if all the predicates in the body of a rule are true under the
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

interpretation, the predicate in the head of the rule must also be ● For example, if supe rvise(a, b) and superior(b, c) are both true
true. under some interpretation, but supe rior (a, c) is not true, the
● The interpretation shown in Figure is a model for the two rules interpretation cannot be a model for the recursive rule:
shown, since it can never cause the rules to be violated. superior(X,Y) :- supervise(X,Z), superior(Z,Y)
● In the model-theoretic approach, the meaning of the rules is
established by providing a model for these rules.
Minimal Model:
● A model is called a minimal model for a set of rules if we cannot
change any fact from true to false and still get a model for these
rules.

● A rule is violated if a particular binding of constants to the


variables makes all the predicates in the rule body true but makes
the predicate in the rule head false.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The model shown in Figure is the minimal model for the set of
facts that are defined by the supervise predicate.
● In general, the minimal model that corresponds to a given set of
facts in the model theoretic interpretation should be the same as the
facts generated by the proof-theoretic interpretation for the same
original set of ground and deductive axioms.
● However, this is generally true only for rules with a simple
structure.
● Once we allow negation in the specification of rules, the
correspondence between interpretations does not hold.
● In fact, with negation, numerous minimal models are possible for a
given set of facts.
● A third approach to interpreting the meaning of rules involves
defining an inference mechanism that is used by the system to
deduce facts from the rules.
● This inference mechanism would define a computational
interpretation to the meaning of the rules.
● The Prolog logic programming language uses its inference
mechanism to define the meaning of the rules and facts in a Prolog
program.
● Not all Prolog programs correspond to the proof theoretic or
model-theoretic interpretations; it depends on the type of rules in
the program.
● For example, consider the interpretation in Figure, and assume that ● However, for many simple Prolog programs, the Prolog inference
the supervise predicate is defined by a set of known facts, whereas mechanism infers the facts that correspond either to the
the superior predicate is defined as an interpretation (model) for proof-theoretic interpretation or to a minimal model under the
the rules. model-theoretic interpretation.
● Suppose that we add the predicate superior (james , bob) to the true Datalog Programs and Their Safety
predicates. This remains a model for the rules shown, but it is not a ● There are two main methods of defining the truth values of
minimal model, since changing the truth value of superior (james , predicates in actual Datalog programs.
bob) from true to false still provides us with a model for the rules. ● Fact-defined predicates (or relations) are defined by listing all the
combinations of values (the tuples) that make the predicate true.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● These correspond to base relations whose contents are stored in a


database system.

● Rule-defined predicates (or views) are defined by being the head


(LHS) of one or more Datalog rules; they correspond to virtual
relations whose contents can be inferred by the inference engine.
● Figure shows a number of rule-defined predicates.

● Figure shows the fact-defined predicates employee, male, female,


department, supervise, project, and workson, which correspond to
part of the relational database shown in Figure .
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● A program or a rule is said to be safe if it generates a finite set of ● It is generally advisable to write such a rule in the safest form, with
facts. the predicates that restrict possible bindings of variables placed
● The general theoretical problem of determining whether a set of first.
rules is safe is undecidable. ● As another example of an unsafe rule, consider the following rule:
● We can determine the safety of restricted forms of rules. has_something(X,Y) :- employee(X)
● One situation where we get unsafe rules that can generate an ● Here, an infinite number of Y values can again be generated, since
infinite number of facts arises when one of the variables in the rule the variable Y appears only in the head of the rule and hence is not
can range over an infinite domain of values, and that variable is not limited to a finite set of values.
limited to ranging over a finite relation. ● To define safe rules more formally, we use the concept of a limited
For example, consider the rule variable. A variable X is limited in a rule if
big_salary(Y) :- Y>60000 (1) it appears in a regular (not built-in) predicate in the
● Here, we can get an infinite result if Y ranges over all possible body of the rule;
integers. (2) it appears in a predicate of the form X=c or c=X or
● But suppose that we change the rule as follows: (c1<=X and X<=c2) in the rule body, where c, cl , and c2 are
big_salary(Y) :- employee(X), salary(X,Y), Y>60000 constant values; or
● In the second rule, the result is not infinite, since the values that Y (3) it appears in a predicate of the form X=Y or Y=X in
can be bound to are now restricted to values that are the salary of the rule body, where Y is a limited variable.
some employee in the database-presumably, a finite set of values. ● A rule is said to be safe if all its variables are limited.
We can also rewrite the rule as follows: Use of Relational Operations
big_salary(Y) :- Y>60000, employee(X), salary(X,Y) ● It is straightforward to specify many operations of the relational
● In this case, the rule is still theoretically safe. However, in Prolog algebra in the form of Datalog rules that define the result of
or any other system that uses a top-down, depth-first inference applying these operations on the database relations (fact
mechanism, the rule creates an infinite loop, since we first search predicates).
for a value for Y and then check whether it is a salary of an ● This means that relational queries and views can easily be
employee. specified in Datalog.
● The result is generation of an infinite number of Y values, even ● The additional power that Datalog provides is in the specification
though these, after a certain point, cannot lead to a set of true RHS of recursive queries, and views based on recursive queries.
predicates. ● In Datalog, we do not need to specify the attribute names; rather,
● One definition of Datalog considers both rules to be safe, since it the arity (degree) of each predicate is the important aspect.
does not depend on a particular inference mechanism. ● In a practical system, the domain (data type) of each attribute is
also important for operations such as UNION, INTERSECTION,
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

and JOIN, and we assume that the attribute types are compatible
for the various operations.
● If the Datalog model is based on the relational model and hence
assumes that predicates (fact relations and query results) specify
sets of tuples, duplicate tuples in the same predicate are
automatically eliminated.
● This may or may not be true, depending on the Datalog inference
engine.
● It is definitely not the case in Prolog, so any of the rules in that
involve duplicate elimination are not correct for Prolog.
● The rules should work for Datalog, if duplicates are automatically
● Figure shows the graph for the fact and rule predicates.
eliminated.
● The dependency graph contains a node for each predicate.
Evaluation of Nonrecursive Datalog Queries
● Whenever a predicate A is specified in the body (RHS) of a rule,
● In order to use Datalog as a deductive database system, it is
and the head (LHS) of that rule is the predicate B, we say that B
appropriate to define an inference mechanism based on relational
depends on A, and we draw a directed edge from A to B.
database query processing concepts.
● This indicates that, in order to compute the facts for the predicate
● The inherent strategy involves a bottom-up evaluation, starting
B (the rule head), we must first compute the facts for all the
with base relations; the order of operations is kept flexible and
predicates A in the rule body.
subject to query optimization.
● If the dependency graph has no cycles, we call the rule set
● If a query involves only fact-defined predicates, the inference
nonrecursive.
becomes one of searching among the facts for the query result.
● If there is at least one cycle, the rule set is called recursive.
● The query involves relational SELECT and PROJECT operations
● In Figure there is one recursively defined predicate-namely,
on a base relation, and it can be handled by the database query
superior-which has a recursive edge pointing back to itself.
processing and optimization techniques.
● In addition, because the predicate subordinate depends on superior,
● When a query involves rule-defined predicates, the inference
it also requires recursion in computing its result.
mechanism must compute the result based on the rule definitions.
● A query that includes only nonrecursive predicates is called a
● If a query is nonrecursive and involves a predicate P that appears
nonrecursive query.
as the head of a rule P : - Pl. P2-------Pn, the strategy is first to
● In the predicate dependency graph, the nodes corresponding to
compute the relations corresponding to Pl,' P2, ...• pn, and then to
fact-defined predicates do not have any incoming edges, since all
compute the relation corresponding to p.
fact-defined predicates have their facts stored in a database
● It is useful to keep track of the dependency among the predicates
relation.
of a deductive database in a predicate dependency graph.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● The contents of a fact-defined predicate can be computed by ● Multimedia databases provide features that allow users to store and
directly retrieving the tuples in the corresponding database query different types of multimedia information, which includes
relation. images (such as pictures or drawings), video clips (such as movies,
● The main function of an inference mechanism is to compute the news reels, or home videos), audio clips (such as songs, phone
facts that correspond to query predicates. messages, or speeches), and documents (such as books or articles).
● This can be accomplished by generating a relational expression ● Databases provides functionalities for the easy manipulation of
involving relational operators as SELECT, PROJECT, JOIN, query and retrieval of highly relevant information from huge
UNION, and SET DIFFERENCE (with appropriate provision for collections of stored data.
dealing with safety issues) that, when executed, provides the query
result. Need for Multimedia Databases
● The query can then be executed by utilizing the internal query
● Early applications of MMDBMS tended to use multimedia for
processing and optimization operations of a relational database
presentational requirements only. For example, an employee
management system.
database might include an image of each employee.
● Whenever the inference mechanism needs to compute the fact set
● These systems could be implemented relatively simply by storing
corresponding to a nonrecursive rule-defined predicate p, it first
the image files externally to the database and storing a file
locates all the rules that have p as their head.
reference in the database.
● The idea is to compute the fact set for each such rule and then to
● However, this external data could be manipulated by the DBMS.
apply the UNION operation to the results, since UNION
Multimedia applications are evolving because people want to
corresponds to a logical OR operation.
exploit Multimedia data in a “natural” way, Interrogating,
● The dependency graph indicates all predicates q on which each p
retrieving and manipulating the data.
depends, and since we assume that the predicate is nonrecursive,
● Complex applications are developing such as entertainment
we can always determine a partial order among such predicates q.
services (e.g. video on demand), multimedia sales for houses,
Before computing the fact set for p, we first compute the fact sets
goods and services, groupware , telepresence, surveillance and
for all predicates q on which p depends, based on their partial
telemedicine.
order.

● The main types of database queries that are needed involve


locating multimedia sources that contain certain objects of interest.
● The above types of queries are referred to as content-based
TOPIC: MULTIMEDIA DATABASE retrieval, because the multimedia source is being retrieved based
on its containing certain objects or activities. Hence, a multimedia
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

database must use some model to organize and index the (I) The first is based on automatic analysis of the multimedia
multimedia sources based on their contents. sources to identify certain mathematical characteristics of their
● Identifying the contents of multimedia sources is a difficult and contents.
time-consuming task. ● This approach uses different techniques depending on the type of
multimedia source (image, text, video, or audio).
There are two main approaches. (2) The second approach depends on manual identification of the
objects and activities of interest in each multimedia source and on
● The first is based on automatic analysis of the multimedia sources
using this information to index the sources.
to identify certain mathematical characteristics of their contents.
● This approach can be applied to all the different multimedia
This approach uses different techniques depending on the type of
sources, but it requires a manual preprocessing phase where a
multimedia source (image, text, video, or audio).
person has to scan each multimedia source to identify and catalog
● The second approach depends on manual identification of the
the objects and activities it contains so that they can be used to
objects and activities of interest in each multimedia source and on
index these sources.
using this information to index the sources.
● The main types of database queries that are needed involve
Image:
locating multimedia sources that contain certain objects of interest.
● An image is typically stored either in raw form as a set of pixel or
Example
cell values, or in compressed form to save space.
● For example, one may want to locate all video clips in a video
● The image shape descriptor describes the geometric shape of the
database that include a certain person in them, say Bill Clinton.
raw image, which is typically a rectangle of cells of a certain width
● One may also want to retrieve video clips based on certain
and height.
activities included in them, such as a video clips where a goal is
● Each image can be represented by an m by n grid of cells.
scored in soccer game by a certain player or team.
● Each cell contains a pixel value that describes the cell content.
● In black/white images, pixels can be one bit.
● The above types of queries are referred to as content-based
● In gray scale or color images, a pixel is multiple bits.
retrieval, because the multimedia source is being retrieved based
● Because images may require large amounts of space, they are often
on its containing certain objects or activities.
stored in compressed form.
● A multimedia database must use some model to organize and index
● Compression standards, such as GlF or JPEG, use various
the multimedia sources based on their contents.
mathematical transformations to reduce the number of cells stored
● Identifying the contents of multimedia sources is a difficult and
but still maintain the main image characteristics.
time-consuming task.
● The mathematical transforms that can be used include Discrete
There are two main approaches.
Fourier Transform (OFT), Discrete Cosine Transform (OCT), and
wavelet transforms.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● To identify objects of interest in an image, the image is typically ● Rather than identifying the objects and activities in every
divided into homogeneous segments using a homogeneity individual frame, the video is divided into video segments, where
predicate. each segment is made up of a sequence of contiguous frames that
● For example, in a color image, cells that are adjacent to one includes the same objects/activities.
another and whose pixel values are close are grouped into a ● Each segment is identified by its starting and ending frames.
segment. ● The objects and activities identified in each video segment can be
● The homogeneity predicate defines the conditions for how to used to index the segments.
automatically group those cells. ● An indexing technique called frame segment trees has been
● Segmentation and compression can hence identify the main proposed for video indexing.
characteristics of an image. ● The index includes both objects, such as persons, houses, cars, and
● A typical image database query would be to find images in the activities, such as a person delivering a speech or two people
database that are similar to a given image. talking.
● The given image could be an isolated segment that contains, say, a ● Videos are also often compressed using standards such as MPEG.
pattern of interest, and the query is to locate other images that Text
contain that same pattern. ● A text/document source is basically the full text of some article,
● There are two main techniques for this type of search. book, or magazine.
● (i)The first approach uses a distance function to compare the given ● These sources are typically indexed by identifying the keywords
image with the stored images and their segments. that appear in the text and their relative frequencies.
● If the distance value returned is small, the probability of a match is ● The filler words are eliminated from that process.
high. ● Because there could be too many keywords when attempting to
● Indexes can be created to group together stored images that are index a collection of documents, techniques have been developed
close in the distance metric so as to limit the search space. to reduce the number of keywords to those that are most relevant to
● (ii)The second approach, called the transformation approach, the collection.
measures image similarity by having a small number of ● A technique called singular value decompositions (SVO), which is
transformations that can transform one image's cells to match the based on matrix transformations, can be used for this purpose.
other image. ● An indexing technique called telescoping vector trees, or TV-trees,
● Transformations include rotations, translations, and scaling. can then be used to group similar documents together.
● The seccond is more general, it is also more time consuming and Audio:
difficult. ● Audio sources include stored recorded messages, such as speeches,
Video class presentations, or even surveillance recording of phone
● A video source is typically represented as a sequence of frames, messages or conversations by law enforcement.
where each frame is a still image.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Here, discrete transforms can be used to identify the main ■ Complex structure demands deriving semantics from contents
characteristics of a certain person's voice in order to have similarity
based indexing and retrieval. ■ Real world objects present in images, video, animation or
● Audio characteristic features include loudness, intensity, pitch, and discussed in audio are often the subject of query
clarity.
■ Techniques such as image interpretation, Speech
Document types
recognition, etc can be used to derive information
■ Mono medium corresponding to multimedia objects

■ text, video, image, music, speech, graph,... Multimedia data modeling concerns with Logical & Physical
representation of multimedia objects, relations among them, extraction
■ multimedia & use of features in obtaining semantics

■ combination of different media Characteristics of an MDBMS

■ hypertext ✔ Corresponding Storage Media

■ interlinked text document (eg XML, HTML) o Data must be stored & managed according to their
specific characteristics of the storage media
■ hypermedia
✔ Descriptive Search Methods
■ interlinked multimedia documents
o Query must be descriptive & content oriented
Nature of Multimedia data
✔ Device-Independent Interface
■ Consisting of alphanumeric, graphics, image, animation, audio,
visual information ✔ Format-Independent Interface

■ From the presentation view point, multimedia data is huge & ✔ View Specific & Simultaneous Data Access
involves time dependent characteristics that must be adhered
o Same data can be accessed through different queries by
■ No matter, if these objects exists or created on the fly different applications

■ Presentation & subsequent interactions needed demands ✔ Management of Large Amounts of data
much more from a DBMS
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

✔ Relational Consistency of Data Management ✔ Data transfer of real time activity gets higher priority than
other database activities
o Relations among data of one or different media
should stay consistent ✔ Large Transactions

o These relations can be used for queries ✔ Large transactions must be done in a reliable fashion,
& data o/p since it take long time.

o Navigation through document is What is a Multimedia DBMS?


supported by managing relations among
individual parts of document A multimedia database management system (MM-DBMS) is a framework
that manages different types of data potentially represented in a wide
✔ Attribute Relation diversity of formats on a wide array of media sources.

o Provides different descriptions / presentations of Like the traditional DBMS, MM-DBMS should address requirements:
same object
▪ Integration
✔ Component Relation Data items do not need to be duplicated for different
programs.
o Consistency among all parts belonging to same
object ▪ Data independence
Separate the database and the management from the
✔ Substitution Relation application programs.
o Concerns with different kinds of presentations of ▪ Concurrency control
same object Allows concurrent transactions.

✔ Synchronization Relation ▪ Persistence


Data objects can be saved and re-used by different
o Temporal relations among different data units
transactions and program invocations
✔ Real time Data Transfer
▪ Privacy
✔ Read / write operations must be done real time Access and authorization control

▪ Integrity control
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Ensures database consistency between transactions ⮚ One structure - easy implementation.


⮚ Annotations for different media types.
▪ Recovery
Failures of transactions should not affect the persistent data
storage

Multimedia Database Architectures:

Based on Principle of Autonomy

⮚ Each media type is organized in a media-specific


manner suitable for that media type.

⮚ Need to compute joins across different data structures


⮚ Relatively fast query processing due to specialized structures. Based on Principle
⮚ The only choice for legacy data banks of Hybrid Organization

⮚ A hybrid of the first two. Certain media types use their own
indexes, while others use the "unified" index
⮚ An attempt to capture the advantages of the first two
⮚ Joins across multiple data sources using their native indexes.

Based on Principle of Uniformity

⮚ A single abstract structure to index all media types.


⮚ Abstract out the common part of different media types
(difficult!) - metadata
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

– Media portals
– Standards become available: coding, delivery, and
description.

A computer hardware/software system used for

– Acquiring and Storing


– Managing
– Indexing and Filtering
– Manipulating (quality, editing)
– Transmitting (multiple platforms)
– Accessing large amount of visual information like,
Access multimedia information
Images, video, graphics, audios and associated
multimedia anytime
Examples: image and video databases, web media search engines, mobile
media navigator, etc. anywhere

– Share Digital Information on any device


– New Content Creation Tools
– Deployment of High-Speed Networks from any source
– New Content Services
– Mobile Internet anything
– 3D graphics, network games
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Network/device transparent • Handle a variety of data compression and storage formats. The data
encoding has a variety of formats even within a single application. For
Quality of service (graceful degradation) instance, in medical applications, the MRI images of brain has lossless or
very stringent quality of lossy coding technique, while the X-ray images of
Intelligent tools and interfaces
bones can be less stringent. Also, the radiological image data, the ECG data,
Automated protection and transaction other patient data, etc. have widely varying formats.

Multimedia data types Support different computing platforms and operating systems. Different
users operate computers and devices suited to their needs and tastes. But
⮚ Text they need the same kind of user-level view of the database.
⮚ Image
⮚ Video • Integrate different data models. Some data such as numeric and textual
⮚ Audio data are best handled using a relational database model, while some others
⮚ mixed multimedia data such as video documents are better handled using an object-oriented
database model. So these two models should coexist together in MMDBs.

Designing MMDBs • Offer a variety of user-friendly query systems suited to different kinds
of media. From a user point of view, easy-to-use queries and fast and
Characteristics of multimedia data that have impacts on the design of accurate retrieval of information is highly desirable. The query for the same
MMDBs include : item can be in different forms. For example, a portion of interest in a video
can be queried by using either
the huge size of MMDBs, temporal nature, richness of content, complexity
of representation and subjective interpretation. The major challenges in 1) a few sample video frames as an example,
designing multimedia databases arise from several requirements they need
to satisfy such as the following: 2) a clip of the corresponding audio track or

• Manage different types of input, output, and storage devices. Data 3) a textual description using keywords
input can be from a variety of devices such as scanners, digital camera for
• Handle different kinds of indices. The inexact and subjective nature of
images, microphone, MIDI devices for audio, video cameras. Typical output
multimedia data has rendered keyword-based indices and exact and range
devices are high-resolution monitors for images and video, and speakers for
searches used in traditional databases ineffective. For example, the retrieval
audio.
of records of persons based on social security number is precisely defined,
but the retrieval of records of persons having certain facial features from a
database of facial images requires, content-based queries and
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

similarity-based retrievals. This requires indices that are content dependent, • Distributed multimedia systems
in addition to key-word indices.
(NSF, AFRL, IBM, Intel, Siemens)
• Develop measures of data similarity that correspond well with
perceptual similarity. Measures of similarity for different media types • High-performance multimedia database architecture for
need to be quantified to correspond well with the perceptual similarity of
storage management
objects of those data types. These need to be incorporated into the search
process. (NSF, AT&T)
Provide transparent view of geographically distributed data. MMDBs
are likely to be a distributed nature. The media data resides in many
different storage units possibly spread out geographically. This is partly due
to the changing nature of computation and computing resources from
centralized to networked and distributed.

• Adhere to real-time constraints for the transmission of media data.


Video and audio are inherently temporal in nature. For example, the frames
of a video need to be presented at the rate of at least 30 frames/sec. for the
eye to perceive continuity in the video.

• Synchronize different media types while presenting to user. It is likely


that different media types corresponding to a single multimedia object are
stored in different formats, on different devices, and have different rates of
transfer. Thus they need to be periodically synchronized for presentation.

Multimedia Databases

• Multimedia database management

(NSF, Fuji Electric, AT&T)


TOPIC: MULTIMEDIA DATA STRUCTURES
– Video modeling and management
MULTIMEDIA DATASTRUCTURES
– Multimedia document management
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Data size is increasing which means its size is increasing in not just – a range of values which are acceptable
numbers and small strings but also multimedia data as well –
structured (text, images, video, audio, VR, etc.) – some operations which are available

● Databases promise: ● the schema indicates a domain for each part of the database and the
DBMS enforces the domain constraint
– well structured data organisation
– e.g. in a Relational Database, each column is assigned a
– efficient storage of large amounts of data domain

– querying ● Therefore a DBMS must provide domain types for any kind of data
that they wish to house and the overall structure will deal with the
– transactional support for concurrent users integration

● If we include multimedia data Domain types of MM data

– multimedia is large and may swamp other data ● DBMS typically provide three different kinds of domain for
multimedia data:
– multimedia data structures are completely different from
standard database structures 1. large object domains, sequences of data often of two kinds

– multimedia data structures do not easily lend themselves ● Binary Large Objects – BLOBs – which are an
to content-based searching unstructured sequence of bytes
Data integration ● Character Large Objects – CLOBs – which are
an unstructured sequence of characters
● Databases already integrate various kinds of data, numbers, dates,
small text strings. 2. file references – instead of holding the data, a file reference
contains a link to the data (OLE in Access)
● They do this by the use of domains
3. genuine multimedia data types – (Oracle and Jasmine)
– i.e. each atomic value in the database belongs to one of a
small number of types 1. There is an important difference between the last of these
and the first two:
● each type has two aspects:
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

2. multimedia data types present the possibility of exploiting


the structure of the data for querying and manipulation

3. large objects at best allow us to extract sections or to


concatenate them

4. file references mean that the DBMS has no access the


data at all

Multimedia DB Storgage

Querying MM data

● A DBMS permits a user to search the database by content e.g. give


the name of the student with matriculation number 0123456 We
would like to do the same with multimedia data e.g. give the
pictures painted by Picasso or sound files with female singer
hitting top C

● With standard data quering is easy – numeric and string operators


are well understood

Data Structure ● With multimedia data this is more difficult and requires some
method of identifying contents of which there are two kinds:

● automatic identification
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● an algorithm takes the data and returns a measure – the set of classes is extensible and so we can freely create
which can be compared – e.g. of blackness domains

● manual identification ● Object-relational DBMS are fundamentally relations but are not
First Normal Form
● a person examines the data and catalogues it –
e.g. in a table of pictures, there is a column for – the values in cells can be object references as well as
the picture and another for the painter atomic values

Types of Database – new types can be defined

● There are three kinds of DBMS that might be used for housing How can we use these different types?
multimedia data.
● In a relational database, we can have:
● Relational DBMS store everything as First Normal Form tables
– domain types for large objects
– all data items are atomic and are held in rectangular
tables – using a string type for file names

– data can only be related if they are in one or in two – extra file types as in OLE in Access
records connected by a common value (foreign key)
● In an object-oriented database, we can have:
– records are identified only by content
– specially designed classes for multimedia
– it is difficult (if not impossible) to extend the set of
● In an object-relational database, we can have:
domains
– specially designed types for multimedia
● Object-oriented DBMS store everything as classes of objects

– all data is held as components of objects (like Java


variables)

– data is related by object reference (i.e. one class variable R type database e.g. Access and OLE
has a type which is another class and the values of that
variable are instances of that class)
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● Object Linking and Embedding was Microsoft’s first architecture ● Oracle and SQL3 support three large object types:
for integrating files of different types:
– BLOB - The BLOB domain type stores unstructured
● Each file type in Windows is associated with an application It is binary data in the database. BLOBs can store up to four
possible to place a file of one type inside another: gigabytes of binary data.

– either by wholly embedding the data in which case it is – CLOB – The CLOB domain type stores up to four
rendered by a plug-in associated with the program gigabytes of single-byte character set data

– or by placing a link to the data in which case it is rendered – NCLOB - The NCLOB domain type stores up to four
by calling the original program gigabytes of fixed-width and varying width multi-byte
national character set data
● Access works with this system by providing a domain type for
OLE * SQL3 is a significant extension to standard SQL which turns into a full
object-based language
• There’s not much we can do with OLE fields since the data is in a
format that Access does not understand ● These types support

• We can plug the foreign data into a report or a form and little else – Concatenation – making up one LOB by putting two of
them together
R databases e.g. BFILEs in Oracle
– Substring – extract a section of a LOB
● The BFILE datatype provides access to BLOB files of up to 4
gigabytes that are stored in file systems outside an Oracle database. – Overlay – replace a substring of one LOB with another

The BFILE datatype allows read-only support of large binary files; we – Trim – removing particular characters (e.g. whitespace)
cannot modify a file through Oracle. Oracle provides APIs to access file from the beginning or end
data
– Length – returns the length of the LOB

– Position – returns the position of a substring in a LOB

– Upper and Lower – turns a CLOB or NCLOB into upper


Large Object Types in Oracle and SQL3 or lower case
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

– LOBs can only appear in a where clause using “=”, “<>” The Object Relational Multimedia Domain Types in interMedia
or “like” and not in group by or order by at all
● interMediaprovides the ORDAudio, ORDImage, and
Large Object Types in MySQL ORDVideoobject types and methods for:

MySQL has four BLOB and four CLOB (called TEXT in MySQL) domain – updateTimeORDSource attribute manipulation
types:
– manipulating multimedia data source attribute
● TINYBLOB and TINYTEXT – store up to 256 bytes information

● BLOB and TEXT – store up to 64K bytes – extracting attributes from multimedia data

● MEDIUMBLOB and MEDIUMTEXT – store up to 16M bytes – getting and managing multimedia data from Oracle
interMedia, Web servers, and other servers
● LONGBLOB and LONGTEXT – store up to 4G bytes
– performing a minimal set of manipulation operations on
Oracle interMedia Audio, Image, and Video multimedia data (images only)

● Oracle interMedia supports multimedia storage, retrieval, and ● The properties available are:
management of:
– ORDImage– the height, width, data size of the on-disk
– BLOBs stored locally in Oracle8i onwards and containing image, file type, image type,compression type, and MIME
audio, image, or video data type

– BFILEs, stored locally in operating system-specific file – ORDAudio – the format, encoding, number of channels,
systems and containing audio, image or video data sampling rate, sample size,compression type, and audio
duration
– URLs containing audio, image, or video data stored on
any HTTP server such as Oracle Application Server, – ORDVideo– the format, frame size, frame resolution,
Netscape Application Server, Microsoft Internet frame rate, video duration, number of frames,
Information Server, Apache HTTPD server, and Spyglass compression type, number of colours, and bit rate
servers
● Oracle also stores metadata including:
– Streaming audio or video data stored on specialized
media servers such as the Oracle Video Server – source type, location, and source name
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

– MIME type and formatting information – transactions may be compromised

– Characteristics such as height and width of an image, 2. The properties are not well understood or implementable in
number of audio channels, video frame rate, pay time, etc. reasonable time

OO databases – e.g. Jasmine – what does it mean to say that one image is before another
in order therefore there are few operators in the where
● Jasmine is an Object-Oriented database and has an application clause that work
known as Studio is its development environment
– At the moment, there is no reason for putting multimedia data into
● It comes with a number of built in classes include four multimedia a relational database
classes:
– it just slows everything down
– Picture -
– and we can’t do very much
– Image –
Disadvantages of OODBMS AND ORDBMS
– Video –
– We could use an object relational or object oriented database
– Audio -
– now we can do more
● These come with manipulation and compression facilities They
also have been made to fit well with Java Media Framework – but the products are immature

Conclusions – and everything will be slow

● At present we cannot do much with MM data, there are two There are three main reasons for integrating multimedia data with a
reasons for this: database:

1. It is very large – 1. Cataloguing the data

– indexing on multimedia data is not reasonable nor is – a column for file names is good enough
storing a default value
– 2. Decorating Reports
– other retrieval may be slowed down
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

– The OLE approach works well here Otherwise a file name § A multimedia database management system (MM-DBMS) is a
column and a simple application for generating the reports framework that manages different types of data potentially
would do represented in a wide diversity of formats on a wide array of media
sources.
– 3. Web Applications
§ Like the traditional DBMS, MM-DBMS should address
– Again a file name column is good enough requirements:

o Integration

▪ Data items do not need to be duplicated for


different programs

o Data independence

▪ Separate the database and the management from


the application programs

o Concurrency control

• allows concurrent transactions

Requirements of Multimedia DBMS

o Persistence

▪ Data objects can be saved and re-used by


different transactions and program invocations

o Privacy

TOPIC: MULTIMEDIA QUERY LANGUAGES ▪ Access and authorization control

What is a Multimedia DBMS? o Integrity control


LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

▪ Ensures database consistency between ● ⇒presentation and delivery support


transactions Major Issues: Query Support

o Recovery § Should allow easy query of multimedia data

▪ Failures of transactions should not affect the ● Query by content


persistent data storage ● Query should be specified as a combination of media
(examples) and text description
o Query support ● Handle different MM objects
● Should decide what query language to be used
• Allows easy querying of multimedia data

§ In addition, an MM-DBMS should:


§ Should allow efficient query of multimedia data
o have the ability to uniformly query data (media data,
● Algorithms should be used to efficiently retrieve media
textual data) represented in different formats.
data on the basis of similarity
o have the ability to simultaneously query different media
● Indexing the contents of different MM objects
sources and conduct classical database operations across
them. § To provide traditional DBMS supports
● ⇒query support Major Issues: Storage Support
o have the ability to retrieve media objects from a local
storage device in a smooth jitter-free (i.e.continuous) § Working of following standard storage devices,
manner.
● disk systems
● ⇒storage support ● CD-ROM systems
o have the ability to take the answer generated by a query ● tape systems and tape libraries
and develop a presentation of that answer in terms of § Laying of data in standard storage devices
audio-visual media.
§ Design disk/CD-ROM/tape servers so as to optimally satisfy
o have the ability to deliver this presentation in a way that different clients concurrently when these clients execute the
satisfies various Quality of Service requirements. following operations
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● playback

● rewind

● fast forward

● pause

Major Issues:
Presentation & Delivery Support

§ Content of multimedia presentations should be specified


Query Processing
§ Specification of the form (temporal/spatial lawet) for the content
■ Queries may contain multimedia objects given by the user
§ Creation of presentation schedule that satisfies these
temporal/spatial presentation requirements ■ Results of these queries are based on degree of similarity

§ Delivery of a multimedia presentation to users when there is ■ Ex :

● a need to interact with other remote servers to assemble ■ Give the description of the real world object o,
the presentation (or parts of it) corresponding to s. User may click on s to select
it.
● a bound on the buffer, bandwidth, load, and other
resources available on the system ■ Retrieve all the video shots of my friend, giving
friend’s photo
● a mismatch between the host server's capabilities and the
customers machine capabilities ■ These kind of queries are not present in a conventional
database management system
§ Presentations should optimize Quality of Service (QoS).
■ Extremely rare, two images matches exactly
Operations On Data
■ Match is a real value, 0 …1

■ Indexing is complicated
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

■ Extract n numerical valued feature from media object & § Police officer Rocky has a photograph in front of him.
the indexing methods can be based on nearest-neighbor
methods § He wants to find the identity of the person in the picture.

■ n features can be distinct / composite § Query: “Retrieve all images from the image library in which the
person appearing in the (currently displayed) photograph appears”
A Sample Multimedia Scenario
Image Query (by keywords):
§ Consider a police investigation of a large-scale drug operation.
This investigation may generate the following types of data § Police officer Rocky wants to examine pictures of “Big Spender”.

● Video data captured by surveillance cameras that record § Query: "Retrieve all images from the image library in which “Big
the activities taking place at various locations. Spender” appears."

● Audio data captured by legally authorized telephone Video Query:


wiretaps.
§ Police officer Rocky is examining a surveillance video of a
● Image data consisting of still photographs taken by particular person being fatally assaulted by an assailant. However,
investigators. the assailant's face is occluded and image processing algorithms
return very poor matches. Rocky thinks the assault was by
● Document data seized by the police when raiding one or someone known to the victim.
more places.
§ Query: “Find all video segments in which the victim of the assault
● Structured relational data containing background appears.”
information, back records, etc., of the suspects involved.
§ By examining the answer of the above query, Rocky hopes to find
● Geographic information system data remaining other people who have previously interacted with the victim.
geographic data relevant to the drug investigation being
conducted.

Possible Queries Heterogeneous Multimedia Query:

Image Query (by example):


LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

§ Find all individuals who have been photographed with “Big Based on Principle of Uniformity
Spender” and who have been convicted of attempted murder in
South China and who have recently had electronic fund transfers § A single abstract structure to index all media types
made into their bank accounts from ABC Corp.
§ Abstract out the common part of different media types (difficult!) -
MM Database Architectures metadata

Based on Principle of Autonomy § One structure - easy implementation

§ Each media type is organized in a media-specific manner suitable § Annotations for different media types
for that media type

§ Need to compute joins across different data structures

§ Relatively fast query processing due to specialized structures

§ The only choice for legacy data banks

Based on Principle of Hybrid Organization

§ A hybrid of the first two. Certain media types use their own
indexes, while others use the "unified" index

§ An attempt to capture the advantages of the first two

§ Joins across multiple data sources using their native indexes


LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Metadata and Media Abstraction

§ All these statements are Meta-data statements.

● Associate, with each media object oi, some meta-data,


md(oi)
● If our archive contains objects o1,..., on, then index the
meta data md(o1),..., md(on) in a way that provides
efficient ways of implementing the expected accesses that
users will make.
§ We expect to take use of a single data structure to represent
metadata This can be achieved via media abstractions

§ Media abstractions are mathematical structure representing such


Example: media content.

Organizing Multimedia Data Based on the Principle of Uniformity Querying SMDSs (Uniform Representation)

§ Consider the following statements about media data and they may Querying SMDS based on top of SQL.
be made by a human or may be produced by the output of an
Basic functions include:
image/video/text content retrieval engine.
§ FindType(Obj): This function takes a media object Obj as input,
● The image photol.gif shows Jane Shady, “Big Spender”
and returns the output type of the object. For example,
and an unidentified third person, in SheungShui. The
picture was taken on January 5, 1997. FindType(iml.gif) = gif.

● The video-clip videol.mpg shows Jane Shady giving “Big FindType(moviel.mpg) = mpg.
Spender” a briefcase (in frames 50-100). The video was
obtained from surveillance set up at Big Spender’s house § FindObjWithFeature(f): This function takes a feature f as input
in Kowloon Tong, in October, 1996. and returns as output, the set of all media objects that contain that
feature. For example,
● The document bigspender.txt contains background
information on Big Spender, a police’s file. FindObjWithFeature(john)=
{iml.gif,im2.gif,im3.gif,videol.mpg:[1,5]}.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

FindObjWithFeature(mary)=
{videol.mpg:[1,5],videol.mpg:[15,50]}.

§ FindObjWithFeatureandAttr(f,a,v): This function takes as input,


a feature f, an attribute name a associated with that feature, and a
value v. It returns as output, all objects obj that contain the feature
and such the value of the attribute a in object obj is v. E.g.

FindObjWithFeatureandAttr(Big Spender,suit,blue): This query asks to


Querying SMDS by SMDS-SQL
find all media objects in which Big Spender appears in a blue suit.
§ All ordinary SQL statements are SMDS-SQL statements. In
§ FindFeaturesinObj(Obj): This query asks to find all features that
addition:
occur within a given media object. It returns as output, the set of
all such features. For example, § The SELECT statement may contain media-entities. A media
entity is defined as follows:
FindFeaturesinObj (iml.gif): This asks for all features within the image
file iml.gif. It may return as output, the objects John, and Lisa. ● If m is a continuous media object, and i, j are integers,
then m:[i, j] is a media-entity denoting the set of all
FindFeaturesinObj(videol.mpg:[1,15]): This asks for all features within
frames of media object m that lie between (and inclusive
the first 15 frames of the video file videol.mpg. The answer may include
of) segments i, j.
objects such as Mary and John.
● If m is not a continuous media object, them m is a media
§ FindFeaturesandAttrinObj(Obj): This query is exactly like the
entity.
previous query except that it returns as output, a relation having the
scheme ● If m is a media entity, and a is an attribute of m, then m.a
is a media-entity.
(Feature,Attribute,Value)
§ The FROM statement may contain entries of the form
where the triple (f,a,v) occurs in the output relation iff feature f
occurs in the query FindFeaturesinObj(Obj) and feature f's attribute a is <media><source><M>
defined and has value v. For example,
which says that only media-objects associate with the named media type
FindFeaturesandAttrinObj(iml.gif) may return as answer, the table and named data source are to be considered when processing the query, and
that M is a variable ranging over such media objects.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

§ The WHERE statement allows (in addition to standard SQL to the existence of an unknown person whose identity is to be
constructs), expressions of the form determined.

term IN func_ca11 SELECT M,Person FROM smdssourcel M,M1WHERE


(FindType(M)=Image) AND (FindType(M1)=Video) AND M IN
where FindObjWithFeature(Jane Shady) AND M1 IN
FindObjWithFeature(Big Spender) AND Person IN FindFeaturesinObj
● term is either a variable (in which case it ranges over
(M) AND Person IN FindFeaturesinObj (M1) AND Person ≠Jane Shady
the output type of func_call) or an object having the
AND Person≠Big Spender
same output type as func_call and
● func_call is any of the five SMDS function calls. Querying SMDSs (Hybrid Representation)
Sample SMDS-SQL Statements § SMDS-SQL may be used to query multimedia objects which are
stored in the uniform representation.
§ Find all image/video objects containing both Jane Shady and Big
Spender. This can be expressed as the SMDS-SQL query: § In the uniform representation, all the data sources being queried
are SMDSs, while in the hybrid representation, different
SELECT M FROM smds source1 M WHERE (FindType(M)=Video
(non-SMDS) representations may be used.
OR FindType(M)=Image) AND M IN FindObjWithFeature(Big
Spender) AND M IN FindObjWithFeature(Jane Shady). § A hybrid media representation basically consists of two parts - a
set of media objects that use the uniform representation and a set of
§ Find all image/video objects containing Big Spender wearing a
media-types that use their own specialized access structures and
purple suit. This can be expressed as the SMDS-SQL query:
query language.
SELECT M FROM smdssourcel M WHERE(FindType(M)=Video OR
§ To extend SMDS-SQL to Hybrid-Multimedia SQL (HM-SQL for
FindType(M)=Image) AND M IN FindObjWithFeatureandAttr(Big
short), we need to do two things:
Spender, suit, purple)
● First, HM-SQL, must have the ability to express queries
§ Find all images containing Jane Shady and a person who appears
in each of the specialized languages used by these
in a video with Big Spender. Unlike the preceding queries this
non-SMDS sources
query involves computing a "join" like operations across different
data domains. In order to do this, we use existential variables such ● Second, HM-SQL, must have the ability to express
as the variable "Person" in the query below, which is used to refer “joins” and other similar binary algebraic operations
between SMDS sources and non-SMDS sources
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

HM-SQL Sample HM-SQL Statements

HM-SQL is exactly like SQL except that the SELECT, FROM, WHERE § Find all video clips containing Big Spender, from both the video
clauses are extended as follows: sources, videol, and video2, where the former is implemented via
an SMDS and the latter is implemented via a legacy video
§ theSELECT and FROM clauses are treated in exactly the same database:
way as in SMDS-SQL.
SELECT M FROM smds video1, videodb video2 WHERE M
§ The WHERE statement allows (in addition to standard SQL IN smds:FindObjWithFeature(Big Spender) OR M IN
constructs) expressions of the form videodb:FindVideoWithObject(Big Spender)

Term IN MS:func_call § Find all people seen with Big Spender in either video1, video2, or
idb.
where
(SELECT P1 FROM smds video1 V1
1. term is either a variable (in which case it ranges over the output
type of func_call) or an object having the same output type as func_call WHERE V1 IN smds:FindObjWithFeature(Big Spender)AND
as defined in the media source MS and P1 IN smds:FindFeaturesinObj(V1) AND Pl≠Big Spender)
UNION
2. eitherMS=SMDS and func_call is one of the five SMDS
functions, or (SELECT P2 FROM videodb video2 V2
3. MS is not an SMDS-media source., and func_call is a query in WHERE V2 IN videodb:FindVideoWithObject(Big Spender)
QL(MS). AND P2 IN videodb:FindObjectsinVideo(V2) AND
§ Thus, there are 2 differences between HM-SQL and SMDS-SQL: P2≠Big Spender)
1. func_calls occurring in the WHERE clause must be explicitly UNION
annotated with the media-source involved, and
(SELECT P3 FROM imagedbidb I3
2. queries from the query languages of the individual (non-SMDS)
media-source implementations may be embedded within an HM-SQL WHERE I3 IN imagedb:getpic(Big Spender) AND P3 IN
query. This latter feature makes HM-SQL very powerful indeed as it is, in imagedb:getfeatures(I3) AND P3≠Big Spender)
principle, able to express queries in other, third-party, or legacy media
implementations.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

Connective Summary ● For example, cartographic databases that store maps include two
dimensional spatial descriptions of their objects-from countries and
When faced with the problem of creating a multimedia database, we must states to rivers, cities, roads, seas, and so on.
consider: ● These applications are also known as Geographical Information
Systems (GIS), and are used in areas such as environmental,
§ The kinds of media data should the MM database provide access to
emergency, and battle management. Other databases, such as
§ Check whether legacy algorithms already exist (and are they meteorological databases for weather information, are
available) to index this data reliably and accurately using three-dimensional, since temperatures and other meteorological
content-based indexing methods information are related to three-dimensional spatial points.
● A spatial database stores objects that have spatial characteristics
§ determine the use of uniform representation or hybrid that describe them.
representation ● The spatial relationships among the objects are important, and they
are often needed when querying the database.
● A spatial database can in general refer to an n-dimensional space
for any n.
● The main extensions that are needed for spatial databases are
models that can interpret spatial characteristics.
● Special indexing and storage structures are often needed to
improve performance.
Model extensions for two-dimensional spatial databases:
● The basic extensions needed are to include two dimensional
geometric concepts, such as points, lines and line segments,
circles, polygons, and arcs, in order to specify the spatial
characteristics of objects.
● In addition, spatial operations are needed to operate on the objects'
TOPIC: SPATIAL DATABASE spatial characteristics-for example, to compute the distance
between two objects-c-as well as spatial Boolean conditions-for
Spatial databases provide concepts for databases that keep track of objects example, to check whether two objects spatially overlap.
in a multidimensional space. Example
Example ● To illustrate, consider a database that is used for emergency
management applications.
LOYOLA-ICAM
COLLEGE OF ENGINEERING AND TECHNOLOGY
LOYOLA COLLEGE CAMPUS, NUNGUMBAKKAM, CH – 34
CS2029 ADVANCED DATABASE TECHOLOGY
UNIT V
CURRENT ISSUES

● A description of the spatial positions of many types of objects ● Typical criteria for dividing the space include minimizing the
would be needed. rectangle areas, since this would lead to a quicker narrowing of the
● Some of these objects generally have static spatial characteristics, search space.
such as streets and highways, water pumps (for fire control), police ● Problems such as having objects with overlapping spatial areas are
stations, fire stations, and hospitals. handled in different ways by the many different variations of R+
● Other objects have dynamic spatial characteristics that change over trees.
time, such as police vehicles, ambulances, or fire trucks. ● The intemal nodes of Rvtrees are associated with rectangles whose
● The following categories illustrate three typical types of spatial area covers all the rectangles in its subtree.
queries: ● Hence, R+ trees can easily answer queries, such as find all objects
• Range query: Finds the objects of a particular type that are within a given in a given area by limiting the tree search to those subtrees whose
spatial area rectangles intersect with the area given in the query.
or within a particular distance from a given location. (For example, finds all
hospitals within the Dallas city area, or finds all ambulances within five Quad trees
miles of an accident location.) ● Other spatial storage structures include quadtrees and their
• Nearest neighbor query: Finds an object of a particular type that is closest variations.
to a given location. ● Quadtrees generally divide each space or subspace into equally
(For example, finds the police car that is closest to a particular location.) sized areas, and proceed with the subdivisions of each subspace to
• Spatial joins or overlays: Typically joins the objects of two types based identify the positions of various objects.
on some spatial condition, such as the objects intersecting or overlapping
spatially or being within a certain distance of one another.
(For example, finds all cities that fall on a major highway or finds all homes
that are within two miles of a lake.)
R+ trees:
● For these and other types of spatial queries to be answered
efficiently, special techniques for spatial indexing are needed.
● One of the best known techniques is the use of R+ trees and their
variations.
● R+ trees group together objects that are in close spatial physical
proximity on the same leaf nodes of a tree-structured index.
● Since a leaf node can point to only a certain number of objects,
algorithms for dividing the space into rectangular subspaces that
include the objects are needed.

You might also like