Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Atlas: A Nested Relational Database System for Text Applications

solutions to problems such as indexing query optimization transaction processing, and physical representation are found

major groups OF text products


Textmanagement (i) word processors contains embedded markup that describes the structure of the text and how it should be presented (ii) the hypertext systems provide an interface for exploring specially structured documents, created manually from collections of text and images

. systems are generally poor at supporting complex modeling, complex Relational processing, or, in particular, text retrieval. To support this range of query mechanisms, the underlying database system needs to have indexing and storage structures capable of supporting complex queries to large databases.

Atlas
a nested relational database system that is designed to combine text retrieval with table management. support hierarchically structured objects such as documents, and by including references (a kind of foreign key). support browsing provides a number of text operators and powerful indexing techniques

THE ATLAS DATA MODEL


an extension of the nested relational model. data is stored in tables of records. The attributes of records can be atomic values, tuples(structured values), nested table, or references (pointers). The schema defines two tables, Document and H y p e r t e x t.

The Document table contains entries


Document [doc-id INTEGER, t i t l e TEXT, Authors [name (surname TEXT, f irstname TEXT)], Nodes [node REF Hypertext] 1 KEY = (doc-id)

A hypertext node is represented by a record in the Hyperte xt table and consists of a node identifier, a reference to the associated document, the content of the node, and a nested table of links to related nodes. Hypertext [id INTEGER,
doc REF Document, content TEXT, Links [ node REF Hypertext, linktype TEXT 1 KEY = (node) 1 KEY = (id) fig. 1. TQL schema for a hypertext system

THE T QL QUERY L ANGUAGE


TQL includes nested expressions, which have no SQL counterpart. In SQL, joins must be specified explicitly. TQL reference attributes allow implicit joins. SELECT t i t l e , A u t h o r s , (SELECT node .content FROM Nodes) FROM Document; The implicit join is denoted with the dot notation and simplifies query formulation in many instances. SELECT t i t l e , Authors, (SELECT content FROM Nodes, Hypertext WHERE Nodes.node = Hyp

SYSTEM ARCHITECTURE
The four boxes with dashed borders show how the subsystems the application layer, Atlas kernel, and central schema manager are grouped into UNIX processes. Each application program creates a new Atlas kernel process. Kernel processes use the central schema manager to hold shared data structures such as the schema and indexing details. These data structures are kept in shared memory for f ast access.

DATA COMPRESSION Records can be compressed as they are written to disk, using a semi-static Huffinan model. The data to be stored must first be analysed to determine word frequencies and other parameters. These parameters are used by the compression algorithm. Records are compressed immediately before being written to disk and are decompressed immediately after retrieval. successful because it means that the use of compression does not interfere with any other part of the system. For tables consisting largely of text, data can generally be stored in about 20% to 30% of its original size

INDEX STRUCTURES
Direct indexes are used for tables with a key attribute whose values are distinct, small integers., linear hashing indexes is based on standard hashing techniques used for dynamic files . The kind of index most commonly used with Atlas is superimposed coding. The prefix indicates that the words in that field are indexed by their stems,and @ indicates that the words in that field are indexed by soundex.

CREATE sic INDEX Document c '-WORDS (title), @Authors.name.surname, @Authors.name.firstname, @Authors.name.surname + @Authors.name.firstname, Nodes.node]; CREATE sic INDEX Hypertext [doc, -WORDS (content)]; Fig.. Creation of superimposed coding index

QUERY O PTIMIZATION Query optimization is the process of identifying the most efficient way to evaluate a query when multiple strategies exist. Optimizer rules consist of two parts: a pattern part and a restructuring part. The restructuring takes place if the input expression matches the pattern. Both the pattern part and restructuring part are defined using a special rule syntax, and can call external C functions when the rules are not powerful enough to perform all necessary checks.

TEXT PARSING A word parser is used to extract index terms from text. noise-database designers can specify how to extract words from text, via a rule-based parser. The parser is implemented as an interpreter of finite state machines in which an arc is traversed for each character in the input string. similar to the approach used by some regular expression pattern matching algorithms, in which patterns are compiled into finite state machines.

Two ATLAS APPLICATION


An SGML-Based Hypertext System
used to specify abstract grammars consisting of tags that are used throughout the text of document.
identify the structure of a document that is to be inserted into a database.

A Genealogy Database
An application that makes use of Atlass multimedia, text,and attribute support is a genealogy database. This application stores family trees together with descriptive text, photographs, and maps, and will eventually include audio.

Summary
successful effort to develop a database system with complex object support and special features required by applications involving large amounts of text. incorporate a powerful query language with well-defined semantics for accessing nested structures. The incorporation of signature file indexes into Atlas has demonstrated their limitations as well as their advantages. One of the successes of Atlas has been as a platform for testing new ideas, both applications and new database system implementation techniques. ideas, such as the grammable word parsers, and data compression, have been incorporated . proven to be a good platform for the development of specialized applications such as hypertext systems and image databases.

You might also like