ADBMS-Vipul Dalal - (SEM - VII) Tanishka Aca PDF

Contents 1. EER Model 1.1 Features of EER Model 1.1.1 Entity type 1.1.2 Superclass, Subclass and Inheritance 1.13 Specialization 1.1.3.1 Characteristics of Speciailzation 1.1.4 Specialization hierarchy and lattice 1.1.5 Union type or Category 1.2. Exercise wt Object Oriented Databases 2.1 Features of OODB 2.2 OODBMS with respect to its features 2.2.1 Object Identity 2.2.2 Object Structure and Type Constructors 2.2.3. Encapsulation and Class definition 2.2.4 Persistency 2.3 Complex Objects 2.4 OODB schema design 2.5. Object Query Language 2.5.1 Basic features 2.5.2 Additional features 2.6. Persistent Programming Languages 2.6.1 Persistent C++ systems 2.7 OODBMS architecture 2.8 Concurrency in OODBMS 2.9 Exercise 3. Object Relational Databases 3.1 Nested Relational Data model 3.2. Overview of SQL 3 3.2.1 Type Constructors 3.22 Object identifiers using reference 3.23 Encapsulation 3.24 Inheritance 3.3 Comparison of RDB, OODB and ORDB 3.4. Object Relations! support in Oracle 9i 3.4.1 ADT Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP sie Downloaded from FaaDoEngineers.com3.4.2. Varray 3.4.3. Nested Tables 3.5 ORDBMS Implementation Issues 3.5.1 Storage and access methods 3.5.2 Query processing and optimization 3.5.2.1 Method caching 3.5.2.2 Pointer swizzlin; 3.6 Exercise : s Distributed and Parallel Databases 4.1 Distributed Databases 4.1.1 Distributed computing 4.1.2 Advantages of DDBMS 4.1.3 Data Fragmentation 4.13.1 Horizontal Fragmentation 4.13.2. Vertical Fragmentation 4.1.3.3. Mixed Fragmentation 4.1.4 Fragmentation Schema 4.1.5 Allocation Schema 4.1.6 Data Replication and allocation 4.1.7 Query processing in DDBMS 4.1.8 Concurrency control in DDBMS 4.1.8.1 Problems that arise in DDBMS 4.1.8.2 Concurrency control based on distinguished copy 4.1.8.3 Concurrency control based on voting 4.1.9 Recovery in DDBMS 4.1.10 Client/server architecture 4.2. Parallel Database System . 4.2.1 Architecture 4.2.2 Parallel Query evaluation 4.2.3 Data Partitioning methods 4.2.4 — Parallelizing individual operations 43 Exercise 5 Databases on the Web and XML - 5.1 Features of XML 5.2. XML schema 5.2.1 DTD 5.2.2 XSD 5.3, XML Querying Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP - Downloaded from FaaDoEngineers.com5.4 XML Storage 5.5 XSLT 5.6 Exercise a Introduction to Data Warehousing and Mining 6.1 Data Warehousing 6.1.1 Need for DW 6.1.2 Operational System Vs Information System 6.1.3 Definition and Architecture 6.1.4 Multi-dimensional data model 6.1.5 OLAP 6.1.6 DW Schema 6.2. Data Mining System 6.2.1 Data Mining operations 63. Exercise 2 Advanced Data Models 7.1 Active Databases 7.2. Temporal Databases 7.3 Deductive Databases 74 Spatial Databases 75 Geographic Information System 7.6 Mobile Databases Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -3- Downloaded from FaaDoEngineers.comEER MODEL 1:EER MODEL Introduction Like ER model, EER model is also used to represent logical database design at the conceptual level. The ER model can easily represent the logical design of the Relation database model, but the features supported by newly developed data models such as OODB: and ORDB can not be expressed easily using ER model. To overcome limitations of ER model, EER (Extended ER or Enhanced ER) model is developed. It supports all the features of the ER model, but has some additional features. The OODB and ORDB models basically introduce the concepts of object oriented programming in the area of databases. Similarly, EER model uses terminology of OOPM. 1.1 Features of EER model 1.1.1 Entity type In ER model, collection of similar type of entities is called an entity set, here it is called an entity type (like a class which is abstract data type to represent a collection of objects in OOPM). 1.1.2 Superclass, subclasses and inheritance Many times within an entity type, there are subgroups of entities and these subgroups are so important from the design point of view that they are represented as separate entity types. Such entity types that represent a subset of entities of the original entity type are called subclasses and the original entity type is called the superclass (again from OOPM). The relationship that exists between a superclass and all its subclasses is called superclass/subclass relationship (the same is called relationship in ER model). Prepared by Prof. Subjects: CP-I, C ipul Dalal (9820833071) -I, DBMS, CG, ADBMS, DWM, IP Downloaded from FaaDoEngineers.comEER MODEL The subclasses are said to inherit all the properties (attributes and relationships) of the superclass (again from OOPM), Co] NoO#Passengers ES L 1.1.3 Specialization It is a process of defining subclasses of an entity type. Unlike ER model, it is Prepared by il Dalal (9820833071) Subjects: cpl CP-II, DBMS, CG, ADBMS, DWM, IP -5- Downloaded from FaaDoEngineers.comEER MODEL can have subclasses such as engineer, secretary, typist based on the type of job they are doing, From the same entity type employee we can have another specialization with subclasses such as salaried employee and hourly basis employee based on the payment mode as shown in the figure. The attributes specified for a particular subclass are called local attributes or specific attributes. There are two reasons to include specialization, (1) a subgroup of entities having some additional features or (2) a subgroup of entities participating in ‘some relationship. 1.1.3.1 Characteristics of specialization (1) Number of subclasses If a specialization includes only one subclass then no circle notation is required between the superclass and subclass. CName > C Ssn > EMPLOYEE MANAGER (2) Predicate defined subclasses If it is possible to determine the membership of an entity to a particular subclass based on the value of one of the attributes of the superclass then the subclasses are called predicate defined subclasses. (3) Attribute defined specialization Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP Downloaded from FaaDoEngineers.comEER MODEL LER MODEL If the membership condition of all the subclasses are based on the same attribute of the superclass then the specialization is called attribute defined specialization. The attribute is called defining attribute. (4) Disjointness and overlapping constraints Disjointness constraint states that an entity cannot be a member of more than one subclass. It is represented using ‘d’ in the circle. Overlapping constraint states that an entity may be a member of more than one subclass. It is represented using ‘O’ in the circle, (5) Completeness constraint It may be total or partial. A total completeness constraint states that each and every entity of the Superclass must be a member of at least one of its subclasses. It is denoted by double lines connecting the superclass and the circle. A partial completeness constraint allows an entity of the superclass to Rot be member of any of its subclasses. It is represented by a single line connecting the superclass and the circle, 1.1.4 Specialization hierarchy and lattice A subclass can have further subclasses defined on it, giving rise to specialization hierarchy and lattice. Specialization hierarchy has a condition that each subclass can participate as @ subclass in only one superclass/subclass relationship. That is, each subclass can have only one superclass, In specialization lattice a subclass can participate as a subclass in more than One superclass/subclass relationship. That is, a subclass can have more than One subclass. This concept is called multiple inheritance. The subclass is called as shared subclass, Prepared by Prof I Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP 7: Downloaded from FaaDoEngineers.comEER MODEL ‘SECRETARY || TECHNICIAN || ENGINEER || MANAGER | / HOURLY _EMPLOYEE SALARIED EMPLOYEE y ENGINEERING MANAGER 1.1.5 Union type or Category Sometimes it is necessary to model a single superclass/subclass relationship Gname> Gadaress> BANK Aaaress Gessress> CSsn [PERSON COMPANY w Lien_or_regular N Uicense_plate_no REGISTERED_VEHICLE with more than one superclass. In this case, the subclass represents a subset Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -8- Downloaded from FaaDoEngineers.comEER MODEL Ee of union of entities from its multiple superclasses. If we take any entity from the subclass, it inherits properties only from one of the superclasses. Such a subclass is called union type or category. It is represented by symbol ‘U’ in the circle, 1.2 Exercise For the following problem statements construct EER diagram, specify all constraints and justify your design. 1.2.1 Consider an art museum database. Art museum has a collection of art objects each with unique object identifier, name, description and an artist if known, Art objects are classified based on their type. Mainly there are two types: painting and statue, A painting has paint type, material on which it is drawn and style. A statue has material from which it is created, height and Weight, Art objects are also categorized as permanent and borrowed objects. A permanent art object has data_acquired and cost. A borrowed object has date of borrow and date of return. Database has to, maintain information about artist such as name that is assumed to be unique, date of birth, country of origin and main style. Various exhibitions occur each with unique name, start date and end date. An exhibition is related to art objects that are displayed in that exhibition. 1.2.2 Consider a bank database. Bank has multiple branches, Each branch maintains two types of accounts: savings account and current account. A branch may offer various types of loans. There are mainly two types: car loan and home loan. Database needs to maintain information about all transactions for each account and all installments for each loan. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP “9° Downloaded from FaaDoEngineers.comEER MODEL EEE et 1.2.3 Consider a university database. A professor has SSN, name, age and specialization. A student has SSN, name, age and a degree program. A student may work-on multiple projects. Each project has projectID, name, Start date, end date, budget. A project may be worked upon by many students and managed by a professor who can advise many students, Each student has a major department in which he is working on his degree. Each department has déptD, name and a location. A professor may work for many departments where each department is run by a professor. 1.2.4 Consider a university database that keeps track of students and their majors, transcripts, registration and university courses. Several sections of each course are offered and each section is related to an instructor who is teaching. It also keeps track of the sponsored research projects of faculty and graduate students of the academic departments of a particular college, The database also keeps track of the research grants and contracts awarded to the university. A grant is related to one principle investigator and all researchers it supports. 1.2.5 Design and implement a database that manages information about hospital. Some information includes: Permanent doctors get fixed salary. Personal information like name, address, data of birth, etc required. Consulting doctors visit at fixed time everyday. Information like name, contact number, specialization, charges, etc required. Patients are admitted to the hospital. Personal information like name, address, relative’s name and address, patient’s blood group, reason of admission, etc required. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -10- Downloaded from FaaDoEngineers.comEER MODEL rr ODEL Patients are admitted to room of different types. Per day cliarges depend on room type. Various labs in hospital, where several. tests conducted on patient éach test hhas fixed charges. 1.2.6 Database is to be designed for a college to monitor students’ progress throughout their course of study. The students are studying for a degree (Such as BA, BCom, MSc, etc) within the framework of the modular system. The college provides a number of modules each being characterized by’ its Code, title, credit value, module leader who shares teaching duties with one or more lecturers. A lecturer may teach (and be a module leader for) more than one module, Students are free to select any module they wish but the following rules must be observed: some modules require prerequisite modules and some degree programs: have compulsory modules, The database also contains some information about students including their number, name, address, degree they study for and their past performance. 12.7 You have to design and implement a database that manages information about publishers, authors and books. Some information includes: A publisher has a name and an address for the headquarter. Each publisher also has a set of branches, each branch having an address and two phone numbers, An author has a name and an address. A book is published by a publisher and has a list of authors associated with it. An author can write many books and a book can be published by only one publisher. Prepared by Prot ipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -ie Downloaded from FaaDoEngineers.comEER MODEL aaaanEEmeeenmememmmeeemmmmmmmmmmmmmmemmemmemen= sss cecal 1.2.8 Consider a small airport database that is used to keep track of aeroplanes, their owners and airport employees. Some information includes: Each aeroplane has a registration number, is of particular type and is stored ‘in a particular hanger. Each plane type has a model number, capacity and weight, Each hanger has number, capacity and location. Each plane has an owner and the employees who are maintaining the plane. ‘An owner is either corporation or a person. Each plane undergoes service many times, the database keeps track of service records which includes date of maintenance, number of hours spent. 1.2.9 ABC Engg college is graded A college. It has five departments. The departments are headed by senior most and qualified faculty. The placement of final year students from all branches is managed by the placement center. The placement center is managed by one of the faculty from any department. The teaching load of that faculty is zero. To assist placement center head there are placement secretaries (whose teaching load is 13) from each department along with placement assistance from students (selected by the placement center) of all five departments, Placement center is responsible for on-campus and off-campus recruitment of students. The placement process requires students resume and relevant documents along with approval from placement center. Companies invited on campus conduct test followed by interviews. The criteria of selection depend on academic performance and interview. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -12- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES rr DATABASES 2::00DB MODEL Introduction One of the major drawbacks of the Relational data model is that it cannot represent complex objects. To overcome the limitations of the Relational model new data models are developed that incorporate object oriented features into databases. There are two options: (1) if we extend an existing OOPL with the features of Relational model such as transactions, recovery, concurrency, atomicity, etc. the resultant model is called OODB model and (2) if we extend an existing RDBMS with the features of OOPL such as object, types, inheritance, encapsulation, etc. the resultant model is called ORDB model, 2.1 Important features of OODBMS Persistency ‘An OODBMS extends the capability of an OOPL that it can create objects by making them persistent as per the requirement. Introducing persistency, automatically introduces the concepts of recovery and concurrency, Identity In RDB, each entity is identified by a value of primary key, which is user Specified value. This value dependent identification is rectified by OODBMS using the concept of object identifiers, which are system generated unique values, Complexity An object in an OODBMS can have very complex internal structure with multiple levels of complexity. State of one object may consist of other objects, Encapsulation Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-1, CP-II, DBMS, CG, ADBMS, DWM, IP -13- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES, Attributes of an object are coupled with operations or methods that operate on the attributes. Whereas in RDB, while creating tables we specify only attributes. Alll operations are generic and can be applied to any table. Relationship An OODBMS maintains relationship between two objects using a concept of inverse reference where one object maintains OID of the other object and vice versa, si & = a a < 3 oh. ; objects Inheritance Ajeckt object 3 Y We can create a new type or class from an existing type such that newly created type inherits all the properties of the existing type. 2.2 OODBMS w.r.t. Object identity, Object structure, Type constructors, Encapsulation and Persistecny 2.2.1 Object identity Whenever an object is created, system generates a unique object identifier. It is used for two purposes: (1) to identify the object uniquely and (2) to maintain relationships between the objects using inverse references. Properties of OID: (1) It is unique. (2) It is system generated. (3) It is invisible to the users. That is, it cannot be modified by the users. (4) It is immutable. That is, once generated, it is never regenerated. (5) It is along integer value. 2.2.2 Object structure and Type constructors Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -14- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES rr ee TABASES An object can have very complex internal structure. This internal structure can be specified using type constructors. Ideally each individual value must be represented using a separate object in OODBMS. There are six different type constructors, Atom: internal structure of the object consists of some atomic value such as a character, an integer, a string-ete, Set: internal structure of the object consists of collection of OIDs of other objects that are constructed using the same type constructor. It is denoted as {iL i2,, Tuple: internal structure of the object consists of colléction of OIDs of other -»in}, where each i is an OID of some other object. objects that may be constructed using different type constructors. It is denoted as , where al,a2,...,an are attributes of the object being created and i in are OIDs of some other objects. List: it is similar to set but it is ordered collection. It is denoted as [i1,i2,...,in]. Array: it is similar to list but it has an upper limit. It is of fixed size. Multiset or Bag: it is similar to set but it may contain duplicates. To understand these type constructors, let us take one example. Suppose that we have a small portion of a Relational database. Dept No | Dept Name MGR SSN 5 Research 1234 partment table Dept No | Location 3 Li 3 12 ‘department_location table Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -15- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES SSN Name Dept No 1234 Al . 3 7235 ~*([A2 wees 3 employee table Each individual value must be represented using an object. Each object can be formally written using a triple, O=(i,c, v) where i = object identifier, ¢ = type constructor and v = state value Using this notion, the information of our example relational database can be given as follows: 1 = (il, atom, 5) 02 = (i2, atom, “L1") 03 = (i3, atom, “L2") 04 = (i4, atom, “Research”) O5 = (i5, set, (12,13) 06 = (i6, tuple, ) 10, set, {i11,i12}) 11, tuple, ) 12, tuple, ) 2.2.3 Encapsulation and Class definitions To specify internal structure of objects using type constructors we need some form of ODL. If these type definitions include operations (encapsulation) then they are called class definitions. In actual implementation, atomic values are specified using basic data types rather than individual objects to avoid large number of objects in the database. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -16- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES ES For our example database, define class Department {ype tuple ( Dno: integer; Dname: string; Locations: set(string); Manager: Employee; Employees: set(Employee); ) operations create_dept: Department; destroy_dept: boolean; assign_emp(e Employee): boolean; remove_emp(e Employee): boolean; end Department. define class Employee type tuple (SSN: string; Ename: string; Dept : Department; ) operations create_emp: Employee; destroy_emp: boolean; end Employee. 2.2.4 Persistency Like OOPL, by default the objects created in OODBMS are transient, but if user wants then they can be made persistent via one of the two methods: (1) Naming: In this method, using a syntax provided by the OODBMS, the objects that user wants to make persistent are assign unique persistent name. ‘The disadvantage of this method is that if the database contains large number of objects then giving persistent name to each and every object will be inefficient. (2) Reachability: According to this method, to make objects of class C persistent, an object N is created whose internal structure is created using type Constructor set(C) and the it is made persistent using the Naming method. Now to make any object of class C persistent, it is inserted into object N and thus making it reachable from a persistent object. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -17- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES Objects like N, present in the database just to make objects of some other class persistent and not part of the database schema, are called extent of that class. 2.3 Complex objects There are two types of complex objects: unstructured complex objects and structured complex objects. Unstructured complex objects Normally they are used to represent objects that require large amount of storage space such’ as images or large text documents. If unstructured complex object represents an image then it is also called BLOB and if it represents text document then it is called CLOB. They are called unstructured because their internal structure is not known to the database system and system cannot perform any specific operations on such objects. When queries are issued on such objects, database system simply retrieves them-and passes them as is to the application programs, Structured complex objects They are called structured because they are created using repeated application of type constructors that are directly supported by the database system and system knows the internal structure of such objects and can perform specific operations on them. For example, a department object. Such objects have different levels of complexity in their internal structure. In the department object only Dno and Dname attributes are having direct values. Other components like Manager, Employees are objects themselves forming the second level of complexity. There are two types of semantics that exists between a structured object and its Dname, are said to be owned by components. Some components like Dno the complex object. The semantics is called ownership semantic and the Prepared by Prof, Vipul Dalal (9820833071) Subjects: CP-1, CP-Il, DBMS, CG, ADBMS, DWM, IP -18- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES TED DATABASES relationship between the complex object and such components is called isa relationship. Such components do not have their own identity. Other components such as employees are said to have reference semantic with the complex object. The relationship between the complex object and such components is called is-associated-with relationship. Such components are themselves objects and they have their own identity. Such components may have references from so many other complex objects. 2.4 OODB schema design Let us turn our attention to OODB schema design. There are certain differences between OODB schema design and RDB schema design. (1) The way relationships are represented. (2) The way inheritance is supported, (3) In OODB schema design, operations are decided early at design stage. Steps to convert an EER diagram into OODB schema: (1) For strong entity types a. Create an ODL class for each strong entity type present in the EER diagram. The ODL class should include all the attributes of the strong entity z type. ©. The multivalued attributes are represented using set type constructor. . |. The composite attributes are represented using tuple (struct) type constructor. - . Declare an extent for each ODL class, (2) For relationships a. Add relationship properties into the ODL classes. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -19- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES b. Depending upon the cardinality ratio, relationship properties may be single valued or collection types. ¢. If relationship attributes exist, a.tuple type constructor can be used to include a structure containing all the attributes of the relationship. (3) Include appropriate operations into the ODL classes. (4) For subclasses a. An ODL class that represents a subclass in the EER diagram is defined using “extends” keyword and inherits all the properties of its superclass. Only its specific attributes, operations and relationship Properties need to be specified. (5) For weak entity types a. If a weak entity type is participating in some relationship other than the identifying one then it can be mapped to an ODL class just like any other strong entity type. r . If a weak entity type is not participating in any relationship other than the identifying one then it can be considered as composite multivalued attribute of the strong entity type on which it is dependent and can be included in the class for the strong entity type using set> type constructor. (6) For union type or category a. For mapping a union type or category we need to specify a new key called “surrogate key” that is included in the ODL class for the category b. Include 1:1 relationship between the ODL class representing the category and all the ODL classes representing its superclasses. 2.5.0QL Consider a University OODB schema. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -20- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES class Person (extent person, key ssn) { attribute struct pname {string fname, string mname, string Iname} name; attribute string ssn; attribute date dob; attribute enum gender{M,F} sex; attribute struct Addr { string street, string city, String state, int pin} addr; short age(); hb class Faculty extends Person (extent faculty) { attribute string rank; attribute float salary: attribute string office; attribute int phone; relationship Department works_in inverse Department: shas_faculty; relationship set advises inverse Gradstudent::advisor; Void give_raise(in float amt); void promote(in string new_rank); ib class Grade (extent grades) { attribute enum Grade {A,B,C,D,E,F} grade; relationship Section section inverse Sectio1 ‘students; relationship Student student inverse Student: :completed_section; iH class Student extends Person (extent students) attribute string class; attribute Department minors_in; relationship Department majors_in inverse Deparment::has majors; relationship set completed_sections inverse Grade: ‘students; Void change_major(in string dname) raises (dname »_not_val Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -21- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES void assign_grade(in short secno, in Grade g); k class Degree { attribute string degree; attribute string college; attribute int year; 1H class Department (extent departments, key dname) { attribute string dname; attribute int dphone; attribute string doffice; attribute string college; attribute Faculty chair; relationship set has_faculty inverse Faculty::works_in; relationship set has_majors inverse Student::majors_in; relationship set offers inverse Course::offered_by; 1H class Couse (extent course, key cno) attribute string cno; attribute string cname; attribute string description: relationship set
has_sections inverse Section::of_course; relationship Deparment offered_by inverse Department h class Sestion (extent section) attribute short secno; attribute int year; attribute enum Half (first, second) half; relationship set students inverse Grade::section; Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I. CP-II, DBMS, CG, ADBMS, DWM, IP -22- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES EERE ie ered relationship Couse of_course inverse Course::has_section; h 2.5.1 Basic features of OQL Like SQL, the general syntax for OQL is also: select from . Q-1 Find names of the departments for the college of engineering. select d.dname from d in departments where d.college = “engineering”; In the above query, “departments” is the extent of the class Department that is used as database entry point. ‘d’ is called iterator variable, and it is required when we are working with a collection of objects or elements and we need to go through each object or element individually. By default, the type of output from a select... from... where... clause is bagc......>. If we want set then we have to use select distinct... So, the type of output for the above query is bag. In OQL, queries need not have select... from... where... form always. Any persistent name by itself is a query. For example, departments; is a query since it is an extent’s name and it is persistent by name. This query Teturns a reference to set of Department's objects. That is, the type of output for this query is set. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -23- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES Similarly, if an object representing the computer science department has been made persistent by name called csdepartment, then esdepartment; is a query. This query returns a reference to a single Department object Tepresenting computer science department. That is, the type of output for this query is Department. From the database entry point, we can form a path expression. For example, csdepartment.chair; is a path expression where csdepartment is used as an entry point. This query: returns a reference to an object of class Faculty representing the chairperson of the computer science department. That is, the type of output is Faculty. Similarly, csdepartment.chair.rank; returns the rank of the chairperson of computer science department as a string. esdepartment.has_faculty; returns a set of Faculty objects representing faculties working for computer science department. That is, the type of output is set. But the query. csdepartment.has_faculty.rank; is invalid since the type of output is ainbiguous (set or bag). It can be solved this way, select frank from f in csdepartment.has_faculty;, now the type of output is clear and that is bag. If set is the desired output type then distinct can be used. In OQL, queries output can be structured using a keyword “struct”. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-1, CP-II, DBMS, CG, ADBMS, DWM, IP -24- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES: Q? Find first name, last name and list of degrees for all graduate students advised by the chairperson of the computer science department, select struct (Name : struct ( Fname : s.name.fname, Lname : s.name.Iname), Degrees : select struct ( Degree : d.degree, Year : dyear, college : d.college) from d in s.degrees) from s in csdepartment.chair.advises; Q3 Retrieve first name, last: name and gpa for all senior students majoring in computer science department, ordered by gpa and within that by last name an then by first name, select struct ( Fname : s.name.fname, Lname : s.name.lIname, GPA : s.gpa) from s in csdepartment.has_majors where s.class = “senior” order by GPA ase, Lname desc, Fname desc; the same query can be solve in a different way, select struct ( Fname : s.name.fname, Lname : s.name.Iname, GPA : s.gpa) from s in students where s.class = “senior” and s.majors_in.dname = “computer science” order by GPA asc, Lname desc, Fname desc; 2.5.2 Additional features of OQL Named Queries or Views: Queries can be assigned unique name and can be Stored permanently in the database as a part of database, such queries are called Named queries and they are similar to views in SQL. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -25- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES To create named queries a keyword “define” is used. Unlike ad hoc queries, they remain present in the database until deleted or replaced by other named query. They can accept parameters also. . ‘They can be referenced from other queries whenever needed. Q-4 Create a named query or view to retrieve all the students minoring in a given department. ~ define has_minors(dept_name) select s from s in students where s.minors_in.dname = dept_name; Such queries are useful to represent the inverse relationships that are not specified directly in the OODB schema. Extracting element from singleton collection: ‘An operator “clement” is used to extract a'teference to the object in a resultant set that contains a single element. For example, element ( select d from d in departments where d.dname = “computer science”); returns a reference to a single object of Department. Without “element” ‘operator, query returns a reference to a set of Department objects that contains a single object. That is, type of output would be set without element keyword. If the collection has zero or more than one element, then “element” operator raises an exception. Collection Operators ‘They include aggregate operators, membership and quantification operators. ‘Aggregate operators are similar to aggregate functions of SQL. by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -26- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES Q-5 Find number of students minoring in computer science department. count( select s from s in students . where s.minors_in.dname = “computer science”); Q-6 Find average gpa of all senior students majoring in computer science department. avg ( select s.gpa from s in students where s.class = “senior” and s.zmajors_in.dname = “computer science” Q-7 Find names of all departments with more than 100 majors. select d.dname from d in departments where count (d.has_majors) > 100; Membership and quantification operators are defined as follows: Let v be a variable, c be a collection, b be a Boolean expression and e be an element of the type of elements in c, (eine returns true if element ¢ is member of collection c otherwise false. (2) for all vine:b returns true if each and every element in c satisfies condition b otherwise false. (3) exists vine :b returns true if there exists an element in c for which condition b is true otherwise false. Expression (1) is called membership operator, expression (2) and (3) are called quantification operators, Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-1, CP-II, DBMS, CG, ADBMS, DWM, IP -27- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES Q-8 Find first name and last name of all students who have completed the course “DBMS”. select s.name.Iname, s.name.fname from s in students where “DBMS” in (select c.name from c in s.completed_sections.section.of_course); Q-9 Are all computer science graduate students advised by faculty from computer science department? for all v in (select s from s in gradstudents where s.majors_in.dname = “computer science”) : v.advised_by.works_in.dname = “computer science”; Q-10 Is there any computer science graduate student with 4.0 gpa? exists V in (select s from s in gradstudents where s.majors_in.dname = “computer science”) :V.gpa = Ordered Collection expressions Queries that involve list or array can include these expressions. Q-11 Find the name of faculty earning the highest salary. first ( select struct ( Fnam Salary : f. from f in faculty order by Salary desc); Q-12 Retrieve top 3 majors from computer science department. (select struct ( Fname : s.name.fname, Lname : s.name.Iname) Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -28- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES aE tetoeee acre et from s in students where s.majors in.dname = “computer science”) [0:2]; 2.6 Persistent programming languages A programming language that is extended with constructs to handle persistent objects is called persistent language. At first glance, a persistent programming language look very similar a language with embedded SQL, but there are certain differences between them. (1) for an embedded SQL language, the type stream of the host language differs completely from the type stream of SQL. The programmer is responsible for all type conversions. Whiereas, in a persistent programming language, the query language is completely integrated with the host language and both share the same type stream. Any conversion required, is done transparently. (2) for an embedded SQL language. the programmer is responsible for fetching data to host language's memory space (via cursor and fetch) and apply any updates to the database. Whereas, in a persistent programming language it is done transparently. The persistent programming language provides tight integration between the host language and query language than embedded SQL. 2.6.1 Persistent C++ Systems Persistent C+ systems that add features to the C++ language have been built, as also systems that avoid changing the language. C++ language allows support for persistence to be added without changing the language. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -29- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES, The Object Database Management Group is an industry consortium aimed at standardizing object-oriented databases. ODMG C++ standard avoids changes to the C++ language. It provides functionality via template classes and class libraries. ODMG Types ‘© Template class d_Ref used to specify references (persistent pointers). ¢ Template class d_Set used to define sets of objects. © Methods include insert_element(e) and delete_element(e). * Other collection classes such as d_Bag (set with duplicates allowed), d_List and d_Varray (variable length array) also provided. *: Database version of many standard types provided, e.g. d_Long and string © Interpretation of these types is platform independent. * Dynamically allocated data (e.g. for d_string) allocated in the database, not in main memory. ODMG C++ ODL: Example class Branch : public d_Object { public: d_String Bname; d_String Baddr; d_String Beode; k class Person : publicd_Object{ public: d_String name; — // should not use String! d_String address; hb class Account : public d_Object{ private: d_Long balance; public: d_Long number; d_Set > owners; int find_balance(); Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -30- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES ee int update_balance(int delta); class Customer: public Person { public: d_Date member_from; d_Long customer_id; d_Ref home_branch; d_Set > accounts; he Implementing Relationships * Relationships between classes implemented by references. * Special reference types enforce integrity by adding/removing inverse links. * Type d_Rel_RefcClass, InvRef> is a reference to Class, where attribute InvRef of Class is the inverse reference. Similarly, d_Rel_Set is used for a set of references. * Assignment method (=) of class d_Rel_Ref is overloaded. Uses type definition to automatically find and update the inverse link. * Frees programmer from task of updating inverse links. * Eliminates possibility of inconsistent links. Similarly, insert_element() and delete_element() methods of d_Rel_Set use type definition to find and update the inverse link automatically. Eg. extern const char _owners[ ], _accounts[ ]; class Account : public d.Object { d_Rel_Set owners; } 11, Since strings can’t be used in templates ... Const char _owners=.“‘owners”; const char _accounts= “accounts”; Prepared by Pro! il Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -31- Downloaded from FaaDo0Engineers.comOBJECT ORIENTED DATABASES ODMG C++ Object Manipulation Language © Uses persistent versions of C++ operators such as new(db). © d_Ref account = new (bank_db, “Account”) Account; © new allocates the object in the specified database, rather than in memory. © The second argument (“Account”) gives typename used in the database. '* Dereference operator -> when applied on a d_Ref reference loads the referenced object in memory (if not already present) before continuing with usual C++ dereference. © Constructor for a class: a special method to initialize objects when they are created; called automatically on new call. » Class extents maintained automatically on object creation and deletion. © Only for classes for which this feature has been specified Automatic maintenance of class extents not supported in earlier versions of ODMG. ODMG C+OML: Database and Object Functions © Class d_Database provides methods to © open a database: open(databasename) © give names to objects: set_object_name(object, name) © look up objects by name: ]ookup_object(name) © rename objects: rename_object(oldname, newname) (© close a database: close(databasename); © Class d_Object is inherited by all persistent classes. © provides methods to allocate and delete objects. © method mark_modified() must be called before an object is updated. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -32- Downloaded from FaaDoEngineers.comOBJECT ORIENTED ODMG C++ OML: Example int create_account_owner(String name, String Address)( Database bank_db.obj; Database *bank_db = & bank_db.obj; bank_db -> open(“Bank-DB”), _Transaction Trans; Trans, begin(); d_Ref account = new (bank_db) Account; d_Ref cust = new (bank_db) Customer: Cust->name - name; cust->address = address; Cust->accounts.insert_element(account); -»+ Code to initialize other fields Trans.commit(); * Class extents maintained automatically in the database. * To access a class extent: d_Extent customerExtent(bank_db); * Class d_Extent provides method d_Iterator create_iterator() to create an iterator on the class extent * Also provides select (pred) method to return iterator on objects that satisfy selection predicate pred, * Tterators help step through objects in a collection or class extent. © Collections (sets, lists etc.) also provide create_iterator() method. Eg. int print_customers() { Database bank_db_obj; Database * bank_db = &bank_db_obj; bank_db->open (“Bank-DB”); d_Transaction Trans; Trans.begin (); d_Extent all_customers(bank_db); Prepared by Prof. Vipul Dalal (9820833071) Subjec P-I, CP-II, DBMS, CG, ADBMS, DWM, IP -33- Downloaded from FaaDoEngineers.comd_Iterator> iter; iter = all_customers->create_iterator(); d_Ref p; while(iter.next (p)) OBJECT ORIENTED DATABASES print_cust (p); // Function assumed to be defined elsewhere Trans.commit(); } ODMG C++ Binding: Other Features © Declarative query language OQL, looks like SQL. © Form query as a string, and execute it to get a set of results (actually a bag, since duplicates may be present). d_Set> result; d_OQL_Query qI("select a from Customer c, c.accounts a where c.name="Jones’ and a.find_balance() > 100"); d_ogl_execute(q1, result); © Provides error handling mechanism based on C++ exceptions, through class d_Error. © Provides API for accessing the schema of a database. 2.7 OODBMS architecture Stand-alone architecture If you are using C++ or Java in a stand-alone application and have the need for a database that provides high performance on complex data, it, is difficult to beat an ODBMS. The reason is two-foid: 1. With an OLBMo, you nave oaly one model to manage -- the model Prepared by Prof. Vipul Dalal (9820833071) Objects Server ‘Object Database, Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -34- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES ee that your object programming language uses. See the diagram, which shows the same model being used in the database and the application. There is also no need to program any mapping between’ the data in the database and the data in the application. 2. An ODBMS gives you excellent performance on object models. This means either you can get extreme performance on complex data or you can use less expensive hardware than you might need with a relational DBMS, for example. Architecture with existing data sources Object databases can be a way of staging data for your C++ or Java applications, This example shows two existing data sources that have data in non-object formats (flat file and relational, for example). The non-object data is mapped into object models and stored in the object database. This object database now holds some part of the existing data and perhaps some of its own data that - did not exist previously. At some later time, the object application Server can obtain this data and tap the high performance that an object ; Server database provides. This Object performance is a result of having Database the same model in the object Existing Data ; : Sone, database as is used by the object application, Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -35- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES 2.8 Concurrency control in OODBMS A transaction is viewed as a unit of concurrency and a unit of recovery. As a unit of concurrency, the steps of several transactions can be interleaved so that they will not interfere with each other. As a unit of recovery, a transaction either succeeds totally or it has no effect on database. In traditional transaction management, a transaction is a sequence of reads and writes against a database. A tansaction, in traditional applications has two properties: ATOMICITY: Means that the sequence of reads and writes in a transaction must be considered as a single atomic action. SERIALIZABILITY: Means that the effect of the concurrent execution of more thai one transaction is the same as that of executing the same set of transactions serially. . Now transactions in OODBMs are generally of long duration. For e.g., developing GUI. In this case atomicity and serializability properties are highly undesirable. The atomicity property means that if a long duration transaction fails,all the work done by that transaction must be undone. The serializability property means that if any data item is locked by a long duration transaction, then another long duration transaction must be blocked until the lock on the data item is released. The concurrency approaches can be divided into two groups: (1) Transaction approach: here the concurrency property is defined according to the semantics of transaction and the data it can manipulate. (2) Data approach: here the concurrency is defined according to the semantics of data and what type of operations can be performed on it. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -36- Downloaded from FaaDoEngineers.comOBJECT ORIENTED DATABASES rr There are three types of transaction approaches: 1) Compatibility approach 2) Constraint based approach 3) Semantic pattern approach There are two types of data approaches: » Schwarz’s model 2) Herilihy and weith's model 2.9 Exercise . 2.9.1 Design Object oriented database schema for exercise 1.8.4 using EER to 00 mapping and write OQL for the following. * Retrieve the riames of all students who completed the course’ called “ADBMS”. Retrieve the names of all departments in the college of engineering, Display the name of the faculty who is teaching any section a course that is offered by the IT department, provided that the section is taken by at least one graduate student who is a research associate. Name the research group who has secured the highest grant. * List the research projects along with its principal investigator and other researchers. 2.9.2 Design Object oriented schema for exercise 1.8.5 and take two typical queries and solve them in OQL. 2.9.3 Design Object oriented schema for exércise 1.8.7 and take two typical queries and solve them in OQL. 2.9.4 Design object oriented schema by identifying different classes and methods for exercise 1.8.8. Prepared by Prof. Vipul Dalal (9820833071) ‘ ‘Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -37- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES )BJECT RELATIONAL DATABSES INTRODUCTION The object relational model extends the relational model by providing a richer type system including complex data types and object orientation. Along with data model, the SQL supported by the RDM also needs to be extended to deal with the richer and complex type system. So, ORDBMS provides a convenient migration path for users of relational databases who wish to use object oriented features. 3.1 NESTED RELATIONAL DATA. MODEL RDBMS supports “flat relational data model” where due to INF condition domain of every attribute must be atomic. Due to this condition, RDBMS can support complex objects properly. Information of one real word object splits into multiple tuples and many times multiple tables. One to one correspondence between user's notion of an object and the DBS’s notion of a tuple is not achieved. The “nested relational model” is an extension of the relational model in which domains may be nonatomic. That is, the value of a tuple on an attribute may be a relation, and relations may be contained within a relation. ‘A complex object can be represented by a single tuple of a nested relation, which preserves the one to one correspondence between user’s view of an object and a data item. For example, employee infomation . empid ename phone dependent First [ Second | last Name | Age | relation Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -38- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES eS Using nested relational data model in ORDBMS, the same information can be stored in a single table. empid | ename phone dependent First_[_Second Name | Age |. Relation E100 | -- _ = 3.2 AN OVERVIEW OF SQL3 SQL 3 is an extension of SQL:92, in terms of object oriented and other features. A subset of SQL 3 called SQL:99 is an approved standard. Main features of SQL:99 1. type constructors to specify complex objects 2. mechanism to specify reference type 3. Encapsulation of operation in user defined types 4. - Inheritance mechanism 3.2.1 Type constructors Type constructors in ORDBMS are used to create user defined data types that are needed when the object is complex. There are two different type constructors supported by ORDBMS. Row type and array type constructors. Row type constructor is needed whenever composite attribute is present and array type constructor is needed whenever multivalued attribute is present as a part of object. A row type may be defined as, create type row_type_name as () Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -39- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES for example address, the most famous composite attribute can be represented by a row type as, create type addr_type as‘( street varchar(20), city varchar(20), pincode char(6)); Similarly, an employee can be represented as, create type emp_type as ( name varchar(35), addr addr_type, age integer); ‘An array type may be specified for an attribute whose value will be a collection. For example a company may have multiple locations, create type comp_type as ( comp_name varchar(20), locations varchar(20) array[10]); 3.2.2 Object identifiers using references A row type can be used either as data type for an attribute or it can be used to create tables. For example, table of employees may be created from the row type emp_type as, create table employee of emp_type REF is emp_id system generated. Similarly, table of company, create table company of comp_type ( REF is comp_id system generated, primary key (comp_name)); 1 Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -40- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES eS When REF keyword is included in the create table statement, ORDBMS generates unique identifier also called reference for each tuple inserted into the table. REF is used for the same purpose as that of OID in OODBMS. That is, it can identify every tuple and can be used to create relationship between two tables, For example, create table employment ( emp REF(emp_type) SCOPE (employee), comp REF(comp_type) SCOPE (company); Each tuple of employment table maintains reference values for one tuple of employee table and one tuple of company table. And that is how two independent tuples of employee table and company table are related. SCOPE keyword specifies the name of the table whose tuples can be referenced. For attributes whose type is REF, the dereferencing symbol -> is used. For example, retrieve employees working for the company “ABC” select e.employee->name from employment ¢ where e.company->comp_name= “ABC”; 3.2.3 Encapsulation of operations | User can create a named user defined type with its own attributes and operations, The general form is, create type as ( list of attributes with individual types) declarations of functions; yy | Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -41- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES for example, we want to extract the apartment no from a string that forms the street attribute, create type addr_type as ( street varchar(20), city varchar(20), zip char(6) ) method apt_no() returns char(8); ‘The code for the method can be specified in two different ways, either using programming statements of SQL itself or using some other host language such asC. The first way, create function apt_no() returns char(8) for addr_type as SQL block; And the second way, create function apt_no() returns char(8) for addr_type as external name ‘path’ language ‘C’; 3.2.4 Inheritance Type hierarchies can be created, where subtype inherits all the attributes of supertype. Similarly, a table can be created from some existing table. For example, a manager type can be created from existing employee type as, create type mgr_type under emp_type as ( dept_no varchar(S)); Here, the newly created type mgr_type inherits all the components specified in the emp_type and has one additional component dept_no. Now the manager table can be created as, create table manager of mgr_type under employee; Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP 742- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES ES In this case, the emp_type-is the supertype, mgr_type is the subtype and similarly, employee table is the supertable, manager table is the subtable. This type of inheritance is called type & table inheritance. The net effect of table inheritance is that any insertion, deletion or updates carried out on the subtable is automatically applied to the supertable. That is, for each tuple in the subtable there exists a tuple in the supertable. Only table inheritance can be applied without creating the corresponding type. create table manager under employee ( dept_no varchar(5) ); In this case, the employee table is the supertable, the manager table is the subtable but emp_type is not the supertype. Only type inheritance can be applied as, create type mgr_type under emp_type as ( dept_no varchar(5)); create table manager of mgr_type; In this case, the emp_type is the supertype, the mgr_type is the subtype but employee table is not the supertable. That is, the employee table and the manager table are completely independent of each other. (So, when to go for which type of inheritance??? Think over it.) 3.3 COMPARISON OF RDBMS, OODBMS AND ORDBMS The three data models can be compared with respect to certain properties such as query language, efficiency, data protection, complexity of objects, etc. ~ Both OODBMS and ORDBMS provide user defined types, object identity, reference types and inheritance. + Both support some form of concurrency control and recovery schemes. Prepared by Prof. Vipul Dalal (9820833071) | ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -43- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES - OODBMS aims to achieve seamless integration with a programming language like C++, Java. This integration is not important in the embedded SQL. - OODBMS is aimed at applications where an object-centric viewpoint is appropriate. - Transactions are of long duration in OODBMS. - ORDBMS aimed at RDBMS users and to give them extensions to model complex real world objects. Transactions are assumed to be short and ordinary mechanisms of RDBMS are used to manage them. 3.4 OBJECT RELATIONAL SUPPORT IN ORACLE 91 (ADT, VARRAY & NESTED TABLE) The concepts of row type, array type and nested relational data model are supported by Oracle 9i in terms of ADT, Varray and Nested tables. 3.4.1 Abstract data type (ADT) create type addr_type as object ( street varchar2(30), city varchar2(30), state varchar2(30), pincode number(6)); create table customer ( cust_ID number, cust_name varchar2(30), cust_addr addr_type); Now a tuple can be insert into customer table as, insert into customer Values(1,‘Cl’, addr._type(‘s1’,'ctl’,’s1’,400001))); ADT is required when a composite attribute is present in the object. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -44- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES. BASES 3.4.2 Varray Varray is useful when a multivalued attribute is present in the object st such as location for a company. create type location_va as varray(5) of varchar2(30); create table company ( name varchar2(30), locations location_va); Now a tuple can be inserted into company table as insert into company values(‘ABC’, location_va(‘L1',’L2’,’L3?,Null,Null)); 3.4.3 Nested Tables If an object has composite multivalued attribute such as transactions for an account then combination of ADT and Varray may be used. create type trans_type as object ( tno number, tdate date, tamt number(10,2)); create type trans_va as varray (50) of trans_type: Now, in the table of account the attribute transaction can be specified using data type trans_va but the limitation is the size associated with the varray. Here, we can store at the max 50 transactions for any account. Solution to this problem is Nested table. Instead of declaring Varray for transactions, we can declare a table of transaction which is nested inside the account table. A table doesn’t have size restriction. create type trans_table as table of trans_type; Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -45- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES create table account ( accno number, balance number(10,2), transactions trans_table), nested table transactions store as Trans_NT_TAB; 3.5 ORDBMS IMPLEMENTATION ISSUES 3.5.1 Storage and access methods: The user defined ADTs may be very complex and large in size. They can be bigger than a single disk block. Objects of type BLOBs and CLOBs require special storage normally a different location on disk from the tuples-that contain them. Disk based pointers are used in such tuples. ADT objects often vary in size during the lifetime of a database. For objects of type arrays, the elements are stored sequentially on disk in row by row fashion. But queries may request elements that are not stored contiguously and results in very high /O cost. Another important issue for ORDBMS is to provide efficient indexing for ADT methods and operators. Different applications based on ORDBMS require different index structure. Generalized Search Tree (GiST) is a template index structure implemented within the ORDBMS, which allows most of the tree index structures invented so far. It is based on B+ trees. 3.5.2 Query Processing and Optimization: 3.5.2.1 Method Caching User defined ADT methods can be very expensive to execute. During query processing, it may make sense to cache the results of the methods. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -46- Downloaded from FaaDoEngineers.comOBJECT RELATIONAL DATABASES Parser and relational algebra [aueey} translator expression execution plan Statistics . about data 3.5.2.2 Pointer Swizzling ~ In some applications, objects are retrieved into memory and accessed frequently through their Oids. This dereferencing must be implemented efficiently. ~ System maintains a table of Oids of objects that are currently in memory. ~ When an object © is brought into memory, system checks each Oid contained in O and replace Oids of in memory objects by in memory pointers and those objects, ~ This technique is called pointer swizzling and makes references to in memory objects very fast. 3.6 Exercise 3.6.1 Design ORDB schema for exercise 1.2.2 Prepared by Prof. Vipul Dalal (9820833071) - Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP 747- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES 4::DISTRIBUTED AND PARALLEL DATABASES Introduction : ORDBMS and OODBMS improve over RDBMS in terms of data model that is the way data is organized and managed. Distributed and parallel databases improve over traditional database system called Centralized system in terms of its hardware implementation and query processing. We will see first distributed database system and then overview of parallel database system, 4.1 DISTRIBUTED DATABASES The main goal of a DDBMS is to bring the advantages of distributed computing to the database applications. 4.1.1 Distributed computing System It consists of number of processing elements, may be heterogeneous, that are interconnected by a computer network, The main features of a distributed system are: - A complex problem can be partitioned into smaller pieces and solved efficiently. - More computing power is generated at low cost. - Individual processing elements are autonomous and can be managed independently. - Reliability of the entire system improves. = Scalability of the system improves. Now we can define distributed database and DDBMS. Distributed Databases ® It is a collection of databases that are logically interrelated an distributed over a computer network. TL Prepared by Prof. Vipul Dalal (9820833071) Subjects: Cp. CP-ll DBMS, CG, ADBMS, DWM, IP -48- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES SS ALLEL DATABASES Distributed Database System * It is a software system that manages a distributed database, makes distribution transparent to the user. That is, user gets feel of centralized system with advantages of distributed system. EMPLOYEES - All PROJECTS - All WORKS_ON - All EMPLOYEES - New York Chicago PROJECTS - All (headquarters) WORKS_ON - New York Employees EMPLOYEES - San Francisco and LA PROJECTS - San Francisco WORKS_ON - San Francisco Employees [ San Francisco) Communications neteork Los Angeles| Atlanta] EMPLOYEES - LA EMPLOYEES - Atlanta PROJECTS - LA and San Francisco PROJECTS - Atlant WORKS_ON-LA Employees WORKS _ON - Atlanta Employees 4.1.2 Advantages of Distributed Database System 1) Management of distributed data with different levels of transparency: * Distribution network transparency: User is free from network operations, "Fragmentation Transparency: Itmakes user unaware of the existence of fragments. "Replication Transparency: Copies of data may be stored at multiple sites for better availability and teliability. User is made unaware of the existence of multiple copies. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP =49- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES 2) Increased Reliability and Availability If one or more sites fail, then also other sites continue to operate. 3) Improved Performance: DDBMS fragments the DB by keeping the data closer to where it is needed most. When a large DB is distributed over multiple sites; i) Smaller DBs exist at each site. As a result, local queries and transactions can perform better. In addition, ii) Each site has a small no. of transactions. Moreover, iii) interquery and intraquery parallelism can be achieved. 4) Easier Expansion: System is scalable. 4.1.3 DATA FRAGMENTATION The process of decomposing the database into multiple units called fragments, which may be stored at various sites, is called fragmentation. Completeness constraint The most important condition of data fragmentation process is that it must be complete. That is, once a database fragmented, it must be always possible to reconstruct the original database from the fragments. 4.1.3.1 Horizontal fragmentation © Horizontal fragmentation divides a relation horizontally by grouping rows to create subsets of tuples where each subset has some logical meaning. © So, a horizontal fragment of a relation is a subset of the tuples in that relation. © Horizontal fragments are specified by the select operation of the relational algebra on one or more attributes. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-Il, DBMS, CG, ADBMS, DWM, IP -50- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES SEN © Primary horizontal fragmentation is the fragmentation of primary relation (a relation on which other relations are dependent using foreign key). © Horizontal fragmentation. of a primary relation introduces Derived horizontal fragmentation of other secondary relations. that are dependant on primary relations. © Complete horizontal fragmentation It generates a set of horizontal fragments that include each and every tuple of original relation. © Disjoint horizontal fragmentation It generates a set of horizontal fragments where no two fragments have common tuples. To reconstruct the original relation we need to perform union operation on fragments. The original relation can be reconstructed if and only if completeness constraint is satisfied. 4.1.3.2 Vertical fragmentation © When each site does not need all the attributes of a relation, vertical fragmentation is used to fragment the relation vertically by columns. © Itis necessary to include primary key or some common candidate key in every fragment to reconstruct the original relation. © Vertical fragmentation can be specified by project operation. © Complete vertical fragmentation: Ht generates a set of vertical fragments that include all the attributes of original relation and share only primary key of original relation. * LIUL2U......ULn=ATTB(R) * And LinLj=Pk(R) fori sj Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS,-DWM, IP -51- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL, DATABASES To reconstruct the ofiginal relation, we can apply full outer join operation on the fragments. 4.1.3.3. Mixed( Hybrid ) fragmentation: © We can mix horizontal and vertical fragmentations. ©. Using, select ~ project operation we get mixed fragmentation. We can reconstruct the original relation by applying union and outer join operation in appropriate order. 4.1.4 Fragmentation Schema It is a definition of fragments that includes-all attributed and tuples in the DB and satisfies the condition that the whole DB can be reconstructed from the fragments by applying some sequence of outer join and union. 4.15 Allocation Schema Once fragments are generated they are assigned to various sites. Allocation schema describes the allocation of fragments to sites of the DDBMS. 4.1.6 DATA REPLICATION AND ALLOCATION To improve availability, data may ‘be replicated at multiple sites. Replication schema is description about replication of fragments. There are three basic ideas — No replication, Full replication and Partial replication. © Fully replicated DDB is extreme case of replication where each site has entire database. It is good for high availability and’ retrieval type of queries. But it slows down update operation and makes concurrency control and recovery tasks more difficult. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -52- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES ee ELEL DATABASES © No replication means, each fragment is stored exactly at one site. It is the other extreme case. © Partial replication means, some fragments are replicated whereas others are not. Number of replicas created for a fragment directly depends upon the importance of data in that fragment. Finding an optimal or good Solution to data allocation is a complex optimization problem. 4.1.7 QUERY PROCESSING IN DDBMS In DDBMS, the main cost factor in query evaluation is the cost of transferring data over the network. which includes intermediate results that are transferred ‘0 other sites for further processing as well.as the final result'that may have to be transferred to the result site. The main goal of query optimization algorithm is to reduce the amount of data transfer. It has to perform the following steps. ~ Find all possible ways (plans) to evaluate the given query. ~ Estimate the cost of each plan based on number of bytes require to be transferred over network. > Select the best plan. ~ Try to reduce number of bytes require to be transferred over network further. Let us take an example. Site 1 has employee relation and Site 2 has department relation. employee relation at site 1. Name [SSN [Bdate | Address Salary [Dno SuperSSN | Total 10,000 records, length of each record =100 bytes where length of SSN = 9 bytes, length of Dno=4 bytes, length of Name=30 bytes So size of the relation = 10° bytes. Prepared by Prof. Vipul Dalal (9820833071) ‘" Subjects: CP-l, CP-II, DBMS, CG, ADBMS,DWM, IP. 4 -53- Downloaded from FaaDoEngineers.com“_DISTRIBUTED & PARALLEL DATABASES department relation at site 2. [NGRSSN[Mdate Total 100 records, length of each record=35 bytes where length of Dname=10 byte, length of Dno= 4 bytes, So size of the relation = 3500 bytes. - If the following query is submitted at Site 2, “For each employee retrieve employee name and department name." Then it can be answered as, TT Nane Deane (Employee DX pyospno Department ) The result will have 10,000 records with 40 bytes for each record, There can be two different strategies. © Transfer Employee relation to site 2, perform join at site 2. It requires “1,000,000 bytes to be transferred. © Transfer Department relation to site 1, perform join at site 1 and send back result to site 2, It requires 4,00,000 + 3500 = 4,03,500 bytes to be transferred. Using SemiJoin © The main idea is to reduce the no. of tuples in a relation before transferring it to another site. © Send only the joining column of relation R to the site where the other relation S is located. © Join this column with S. © Join attributes, resultant attributes are projected, and: sent back to the original site and joined with R. Let us see, how semijoin can be applied in our example query. ~ project join attribute at site 2 and transfer it to site 1. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-1, CP-I, DBMS, CG, ADBMS, DWM, IP -54- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES imme ete FT pm (Department) size= 4*100=400 bytes. ~ perform join at site 1 and transfer required attributes to site 2. R= Tlononene (F ™pno=pno Employee) size = 34 *10000=3,40,000 (Which is less than the number of bytes required in option IT in previous case.) A semijoin operation R %4a5 S, where A and B are domain compatible attributes of R and S respectively; produces the same result as Tia (R Meg S). Itis implemented by first transferring F=I, (S) to site where R resides. and then joining F with R. The semijoin operation is not commutative. Rk S # SkR, 4.1.8 CONCURRENCY CONTROL IN DDBMS 4.1.8.1 Problems arise in DDBMS 1, Dealing with multiple copies of data items Concurrency method is responsible for maintaining consistency among multiple copies - The recovery method is responsible for making a copy consistent with other copies if the site on which it is stored fails, Failure of individual sites When a site recovers, its local DB must be brought up to date with the rest of rn the sites. 3. Failure of Communication Site Network partitioning may occur. |. Distributed Commit If some sites fail during the commit process then two phase commit protocol can be used. PS Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -55- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES 5. Distributed Deadlock Deadlocks may occur among several sites, so techniques for handling deadlocks must be extended. 4.1.8.2 Concurrency control based on a distinguished copy of a data item ‘The idea is to designate a particular copy of each data item as a distinguished copy. ‘The locks for this data item are associated with this copy, and all locking and unlocking requests are sent to the site that contains that copy. Primary Site method © A single “Primary Site” is selected which contains distinguished copies for all database items. © Alllocks are kept at this site and all requests are sent here. (site3} $F} [Site] ©. There are two major disadvantages = Primary site may be overloaded. » Failure of primary site brings down entire system. ©. There are two advantages Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP +56- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES * Simple implementation * No distributed deadlocks Once ‘locks are accessed from the Primary site, the data items can be accessed from any site. Primary Site with Backup Site © Apart from a primary site, a second site is selected as a back up site. © All locking information is maintained at both the sites, © If primary site fails, the backup site can take over and operation can be "resumed after selecting another site as a backup site and copying all locking information at the new backup site. © It slows down the operation. Primary Copy method © The locking information is distributed, among various sites having distinguished copies, © Failure of one site will affect only transactions accessing locks on items whose primary or distinguishing copies reside at that site. © TW enhances reliability. But now in this case distributed dead lock may occur, 4.18.3 Concurrency control based on voting © There is no distinguishing copy, rather a lock request is sent to all sites that include a copy of the data item, © Each site maintains its own locking information for its copy of data items and can grant or delay the request, © Ifa request from a transaction is granted by a majority of the sites, it holds the lock and informs all sites that it has been granted the lock. © Ithas higher network traffic among sites, © Itis more complicated to implement. ipul Dalal (9820833071) “I, DBMS, CG, ADBMS, DWM, IP -5T- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES 4.1.9 RECOVERY IN DDBMS © A log is maintained as a part of normal execution to provide necessary information to recover from failure. © In addition to the kinds of information maintained in a centralized DBMS, actions taken as part of the commit protocol are also logged. © The Two-Phase Commit (2PC) can be used. Normal execution and commit protocol © Apart from normal logging activity, a commit protocol is followed to * ensure that all subtransactions of a given transaction either commit or abort uniformly. © The transaction manger at the site where the transaction originated is called coordinator and the transaction managers at sites where subtransactions execute are called subordinates. © At the time, user decides to commit the transaction, the commit command is sent to the coordinator and this initiates 2PC. « The coordinator sends a prepare message to all subordinates. * Upon receiving prepare message, the subordinate decides whether to abort or commit its subtransaction. It writes an abort or prepare og record and sends no or yes message to the coordinator. = If coordinator receives yes message from all subordinate, it writes a commit log record and sends a commit message to all subordinates. If it receives even one no message or no response from some subordinate for a specified time out interval, it writes abort log record and sends an abort message to all subordinates. * Based on the message received by the subordinate(commit/abort) it writes commit/abort log record, commit/abort the subtransaction and sends ack message to the coordinator. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -58- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES. aaa RRmeemieeemmmmmmesssmsee sree acre eee * After receiving ack message from all the subordinates,’ the coordinator writes an end log record for the transaction. © A transaction is officially committed at the time the coordinator’s commit Jog record is written, Restart after a failure © when a site comes up after a crash, a recovery process is invoked which reads the log and processes all transactions executing the commit protocol at the time of crash. © If we have commit/abort log record for transaction T, its status is clear and we redo or undo T. If this site is the coordinator, it has to resend commit/abort message to subordinates until it receives ack message from them. © If we have prepared log record for transaction T,, it is subordinate site. It must contact the coordinator to find out the exact status of the transaction and then necessary action is taken. © If we have no prepare, commit or abort log record for T, the commit Process was not started before crash-and we don’t have to determine the exact status of the transaction, as well as no way to determine the current site is coordinator or subordinate. — If it is coordinator, then it might have sent prepare message to one/more subordinates and they may try to contact recovery process of the coordinator. Now, we can say that this is coordinator site and can send abort message. 4.1.10 CLIENT/SERVER ARCHITECTURE Practically a distributed system is approximated using client/server architesture. The concept of client/server architecture assumes an underlying framework that ————— eee Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -59- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES consists of a large number of PCs, workstations, file servers, printers, database servers, web servers and other components connected via computer network. A client in this framework is a user machine that provides user interface capabilities and local processing. A server on the other hand, is a machine that can provide services to the clients such as file access, printing, database access, web access, etc. When a client requires an additional functionality such as database access that does not exists at that machine, it connects to the server that provides the needed functionality. In general, some machines install only client software, others only server software and still others include both client and server software. Two main types of basic DBMS architectures are developed on this underlying client/server framework. Two-Tier client/server architecture Client Diskless client —_with disk Server Server and client oS Cc SERVER wee SERVER [CLIENT CLIENT CLIENT Site 1 Site 2 Site'3 Site n (Communication Network Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -60- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES ee TABASES In this type of system, there. are only two types of software components exist: client and server.The user interface and application rogram (business logic) can run on client side. When database access is needed, the program establishes a connection to the DBMS that resides on the server side. A standard called Open Data Base Connectivity (ODBC) provides an API, which allows client side programs to call the DBMS. A related standard for Java language called JDBC is also defined that allows java client programs to access the DBMS. The main advantage of such system is the simplicity and compatibility, Three-Tier client/server architecture The WWW leads to the three-tier architecture that adds an intermediate layer between client and server. GUI The middle tier is called application Web Interface | server or web server. All client requests are diverted to the Client application server and then application server on behalf of Application Server | Application | Ts then the system is said to demonstrate sub-linear scale up. The size of the task can be measured in two different ways: © the size of database increases © the rate at which transactions are submitted, increases. wm Architectures There are many possible ways to organize multiple resources of a parallel database system. ba Shared memory © All the processors have access to a common global shared memory via bus or intérconnection network. © Communication between processors is very efficient. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -63- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES © Number of processors can not be increased beyond a certain point, because the bus or interconnection network becomes bottleneck. P M P LP y P +3 : (a) shared memory Shared disk © All processors can access all the disks directly via interconnection network but each processor has its local memory. © A degree of fault tolerance is achieved since database is resident on disks that are accessible by all processors. © The problem is again, system is not scalable beyond a certain point BH P Ley {? } P SHE} (b) shared disk Shared nothing © Each processor has its own local main memory and local disk. © The processors can communicate with each other via high speed interconnection network. © Anode functions as a server for the data that is stored on its local disk. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -64- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES © Only queries accesses non-local disk, pass through the network. © System is more scalable, only disadvantage is cost of communication, 4.2.2 Parallel Query evaluation It can be achieved in two ways: 1. interquery parallelism 2. intraquery Parallelism, Response time can be improved for a large transaction by intraquery parallelism. Through put can be improved by interquery parallelism. Interquery parallelism is hard to achieve, since it is difficult to identify in advance which queries can run concurrently. Intraquery parallelism can be achieved by 1. executing different operations present in the query evaluation plan parallely. 2. parallelizing the execution of each individual operation such as sort, select, project, join etc. In the case of interoperation parallelism, if one operation consumes output of another operation, it is called pipelined parallelism, otherwise two operations can proceed independently and it is called independent parallelism. An Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -65- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES operation is said to be blocked, if it is waiting for some input. The pipelined parallelism is limited by the presence of such blocked operations. ‘The key to achieve intra operation parallelism is to partition the input data, 4.2.3 Data partitioning To reduce the time required to retrieve relations from disk, they are partitioned and stored on multiple disks. The most common partitioning is horizontal partitioning, where tuples in a relation are distributed over many disks. 1) Round-robin: Here tuples are assigned to disks in round-robin fashion. If there are N disks then tuple I is assigned to disk (I mod N). The distribution is even. 2) Hash partitioning: A hash function is applied to selected field of a tuple to determine its disk. 3) Range partitioning: Tuples are sorted logically and n ranges are chosen for the sort key values so that each range contains all most same number of tuples and tuples in range I are assigned to disk I. The round-robin partitioning is suitable for queries that access the entire relation. The hash partitioning is suitable for queries based on partitioning attribute (point queries). ‘The range partitioning is suitable for range queries on the partitioning attribute. The range partitioning can lead to data skew that is partitions with widely varying number of tuples. It causes the disk working with larger partition to become bottleneck. 4.2.4 Parallelising individual operations Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP - 66 - Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES ee ene DATABASES Sorting © We want to sort a relation that resides on N disks d1,d2,...,dn. © If the relation has been range partitioned on the attributes on which it is to be sorted then we can sort each partition separately in parallel and can Concatenate the results to get the full sorted relation. © If the relation is partitioned in any other way, then sorting can be carried out in two steps: 1, Redistribute the tuples using range partitioning on the sort attributes. 2. Bach processor sorts its partition locally and result is combined, Joins © The basic idea for joining two relations A and B, that are initially partitioned on several disks using some non-join attribute, in parallel is to decompose the join in to a collection of k smaller joins. © We can decompose the join by partitioning both A and B into k partitions. © By using the same partitioning function for both A and B, we ensure that the union of k smaller joins computes the join of A and B. 4.3 Exercise 4.3.1 Consider the following relations: BOOKS (books, primary_author, topic, total_stock, $price) BOOKSTORE (storeé, city, state, zip, inventory_value) STOCK (store#, book#, qty) 3. Give an example of two simple predicates that would be meaningful for the BOOKSTORE relation for horizontal partitioning. 4, How would a derived horizontal Partitioning of STOCK be defined based on the partitioning of BOOKSTORE? Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -67- Downloaded from FaaDoEngineers.comDISTRIBUTED & PARALLEL DATABASES 5. Show predicates by which BOOKS may be horizontally partitioned by topic. 4.3.2 Consider the global schema: PATIENT(number, name, SSN, amount_due, dept, doctor, med_treatment) DEPARTMENT (dept, location, director) STAFF (staffnum, director, task) 1. Show 2 examples of horizontal fragmentation. 2. Show 2 examples of derived fragmentation. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -68- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML :DATABASES ON THE WEB & XML Introduction XML stands for Extensible Markup Language. The main drawback of HTML is that, it can specify only the format of the data being sent, it can not provide the meaning of the data. So, at the receiving end, if we require that incoming data must be processed by some application then we have to send meaning of the data along with the data. It can be done using XML. XML can specify the ‘meaning of the data along with the data. 5.1 Main features of XML % Itdoesn’t have predefined set of tags. ~ Acdocument prepared in XML is said to be semi-structured document. - Itcan provide meaning of the data, Tf-we have database of a banking application, with three tables, Account (acno, balance), customer (custname, address) and depositor(acno, custname). An XML document can be prepared from this database as follows, al01 S000 al02 3000 ‘cl
....
c2
....
ed by Prof. Vipul Dalal (9820833071) Subjects: CP-1, CP-II, DBMS, CG, ADBMS, DWM, IP -69- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML al01 cl al02 c2 The same document can be prepared using different structure. al01 5000 cl
....
al02 3000 c2
....
From this example it can be noted that the same data along with its meaning can be represented using different XML document structure. Attributes Just like HTML, we can specify attributes in the tag definitions. For examaple, Account type could be specified as an attribute of account tag. al01 5000 Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -70- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML ace oe Or we could have included actype as a subelement of the account tag. al01 ‘5000 savings . That is, main aim is to provide meaning of the data. Whether it is specified in terms of subelement or attribute is not important. Name space To identify XML document uniquely, xml name space is used. al01 5000 5.2 XML schema Schema is a set of rules or constraints to be followed by data stored in the repository. For example, data in RDBMS satisfies relational schema. The freedom of specifying data using any internal structure is actually undesirable when we want to process the data automatically using some receiving application. So, we require that the sender follows some set of rules, called XML schema, while preparing XML document. 5.2.1 Document Type Definition (DTD) XML schema using DTD can be specified as follows. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -T1- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML ]> This schema is stored in a file, say bank.dtd. The document is stored in a file, say bank.xml and requites the following changes. An attribute in the document can of type ID, IDREF, IDREFS or CDATA. If value of the attribute is an ordinary string value, it is of type CDATA. If it is of type ID, no two elements in the same document can have the same value over that attribute. If it is of type IDREF, it can store value of some ID attribute of some other clement. And it is of type IDREFS, it can store a set of ID attribute values of some other elements. An attribute is also supplied with specifications #REQUIRED, #IMPLIED or default value. If it is supplied with REQUIRED, an element must include that attribute. If it is supplied with #IMPLIED, an element may ignore that attribute. Or it can be supplied with some default value. The following example will make these points clear. }> A document satisfying the given schema can be generated as given below. 5000 c
...
c2
...
Limitations of DTD There are two basic limitations of DTD: 1, Individual elements and attributes can not be further typed. 2. There is no way to specify an IDREF or IDREFS attribute can refer to which ID attributes. - 5.2.2 XML Schema Document (XSD) To overcome the limitations of DTD, a more powerful schema language is developed called as XSD. Let’s look at a sample XSD for bank document. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -73- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML * “acno” type = “xsd:string” /> “palance” type = “xsd:decimal"/> /xsd:sequence> Main features of XSD: 1. It allows individual elements to be specified with exact datatypes. 2. It allows user defined data types to be created. 3. It allows minimum and maximum number of occurrences to be specified for individual subelements 4. It supports a form of inheritance. 5. Ithas syntax similar to the XML document. 5.3 XML Querying XPath: it is a language of path expressions which is a sequence of location steps separated by forward slash (/). by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -74- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML €.g. a query ‘bank / customer / custname Teturns all the customer name sub-elements Present under each customer element that is present under the root element bank. cl c2 Tf we want only value of the element without the tags then we can use text() function. e.g. a query /oank / customer / custname / text() returns only customer names without tags. We can access the attributes also. eg. aquery /oank / account / @acno returns all the acno attribute values under each account element. We may specify conditions also, e.g. a query feank / account / [balance > 4000] returns all account elements which has balance subelement with value greater than 4000, Similarly, the query foank / account / [balance > 4000] / @acno Teturns account numbers of all the accounts where balance is greater than 4000. A function id(“value”) returns all the elements with ID attribute whose value is supplied to the function, e.g. aquery Joank / account / id(@owners) Teturns all the customer elements being referred from the account elements. XQuery: It has a structure similar to SQL, for... let... where ... return .., Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -15- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML e.g. to retrieve all the account numbers where balance is greater than 4000 for $x in bank / account let $y = $x / @acno where $x / [balance > 4000] return $y We may perform join operation in XQuery. e.g. to merge the information of accounts with customers in a single element. for $a in bank / account $c in bank / customer $d in bank / depositor where $a / acno = $d/acno and $c / custname = $d/ custname return $a $c 5.4 XML Storage When the XML document is processed, we need to store it at the receiving end. There are different alternatives available to store XML documents. Relational database 1. store as strings: we can create a single table with a single column and all the values in the documents are stored as strings. One improvement over this method would be creating different tables for different elements and attributes to store values. The major disadvantage of this method is that once stored, the document can not be queried easily. 2. tree representation: we need to create two tables, node table and child table as given below. Dp TYPE NAME VALUE T Blement | Account = 2 Attribute Acno A101 3 Blement | Balance 5000 Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -76- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML Emre et CHILD-ID PARENT-ID 2 1 3 1 ‘These tables reflect the hierarchical structure of the XML document. 3. Map to relations: if the schema is available then we can map the document to relations. Consider a small part of the schema. This schema can be mapped to the following tables. el(al,bl,cl,c3) €2(e4,¢5,a1) The main advantage is that data can be queried easily using SQL. XML databases The documents can be stored in databases specifically designed to handle XML documents. Querying will be more efficient. Flat files The documents can be stored in flat files. 5.5 XSLT (XML Style Sheet Transformations) A style sheet stores formatting options for a document, usually separately from document E.g. an HTML style sheet may specify font colors and sizes for headings, etc, Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -71- Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML The XML Stylesheet Language (XSL) was originally designed for generating HTML from XML. It is a general-purpose transformation language and it can translate XML to XML, and XML to HTML XSLT transformations are expressed using rules called templates. ‘Templates combine selection using XPath with construction of results. Example of XSLT template with “match” and “select” part The “match” attribute of “xsl:template” specifies a pattern in XPath. Elements in the XML document matching the pattern are processed by the actions within the xsl:template element. The element “xsl:value-of” selects (outputs) specified values (here, customer_name). For elements that do not match any template, Attributes and text contents are output as is. The templates are recursively applied on subelements. The template matches all elements that do not match any other template. If an element matches several templates, only one is used based on a complex priority scheme/user-defined priorities. Creating XML Output $ Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is. E,g. to wrap results in new XML elements. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP +78 - Downloaded from FaaDoEngineers.comDATABASES ON THE WEB & XML Example output: ‘ Joe Mary we cannot directly insert “xsl:value-of” tag inside another tag. E.g. we cannot create an attribute for in the Previous example by directly using. “xsi:value-of” tag. XSLT provides a construct “xslattribute” to handle this Situation. “xsl:attribute” adds attribute to the preceding element. For eg. > This results in output of the form 5 ,._. “xsl:element” is used to create output elements. with computed names, Template action can apply templates recursively to the contents of a, matched element. For eg. ‘ ‘ ‘ ‘ ‘Scustomer> John Mary ‘ Downloaded from FaaDo0Engineers.comDATABASES ON THE WEB & XML Joins in XSLT XSLT keys allow elements to be looked up (indexed) by values of subelements or attributes. Keys must be declared (with a name) and, the key() function can then be used for lookup. E.g. Sorting in XSLT Using an xsl:sort directive inside a template causes all elements matching the template to be sorted. Sorting is done before applying other templates. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -80- Downloaded from FaaDoEngineers.com. DATABASES ON THE WEB & XML aaa mmemmmemeemememmmmmemememe secession t 5.5 Exercise 5.5.1 University of Mumbai wants to put up its database of exam results for the Past 20 years onto the web. Discuss the design such a web database. Describe XML schema for the database. 5.5.2 Give the DTD for an XML representation of the following nested relational schema. emp (ename, children setof (child), skills setof (skill)) child (name, birthday) birthday (day, month, year) skill (type, exams setof (exam)) exam (year, city) Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -81- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING 6::AN INTRODUCTION TO DATA WAREHOUSING & MINING Introduction RDBMS and other traditional database systems can perform normal operations ‘on data such as insertion, deletion, updates, retrieval, etc. If complex operations are desired such as prediction, classification, OLAP, clustering, outlier analysis, association, etc then we need systems like data warehouse and data mining. Let us see first data warehouse system and then data mining system. 6.1 Data warehousing 6.1.1 Need for Data warehouse Normally an organization needs three types of systems to implement all its business operations smoothly. Operational system(s), Feed back system and Conkaol Componen Control and management system. Management Business as @ System The most important system is the Feed back system whose responsibility is to provide unified view of all the operations going on in the organization to the top management that in tum controls business activities and plans new strategy. The operational system works with day-to-day business data also called operational data and can be implemented using traditional DBMSs. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-I, DBMS, CG, ADBMS, DWM, IP -82- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING a Due to its specific requirements the feed back system cannot be implemented using traditional DBMSs and we need something called Information System. Let us see the major differences between an operational system and an information system. 6.1.2 Operational System Vs Information System (OLTP Vs OLAP) We can call it DBMS Vs DW also. Operational System 1 It works on operational data and provides operational information. 2 Operation information provided by this system is used to run day-to- day business operations. 3 It can be implemented using traditional DBMS. 4 -Itis also called OLTP system. 5 It performs normal operations on operational data such as insertion, deletion, update and retrieval. 6 The size of the database is comparatively small (in terms of few megz bytes). 7 It contains current data. 8 Ithas large number of users. Prepared by Prof. Vipul Dalal (9820833071) Information System It works on data collected from operational system and provides strategic information. Strategic information provided by this system is used to analyze how well business is running and what can be done to improve it. We need DW to implement information system. Itis also called OLAP system. It performs complex analysis on collected data. It has huge collection of data in terms of giga bytes or tera bytes). It contains current as well as historic data for proper analysis. It has very small number of top executives as users, Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -83- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING eee ene 9. Access frequency of data is high. | Access frequency of data is low. 10 The type of usage is predictive and | The type of usage is ad hoc. repeated. 11 It may update or delete data. It rarely updates and never deletes any data. 12 Response time is in terms of sub- | Response time is in terms of seconds seconds. to minutes. 13. ER model midlidimensional model Because of these distinguished features of information system even if DBMS has a large collection of data cannot be used as DW. 6.1.3 Definition and Architecture A data warehouse may be defined as subject oriented, integrated, non-volatile, time variant collection of data in support of management's decision making. A data warehouse system has three level architecture. Dar metadata ‘opeAaLenal OPeAaLonat coete att ope alonal systems The data from operational system is collected into data warehouse using ETL (Extraction, Transformation and Loading) process that basically makes data ready for analysis purpose. Since the data collected at the second level provides unified view of the whole enterprise, it is also called EDW (Enterprise-wide Data Warehouse). Prepared by Prof. Vipul Dalal (9820833071) ‘ ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP = 84. Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING SSS — — EEE The EDW is just too huge for a user or a group of users. So from EDW, various departmental Data Marts are created. 6.1.4 Multi-dimensional data model, OLAP and DW schema A data model that organizes data in terms of multiple business dimensions is called multi-dimensional data model. For example, a sales manager wants to analyze sales units with, respect to product, time and region. We can organize this data in terms of a cube where product, time and region are the three business dimensions and each cell of the cube provides the same thing, sales units at a particular granularity level for each dimension. For more than three business dimensions we need hyper cubes. contains paofit & unit. sold. Product Fl, in Regio n Al, in Mont Jen, 6.1.5 OLAP This organization simplifies various OLAP operations. Drill down Obtaining detailed level of data from the current level is called drill down operation. For example, obtaining weekly sales units from monthiy. Roll up Obtaining generalized or abstracted level of data from the current level is called roll up operation. For example, obtaining yearly sales units from monthly. Slice-&-Dice Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-I1, DBMS, CG, ADBMS, DWM, IP -85- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING The EDW is just too huge for a user or a group of users. So from EDW, various departmental Data Marts are created. 6.1.4 Multi-dimensional data model, OLAP and DW schema A data model that organizes data in terms of multiple business dimensions is called multi-dimensional data model. For example, a sales manager wants to analyze sales units with respect to product, time and region. We can organize this data in terms of a cube where product, time and region are the three business dimensions and each cell of the cube provides the same thing, sales units at a particular granularity level for each dimension. For more than three business dimensions we need hyper cubes. 6 NOM nt “ey 09 \ : ke profit € on contains pao hep! , unit. sold ns Product FI, ‘4 in Region al, pe in month Tan. BN ® 6.1.5 OLAP en, This organization simplifies various OLAP operations. Drill down Obtaining detailed level of data from the current level is called drill down operation. For example, obtaining weekly sales units from monthiy. Roll up Obtaining generalized or abstracted level of data from the current level is called roll up operation. For example, obtaining yearly sales units from monthly. Slice-&-Dice Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -85- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MININ' Snowflake Schema The dimension tables in a star schema are not normalized. If we normalize dimension tables they are divided into multiple tables and the resultant schema is called snowflake schema. That is, snowflake schema is the star schema with normalized dimension tables. 6.2 Data Mining System Data mining refers to extraction of hidden pattems also called knowledge from large collection data, Then what do mean by hidden pattern or knowledge??? The information that is not explicitly present and cannot be extracted using ordinary tools such as SQL is called hidden pattern or knowledge. " Basically data mining is a part of a complex process called KDD (Knowledge Discovery in Databases). The KDD process consists of the following steps: - Data selection: select a particular part of database for mining. - Data cleaning: remove noisy data. - Data transformation: prepare data for mining. - Data mining: apply intelligent algorithm in order to extract hidden patterns. - Pattern evaluation: select only useful patterns. - Visualization: provide selected patterns to the user in specified format. 6.2.1 Data Mining Operations They are also called data mining functions, data mining tasks, data mining algorithms, etc. Here we see the most important ones. Association Rule Mining Itis a process to find association between attribute values that frequently occur together in a given set of data. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -87- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING Itis of the form X>Y Where X and Y are the collection of attribute value conditions. The meaning of this rule is that database tuples that satisfy the attribute value condition X are likely to satisfy attribute value condition Y. For example a rule generate from a customer table, income (20K...30K) A age (30...35) > buys (CD player) [support=30%, confidence=70%} says that a customer tuple that contains value of income attribute in the range 20K to 30K and value of age attribute in the range of 30 to 35 likely to have value of buy attribute CD player. That is, there is some form of association between the attribute values on the left hand side of the rule and attribute values on the right hand side of the rule, The values in the square brackets following the rule specify the support and confidence of the rule. Support gives percentage of tuples in the database that actually satisfy the rule, That is, Number of tuples that satisfy X & Y Support (KY) Total number of tuples probability (X U Y) W Confidence gives percentage of tuples satisfying X that also satisfy Y. That is, Number of tuples'that satisfy X & Y Confidence (XY) = Number of tuples that satisfy X = probability (Y|X) One of the most popular association rule algorithms is called Apriori algorithm for Boolean association rules. Prepared by: Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -88- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MI Classification It is a process of assigning label or class to an unknown sample or object. It is basically three step process. In the first step called training step, from a given set of samples called training samples (whose class is known) and a set of predefined class a classification model is prepared. Ira Healfe ifiealio sce ipeaitin Po Lose In the second step, accuracy of the classification model is measured using test samples, esk classificalior) ace usacy samples; model In the third step, a class label is predicted for an unknown sample (whose class is not known) using classification model. unknown) _,[classificalfer class samples model Itis also called supervised learning since in the first step training is provided to the algorithm under users supervision. Some important classification algorithms are decision tree based (ID3) classification, statistical approach (Baysian), distance based (K-Nearest neighbor), etc. Clustering It is a process of grouping together objects similar in terms of certain attribute values. Clustering is similar to classification in a sense that here also objects are assigned to various groups. There are two major differences between clustering and classification. First, there are no predefined groups or classes. Second, no training is provided to the clustering algorithms. Prepared by. Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP - 89- Downloaded from FaaDoEngineers.comINTRODUCTION TO DATA WAREHOUSING & MINING SS All clustering algorithms use a distance function that takes two objects at a time and returns a value of dissimilarity between them. Based on this value also called distance, all clustering algorithms create groups or clusters. The main goal of clustering process is to maximize intercluster distance and to minimize intracluster distance. That is, two objects from the same cluster must be very similar and two objects from different clusters must be very dissimilar. Some important clustering algorithms are Nearest neighbor, Minimum spanning tree, K-means clustering, etc. 6.3 Exercise 6.3.1 For exercise 4.3.1 - Discuss how OLAP could be useful for such an organization. - Indicate a star schema that could be used for building a data warehouse. - Indicate five typical queries that the warehouse may have to answer, ipul Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -90- Downloaded from FaaDoEngineers.com{ | | | ADVANCED DATA MODELS MODELS 7:ADVANCED DATA MODELS 7.1 Active Databases ~ They provide functionality for specifying active rules. ~ These rules specify actions that are automatically triggered by certain events such as database update. ~ Many commercial DBMS provide some of the functionality provided by active databases in the form of triggers. Generalised model for active databases: The model used to specify active database rules is referred as the Event- Condition-Action or ECA model. It has three components: A) The event that triggers the rule. e.g. database update or temporal event. 2) The condition that determines whether the rule action should be executed -Itis optional part, 3) The action to be taken -It is usually SQL statements. Create trigger triggername After/before inser/update/delete on relationname [fot each row] [when (condition)] begin SQL statements End; 1) Active database system allows users to activate, deactivate and drop rules. 2) It also allows users to group rules into named “rule set”. Limitations: 1) Thece are no easy to use techniques for designing, writing and verifying rules, Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -91e Downloaded from FaaDoEngineers.comADVANCED DATA MODELS 2) It is difficult.to verify that a set of rules is consistent, e.g., two or more rules in a set do not contradict one another. 3) It is difficult to guarantee termination of a set of rules. e.g. create trigger tl after insert on table] for each row update table2 create trigger t2 after update on table2 for each row insert into tablel values(...) Appli 1) Notification of some important events 2) To enforce integrity constraints 3) Automatic maintenance of derived data tions 4) To maintain consistency of materialized views. 7.2 Temporal Databases - They encompass all database applications that require time information to organize data, e.g. insurance database, scientific database. - Time Representation Time is considered as an ordered sequence of points in some granularity. ‘The rhain idea behind choosing a granularity is that events occurring within the same selected granularity interval will be considered as simultaneous events. Various temporal types in SQL2 Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -92- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS. SaannnnnnnIEnsnmmmmemmmmmmeemmmeemmmmmmmeeee eset stro DATE(specifying year, month and date) ‘TIME(specifying hrs, minutes and seconds) TIMESTAMP(Gpecifying date/time combination) INTERVALA(a relative time duration) PERIOD(time duration with a fixed starting and ending points) Event and Duration information point events are associated with a single time point in some granularity , e.g. bank deposit duration events are associated with a specific time period, e.g. an employee has worked from some date to some date. Valid time and Transaction time Given an event associated with a time point or time period, the association can be interpreted in two ways: 1) The associated time at which event occurred or the time period during which the fact was considered valid in real world.- The associated time is called VALID TIME. If this interpretation is used in the temporal database then it is called VALID TIME DATABASE. 2) The associated time at which the information was actually stored in the database. The associated time is called TRANSACTION TIME. If this interpretation is used in temporal database then it is called TRANSACTION TIME DATABASE.. If both the interpretations are used then it is called BITEMPORAL DATABASE. Incorporating time in Relational Databases 1) Valid time relations ->e. g. Emp relation, Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -93- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS Emp(name, ssn, salary,dno,superssn) can be converted into valid time relation by adding the attributes VST(Valid Start Time) and VET(Valid End Time), whose data type is DATE. So Emp-VT relation is Emp-VT(name,ssn, salary, dno, superssn, VST, VET). © Each tuple v in Emp-VT relation represents a version of employee’s information that is valid only during [v.VST,v.VET], whereas in Emp relation, each tuple represents the current state of each employee. © In Emp-VT, the current version of each employee has a special value, “now” as its valid end time. o Whenever one or more attributes of an employee are updated, instead of overwriting the old values, a new version is created and the current version is closed by changing its VET value to the end time. © If update is applied before it becomes effective in the real world, it is called proactive update. © If update is applied after it becomes effective in the real world, it is called retroactive update. © If update is applied at the same time it becomes effective, it is called simultaneous update. © An insert operation results into creating the first tuple version of the entity and making it current by setting its VET value to “now”. © A delete operation results into closing the current version of that entity, so a tuple is logically deleted. . 2) Transaction time relations © The Emp relation can be converted into transaction time relation by adding two attributed TST(Transaction Start Time) and TET(Transaction End Time), which are of type TIMESTAMP. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-Il, DBMS, CG, ADBMS, DWM, IP - Downloaded from FaaDoEngineers.comADVANCED DATA MODELS ELS © So Emp-TT relation is Emp-TT(name,nsaction End Time), which are of type TIMESTAMP. © © So Emp-TT relation is Emp-TT(name, ssn, salary,dno,superssn,TST, TET). © The current version of each employee has a special value uc(until changed) for its TET attribute, indicating that the tuple represents the correct values until it is changed by some transaction, 3) Bitemporal relations © The Emp relation can be converted into bitemporal relation by inserting VST, VET, TST, TET attributes. © So Emp-BT relation is Emp-BT(name,s: VET, VST, TET). . salary,dno,superssn, VST, 73 Deductive Databases © A database system that provides capabilities for defining deduction rules for inferencing new information from the stored database facts, is called deductive database system. © These rules are specified using a declarative language and an inference engine/deduction mechanism within the system that can deduce new facts from the database by interpreting these rules. © The model used for deductive databases is related to logic programming and Prolog language. © A variation of Prolog called datalog is used to define rules declaratively in conjuction with an existing set of relations. © A deductive database uses two types of specifications: Facts: Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -95- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS eee ae -. They are specified in a manner similar to the way relations are specified, but attributes are not included, A tuple in a relation describes some real world fact whose meaning can be partly determined from attribute names. In deductive database, meaning of an attribute value is determined by its position within the tuple. Rules: Rules are somewhat similar to relational views. They specify virtual relations that are not actually stored but can be formed from the facts by applying inference mechanism. Prolog/Datalog notation: The notation is based on providing predicates with unique names. A predicate has implicit meaning which is given by the predicate name and a fixed no. of arguments. If the arguments are all constants, the predicate simply states that a certain fact is true, If the predicate has variables, it is either considered as a query or a part of rule. - Eg. Facts Rules Supervise (B,D) Superior (x,y):- Supervise (x.y) Supervise (BE) Superior (x,y):- Supervise (x,z) superior(z,y) Supervise (B.F) Subordinate(sx,y):- Superior(y,x) Supervise (C,G) Queries Supervise (C,H) Superior(A,y) ? Supervise (A,B) Superior(A,F) ? Supervise (A.C) Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP +96- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS 2°CEOCED - There are 3 predicates. - The Supervise predicate is defined via a set of facts, each with two arguments: a Supervisor name and direct subordinate. - These facts correspond to the data in the database. ~The other two predicates are defined by rules. ~ The important feature of deductive database is the ability to specify recursive rules. Arule is of the form Head :- body where head contains a single predicate and body contains one or more predicates and :- is “if and only if”. ~ Rules provide us a way of generating new facts based on already existing facts, Implicitly we apply “logical error” between predicates in the body of a tule, - The second rule is recursive rule. > If we have two or more rules with same head, it is equivalent to saying that the predicate is true if either one of the bodies is true, hence it is equivalent to logical or operation. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -97- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS - Means x :-y & x ;-z is same as x :-y or z. - A prolog system contains many built in predicates that the system can interpret directly. - A query involves a predicate symbol with some variable arguments and its answer is to deduce all the different constant combinations, that when bound to the variables, can make that predicate true. ‘The first query requests the names of all subordinates of “A” at any level. The second query’ which has only constants returns either a true or a false result. 7.4 Spatial Databases © Limited set of datatypes and operation makes the modeling of real world spatial applications extremely difficult. © The spatial database management system aims at making spatial data management easier and more natural to users or applications. Spatial databases provide concepts for databases that kept track of objects ° in multidimensional space. © A common example of spatial data can be seen in a road map. A road "map is a two dimensional object that contains points, lines and polygons that can represent cities, roads and political bouindaries. Types of Spatial data and Queries From the DBMS point of view, spatial data can be classified into two broad categories. - Point Data: - A point has a spatial extent characterized completely by its location. Point data consists of collection of points in a multidimensional space. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-Il, DBMS, CG, ADBMS, DWM, IP -98- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS LA MODELS Point data stored in a databse can be based on direct measurement or generated by transforming data obtained through measurement. Region Data: ~ Region data has a spatial extent characterized by location and boundaries. The location can be thought as the position of a fixed point for the region In 2D space, boundary is a line and in 3D space, it is a surface. - Region data consist of collection of regions. - Region data stored in a database is a simple geometric approximation to an actual data object. Queries are of three types: 1) Spatial range Queries ~ Queries like “ Find all cities within 500 kms of Mumbai “ or “ Find all lakes in Mumbai “. ~ A spatial range query has an associated region. ~ In the presence of region data, spatial range queries return all Tegions that Overlap the specified region or contained within the specified region. 2) Nearest Neighbor Queries ~ Queries like “Find the 10 cities nearest to Mumbai “. - We want the answer to be ordered by the distance to mumbai. ~ Such queries are mainly important in the context of multimedia databases. 3) Spatial Join Queries ~ Queries like “ Find pairs of cities within 200 kms of each other “, ~ If we consider a relation in which each taple is a point representing a city, the query can be answered by a join of this relation with itself. ~ If cities are represented with more details and thay have spatial extent, query becomes more complex. Applications involving spatial data Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP : Downloaded from FaaDoEngineers.comADVANCED DATA MODELS 1) Geographic Information Systems(GIS): 4.3.3 GIS deal extensively with spatial data, including points,lines and 2-3 dimensional regions. 4.3.4 All the types of spatial queries arise naturally,. 4.3.5 Both point and region data must be handled properly. 4.3.6 ArcInfo is widely used GIS. 2) Computer-aided design and manufacturing(CAD/CAM): 4.3.7 Store spatial objects such as surfaces of design objects. 4.3.8 Both point and region data must be stored. 4.3.9 Range and spatial join queries are common. 4,3.10Spatial integrity constraints are used. 3) Multimedia databases: 4.3,11 They contain object such as images, various types of time series data (audio) that require spatial data management. 4,3.12Finding object similar to a given one is a common query. 4.3.13Multimedia databases store 2-3 dimensional images, photographs, fingerprints, video clips, audio clips, signals/time series, text documents. 4.3.14For processing purpose multimedia data is mapped to a collection of points, where distance between two points is very important. 7.5 Geographic Information System © GIS contains spatial information about cities, states, countries, streets, highways, rivers, lakes and other geographical regions and support applications to combine such spatial information with nonspatial data. © Sometimes temporal dimension is also added. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP -100- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS ee MODELS caromtose | seenion | raseeroe so | sandree taste | flood control |__ reographie market snc pnon ime wi | watrseonee rs [tt | tc sesinnon sad =o Data management requirements of GIS: 1) Data Modelling And Representatio GIS data can be represented in two formats: 1) Vector: to represent objects like points, lines and polygons. 2) Raster: represented as an array of points, where each point represents the value of an attribute for a real world location. 2) Data Analysis: GIS data undergoes various analysis and operation, such as, © Measurement of slope, gradients, aspects, profile convexity, plane convexity. © Aggregation and drill down operation when used with data warehouse. © Area, volume, distance managements. © Overlaps, intersections, shortest paths,. © Temporal operation. 3) Data integration: © GIS must intefrate both vector and raster data from various sources. © Sometimes edges and regions are inferred from a raster image to frorm a vector model. 4) Data Capture: Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-1, CP-II, DBMS, CG, ADBMS, DWM, IP -101- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS © The first step in generating spatial database for various applications is to capture the 2D and 3D geographical information in digital system. © Spatial data can also be captured from remote sensors. 7.6 Mobile Databases eed ese feed host fixed host wired Neewedk ase sation raat oa I i —d wicaleas cade call It is a distributed architecture where a no. of computers called fixed hosts ‘and base stations are interconnected via high speed wired network. - Fixed hosts are general purpose computers. Base stations work like gateways for mobile units. Mobile units are clients for base stations, - mobile units can move freely within the cell and between the cells. - To compensate for high latencies and unreliable connectivity, clients cache replica of important and frequently accessed data. - Proxies are used in the case client or server can not reach the other. Prepared by Prof. Vipul Dalal (9820833071) Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP = 102- Downloaded from FaaDoEngineers.comADVANCED DATA MODELS eS ~ Proxy will cache the updates and when the destination ecomes reachable, it automatically transfers the updates. ~ Client mobility introduces many data management challenges: 1) Server must track the client locations in order to route messages. 2) When client moves from one cell to another, the server must be able to gracefully divert the data to the new base station, without client noticing it. Data management issues -Itis a variation of distributed computing, -Mobile databases can be distributed in two ways: 1) Entire database is distributed among the wired components with partial replication. A base station or fixed host manages its own database. 2) Database is distributed among wired and wireless components. Database "responsibilities are shared among base stations and mobile units. - So, distributed data management issues can be applied to mobile database with following variations: 1) Data Distribution: Data is unevenly distributed among base stations and mobile units. 2) Transaction Model: Issues of fault-tolerance and correctness of transactions are aggrevated in mobile environment. 3) Query processing: 4) Recovery: 5) Location based services: As clients move, location dependent cache information must be refreshed, 6) Security: Mobile data is less secured. Prepared by Prof. Vipul Dalal (9820833071) ‘Subjects: CP-I, CP-II, DBMS, CG, ADBMS, DWM, IP = 103 - Downloaded from FaaDoEngineers.comDownloaded from FaaDoEngineers.comCon, 5203-07, (REVISED COURSE) D-5550 (3 Hours) [Total Marks : 100 N.B.: (1) Question No. 1.is compulsory. (2) Attempt any four out of remaining. (3) Figures to the right indicates full marks. (4) Make reasonable assumptions. Q.1 In a company database, we need to store information about employees, departments, and children of employees. For each employee, identified by an emp.no , we must record the number of years worked, phone number and a photograph for identification. There are two classes of employees, regular and contract, The salaries for both are calculated differently. For the regular employees, we must record the name of the children and their ages. For each photo depending on its type, a display tnethod is defined. For each department we must record dept.no., deptname, budget, and employees who work in that department. (a) Draw an extended ER diagram for the above system © (b) Design an OR or 00 system © (©) Fortwo typical queries write OQL or SQLS representations @) \ Q42. Consider a database keeping track of sales of different products in a company. Information of different zones where the sales are made is also stored. 8) Discuss the design of a data warehouse using a star schema for this application. (10) b) Explain the OLAP operations of rollup, drill down and pivot for the above application (10) 3 (a) Using an example system describe DTD for the XML schema of the system. (10) (©) Explain querying and transformation of XML data. (10) (1) Display books, magazine and journals sorted by year (2) Find all authors who have written a-book and also a journal article in the same year. Q4 (a) Consider any example system of your choice explain the design of a distributed database. Show 2 examples each for horizontal, vertical and derived fragmentation (19) (©) Using the same example explain the different type of parallel database architectures (10) QS Explain the following concepts with examples: (a) Object Identity (OID) (b) Type constructor (c) Persistent data types (d) Accessor functions (GET and SET) . (6 *4=20) Q6 (a) Define Data Mining? Describe classification, clustering and give-one example system where each may be useful. “ (10) (b) Define a temporal database, using a flight reservation system as an example. (10) Q7 (a) Explain the concepts of spatial data and Geographical Information Systems (GIS) using an appropriate application (10) (&) Describe clearly issues involved in mobile databases and solutions, for them. (10) Downloaded from FaaDoEngineers.comCon, 5318-07, SITS NET RE syle (REVISED COURSE) cD-7287 @ Hours) [Total Marks : 100 N.B.: (1) Question No.1 is compulsory. @) Attempt any four questions out of remaining six questions. 1 4 5. 2 Blood bank is a critical’ entity in providing required type of blood to the paticnts at critical time. Their 20 database keeps tack ofthe inventory of the blood, together wth relevant information like blood group, date reocived, location, date of expiry, donorete. “The database keops information s.uch as name, addres~, and telephone" number of other blood banks in ite area, The reason for doing so is to get blood of a particular type from other oanks in case of emergency, Information about donors is recorded as well. Donors are classified into occasional and r~gular donors. Fqr theregular donors. the database keep information such as identification number, blood type and ahistoJ of their donations. Alistof health care providers in the area along with information suchas address, telephine number ete, is kept. The healthcare providers aro the customers of the blood bank. ‘They kecp track of the blood transactions perforlled, These transactions ate classified into : normal transactions and unexpected ‘ransactions(for example, the motor accidents durin the holiday season) The reason for keeping track ofthe unexpected wrinsactions is to use this information in estimating the extra amount of blood to keep in the inventory fof cach age group during the coming holiday season. (a) © @ ® @ ) @). () Explain the following with exam! @ Object identity (a) Drawan extended E-R diagram for the system. (b) Drawand Object-Oriented schema (©) Taketwo iypial queries and wate them in OL, Explain the concept of nested relation in ORDBMS with example ‘Compare and contrast RDBMS,OODBMS and ORDBMS. Describ different architecture for Parallel Databases. Explain varioustypes of wansparencies in distributed databases andalsG list advantages and disadvantages of distributed databases. Consider the global schema :— PATIENT (number, name,ssn,amount_due,dopt,doctor,med_treatment) DEPARTMENT (dept, location, director) STAFF(staffium, director, ask) 1. Show2 examples of horizontal fagm~ntation Tl Show2examples of vertical fregmentation ‘Write XMI_ Schema for Mumbai University Exam results, Assume Few branches‘and few colleges of incering. Explain in brief Deductive database system: ‘Write detailed noteon Geographical Information System GD Transient and Persistent object Gi) Specialization and Generalization (¥) Subclass and Superclass ‘Write notes on (any four) — @ XPATH @ XQUERY (G@)_Persistco" Programming Languages v) SQL3 standard: (¥) Mobile Databases Downloaded from FaaDoEngineers.com 10 10 10 10 10, 10 10 20 20532 | Fundamentals of Database Systems lass Porson (extent persons key ono) { strbute struct Pname (string name, sting intl, string name ) name; attribute atribule date attribute enum Gender, F) attribute struct Address, {short no sting street, shor apino, string cy, string sae, short zp) address short eget) class Facuty extends Person { ‘extent faculty ) aniibute sank; oie ou ‘stn auribuie ting strbute String Flationship Department Works_n inverse Department-has. fauly: ‘lationship. n> on_commiioe_of ‘Inverse GradStudent:commiteo; Wold give_raiso(in float raise); y, "04 Promoted string new. ranky class Grade (extent grades ) ‘attribute enum Grade Values{A, 8, C.D, F,1,P} ‘grade; relationship. Section section Inverse Secion:students; y, Mitfonship Student student Inverse Student:completed_sectons: clans Student extends Person (Pet en} & ibute Dept ~ ebote Depron Matos Depart maj ror ct mlaona i Inve Cursactonoleed students YO tangs mot ering dnone eases ext pal Yold arta shor sor) raues(eton rt va: volt ln get aor sone eas grade) ‘alses(section_not vali, grade_not_valiy, FIGURE 13.10 Possible ODL schema forthe univenstry database of Figure 13,9(b). (Continued)Chapter 13 Enhanced Data Models for Advanced Apalications | 533 lass Degree ( attribute string cologe: - ‘attribute string degree attribute. string year, k loss GradStudent extends Student (extent gred_students ) t attribute seteDagree> degrees: . relationship Facully advisor inverse Facully:advises: ‘lationship set committee inverse Facuty:on_committee_of; old ‘asign_acvsor(in string iname; in string frame) falses((aculy_nol_vald) void assign commitae_member(in string name; in string fname) ralses{taculy nol valid): 388 Department stent departments key dame) attribute string attrbule string attribute string attribute string ‘attribute Faculty chai, relationship set has_faculy inverse Facuty:works_o; felationship set has_majors Inverse Student:majors in; relationship set ofers inverse Course-offered_by; class Course (extent course key ono) - { attribute string name; attribute string one, attribute string scription: ‘lationship set has_sections Inverse Section:ol course: ‘lationship Department ofered_by inverse Depariment:ofers; ® class Section (extent section) ( attribute short ssecno: attribute string your, attribute enum Quaror(Fal, Winter, Spring, Summer) a; Telationship set students inverse Grade:section; telalonship Course of_course Inverse Courseches_ sections; ‘ class CurrSecton extends Section ("extant curent sections) relationship setStudent>registered_studonis inverse Student:registered_in void ‘rogister_studontin string ssn) ‘alses(student_not valid section ful ’ FIGURE 13.10 Possible ODL schema for the university database of Figure 13.9(b).

ADBMS-Vipul Dalal - (SEM - VII) Tanishka Aca PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ADBMS-Vipul Dalal - (SEM - VII) Tanishka Aca PDF

Uploaded by

Copyright:

Available Formats

You might also like