Professional Documents
Culture Documents
Object-Oriented Database Development Using Db4o
Object-Oriented Database Development Using Db4o
LUCRARE DE DIZERTAIE
2009
BABE-BOLYAI UNIVERSITY CLUJ-NAPOCA FACULTY OF MATHEMATICS AND COMPUTER SCIENCE DEPARTMENT OF COMPUTER SCIENCE
2009
To my family
ABSTRACT
Since the inception of the use of computers to manage (corporate) complex data, there has been an emergence of database theories and implemented solutions suitable for different customer needs. This variety in points of view regarding database principles almost ceased to exist with the advent of the SQL relational databases in the 1990s, that eventually became thoroughly supported by the vast majority of both major and minor database vendors.
Object-oriented databases seek to challenge the domination of SQL databases, in areas where relational theory does not map appropriately to the application model or when object-orientation at the level of the database provides clear benefits in comparison with a relational solution.
Such an ambitious project is db4o 7.4, developed by Versant and having API bindings to both the .NET platform and the Java language.
This work is organized into three chapters, as follows: Chapter 1 Introduction to Database Management Systems explains the basics of databases (the terminology and key concepts) and attempts a brief look into the SQL technology, in order to expose some of its limitations, when persisting an object-oriented model. This allows for a smooth transition to the main topic of the paper, as well as provide the reader with an understanding of the reasons that made relational DBMS insufficient for handling modern OOP languages object graph persistence. Chapter 2 Object-Oriented Databases in db4o exposes the principles and foundations of db4o version 7.4, trying to achieve a delicate balance between OODBMS concepts and their actual implementation under db4o. Each concepts explanation is augmented with a proof-of-concept example, in order to prove its usefulness and power.
Chapter 3 Implementing the BookStore.NET Application covers the steps needed for developing a powerful digital book management system that relies on db4o for the management of object persistence. It also provides an elegant solution to the sinuous task of storing large binary data into a database.
ii
TABLE OF CONTENTS
Chapter 1 - Introduction to Database Management Systems....................................................3 1.1 Preliminaries..................................................................................................................3 1.2 Database expectations...................................................................................................3 1.3 A simplified database environment......5 1.4 Types of databases................................5 1.5 SQL taxonomy......................................7 1.6 SQL modeling of object-based programming data example.........................................8 1.7 SQL modeling of object-oriented programming data example...................................11 1.8 Impedance mismatch...................................................................................................15
Chapter 2 - Object-Oriented Databases in db4o.......................................................................17 2.1 Object-oriented databases in general..........................................................................17 2.2 Some data on db4o.....................................................................................................18 2.3 Storing objects in db4o..............................................................................................18 2.4 Querying objects in db4o...........................................................................................20 2.5 Client-server mode in db4o.......................................................................................27 2.6 Advanced options in db4o.........................................................................................31
Chapter 3 - Implementing the BookStore.NET Application..................................................33 3.1General information....................................................................................................33 3.2 Use cases....................................................................................................................35 3.3 Sequence diagrams.....................................................................................................38 3.4 Component diagram...................................................................................................41 3.5 Database model..........................................................................................................42 3.6 Class diagrams............................................................................................................44
Conclusions..................................................................................................................................59 References....................................................................................................................................61 1
1.1 Preliminaries
A database consists of some collection of persistent data that is used by the application systems of some given enterprise, and that is managed by a database management system. [Dat97] Persistent data refers to information that outlives the application that managed it, thus becoming persistent. Although not a necessary condition, data could also be shared and persisted among more than one application. A database server is a collection of programs that enables users to create and maintain a database. [Elm03] In order to issue instructions to the database server, one needs a database language. There can be a single language for both querying and updating data, or specific languages for each of the previously-stated tasks.
transactional semantics support multiple statements (updates and queries) scheduled to be executed against the database to be bundled together as a single logical unit of work [Wil05] and, as such, treated in a coherent and reliable way, independent of other transactions. The properties that the transactional semantics must fulfill are the ones commonly accronimed as ACID: atomicity guarantees that either all of the tasks in a transaction are performed or none of them are, meaning a transaction is an all-or-nothing matter and, should an operation fail, the whole transaction is considered failed consistency ensures that the database exists and remains in a consistent state both before and after the transaction is executed, and should the transaction fail, the database is rolled back to the consistent state before the execution isolation relates to the constraint that other operations cannot access or see the data in an intermediate state during a transaction, which is important for consistency and performance reasons durability makes certain that once the user has been signaled the success of the transaction, that transaction will persist and not be undone even in the case of system failure backup and replication allow to make periodical copies of the data, to avoid loss of information due to potential hardware or software failure and to maintain chronological successive versions of the data repositories enforcement of rules provide opportunities to specify consistency regulations on the data stored inside the database and to disallow adding or modifying data in a way that would violate the aforementioned constraints security expose means to authenticate users, as well as permit only actions that are conformant to the users group, role and individual privileges computation abstraction exempt the user of the application from knowing the database management systems internal structure and offer support for computations (such as 4
summing, averaging, mean, standard deviation, etc.) in an abstract, implementationindependent way logging making user actions and data modification trackable and checkable at a later time optimization enabling users to specify performance tuning parameters and, in some cases, perform the optimizations implicitly
Users
Applications
DBMS software
abstract data types. Still, the internal structure of such databases in a relational one, objects just taking the place of a primitive data type inside of a record. Nowadays, most relational databases have some degree of object-relational capability. object-oriented databases expose means through which objects can be queried and stored using the same model that it employed by the applications programming language. Another way of putting it would be saying that an ODBMS extends the programming language with transparently persistent data, concurrency control, data recovery, associative queries, and some other capabilities. In the contemporary world, objectoriented databases have bindings to most modern programming languages and platforms, including C++, Java, .NET, Perl, Python, Objective-C, Visual Basic.
relation universe (relation universe U over a header H) a non-empty set of relations with header H relation schema (relation schema (H,C))
consists of a header H and a predicate C(R), that is defined for all relations R with header H satisfies a relation schema (H,C), if it has header H and satisfies C
The model for this concept does not need to be a fully object-oriented one, just an object-based model. Below is the UML class diagram illustrating the classes required to model the book concept.
Now let us see how this object-based model can be mapped to a relational one, based upon the SQL standard. First, let us express the content of the tables required for the SQL model.
10
We can observe some key differences between the two models of the same concept (books):
the approximate map (function) between object-based model entities and relational ones is the following: class object SQL table row in the SQL table foreign-key constraints
the relational (SQL) model contains two extra entities (the BooksAuthors and BooksGenres tables), therefore complicating the object-based model and increasing its entities count from 3 to 5 (almost to double).
the relational model seems a rather awkward approximation of the object-based model, but still is an arguably reasonable and manageable compromise in this case
There are four object-oriented entities pertaining to our problem: IDiscEntry interface for storable disc data (either files or folders) File class representing a file on the disc Folder class expressing a folder on the disc, which may contain both folders and files Buffer class storing up to X bytes of data in a contiguous data structure in memory
The next UML class diagram specifies the attributes of the types mentioned, as well as the relations between them.
11
The diagram reveals the usage of the composite design pattern. An IDiscEntry can be either a File or a Folder. A Folder can contain many IDiscEntrys, those, of course, being files or folders. A File contains one or many Buffers, which are used to read / write data from / to disc using contiguous memory blocks. If a file is small, it may require only one Buffer object, but, if its size is great, it could use many such objects.
The main addition in this example when compared to the previous one is the use of full-fledged object orientation, including inheritance and polymorphism, not just of object-based features.
As the relational database world does not have even an approximate correspondent to the inheritance relation in the object-oriented world, modeling this concept into a SQL-compliant database is a painful, full of compromises experience, as it will be shown below.
12
13
14
We can notice some essential differences between the two models of the same concept (folder and file storage):
unlike in the case of the object-based example, there is no supported way in SQL databases to model inheritance and polymorphism from the object-oriented world. Therefore, the SQL model uses hacks (non-standard, non-elegant solutions) to express these relations between classes in OOP.
the relational (SQL) model contains two extra entities (the DiscEntriesMappings and FilesBuffers tables), therefore adding extra complexity to the object-oriented model and augmenting its entities count from 4 to 6.
the relational model is, in this case, almost completely lacking any resemblance with the object-oriented model. Therefore, the transition of data from one format to the other, as well as navigating the emerged SQL model, are mired by complications, hacks and quirks. These hindrances can be collectively understood as the impedance mismatch between the object-oriented world and the relational world.
The situations that could reflect an impedance mismatch are expressed below:
relational databases disregard the notion of information hiding, that is visibility specifiers (private, protected, public, etc.) from OOP
15
the difference in terms of security policies between relational databases (based on privileges, roles, authentication) and the OOP languages (based on information hiding) lack of relational support for OOPs inheritance and polymorphism data types differences OOP programming languages often have somewhat different primitive types than SQL databases, while the ubiquitous String data type from OOP can correspond to many primitive types in SQL. OOP allows users to create new types, while classic relational databases do not. Of course, a lot of modern relational databases are, in fact, object-relational databases, and, as such, this issue is somewhat mitigated. the syntax and semantics of the object-oriented programming language is completely foreign and radically different from the Standard Query Language (SQL) employed by the relational databases.
16
There are still some similarities among OODBMS products, most of them related to the core object-oriented programming principles and foundations:
navigational interfaces, meaning objects are reached using references (pointers), rather than through joins. Some newer OODBMSs abstract the navigational interface behind a declarative one.
non-formal approach, as there is no mathematical theory behind the principles of object orientation either use reflection to determine the object data (in modern languages) or require intrusive additions to the object prototypes (classes), meaning they should contain extra load (methods) to aid the database management system in manipulating them provide most of the features that database users have been accustomed to and that are, more or less, independent of the database technology being used: transactional support connectivity over the network or on the same machine backup and replication abstraction from the intricacies of internal storage mechanisms security features
17
The distribution system of db4o is open-source, having the advantage that flaws in the code are quickly detectable and correctable. The current db4o licensing system is dual, meaning one can apply to the system that is most suitable to its (companys) goals: commercial licensing, for using db4o in a commercial product GNU General Public License (GPL), for non-commercial products, that must themselves belong to the GNU GPL License
The stable release that this thesis is based upon is at the 7.4 version.
class Person { private string name; private Address address; private string phoneNumber;
18
public Person(string name, Address address, string phoneNumber) { this.name = name; this.address = address; this.phoneNumber = phoneNumber; }
public Person() { }
class Address { private string street; private string city; public Address(string street, string city) { this.street = street; this.city = city; }
19
Address address = new Address("Teodor Mihali St.", "Cluj-Napoca"); Person person = new Person(Mihnea Rdulescu, address, 111111); try { // Opens the database file and creates a new instance of the IObjectContainer interface. IObjectContainer db = Db4oFactory.OpenFile("Persons.yap"); // Stores the new person. db.Store(person); // Commits the transaction. db.Commit(); } catch (Exception) { // Rolls the transaction back, in case of failures. db.Rollback(); } finally { // Closes the database. db.Close(); }
20
Characteristics Fast and simple to comprehend and implement. Optimal solution for simple queries that do not use logical operators.
Query graphs
Build a query graph by navigating references in classes and imposing constraints. Fast, but concept can be seen as counterintuitive. It is considered obsolete, since the advent of Native Queries (NQ), except in applications where even very small performance improvements are paramount.
.NET method
Express the query in a .NET-compliant language by writing a method that returns a Boolean. db4o applies your method to all objects stored and the list of matching object instances is returned. The speed of execution depends on the optimization level you have chosen, but is generally only marginally worse than SODA queries.
21
The attributes that are not assigned values, and are implicitly null or zero, depending on the attribute type (primitive or non-primitive), are always matched. The consequence of this is that, if one does not assign any attribute values, then all objects of the specified class will match, so the query will return all objects in the extent of that class. One assigns attributes as required in order to constrain attributes, making the query more specific.
The following example sheds light on the query by example query type of db4o. We use the same person class definition from the object storage examples). The query returns all the persons stored in the database.
Person person = new Person(); try { // Opens the database file and creates a new instance of the IObjectContainer interface. IObjectContainer db = Db4oFactory.OpenFile("Persons.yap"); // Retrieves all the persons in the database. IObjectSet personsSet = db.QueryByExample(person); // Prints the results on the screen. PrintResults(personsSet); } finally { // Closes the database. db.Close(); }
22
One cannot use the values that are defined to mean empty as constraints. Fields specified as null, 0, or "" (empty string) are always treated as unconstrained. This leads to issues in situations:
when desiring to find, say, all the Customer class instances, which do not qualify for a discount percentage, that is their discount attribute will always be zero. This cannot be achieved using QBE, which will return all the customers, irrespective of their discount received.
when needing to specify range conditions in a query, such as retrieving all the Customer objects, with age attribute between 35 and 50
when needing to specify a template object, whose constrained attributes cannot be set using a combination of existing constructor calls and setters or other methods. To still be able to use QBE in this situation, one must perform an intrusive change in the class model, that is add the appropriate constructor / methods / setters to be able to define such a template.
None of these issues do arise with the other db4o query methods, at the expense of increased complexity.
Although Native Queries are internally based on SODA Queries, the performance penalty for using NQ vs. SODA in negligible in most applications, except for the most demanding.
Native Queries allow type-safe, compile-time checked and refactorable querying, following object-oriented principles. Native Queries expressions are written as if one or more lines of code would be run against all instances of a class. A Native Query expression should return true to mark specific instances as part of the result set. db4o will attempt to optimize native query expressions and execute them against indexes and without instantiating actual objects, where this is possible. [db4o]
The greatest advantage of Native Queries over other querying languages is that they are written as predicates in the same language (in this case .NET CLI) as the object templates (classes). Therefore, there is absolutely no impedance mismatch when using Native Queries in db4o.
class Customer { private string name; private int age; private int discount;
public Customer(string name, int age, int discount) { this.name = name; this.age = age; this.discount = discount; }
24
try { // Opens the database file and creates a new instance of the IObjectContainer interface. IObjectContainer db = Db4oFactory.OpenFile("Customers.yap"); // Retrieves all the customers in the database, whose age attribute is between 35 and 50 IList<Customer> customersList = db.Query<Customer>(delegate(Customer customer) { return customer.CurrentAge >= 35 && customer.CurrentAge <= 50; }); // Prints the results on the screen. PrintResults(customersList); } finally { // Closes the database. db.Close(); }
25
However, SODA has its share of disadvantages. A query is expressed as a set of method calls that explicitly define the graph. Its not too hard to get used to, but its not similar in any way to traditional querying techniques. This is why Native Queries are preferred, since they utilize the standard programming language constructs. A further important disadvantage is the fact that attribute names are strings, therefore SODA queries are not type-safe. [Pat06]
A SODA Query example is displayed below. It retrieves all Customer objects, with the age attribute between 35 and 50.
try { // Opens the database file and creates a new instance of the IObjectContainer interface. IObjectContainer db = Db4oFactory.OpenFile("Customers.yap"); // Creates a new IQuery object IQuery initialQuery = db.Query(); // Constrains the returned object to instances of the Customer class query.Constrain(typeof(Customer)); // Reaches the age attribute of a Person IQuery ageQuery = query.Descend("age"); // Constrains the age attribute to be between 35 and 50 ageQuery. Constrain(34).Greater(); ageQuery. Constrain(51).Smaller(); // Executes the query IObjectSet results = query.Execute(); // Prints the results on the screen PrintResults(results); } finally { // Closes the database. db.Close();
26
db4o server
. . .
db4o client no. n
27
db4o supports three flavors of client/server interactions, as revealed below: The first mode is the networking mode, which is the traditional way of operating in most database solutions. Here, remote clients open a TCP/IP connection to transfer, query, insert, modify, and delete instructions to, and data from, the db4o server. This mode works in db4o in exactly the same manner as one would expect from any database management system. db4o also supports an embedded mode, which doesnt involve a distributed system, although the client and the server are quite distinct objects. Instead, both the client and the server are run on the same virtual machine. The communication between the server and the client is the same as in networking mode, but in this mode you work entirely within one process, which is extremely useful in some applications. One example application for this mode could be the design of a desktop application that uses db4o as storage. If the application is later redesigned to work in a distributed mode, then it is very simple to convert to work in network modeall there is to be done is specify an IP address or hostname and TCP port number for the server. The last mode is used for out-of-band communications with the server. In this mode the information sent does not belong to the db4o protocol, and does not consist of data objects, but instead is completely user-defined. This mode uses a message passing communication interface. One can send objects to the server and the server can use them to do whatever is needed. This is extremely useful for sending messages to the server like: do a defragment, stop yourself, perform a save copy and so on.
28
{ Console.WriteLine("Starting server..."); // Assigning a database file repository to be handled by the server IObjectServer server = Db4oFactory.OpenServer("C:\\Database.yap", 8732); // Granting access rights to two clients server.GrantAccess("user1", "password"); server.GrantAccess("user2", "password"); try { while (stop == false) Monitor.Wait(this, 60000); } catch (ThreadInterruptedException) { } finally { // Closes the server server.Close(); } } Console.WriteLine("Server closed."); } } // Client class definition class Client { private IObjectContainer client; public void Run() { Console.WriteLine("Starting client..."); // Gets a client connection to the db4o server client = Db4oFactory.OpenClient("localhost", 8732, "user1", "password"); // Reads new customer data Console.WriteLine("Please enter the customers name: "); string name = Console.ReadLine(); Console.WriteLine("Please enter the customers age: "); int age= int.TryParse(Console.ReadLine()); Console.WriteLine("Please enter the customers discount: "); int discount= int.TryParse(Console.ReadLine());
29
try { // Stores the client inside the database client.Store(new Customer(name, age, discount)); // Commits the transaction client.Commit(); } catch (Exception) { // Rolls the transaction back, in the case of an exception client.Rollback(); } finally { // Closes the connection to the server client.Close(); } } }
30
Console.WriteLine("Please enter the customers age: "); int age= int.TryParse(Console.ReadLine()); Console.WriteLine("Please enter the customers discount: "); int discount= int.TryParse(Console.ReadLine()); try { // Stores the client inside the database client.Store(new Customer(name, age, discount)); // Commits the transaction client.Commit(); } catch (Exception) { // Rolls the transaction back, in the case of an exception client.Rollback(); } finally { // Closes the connection to the embedded server client.Close(); // Closes the embedded server server.Close(); } } }
31
update depth when one uses the Store() method for an object, one may only want to update the object graph starting from the given object at a certain depth only. db4o allows the setting of update depth, for all the entities, as well as for individual types and type templates. A depth of 1 is the default, which stores the objects primitive data and sets any references to null. A depth of n follows the references inside the initial object, decreases the update counter by 1 when moving to the next entity in the graph and then applies the same process recursively. Example (setting the update depth for all objects to 5 and then the depth for Person objects to 2, meaning that the Address object of a Person instance will also be updated):
Db4oFactory.Configure().UpdateDepth(5); Db4oFactory.Configure().ObjectClass(typeof(Person)).UpdateDepth(2);
activation depth when one uses any of the querying methods to retrieve a set (or list) of objects, it might be useful not to store the whole object graph at once into memory, but activate and then deactivate parts of the object graph, as they are needed by the application. A depth of 5 is the default, while an activation depth of n behaves in the same way as an update depth on n. Example (setting the activation depth for all objects to 1 and then the depth for a previously-queried Person object to 2, meaning that the Address object of a Person instance will also be activated):
Db4oFactory.Configure().ActivationDepth(1); IObjectContainer db = Db4oFactory.OpenFile(C:\\Persons.yap); Person currentPerson = db.QueryByExample(new Person()).Next(); db.Activate(currentPerson);
encryption support by default, db4o stores any object data in plaintext inside the database repository (file on the disc). In situations when this approach is unsuitable (especially in the case of strings), db4o offers internal encryption capabilities (of limited security) and an interface for 3rd-party vendors to develop security plug-ins in.
32
3.1 General information BookStore.NET is a digital book management system, written in C# and targeting the .NET Framework 2.0 on the Windows operating system.
3.1.1 Features
The application allows the user to store all of its digital books (pdf, chm, html, doc, rtf, etc.) in a central single-file database repository, without the need to keep the books as individual files and folders on the disc, thus reducing disc fragmentation and easing content cohesion and portability (by copying the books database to a memory stick, for example). The books to be imported can be either file-based books (such as a single pdf document) or folder-based books (such as a website offline copy). Books can be loaded one at a time or manyat-once (bulk-load). Once imported, each book can be visualized or deleted. Each book stored in the system is characterized by 5 attributes: title, set of authors, set of tags, publishing house and year of publishing. Each of these attributes can be set for every book. The user can then query a particular set of books, based upon a combination of the aforementioned attributes. As such, the querying system is very powerful, while retaining its ease of use. The application supports multi-user access on the same PC, as each user can have its own database, without any interference with the data of other users. Also, since the user is able to select the application's working folders, the solution can be run by any user, irrespective of its privileges on the system. This application uses the object-oriented database management system db4o 7.4, under its own GPL license.
33
3.1.2 Screenshots
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Package BusinessLayer
50
51
52
53
54
Package DataAccessLayer
55
56
Package DiscAccess
57
58
Conclusions
CONCLUSIONS
Object-oriented database management systems in general and todays implementation of db4o in particular (db4o 7.4) can be very efficient and easy-to-use alternatives to the relational DBMS (sometimes including object-relational mapping tools such as (N)Hibernate).
In the case of object-oriented features such as inheritance and polymorphism or a complex object graph, a relational mapping to the object model becomes very hard to define, due to the pronounced impedance mismatch between the OOP and SQL worlds. Object-relational DBMS do not alleviate this issue to a significant extent either.
There are also performance implications, which favor OODBMS and db4o. Storing an object graph through an object-relational mapping (such as (N)Hibernate) is several orders of magnitude slower than using SQL stored procedures, which, in turn, can actually be slower then directly persisting a complex object graph.
In the case of an evolving object model (the attributes and methods of types or type templates might change), a relational database needs a schema alteration, in order to accommodate the changes. This is not the case with db4o, whose database structure mirrors that of the application and modifications are performed seamlessly, on-the-fly.
The application developed (BookStore.NET) is a well-chosen example of using the strengths of OODBMS in real-world scenarios. It manages the storage, visualization and tag-based querying of the digital books of the user and allows viewing the content of books already stored, deleting existing books, searching for books based upon a criterion or combination of criteria and adding new books to the application's database. It benefits from object-oriented persistence features in terms of natural, straightforward development, as well as performance opportunities.
59
60
References
REFERENCES
[Dat97] C. Date, H. Darwen, A Guide to the SQL Standard, Fourth Edition, Addison-Wesley, 1997 [Elm03] R. Elmasri, S. Navathe, Fundamentals of Database Systems, Fourth Edition, AddisonWesley, 2003 [Lan06] R. van der Lans, Introduction to SQL: Mastering the Relational Database Language, Fourth Edition, Addison-Wesley, 2006 [Pat06] J. Paterson, S. Edlich, H. Hoerning, R. Hoerning, The Definitive Guide to db4o, Apress, 2006 [Wil05] P. Wilton, J. Colby, Beginning SQL, Wiley Publishing Inc., 2005
[db4o] Db4o Developer Community, http://developer.db4o.com/ [MSDN] Microsoft Developer Network, http://msdn.microsoft.com/en-us/default.aspx
61