Database Fundamentals: Topic 2

TOPIC 2: DATABASE FUNDAMENTALS
Topic 2
Database Fundamentals
Learning Objectives The learning objectives for this topic are: Know the importance of data compared with programs. Understand file-based systems and their problems. Know elementary database concepts. Understand the capabilities of second generation DBMS and the direction of future DBMS developments. Understand the database development model. Understand the components of the ER data model. To be able to construct an ER model from textual data requirements. To gain experience in developing ER models.
Contents
2.1 Evolving Database Technology ................................................................................................................. 5 2.1.1 The Importance of Data ....................................................................................................................... 5 2.1.2 File-Based Systems............................................................................................................................... 6 2.1.2.1 The Organisation of Data ........................................................................................................... 6 2.1.2.2 An Example File-based Information System Architecture ......................................................... 9 2.1.2.3 Problems with File-based Systems ............................................................................................. 9 2.1.3 Database Concepts.......................................................................................................................... 10 2.1.3.1 Database Management System (DBMS) .................................................................................. 11 2.1.3.2 Data Models ............................................................................................................................. 12 2.1.3.3 Schemata and Instances ........................................................................................................... 12 2.1.3.4 Database Views ........................................................................................................................ 14 2.1.3.5 Data Independence .................................................................................................................. 15 2.1.3.6 Database Languages and Interfaces......................................................................................... 16 2.1.4 Second Generation Databases ........................................................................................................ 17 2.1.5 Future Database Development ....................................................................................................... 18 2.2 The Entity Relationship Model ............................................................................................................... 19 2.2.1 Approaches to Modelling ................................................................................................................ 19 2.2.2 Establishing Data Requirements ..................................................................................................... 20 2.2.2.1 Description of Data Requirements for University .................................................................... 21 2.2.3 Components of the Entity Relationship Model ............................................................................... 22 2.2.3.1 Entities, Entity Types and Attributes ........................................................................................ 22 2.2.3.2 Example Entity Types ............................................................................................................... 22 2.2.3.3 Relationships ............................................................................................................................ 23 2.2.3.4 The Degree of a Relationship ................................................................................................... 24 2
2.2.3.5 Participation Conditions ........................................................................................................... 26 2.2.3.6 Constraints and Assumptions................................................................................................... 26 2.2.3.7 Modelling Multi-Line Data ....................................................................................................... 27 2.2.4 Constructing an ER Model from Data Requirements Description .................................................. 27 2.3 Entity Relationship Modelling Tutorial................................................................................................... 29 2.4 Answers to questions and activities ......................................................................................................... 33 2.4.1 Preliminaries ...................................................................................................................................... 33 2.4.2 CD Shop ER Model ............................................................................................................................ 34 2.4.3 University ER Model........................................................................................................................... 35 2.4.4 ER Model of Geographical Features .................................................................................................. 38 2.4.5 An ER Model of Timber Handling in the Sir Edward Kelly Case Study ............................................... 40
This topic provides an introduction to the fundamental concepts and terminology concerning databases. The relevant sections of the recommended course texts and other books are as follows: El Masri and Navathe, Chapters 1 and 2. You may also like to consult C J Date, Chapters 1 and 2
If you are unable to locate the above text books there are other sources of information that may prove useful. Remember to check out the details of the recommended reading and sources of information that were detailed in Topic 1 of this module. There are two main subtopics that cover the core material in this topic: Evolving Database Technology: a brief history of database design, which begins with a discussion of why we place so much emphasis on data. This is examined through the history of database development, from the pre-1960s through to the present day and into the future. Entity Relationship (ER) Modelling: ER modelling is a well established technique to support the design of database systems. The technique focuses on the data of interest (represented as a number of entities) and the relationships between the data.
A third subtopic includes a substantial tutorial to be completed at the end of this topic. The tutorial provides an opportunity for you to develop and test the skills and knowledge gained from this topic. The tutorial includes two separate activities: ER Model of Geographical Features: provides practise in developing a general ER model of geographical features, considering rivers and the cities they flow through. ER Model of Sir Edward Kelly Case Study: provides an opportunity to examine the Sir Edward Kelly case study introduced in Topic 1 of this module.
2.1 Evolving Database Technology

This provides an introduction to the design of database systems. It requires us to consider the importance of the data that systems need to store, as well as a review of developments that have led to the introduction of database systems. The following topics are covered: The Importance of Data; File-Based Systems Database Concepts; Second Generation Databases; Future Database Development
2.1.1 The Importance of Data

Learning Objective Know the importance of data compared with programs. Data has greater importance than the programs that operate on it. Consider the following reasons: Data persists much longer than the programs that operate on it. Data may have a lifetime of centuries e.g. records of sewers in a city, such as those in Paris which were built in Napoleonic times. In contrast, the average computer application may be expected to last approximately five years before it is rewritten or completely replaced. Data, or more importantly the information that the data represents, has a value such information represents power. Power may have many forms, for example economic e.g. enabling more effective marketing, political e.g. enabling more effective campaigning, strategic Vast amounts of data exist about each of us. This data is stored in databases and can be analysed, e.g. supermarket loyalty cards which can be used to monitor shopping trends or promote sales in certain areas. More sensitive data includes: Secure Data: Like financial data and examination papers. This data is subject to authorisation restrictions, as it must only be manipulated by certain people and programs. Private Data: Examples of private data include health, employment, credit and criminal records. The Internet is a good example of a global data resource - the data that it contains is more significant than the variety of browsers and applications that provide access to it.
2.1.2 File-Based Systems

Learning Objective Understand file-based systems and their problems. In its most basic form the database is just a collection of data. As such, databases have existed for many years - since records began and people started to collate data. In this module our interest concerns computerised databases. This can be traced from before 1960s through to the present day and into the future. Before the 1960s computer systems stored data in files. The key characteristics of systems of this period are: Mainframe-based computing technology; File-based storage; Piecemeal information system development; Persistent storage based on tapes/drums. Between executions of a program, data can be stored in files. Such files need to store a number of different types of data, for example characters, integers or more significant structures such as employee records. The COBOL programming language was enormously successful in the 1970s and 80s, as its design aimed for easy handling of files.
2.1.2.1 The Organisation of Data

This section provides an introduction to the fundamental file structures. The relevant sections of the recommended course texts are El Masri and Navathe, Chapters 5 and 6. The organisation of the data within the file can be achieved in a number of different ways. There are two key file structures that we consider here - sequential and indexed sequential:
Sequential: A sequential file structure is the most simple of file structures, with each item of data being stored sequentially after one another. A sequential file can be unsorted or sorted. An unsorted sequential file is also often referred to as a heap, whilst a sorted sequential file may be referred to as an ordered file. An example of an unsorted sequential file is shown in Figure 2.1. This shows the data in the order that it was added to the file. Storing a new data item in the unsorted sequential file is very simple, since each new data item is added to the end of the file. For example, a new data item '5' would be added to the file in Figure 2.1 after data item '116'.
Figure 2.1: Unsorted Sequential File Format An example of a sorted sequential file is shown in Figure 2.2. Note that the order in which data is sorted can be either ascending or descending - the example in Figure 2.2 shows data sorted in ascending order, i.e. smallest data items first.
Figure 2.2: Sorted Sequential File Format The main disadvantage of a sequential file structure is that a program must read all records in the file during each processing activity before finding the record it wants. This is rather like reading a dictionary from the first page until you find the word you are looking for, and then reading on until you find a second word. Sequential structures may be useful, however, if all records are to be read in each pass, for example in processing payroll data. Q1: Consider the sorted and unsorted sequential file structures described above. Which is most efficient (in this case we assume efficient to mean fastest) in terms of finding data? Why? Q2: Consider the sorted and unsorted sequential file structures described above. Which is most efficient (in this case we assume efficient to mean fastest) in terms of inserting data? Why? Indexed Sequential: Indexed sequential files are held in a similar way to sequential files, but in addition to the contents of the file, an index is created to locate the position of a number of records in the file to assist in searches. In order to find a record in the file, the index is consulted to locate the indexed record nearest to the one required, and will read the file from there. Consider the example shown in Figure 2.3. This shows a simple index containing the values '50' and '100'. The index contains references that point to locations in the actual data file. So, if looking for a data item '25', the index indicates that the search should start from the very beginning of the file since the data item sought is less than index value '50'. Similarly, if looking for data item '52' the index indicates that the value sought is larger than index item '50', but less than index item '100', so the search is set to start after data item '47' in the file. This format is helpful for systems that process more random data records, such as a stock system.
Figure 2.3: Indexed Sequential File Format

A further example of an indexed sequential file is shown in Figure 2.4. This example shows a typical application of an indexed sequential file system, with the index providing alphabetical access to the data file items. In this situation, we
assume that the data sought is alphabetical in nature, for example in a telephone directory where data is arranged by company or personal name. The index is used to locate the area of the file to begin searching. This makes it simple to enter the file at the correct point for names beginning with the letter 'R' without having to scan through the previous items as would be the case in a simple sequential file.
Figure 2.4: Indexed Sequential File Format The sequential and indexed sequential file structures are two of the common file structures that you should be aware of. There are, however, many more variants on these simple structures. It is recommended at this point that you consult the recommended course text references at the start of this subtopic and find out about the following file structures. In particular you are directed to find out about hash files. Hashing is a very common technique that is used to support fast access to data. The technique relies on a hash key and hash function that compute the location (address) of data items in the file. It is important that you are aware of hashing as this access method applies to databases as well as non-database file structures.
2.1.2.2 An Example File-based Information System Architecture

A typical file-based information system architecture is shown in Figure 2.5. This figure shows a number of application programs and the data that they access via files.
Figure 2.5: File Based Architecture of Library Information System This system shows a number of functions supported by a library service - Make Reservation, Make Loan and Overdue List. These functions are shown to access a number of different database files in carrying out the required task. The database files shown hold different sets of data - there are different database files to hold data concerning reservations, loans, books and borrowers.
2.1.2.3 Problems with File-based Systems

File-based systems work well for small or single-user collections of data and are still widely used. However, there are a number of problems that occur in such systems and these are summarised here: Changes in the format of the data force changes in the programs that act on the data - this is termed data dependence. For example, adding a postcode to borrowers' records means that the programs using it (Overdue List and Make Loan in Figure 2.5) must also be changed. It is hard to find a single file organisation that will suit all existing (and future) users of the data. For example, should the loans information be arranged by borrower or book?
This often leads to a duplication of data to satisfy different users and different applications - once duplicate data is present this leads to the possibility of data inconsistency if one instance of the data is updated and others are not. For example, it is common that companies have separate Personnel and Payroll databases. A person's entry in the personnel database may indicate a promotion, resulting in a salary increase. This will require a separate, explicit update in the payroll database, otherwise the member of staff will not receive the correct pay.
The data is inaccessible to non-programmers, unless they use a program that is written for them. There is no global description of the data available outside of the programs that use it. There is duplication of functionality across the programs that manipulate the files, as they each need to reimplement: Concurrency Control: otherwise simultaneous reading and writing of files can corrupt data. For example, in the library example shown in Figure 2.5, checks on the availability of a particular book may be incorrect if two users are trying to take the same book out at the same time. Recovery: otherwise computer crashes during updates are likely to corrupt data, for example if an application responsible for transferring money between bank accounts crashes after removing money from one account, but before it has had a chance to credit the other account. Integrity: otherwise data may become inconsistent, for example ensuring that a borrower can only reserve or borrow three books at a time. Security: otherwise unauthorised people may change the data. For example, the librarian's identity should be verified by the system to avoid allowing any user to enter the system and change data.
2.1.3 Database Concepts

Learning Objective Know elementary database concepts.
This introduces some key concepts and terminology relating to databases, and covers: Definitions; Database Information System Architecture; Data Models; Schemata and Instances; Database Views; Data Independence; Database Languages and Interfaces.
10
2.1.3.1 Database Management System (DBMS)

Database Management System (DBMS) A Database Management System is a general purpose software management system that controls shared access to a database, and provides mechanisms that help to ensure the security and integrity of the stored data.
The basic idea behind a DBMS is to solve the data management problems outlined in section 2.1.2.3 and re-use the solution for many collections of data.
Figure 2.6: Database Information System Architecture In a database information system a DBMS is interposed between the programs and the data they access.
11
2.1.3.2 Data Models

Data models are key to the design of database systems. There are a number of different levels of data model that we need to consider: 1. Conceptual Data Model: a high-level model of the data, describing what is stored. The conceptual data model is often presented in the form of a written document, which describes the data using high-level concepts such as objects and relationships. Examples of conceptual models are class models, using standard notation such as the Unified Modelling Language (UML) and EntityRelationship models. 2. Implementation Model: an intermediate level describing how the data is logically organised. This is often presented as a description of the interface to the DBMS, and describes the data using medium-level concepts such as relations or sets. Examples of logical models are relational models and network models. 3. Physical: very low-level compared to the conceptual and implementation models, and describes how the data is physically stored. This is often expressed in the form of commands in a control language, using concepts like record formats or index files. The implementation model can be defined in the statements that create the database, such as part of the SQL language, for example CREATE TABLE or CREATE INDEX.
2.1.3.3 Schemata and Instances

Schemata and instances are key concepts in databases. A schema is a description of a database in some data model. If we compare the database to a programming language, then the schema can be compared to the datatypes represented by the programming language. Consider the example schema in Figure 2.7 . This example shows the schema for data related to a student.
Figure 2.7: Example Schema for Student An instance is a collection of data that corresponds to a schema. Again, using the comparison with the programming language, the instance corresponds to the variables in the programming language. The relationship between schemata and instances is analogous to the relationship between the value of a variable in a program and the associated datatype, e.g. 3 is of type int. An instance of the schema in Figure 2.7 is provided in Figure 2.8.
12
Figure 2.8: Example Instance Supported by the Student Schema Q3: Think about the description of the database instance and schema described above. How often do you think that the instance will change compared to the schema?
13
2.1.3.4 Database Views

Not all users should, or need, to have access to all data held within the database. This can be controlled at the schema level. A database view is a type of schema that sits a level above the actual schema and controls what the user can see. Such views are often referred to as external schemas. There are two main reasons for using views: restricting access and generating data. Restricting access: may be important, since you do not want all users to be able to access all data. For example, consider a simple database which a bank may be using to hold details of customer accounts. The bank database will have a number of different levels of user. The following may be an appropriate set of restrictions that may apply to a teller user of the bank's system: Operations: it may be appropriate to prevent a teller user allocating or updating a customer's overdraft limit. Fields: it may be appropriate to prevent the teller user from accessing the customer's overdraft status. Individual Records: it may be appropriate that only the bank's credit controller is able to see records for customers who have a negative balance on their account. Groups of Records: employee records. it is reasonable to stop the teller user from accessing
Generating data: is a common technique in database systems. It is not always necessary to store items of data that can be generated from other items of data in the database. For example, the age of a person can be calculated from their date of birth - it is not therefore important to store the age as a separate field. A database view can be used to control access to such generated data items, so that as far as the user is concerned the item of data is actually stored in the database. Three-level Database Architecture: Databases are considered to have three levels: physical, conceptual and external. Figure 2.9 illustrates this three-level architecture. The internal level (or physical level) describes the physical storage structure of the database. The internal level therefore has an internal schema which defines the storage of data. The internal schema uses a physical data model which shows how data is organised on the machine. The conceptual level has a conceptual schema, which describes the structure of the entire database for the community of users. This schema hides the details of the physical data model and concentrates on the description of entities, data types, relations, user operations and constraints. The external level (or view level) includes a number of external schemas or user views. Each of these views or external schemas describes a part of the database of interest to a particular group of users. This allows the users to see only those parts of the database that are relevant to them.
14
Figure 2.9: Three-Level Architecture
2.1.3.5 Data Independence

Data independence is an important concept in database design. There are two kinds of data independence that we need to consider: 1. Logical Data Independence: the conceptual schema can be altered without having to modify the views or applications that access the database. For example, an existing application may access customer records in a database. If an additional attribute is added to the customer schema, for example a reference indicating passport number, then only applications or views that need to access the new data item need to be modified. 2. Physical Data Independence: the physical schema can be changed without having to change the conceptual schema. An example of such a change could be the addition of a new index for accessing customer addresses. This does not affect the conceptual schema.
15
2.1.3.6 Database Languages and Interfaces

There are three main types of language that we need to be aware of. Each language serves a particular purpose. The languages are: Query Language: The query language is used to extract data from the database.
The following is an example of a query statement which selects the attributes
CName and Year from some COURSE table. You will see XML query languages later in the course.
SELECT Cname, Year FROM Course

Data Manipulation Language (DML): The DML is used to modify the data, by inserting, deleting or updating items. The following is an example DML statement which deletes entries from the COURSE table where the CourseNo attribute matches the value 'C3'.
DELETE FROM Course WHERE CourseNo = C3
Data Definition Language (DDL): The DDL language is used to define the structure of the data to be stored. It is responsible for the creation of tables, indexes, views
etc. The following is an example DDL statement which creates a table called
COURSE with four attributes which represent a course reference number, course name, department name and year the course is offered. You will see XML document definition language(s) later in the course.
CREATE TABLE Course (CourseNo CHAR(5), CName CHAR(20), DName CHAR(22), Year NUMBER) ;
In order to use the above three languages, the user needs some sort of interface in order to access the database. The interface will be dependent on the DBMS product. The majority of databases these days provide some sort of GUI to try and help the user; for example the Oracle database vendor provides a Navigator tool.
16
2.1.4 Second Generation Databases

Learning Objective Understand the capabilities of second generation DBMS and the direction of future DBMS developments. The second generation DBMS were introduced in the 1970s and 1980s. This was a major step on from the first generation DBMS, notably by introducing an approach echoed by a number of vendors, which began to introduce an element of consistency between different providers of database technology. The second generation DBMS is seen as the introduction of the relational model. This model addressed the majority of problems identified by preceding generations: Data Independence: Support for data independence is provided. Interfaces: High-level interfaces started to appear, making the data more readily accessible to non-programmers. Such interfaces were implemented by interactive SQL, data-entry forms and support for querying by form. The relational model was proposed by Ted Codd in the 1970s. Efficient implementations became available from the mid 1980s. Subsequently, the Relational DBMS (RDBMS) has become the most dominant database model. You may be interested to check out the following seminal publication: Codd, E.F., A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol 13, No 6, 1970. A range of RDBMS are available, and some important ones are as follows: Oracle: produced by The Oracle Corporation. DB2: produced by IBM. MySQL: Open source Access: produced by Microsoft SQL Server: produced by Microsoft Why not try a search on the Internet to find out more details of these companies and their RDBMS products? All of the above RDBMS make use of a common declarative language called SQL. You will learn more about the SQL in subsequent module topics 3 and 4
17
2.1.5 Future Database Development

Database technology is still evolving. This topic considers what will direct the future development of DBMS. Important new data models are being considered in addition to the relational one, e.g. object-oriented, object-relational and knowledge-based. DBMS make use of a number of related areas of technology, which are themselves developing. As a result the DBMS can then take advantage of these developments:
Hardware platforms: for example, using multiprocessors Communications networks: for example, using Java bindings for access via the Internet. User Interface: The user interface technology is developing rapidly; for example, allowing the use of multi-media, speech and hands-free access. The main drive for DBMS development comes from the demands of new database applications, for example data warehousing and data mining applications which have particular requirements. Examples of future database developments are considered in Topic 7 in this module.
18
2.2 The Entity Relationship Model

The entity relationship model is a well established approach to the design of database systems. The focus of the technique is on the data represented (as a number of entities in which we have an interest), and the relationships between the data. The relevant sections of the recommended course texts are as follows: El Masri and Navathe, Chapter 3, C J Date, Chapter 22.
The following subtopics are covered here: Approaches to Modelling; Establishing Data Requirements; Components of the Entity Relationship Model; Constructing an ER Model from Data Requirements Description.
2.2.1 Approaches to Modelling

Learning Objective To understand the Database development model. Topic 1 introduced the system life cycle. In this subtopic we consider the similarities with approaches to data modelling. Figure 2.10 provides a graphical representation of a life cycle concerning software development. This is similar to that introduced in Topic 1 (refer back to it now if you need to remind yourself).
Figure 2.10: Illustration of Software Development Model
19
Database development follows a similar life cycle, as depicted in Figure 2.11.
Figure 2.11: Illustration of Database Development Model You are directed to read more about this yourself from the recommended course text, El Masri and Navathe. If you do not have access to the above text, then consult some other database text on this subject.
2.2.2 Establishing Data Requirements

The starting point is to establish data requirements for the database. This process is similar to establishing requirements of any other Information System and focuses on the data that the database will be required to store. This involves establishing the requirement for holding that data, i.e. not just about what data, but asking why, to help establish non-obvious data. Let us consider an example information system which looks at the data requirements of a university. The university has an overall requirement to store data which is central to its key business functions. This concerns holding data on staff, students and courses. This is insufficient information on which to build a system, so it is important to consider an approach that we use to express the detail of the requirements. The following introduces a case study based on a University system. This example will be used throughout this and following database topics. The case study is introduced by providing an initial text-based description of the data requirements of the university. This will be used later in this topic as we consider a more rigorous notation than text alone namely ER modelling.
20
2.2.2.1 Description of Data Requirements for University

Consider the following text description which represents an example of the data requirements for a University. A university has a requirement to maintain details of staff, students, the courses available and the performance of students on courses. Information about each student is initially recorded at registration, and includes the student's matriculation number, name and year of registration. A student may or may not enrol on courses at registration. Information recorded for each staff member includes the staff number and name. Each staff member may or may not act as a counsellor to one or more students, and may or may not act as a tutor on one or more courses. A student has one counsellor, and has a tutor for each course on which the student is enrolled. A student is allocated a counsellor at registration and must always have a counsellor. A student may or may not have a tutor for a course on which they are enrolled. Each course has an identifying code, a title and a credit value. There may be a limit to the number of students who can be registered on a course this is referred to as the course quota. A course may have no students. Students may not enrol for more than 100 credit points worth of courses at a time. Courses have assignments, and the grade of a student for an assignment is recorded as a percentage.
Using plain text to capture requirements has advantages as well as disadvantages. Consider what some of these may be from your reading of the above text:
Q4: Q5: What might some of the advantages be? What might some of the disadvantages be?
21
2.2.3 Components of the Entity Relationship Model

Learning Objective To understand the components of the ER data model. The entity relationship model is a conceptual data model. This conceptual model is represented through the three key components: entity types, attributes and relationships. These components are introduced and explained as follows: Entities, Entity Types and Attributes Example Entity Types Relationships The Degree of a Relationship Participation Conditions Constraints and Assumptions Modelling Multi-Line Data
2.2.3.1 Entities, Entity Types and Attributes

An entity represents a thing about which data is recorded. An entity may represent a tangible object, such as a student (John Perkins) or a vehicle (Y354 LMN). It can often represent intangible objects, such as enrolment or order. An entity type defines the properties of a collection of entities. For example, a Student with Name, Matriculation Number and Date of Registration. An attribute is a component of an entity that represent a single property of entities of that type. The attributes of the student entity type are Name, Matriculation Number and Date of Registration. One or more attributes may be chosen to be the identifier of an entity. The identifier is the attribute that helps us distinguish one entity from another entity of the same type. For example the matriculation number would be a suitable identifier for a student entity. It is advantageous to introduce a notation what allows us to graphically represent the developing ER model - such a notation allows us to represent an ER model as an ER diagram. This will be discussed later.
2.2.3.2 Example Entity Types

This introduction of example entity types uses the data requirements of the university system mentioned previously. The ER model allows the entity types to be represented graphically. Examples of both Staff and Student entities are illustrated in Figures 2.12 and 2.13.
Figure 2.12: Graphical Illustration of Student Entity Type 22
Figure 2.13: Graphical Illustration of Staff Entity Type
The entity is represented as a simple rectangle, with the name of the entity placed inside the rectangle. For clarity, it is a helpful convention to keep the name of the entity the same as the name used in the database schema - including spelling and capitalisation. The entity types can also be written textually with all of the attributes shown. illustration of the Student and Staff entities is given in Figure 2.14. An
Student(MatricNo, Name, Registered) Staff(StaffNo, Name)

Figure 2.14: Student and Staff Entity Types The identifier of each entity is shown underlined - the identifier is the attribute that will be used to distinguish one entity instance from another. In our example entities, the MatricNo is the attribute that will identify one student from another student. In normal convention, the identifier will be at the start of the attribute list in the schema. However, later examples will show that the identifier can comprise more than one attribute. The underlining is therefore used for clarity.
2.2.3.3 Relationships
A relationship is an association between entities that needs to be recorded. The relationship is key to the way the data is to be interpreted and used by the database. An example of a relationship is shown in Figure 2.15, which shows one relationship between the Student and Staff entities. The particular relationship concerns the counselling relationship between these entity types. The relationship is initially illustrated by drawing a connecting line between the two entity types affected by the relationship. The relationship is itself then named for clarity.
Figure 2.15: counsels Relationship between Staff and Student Entity A relationship may exist between different entities of the same type. For example, consider the example shown in Figure 2.16 which shows that a relationship exists between instances of the Person entity. The relationship is shown to be married to, which captures the fact that one instance of the Person entity may be married to another instance of the Person entity.
23
Figure 2.16: married to Relationship between instances of Person Entity A relationship occurrence is a specific set of associations that exist at a given time. These can be captured in an occurrence diagram. An example occurrence diagram is shown in Figure 2.17.
S01 7774 S02 S05 6635 3158 s07
S09 5324 S10
Figure 2.17: Occurrence Diagram Figure 2.17 depicts to the counsels relationship between the Student and Staff. The occurrence diagram shows the associations between the entity identifiers. So, for example, the diagram shows that Staff number 7774 has a counsels relationship with Student s01, s02 and s05. This diagram further illustrates that certain Staff need not be associated with Students via this relationship; for example Staff number 6635 is not associated with any Student occurrences.
2.2.3.4 The Degree of a Relationship

The degree of a relationship governs the maximum number of entities that participate in the relationship. There are three alternatives to consider:
24
1:1: This is referred to as a one to one relationship and states that an entity may be associated with at most one other entity in the given relationship. An example in Figure 2.18 shows that a member of Staff may be head of one Department. Likewise, each department has just one head.
Figure 2.18: 1:1 Degree of Relationship Figure 2.16 also represents a 1:1 relationship. The married to relationship between instances of the Person entity type represents a monogamous marriage, where we restrict a person to being married to at most one other person. 1:N: This is referred to as a one to many relationship and states that one entity may relate to more than one other entity. The example in Figure 2.19 shows that a Person may own one or more Car entities. The many aspect of the relationship applies to the Car entity. The one aspect of the relationship belongs to the Person entity. So, a Person may own one or more cars. A Car however, can be owned by at most one Person.
owns
Person
Figure 2.19: 1:N Degree of Relationship
Car
M:N: This is referred to as a many to many relationship. An example is shown in Figure 2.20, which shows that if we have a Course and Location entity, then the Course may be offered at one or more locations. Similarly, a Location can offer one or more courses.
Figure 2.20: M:N Degree of Relationship
25
2.2.3.5 Participation Conditions

Participation conditions govern the minimum number of entities that participate in a relationship. In this simple ER model there are just two alternatives: mandatory: every entity must participate in the relationship. For example, in recording details of car ownership we may want to stipulate that each car must have an owner. This is represented in the notation by adding a filled in dot on the relationship line. The dot is placed adjacent to the entity which is mandatory. An example is shown in Figure 2.21.
Figure 2.21: Mandatory Participation in Relationship
optional: every entity may, or may not participate in the relationship. This is represented in the notation by an unfilled dot on the relationship line. The dot is placed adjacent to the entity which is optional in the relationship. For example, returning to the relationship between Person and Car, we may further stipulate that a Person entity may exist without having an associated Car. This is shown in Figure 2.22.
Figure 2.22: Optional Participation in Relationship
2.2.3.6 Constraints and Assumptions

Typically, not all the data requirements can be captured using entities, attributes and relationships. Such items are simply recorded as constraints, i.e. assertions about the data. For example, referring to the university example, a student may not enrol for more than 100 points worth of courses. Typically, in order to construct an ER model, we must make assumptions about the data that are not stated in the data requirements. For example, it is reasonable to assume that we allow a Student to enrol on a Course at most once - this is not something to be repeated at each registration.
26
2.2.3.7 Modelling Multi-Line Data

Often we may find that we have identified a header entity. Such an entity has multiple lines of associated data that all relate to the single header entity. Consider the example of a company that specialises in mail order of music CDs. Customers place orders for one or more CDs. The order date, delivery date and delivery address is recorded for each order. A single order can be for multiple CDs, and for each CD we may want to record the CD number, title, price and quantity ordered. The following questions present a pen and paper exercise based on the above description of the CD shop described above. Q6: List the entity types for the CD shop. To do so you must identify the attributes of each entity, and an identifier for each entity type. Q7: Draw an initial graphical ER model that represents the CD shop based on the above information. Remember to state clearly any assumptions that you make.
2.2.4 Constructing an ER Model from Data Requirements Description

Learning Objective To be able to construct an ER model from textual data requirements.here is no guaranteed approach to producing an ER model from a text-based description of the data requirements. There are, however, a number of general guidelines which can be considered as a general recipe. This recipe is as follows: 1. Identify Potential Entities: This involves scanning the text and picking out all those items that may suggest themselves as potential entities. HINT: often tangible things, such as students, are entities. 2. Identify Attributes: Given the list of possible entities identified in step 1 above, scan the text and pick out all the possible attributes that belong to the candidate entities. 3. Choose Identifiers: Having identified a list of candidate attributes, examine those attributes and choose one that seems a suitable key identifier for each entity type. 4. Draw Initial ER Diagram: Given the information from the above steps, draw out an initial ER diagram - representing the entity types in the rectangles. 5. Add Relationship Information: Referring to the text description, between the entities on the draft ER diagram from step 4 above. draw relationships
6. Add Degree Information: From the text information, identify any additional information concerning the degree of relationships between entities. Add this information to the draft ER diagram. 7. Add Participation Information: From the text information, identify any additional information
27
concerning the participation of entities in the relationships identified. developing ER diagram.
Add this information to the
8. Redraw the ER Diagram: At this stage, take the time to carefully redraw the ER diagram neatly, reviewing each entity, attribute and relationship as you do so. It is important to remember that developing an ER model is an iterative process. It is unlikely that you will obtain an ideal ER model in a single pass through the process described in steps 1 through 8 above. As the model develops you may find it appropriate to make assumptions, which would subsequently need to be clarified with the user or other expert in the relevant business domain. The following questions guide you through the above recipe steps in order to develop an ER model of the university. The university data requirements were introduced earlier in this topic - refer back to the Description of Data Requirements for University in section 2.2.2 for further details. Q9: Identify potential entities for the University: You already have the Staff and Student entities to start with - these were introduced earlier. What others can you identify from the university data requirements? Q10: Identify attributes: Sample attributes have already been provided for the Staff and Student entities. What about attributes for the additional entities you have identified? Q11: Choose identifiers: Choose identifiers for the entities from the attribute list provided in the previous step. Q12: Draw an initial ER Diagram: Draw an initial ER diagram to represent the entities as rectangles. Q13: Add relationship information: Now add relationship information to your ER diagram developed in the previous step. Q14: Add degree information: Now think about the degree information that is relevant to the model you are developing. Add the degree annotations to your ER diagram. Q15: Add participation information: Think about which entities have an optional or mandatory role in a relationship. Add the required participation notations to your ER diagram.
Finally, you may like to redraw your ER diagram at this point to provide a neat model.
28
2.3 Entity Relationship Modelling Tutorial

Learning Objective To gain experience in developing ER models. This section provides a substantial tutorial to be completed at the end of this topic. The tutorial provides an opportunity for you to develop and test the skills and knowledge from this topic. The tutorial includes two separate activities. ER Model of Geographical Features: provides practise in developing a general ER model of geographical features, considering rivers and the cities they flow through. ER Model of Sir Edward Kelly Case Study: provides you with an opportunity to examine the Sir Edward Kelly case study introduced in Topic 1 of this module.
29
ER Model of Geographical Features

Read through the details of the data requirements presented in this activity. This activity is a paper exercise during which you will develop an ER model based on the data requirements. A database is required to record data about geographical features in a number of countries. The data relates to rivers, lakes, mountains and cities. It is assumed that each such feature can be uniquely identified by a name, except for cities, where a name is only unique within the country to which it belongs.
For each country, identified by its name, data is required for its area, population and capital city. For each mountain, record the height, and the country in which it lies. For each river, record its total length, and its length in each country through which it flows. For each lake, record its area, maximum depth and the proportion of the lake owned by each country on its shoreline. For each city record its name, population, the country to which it belongs, and the river (just one) which may flow through it. The following questions guide you through developing an ER model of the geographical features database. The questions allow you to build up the model incrementally if you so choose, by first checking if you have identified all the entities and their attributes, then establishing all relationships between entities. Finally, the third question in the following question set leads you to develop the complete ER model, including the participation information, along with a summary of the constraints and assumptions.
Q16: As an initial step, list the entities to be represented and their attributes (we are not concerned with their datatype at this stage, so only a representative name is required). Remember to clearly mark the attributes that form the primary key for each entity. Q17: Taking the entities identified in the first question, develop an ER model for the data requirements described above. Your model should include entity identifiers and the degree of relationships between the entities. Q18: Finally, taking the ER model developed in Question 2, now add participation constraints on each of the relationships. Also include all constraints and assumptions that you make.
30
An ER Model of Timber Handling in the Sir Edward Kelly Case Study

Read through the details of the data requirements presented in this activity. This activity is a paper exercise during which you will develop an ER model based on the data requirements found in the Sir Edward Kelly case study described in topic 1 of this module. The following questions guide you through developing an initial ER model of the timber handling part of the Sir Edward Kelly Case Study. It is important that you complete this activity before working on the assessed coursework for the database part of this module. The questions allow you to build up the model incrementally if you so choose, by first checking if you have identified all the entities and their attributes, then establishing all relationships between entities. Finally the third question in the following question set leads you to develop the complete ER model, including the participation information, along with a summary of the constraints and assumptions.
Q19: As an initial step, write down a list of the entities and their associated attributes remember to clearly mark the attributes that form the primary key for each entity. Hint 1: You should focus on the example documents in the case study, and produce entity types to model them. Hint 2: Much of the data is multi line, so the techniques outlined in Section 2.2.3.7 should be applied. Q21: Taking the entities identified above, develop an ER model for the data requirements described above. Your model should include entity identifiers and the degree of relationships between the entities. Q22: Finally, taking the ER model developed in Question 2, now add participation constraints on each of the relationships. Also include all constraints and assumptions that you make.
31
Glossary
database A database is a coherent collection of related data. Database Management System (DBMS) A Database Management System is a general purpose software management system that controls shared access to a database, and provides mechanisms that help to ensure the security and integrity of the stored data.
32
2.4 Answers to questions and activities 2.4.1 Preliminaries

Q1: A sorted sequential file will tend to be more efficient (i.e. faster) than an unsorted sequential file for data retrieval because each data item is stored in a specific order. The order of the data file provides some assurance as to where we will find data. Consider the example of searching for a data item '46'. If we consider the sorted sequential file shown in Figure 2.2 we know that we can stop the search when we reach data item '47', as at that point we know that '46' cannot exist in the file. The same cannot be said of an unsorted file, as there is no guarantee of the order that we will find data in. Given the above example of searching for data item '46' in the file shown in Figure 2.1, we would have to search until the end of the file to establish whether data item 46 existed or not. Q2: The unsorted sequential file structure is the most efficient (i.e. fastest) for inserting data because each new data item is simply added to the end of the file. If we assume we want to add a new data item '55' to a file, then it is simply added to the end of an unsorted sequential file structure - after data item '116' in Figure 2.1. The sorted sequential file structure is a little more complicated, as we assume that the data file will be searched in order to find the correct insertion point to maintain the order of the data. With regard to the sorted sequential file in Figure 2.2, this would mean searching past data items '16', '25', '47, '52' and '116' - then backtracking to insert the new data item before date item '116'. Q3: The instance will change as the user requests data be inserted, updated or deleted. Compared to the schema this is a frequent change. The schema only change as the user requires a change in the type of data recorded in the database. This is most commonly in situations where an additional data field is required - the data field would be added to the schema, then instances of that data item could be recorded in the database. Q4: The key advantage of the text-based description is that it is easy for people to read. It requires no specialised knowledge to interpret the data requirements. Q5: The disadvantage is that the relationships between data items and constraints on data are not entirely clear in the text description - it can be difficult to pick out key characteristics.
33
2.4.2 CD Shop ER Model

Q7: The entity types for the CD shop are as follows. Note the underlined identifiers.
Customer(CustNo, Title, Initial, Surname) CD(CDNo, Title, Price) Order(OrderNo,CustNo,OrderDate, DeliveryDate, DeliveryAddress) OrderItem(OrderNo, CDNo, Quantity)
Q7: The following figure represents a possible ER model of the CD shop. The Customer entity is able to place one or more orders. An Order must be placed by one Customer, but a Customer is optional in the places relationship - this assumes we can hold details of customers who have not yet placed an Order. The Order is made up of one or more OrderItems. The Order and OrderItem are both mandatory in the contains relation. Each OrderItem concerns a CD that the shop sells. Note that the CD is shown as optional in the ordered relation with the OrderItem - you might like to consider what that means.
Figure 2.23: Example CD Shop ER Model Q8: There is no Question 8
34
2.4.3 University ER Model

Q9: Tangible things that you might identify as entities are Course and Assignment. There are also intangible things that can be important entities. As far as the universityER
model is concerned this leads us to consider Enrolment as our final entity. The full list of entities is therefore as follows: Student Staff Course Assignment Enrolment
Q10: The attributes for the suggested entities are listed below:
Student(MatricNo, Name, Registered) Staff(StaffNo, Name) Course(CourseCode, Title, Credit) Enrolment(MatricNo, CourseCode) Assignment(MatricNo, CourseCode, AssignmentNo, Grade)
Q11: Candidate identifiers are shown by underling the attribute - remember that multiple attributes may be involved in forming the identifier.
Student(MatricNo, Name, Registered) Staff(StaffNo, Name) Course(CourseCode, Title, Credit) Enrolment(MatricNo, CourseCode) Assignment(MatricNo, CourseCode, AssignmentNo, Grade)
35
Q12: The following figure represents the entities for the University system - think carefully about the Enrolment entity. Did you identify it in your own list? You may have identified it as a relationship - but the details of the enrolment need to be stored and this is best achieved by having a separate entity that represents each enrolment.
Q13: Adding initial relationship information results in drawing lines between the entities and adding some text that describes the nature of the relationship.
36
Q14: Adding the degree information the the ER diagram results in the following clarification of relationships.
Q15: Adding the participation information to the ER diagram gives the final ER diagram.
37
2.4.4 ER Model of Geographical Features

Q16: The following is a list of entities and associated attributes. The names of entities and attributes are chosen from the problem domain, so it is simple in reading the following descriptions to relate the information back to the data requirements description. It is feasible that you will come up with different names from those suggested below - that is fine, as long as the names you have chosen are also readily identifiable with the problem description.
Country(CountryName, Area, Population) City(CountryName, CityName, Population) Mountain(MountainName, Height) Lake(LakeName, Area, Depth) River(RiverName, Length) Stretch(CountryName, RiverName, Length) Portion(CountryName, LakeName, Proportion)
Did you remember to underline the primary key attributes in your own entity list?
38
Q17: The ER diagram in Figure 2.25 shows the ER model capturing entities, the relationships and the degree of relationship.
Figure 2.25: Q18: The figure below shows the ER model capturing additional information concerning participation in relationship. This represents the complete ER Model for this tutorial.
39
The constraints and assumptions concerning the model are summarised below: Constraints: The total of all proportions for a lake, as fractions, add up to 1. The capital of a country must be contained within it. A country may not have a mountain, lake, or river. A country may have more than one mountain, lake, or river. Assumptions: If a river separates two countries, there is a separate stretch of river for each country. Thus the total length of all the stretches is not necessarily the same as the total length of a river (i.e. it is not derived data). Any lake is completely included within the region being modelled, so there are no parts of the lake that are owned by some country not in the region.
2.4.5 An ER Model of Timber Handling in the Sir Edward Kelly Case Study
Q19: The following is a list of entities and associated attributes. The names of entities and attributes are chosen from the problem domain, so it is simple in reading the following descriptions to relate the information back to the data requirements description. It is feasible that you will come up with different names from those suggested below - that is fine, as long as the names you have chosen are also readily identifiable with the problem description. Did you remember to underline the primary key attributes in your own entity list?
PurchaseContract(ContractNo, ContractDate, ShippingDate, Description,Lengths, UnitPrice, CurrencyRestrictions) ContractItem(ContractNo, BillOfLadingNo, ShipmentDate, Shipment, Comments) ShippingSheet(ShippingSheetNo, ContractNo, Vessel, ShippingAgent, CustomsAgent, TotalVolume, DateShipped, Dock, ExQuayPeriod, 40
PortOfShipment, FreightCharge, Insurance, ExchangeRate, DateArrived, Berth, ExQuayRate) ShippingItem(ShippingSheetNo, BillOfLadingNo, Size, Quality, Type, NumPacks, NumPieces, Volume, Destination, Haulier, OrderNo, DateInStock/Invoiced) BillOfLading(BillOfLadingNo, ContractNo, LoadingDate, Description, Size, Quality, L51, L47, L45, L42, L39, L36, L33, L30, L27, L24, L21, L18, NumPieces, TotalLength, Volume) OutturnReport(BillOfLadingNo, NumPacks, NumPieces, Condition) StockSheet(BillOfLadingNo, ContractNo, ShippingSheetNo, Stowage, Cost, Condition, Size, Quality, Type, Vessel) StockItem(BillOfLadingNo, ReferenceNo, Date, L51, L47, L45, L42, L39, L36, L33, L30, L27, L24, L21, L18, NumPieces, NumPacks, Volume, Balance)
Q20: The Figure below shows the ER model capturing entities, the relationships and the degree of relationship.
41
Q21: Figure 2.28 shows the ER model capturing additional information concerning participation in relationship. This represents the complete ER model of the Sir Edward Kelly timber handling side of the business for this tutorial. The key parts of the modelling process are to identify the entities correctly, together with their attributes, and the relationships between them. The degree and participation conditions of the relationships are also important. The layout of the diagram, and the names that you chose for the entity types, attributes and relationships are not so important. The aim in deciding on names is to be as unambiguous as possible.
Figure 2.28: The assumptions concerning the model are summarised below: Assumptions: Assume that the bill of lading is unique (i.e. no two contracts, or shipping items have the same bill of lading number
42

Database Fundamentals: Topic 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Fundamentals: Topic 2

Uploaded by

Copyright:

Available Formats

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

2.1 Evolving Database Technology

2.1.1 The Importance of Data

TOPIC 2: DATABASE FUNDAMENTALS

2.1.2 File-Based Systems

2.1.2.1 The Organisation of Data

TOPIC 2: DATABASE FUNDAMENTALS

Figure 2.3: Indexed Sequential File Format

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

2.1.2.2 An Example File-based Information System Architecture

2.1.2.3 Problems with File-based Systems

TOPIC 2: DATABASE FUNDAMENTALS

2.1.3 Database Concepts

TOPIC 2: DATABASE FUNDAMENTALS

2.1.3.1 Database Management System (DBMS)

TOPIC 2: DATABASE FUNDAMENTALS

2.1.3.2 Data Models

2.1.3.3 Schemata and Instances

TOPIC 2: DATABASE FUNDAMENTALS

TOPIC 2: DATABASE FUNDAMENTALS

2.1.3.4 Database Views

TOPIC 2: DATABASE FUNDAMENTALS

Figure 2.9: Three-Level Architecture

2.1.3.5 Data Independence

TOPIC 2: DATABASE FUNDAMENTALS

2.1.3.6 Database Languages and Interfaces

SELECT Cname, Year FROM Course

DELETE FROM Course WHERE CourseNo = C3

TOPIC 2: DATABASE FUNDAMENTALS

2.1.4 Second Generation Databases

TOPIC 2: DATABASE FUNDAMENTALS

2.1.5 Future Database Development

TOPIC 2: DATABASE FUNDAMENTALS

2.2 The Entity Relationship Model

2.2.1 Approaches to Modelling

Figure 2.10: Illustration of Software Development Model

TOPIC 2: DATABASE FUNDAMENTALS

Database development follows a similar life cycle, as depicted in Figure 2.11.

2.2.2 Establishing Data Requirements

TOPIC 2: DATABASE FUNDAMENTALS

2.2.2.1 Description of Data Requirements for University

TOPIC 2: DATABASE FUNDAMENTALS

2.2.3 Components of the Entity Relationship Model

2.2.3.1 Entities, Entity Types and Attributes

2.2.3.2 Example Entity Types

Figure 2.12: Graphical Illustration of Student Entity Type 22

TOPIC 2: DATABASE FUNDAMENTALS

Figure 2.13: Graphical Illustration of Staff Entity Type

Student(MatricNo, Name, Registered) Staff(StaffNo, Name)

TOPIC 2: DATABASE FUNDAMENTALS

S01 7774 S02 S05 6635 3158 s07

S09 5324 S10

2.2.3.4 The Degree of a Relationship

TOPIC 2: DATABASE FUNDAMENTALS

Figure 2.20: M:N Degree of Relationship

TOPIC 2: DATABASE FUNDAMENTALS

2.2.3.5 Participation Conditions