Full Chapter Database Engineering Engineering Handbook P K Ghosh PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Database Engineering Engineering

Handbook P.K. Ghosh


Visit to download the full and correct content document:
https://textbookfull.com/product/database-engineering-engineering-handbook-p-k-gho
sh/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Handbook of Optomechanical Engineering Ahmad

https://textbookfull.com/product/handbook-of-optomechanical-
engineering-ahmad/

Handbook of Image Engineering Zhang

https://textbookfull.com/product/handbook-of-image-engineering-
zhang/

The Biomedical Engineering Handbook Third Edition 3


Volume Set Biomedical Engineering Fundamentals The
Biomedical Engineering Handbook Fourth Edition Joseph
D. Bronzino
https://textbookfull.com/product/the-biomedical-engineering-
handbook-third-edition-3-volume-set-biomedical-engineering-
fundamentals-the-biomedical-engineering-handbook-fourth-edition-
joseph-d-bronzino/

The Gypsum Construction Handbook Rsmeans Engineering

https://textbookfull.com/product/the-gypsum-construction-
handbook-rsmeans-engineering/
The foundation engineering handbook Second Edition
Gunaratne

https://textbookfull.com/product/the-foundation-engineering-
handbook-second-edition-gunaratne/

Handbook of Environmental Engineering 3rd Edition Myer


Kutz

https://textbookfull.com/product/handbook-of-environmental-
engineering-3rd-edition-myer-kutz/

Springer Handbook of Ocean Engineering Manhar R Dhanak

https://textbookfull.com/product/springer-handbook-of-ocean-
engineering-manhar-r-dhanak/

Clinical Engineering Handbook 2nd Edition Ernesto


Iadanza (Ed.)

https://textbookfull.com/product/clinical-engineering-
handbook-2nd-edition-ernesto-iadanza-ed/

Nuclear Engineering Handbook 2nd Edition Kenneth D. Kok


(Ed.)

https://textbookfull.com/product/nuclear-engineering-
handbook-2nd-edition-kenneth-d-kok-ed/
2019 First Edition

Database
Engineering

engineering handbook p.k. ghosh


GRAB YOUR COPY NOW !!!
Contents
CHAPTER . 1 . INTRODUCTION TO DATABASE SYSTEM
1.1 INTRODUCTION 1
1.2 BASIC CONCEPTS AND DEFINITIONS 1
1.2.1 Data 1
1.2.2 Information 2
1.2.3 Data vs Information 2
1.2.4 Database 2
1.2.5 Data Dictionary 3
1.3 DATABASE ENGINEERING 3
1.4 DATABASE 4
1.4.1 Features of a database 4
1.4.2 Components of a databse 4
1.5 DATABASE MANAGEMENT SYSTEMS 6
1.5.1 Components of DBMS 6
1.5.1.1 RDBMS components 6
1.5.1.2 ODBMS components 7
1.5.2 Features of a DBMS 7
1.5.3 Advantages of DBMS 8
1.5.4 Disadvantages of DBMS 8
1.6 DATABASE SYSTEM 9
1.6.1 Operations Performed on Database Systems 10
1.7 TRADITIONAL FILE PROCESSING SYSTEM 11
1.7.1 Characteristics of File Processing System 11
1.7.2 Limitations of the File Processing System 11
1.8 DATABASE MANAGEMENT SYSTEMS AND
FILE MANAGEMENT SYSTEMS : A COMPARISON 13
1.8.1 File Management Systems 13
1.9 DATABASE ADMINISTRATOR 14
1.10 DATABASE APPROACH 15
1.10.1 Advantages of the Database Approach 15
1.11 COMPONENTS OF THE DATABASE ENVIRONMENT 16
1.12 DATABASE LANGUAGES 17
1.12.1 Data Definition Language (DDL) 17
1.12.2 Data Storage Definition Language (DSDL) 19
1.12.3 View Definition Language (VDL) 19
1.12.4 Data Manipulation Language (DML) 19
1.12.5 Fourth-generation Language (4GL) 21
SUMMARY 22
EXERCISES WITH SOLUTIONS 23
CHAPTER . 2 . DATABASE ARCHITECTURE & DATA MODELING
2.1 INTRODUCTION 26
2.2 SCHEMAS, SUBSCHEMA AND INSTANCES 27
2.2.1 Schema 27
2.2.2 Subschema 29
2.2.3 Instances 30
2.3 THREE LEVEL ARCHITECTURE FOR A DBMS 31
2.4 MAPPINGS 33
2.4.1 Conceptual/Internal Mapping 34
2.4.2 External/Conceptual Mapping 34
2.5 LEVELS OF ABSTRACTION 35
2.5.1 External Schema 36
2.5.2 Conceptual schema 36
2.5.3 Physical Schema 36
2.6 ADVANTAGES OF THREE-TIER ARCHITECTURE 36
2.7 DATA INDEPENDENCE 37
2.7.1 Physical Data Independence 38
2.7.2 Logical Data Independence 38
2.8 STRUCTURE AND COMPONENTS OF A DBMS 38
2.8.1 Components of a DBMS 39
2.8.2 Execution Process of a DBMS 40
2.9 FUNCTIONS AND SERVICES OF DBMS 41
2.10 TYPES OF DATABASE SYSTEMS 42
2.10.1 Centralized Database System 42
2.10.2 Parallel Database System 43
2.10.3 Distributed Database System 43
2.10.4 Client-Server DBMS 44
2.10.5 Multi-Tier client server computing models 45
2.11 DATA MODELS 47
2.11.1 Record-based Data Models 48
2.11.2 Object-based Data Models 48
2.11.3 Physical Data Models 49
2.12 ENTITY-RELATIONSHIP MODEL (ER-MODEL) 57
2.12.1 Basic Constructs of E-R Modeling 58
2.12.2 E-R Notation 61
2.12.3 Developing an ERD 63
2.12.4 Strong and Weak Entity Sets 65
2.12.5 Generalization 65
2.12.6 Specialisation 66
2.12.7 Aggregation 67
2.12.8 Mapping Conceptual Schema to a Relational Schema 68
2.12.9 Database Design using the Entity-Relationship Model 69
2.12.10 Examples of E-R Diagram 69
SUMMARY 72
EXERCISES WITH SOLUTIONS 73

CHAPTER . 3 . STORAGE STRATEGIES


3.1 INTRODUCTION 85
3.2 ARCHITECTURE 85
3.2.1 Three Tier Reference Architecture 85
3.2.2 Layered DBMS Architecture 86
3.2.3 Detailed Storage Architecture 87
3.3 STORING DATA 87
3.3.1 Magnetic Disks 88
3.3.2 RAID 92
3.3.3 Other Disks 95
3.3.4 Magnetic Tape 96
3.3.5 Storage Access 97
3.3.6 File & Record Organization 98
3.4 FILE ORGANIZATIONS & INDEXES 103
3.4.1 Ordered Indices 104
3.4.2 B+ Tree Index Files 106
3.4.3 Hashing 110
SUMMARY 122
EXERCISES WITH SOLUTIONS 122

CHAPTER . 4 . RELATIONAL DATA MODEL AND LANGUAGES


4.1 INTRODUCTION 130
4.2 PROPERTIES OF RELATIONAL TABLES 130
4.3. DIFFERENCE BETWEEN DBMS AND RDBMS 130
4.4 CODD’S RULES 131
4.5 RELATIONS 133
4.6 KEYS 133
4.7 INTEGRITY CONSTRAINTS 134
4.7.1 Efitity Integrity Constraints 135
4.7.2 Referential Integrity Constraints 135
4.8 DOMAIN CONSTRAINTS 135
4.9 QUERY LANGUAGES 136
4.9.1 Relational Algebra 136
4.9.1.1 Relational Algebraic Operations 136
4.9.2 Relational Calculus 141
4.9.2.1 Tuple Relational Calculus 141
4.9.2.2 Domain Relational Calculus 144
4.10 SOME QUERIES IN RELATIONAL ALGEBRA 145
4.11 STRUCTURED QUERY LANGUAGES(SQL) 146
4.11.1 History of SQL 146
4.11.2 Characteristics of SQL 147
4.11.3 Advantages of SQL 147
4.11.4 SQL in Action 147
4.11.5 Types of SQL 149
4.11.6 SQL Data types and Literals 149
4.11.7 Types of SQL Commands 150
4.11.8 SQL Operators and their Precedence 151
4.11.9 Tables, Views And Indexes 152
4.11.10 Create Table Command 152
4.11.11 Alter Table Command 157
4.11.12 Insert Operation 158
4.11.13 Queries 159
4.11.14 Aggregate Functions 164
4.11.15 ORDER BY 165
4.11.16 GROUP BY 166
4.11.17 HAVING 167
4.11.18 UPDATE OPERATION 168
4.11.19 DELETE OPERATION 168
4.11.20 JOINS 169
4.11.21 Sub Queries (Nested Queries) 171
4.11.22 UNIONS, INTERSECTION and MINUS 172
4.11.23 SEQUENCES 174
4.11.24 DROP Command 175
4.11.25 NULL Values in SQL 175
4.11.26 Examples on SQL Programs 176
4.11.27 Embedded SQL 179
4.11.28 Cursors in SQL 181
4.11.28.1 Handling Cursors 181
4.11.29 Sample Embedded SQL Programs 182
4.11.30 GRANT command 185
4.11.31 REVOKE Command 186
4.11.32 View Definition 186
4.11.32.1 Uses of View 187
4.12 QUERY BY EXAMPLE (QBE) 189
4.12.1 QBE Dictionary 189
4.12.2 Features of QBE 193
4.12.3 Commercial Database Management Systems Providing QBE Feature 194
4.13 COMPARISON OF SQL AND QBE 194
SUMMARY 194
EXERCISES WITH SOLUTIONS 195
CHAPTER . 5 . DATABASE DESIGN
5.1 INTRODUCTION 207
5.2 SOFTWARE DEVELOPMENT LIFE CYCLE (SDLC) 207
5.2.1 Software Development Cost 208
5.2.2 Structured System Analysis and Design (SSAD) 210
5.2.2.1 Structured System Analysis 210
5.2.2.2 Structured Design 211
5.3 DATABASE DEVELOPMENT LIFE CYCLE 212
5.4 DATABASE DESIGN 215
5.5 AUTOMATED DESIGN TOOLS 221
5.5.1 Limitations of Manual Database Design 222
5.5.2 Computer-aided Software Engineering (CASE) Tools 222
5.5.2.1 Facilities provided by CASE Tools: 222
5.5.2.2 Characteristics of CASE Tools 223
5.5.2.3 Benefits of CASE Tools 224
5.6 NORMALIZATION CONCEPT 225
5.7 FUNCTIONAL DEPENDENCY 226
5.8 FULL FUNCTIONAL DEPENDENCY 227
5.9 PARTIAL DEPENDENCY 227
5.10 TRANSITIVE DEPENDENCY 227
5.11 ARMSTRONG'S AXIOMS OR 228
5.12 CLOUSER 229
5.13 NORMAL FORM 230
5.13.1 First Normal Form (1 NF) 231
5.13.2 Second Normal Form (2 NF) 232
5.13.3 Third Normal Form (3 NF) 234
5.13.4 Boyce-Codd Normal Form 235
5.13.5 Difference between BCNF & Third normal form 237
5.13.6 Dependency Preservation 237
5.13.7 Lossless-Join Decomposition or Loss less Design 238
5.13.8 Multivalued Dependency 239
5.13.9 Trivial Multivalued Dependency 240
5.13.10 Forth Normal Form 240
5.13.11 Fifth Normal Form or Project Join Normal Form 241
5.13.12 Fourth and Fifth Normal Forms : A comparative view 242
5.13.13 Sixth normal form or Domain key Normal Form 242
5.13.14 Understanding Denormalization 243
5.14 A PRACTICAL APPROACH TO DATABASE DESIGN 244
5.15 BREAKING THE RULES: WHEN TO RENORMALIZE 246
SUMMARY 247
EXERCISES WITH SOLUTIONS 248
CHAPTER . 6 . QUERY PROCESSING AND OPTIMIZATION
6.1 INTRODUCTION 257
6.2 QUERY PROCESSING 257
6.2.1 Query Processing Stages 259
6.2.1.1 Query Translation 259
6.2.1.2 Query Transformation 260
6.2.1.3 Simplification 261
6.2.1.4 Preparing alternate Access plan 261
6.2.2 Query Execution Plans in improving Application Performance 261
6.3 QUERY OPTIMIZATION 263
6.3.1 Heuristic Query Optimization 264
6.3.2 Transformation Rules 268
6.3.3 Heuristic Optimization Algorithm 274
6.4 COST ESTIMATION IN QUERY OPTIMIZATION 275
6.4.1 Cost Components of Query Execution 275
6.4.2 Cost Function for SELECT operation 277
6.4.3 Cost Function for JOIN operation 279
6.5 PIPELINING AND MATERIALIZATION 281
6.6 STRUCTURE OF QUERY EVALUATION PLANS 281
6.6.1 Query Execution Plan 282
SUMMARY 284
EXERCISES WITH SOLUTIONS 284

CHAPTER . 7 . TRANSACTION AND CONCURRENCY


7.1 INTRODUCTION 296
7.2. TRANSACTION SYSTEM 297
7.3 PROPERTIES OF TRANSACTION 297
7.3.1 Atomicity 297
7.3.2 Consistency 297
7.3.3 Isolation 297
7.3.4 Durability 297
7.4 TRANSACTION STATE 297
7.5 TRANSACTION PROCESSING SYSTEM 298
7.6 RECOVERY FROM TRANSACTION FAILURES 298
7.6.1 Cascading Rollback 298
7.6.2 Recoverable Schedules 299
7.6.3 Log Based Recovery 299
7.6.4 Checkpoints 301
7.6.5 Backup Mechanism 301
7.6.6 Shadow Paging 301
7.7 SERIALIZABILITY OF SCHEDULES 302
7.8 TYPES OF SCHEDULES 302
7.9 TESTING OF SERIALIZABILITY 303
7.9.1 Conflict Serializability 307
7.9.2 View Serializability 308
7.10 DEADLOCK HANDLING 308
7.11 DEADLOCK DETECTION 309
7.12 RECOVERY FROM DEADLOCK 309
7.13 CONCURRENCY CONTROL 309
7.14 LOCKING TECHNIQUES FOR CONCURRENCY CONTROL 310
7.15 MODE OF LOCKING 310
7.15.1 Shared Lock 310
7.15.2 Exclusive Lock 310
7.16 THE TWO-PHASE LOCKING PROTOCOL 310
7.16.1 Static (or Conservative) Two-Phase Locking 311
7.16.2 Dynamic Two-Phase Locking 312
7.16.3 Strict Two-Phase Locking 312
7.17 GRAPH-BASED PROTOCOL 312
7.18 TIME STAMPING PROTOCOLS FOR CONCURRENCY CONTROL 313
7.19 THOMAS WRITE RULE 313
7.20 VALIDATION-BASED PROTOCOLS 314
7.20.1 Read Phase 314
7.20.2 Validation Phase 314
7.20.3 Write Phase 314
7.21 MULTIPLE GRANULARITY 315
7.22 INTENSION MODES 316
7.22.1 Intention-Shared (IS) Mode 316
7.22.2 Intention-Exclusive (IX) Mode 316
7.22.3 Shared and Intention-Exclusive (SIX) Mode 316
7.23. THE MULTIPLE-GRANULARITY LOCKING PROTOCOL 316
7.24. MULTIVERSION SCHEMES 317
7.25. MULTIVERSION TIMESTAMP PROTOCOL 317
7.26 RECOVERY WITH CONCURRENT TRANSACTION 317
SUMMARY 318
EXERCISES WITH SOLUTIONS 318

CHAPTER . 8 . DATABASE RECOVERY SYSTEM


8.1 INTRODUCTION 328
8.2 EVENTS CAUSING DATA LOSS 328
8.3 DATABASE RECOVERY 329
8.3.1 Recovery Levels 330
8.3.1.1 OS Level Recovery 330
8.3.1.2 File Level Recovery 330
8.3.1.3 Disk Level Recovery 331
8.3.1.4 Transaction Level Recovery 331
8.4 STORAGE DEVICES 332
8.4.1 Magnetic tape 333
8.4.2 Hard disk 333
8.4.3 Optical disc 333
8.4.4 Floppy disk 333
8.4.5 Solid state storage 333
8.5 STORAGE STRUCTURE 334
8.5.1 Stable Storage Implementation 334
8.6 LOG-BASED RECOVERY 335
8.6.1 Deferred Database Modification 335
8.6.2 Immediate Database Modification 335
8.7 CHECKPOINTS 335
8.8 SHADOW PAGING 335
8.9 RECOVERY WITH CONCURRENT TRANSACTIONS 336
8.9.1 Buffer Management 336
8.9.1.1 Operating System Role in Buffer Management 337
8.9.1.2 Failure with Loss of Nonvolatile Storage 337
8.10 BACKUP SYSTEM 337
8.10.1 Data Repository and Back up Strategies 338
8.11 DATA OPTIMIZATION 338
8.12 MANAGING THE BACKUP PROCESS 339
8.12.1 Measuring the process 341
SUMMARY 341
EXERCISES WITH SOLUTIONS 342

CHAPTER . 9 . OBJECT-ORIENTED DATABASES


9.1 INTRODUCTION 345
9.2 OBJECT-ORIENTED DATA MODEL (OODM) 345
9.2.1 Characteristics of Object-oriented Databases (OODBs) 347
9.2.2 Comparison of an OODM and E-R Model 348
9.3 BASIC CONCEPTS OF OBJECT ORINETED DATABSES 348
9.4 BENEFITS OF OBJECT ORIENTATION 349
9.5 OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEMS 350
9.6 MERITS OF OBJECT ORIENTED DATABASE 350
9.7 TRADITIONAL VERSUS OBJECT-ORIENTED DBMSES 351
9.8 CHARACTERISTICS OF OBJECT-ORIENTED DATABASE 351
9.8.1 When Object Databases are used? 353
9.8.2 How Data is Stored? 353
9.9 OODB VS RDBMS 353
9.10 OBJECT RELATIONAL DBMS 354
9.10.1 Benefits of ORDBMS 354
9.10.2 Object Oriented Vs. Object Relational Management System 355
SUMMARY 357
EXERCISES WITH SOLUTIONS 357

CHAPTER . 10 . PARALLEL AND DISTRIBUTED DATABASE


10.1 INTRODUCTION 363
10.2 PARALLEL DATABASES 363
10.2.1 Architecture of Parallel Databases 364
10.2.1.1 Shared-memory Multiple CPU Parallel Database Architecture 364
10.2.1.2 Shared-disk Multiple CPU Parallel Database Architecture 365
10.2.1.3 Shared-nothing Multiple CPU Parallel Database Architecture 367
10.2.2 Key Elements of Parallel Database Processing 368
10.2.3 Query Parallelism 371
10.2.3.1 I/O Parallelism (Data Partitioning) 372
10.2.3.2 Intra-query Parallelism 375
10.2.3.3 Inter-query Parallelism 375
10.2.3.4 Intra-operation Parallelism 376
10.2.3.5 Inter-operation Parallelism 377
10.3 DISTRIBUTED DATABASE 377
10.3.1 Homogeneous Distributed Database 378
10.3.2 Heterogeneous Distributed Database 379
SUMMARY 380
EXERCISES WITH SOLUTIONS 380

CHAPTER . 11 . DATA WAREHOUSE AND DATA MINING


11.1 INTRODUCTION 388
11.2 DATA WAREHOUSE TERMINOLOGIES 389
11.3 PROPERTIES OF DATA WAREHOUSE: 391
11.4 DATA WAREHOUSES: A COMPARATIVE VIEW 393
11.5 DATA WAREHOUSING AND OLAP 394
11.5.1 Comparison of Data Warehouse and Operational Data 395
11.6 ARCHITECTURE OF DATA WAREHOUSE 396
11.6.1 Extraction, Transformation and Loading (ETL) 397
11.6.2 Star schema architecture 397
11.7 DATA MINING 400
11.8 THE SCOPE OF DATA MINING 401
11.9 DATA MINING PROCESS 403
11.10 ARCHITECTURE FOR DATA MINING 403
11.11 PROFITABLE APPLICATIONS 405
11.12 DATA MINING AND DATA WAREHOUSE 406
SUMMARY 408
EXERCISES WITH SOLUTIONS 409
PREVIOUS YEAR QUESTIONS WITH ANSWER-2011 412
ppp
CHAPTER
INTRODUCTION TO
1 DATABASE SYSTEM
1.1 INTRODUCTION
Now these days companies, institutions, big offices, organisations and malls all
require large amount of data to be stored securely and accurately for various purpose.
For this, computers plays a vital role for help and processing day-to-day activities. Almost
every organisation feels to use powerful computers with very high disk storage capacity.
But it is not possible for a person in a company to be proficient in programming to carry out
the activity of data storage and retrieval. Instead of it is simpler if the enterprise-wide data
is stored at one system and all the staffs can retrieve the respective information without
much effort. Today decision makers prefer to use computer technology for quick and
efficient decision making. Living in an information age, we must realize that information is
the most powerful tool for efficient decision-making. In this respect Databases have become
a dominant tool in business computing and is the driver of information systems.
1.2 BASIC CONCEPTS AND DEFINITIONS
With the first growing application of computers, the organisations are switching / changing
mind from a manual system to a computerised information system for which the data within
the organisation is a basic resource. Therefore, proper organisation and management of data
is essential to run the organisation more efficiently. The efficient use of data for planning,
production control, marketing, invoicing, payroll, accounting and other functions in an
organisation have a great impact for this competitive edge.
1.2.1 Data
The term data means groups of information that represent the qualitative or quantitative attributes
of a variable or set of variables. Data (plural of “datum”, which is seldom used) are typically
the results of measurements and can be the basis of graphs, images, or observations of a set
of variables. Data are very much to be viewed as the lowest level of abstraction from which
information and knowledge are derived.
Data can exist in different forms -- as numbers or text, figures etc. as bits and bytes stored in
electronic memory, or as facts stored in a person’s mind.
2 Database Engineering
1.2.2 Information
In general information is a raw data which –
• has been verified to be accurate and timely produced.
• is specific and organized for a specific task.
• is presented within a context that gives the meaning and relevance.
• leads to increase in understanding and decrease in uncertainty.
The value of information can be sensed by its ability to affect the behaviour, decision, or
outcome.

Fig. 1.1 : Information Cycle


Figure 1.1 describes when data is fed into the system, it under goes valid process and procedure
and results valid information according to the user’s needs (end users and decision makers).
Hence information can be defined as ‘processed data’.
1.2.3 Data vs Information
Data are plain facts. The word “data” is plural for “datum.” When data are processed,
organized, structured or presented in a given context so as to make them useful, they are
called “Information”.
Data themselves are fairly useless. But when these data are interpreted and processed
to determine its true meaning, they becomes useful and can be called Information.
“Data is the raw material of information. i.e. data undergoes processing activities through
a system results information, but information is as a whole refinement of data.”
1.2.4 Database
An integration of related data/information, which can be referred by an user or a
group of users for individual/organizational activities like creation, addition, deletion,
updation and even sharing of information etc. Some of the examples on database are like
maintaining stock in a small shop, employee details of a big hospital, railway/air ticket
reservation system etc.
Database Engineering 3
1.2.5 Data Dictionary
A data dictionary is a metadata (data about the data) repository, as defined as "centralized
repository of information about data such as meaning, relationships to other data,
origin, usage, and format."
It is generally associated to databases and database management systems (DBMS)
and can also be understood as
• A document describing a database or collection of databases
• An integral component of a DBMS that is required to determine its structure
• It is analogous to the term system catalog which describes the metadata.
Database users and application developers can benefit from an authoritative data
dictionary document that catalogs the organization, contents, and conventions of one or
more databases. This typically includes the names and descriptions of various tables and
fields in each database, plus additional details, like the type and length of each data element.
There is no universal standard to create a database dictionary.
Data Dictionary is necessary in the database due to following reasons:
* It improves the control of Database Administrator (DBA) over the information system
and user's understanding of use of the system.
* It helps in documentation of the database design process by storing documentation
of the result of every design phase and design decisions.
* It helps in searching the views on the database definitions of those views.
* It provides great assistance in producing a report of which data elements (data values)
are used in all the programs.
* It promotes data independence i.e. by addition or modifications of structures in the
database application program are not affected.
1.3 DATABASE ENGINEERING
Database engineering methodology is an architectural approach to planning, analyzing,
designing, and implementing applications within an oganisation. It aims to improve the
management of resources, including capital, people and database systems, to support the
achievement of business goal. It is defined as: “An integrated and evolutionary set of
tasks and techniques that enhance a business process through an automation
throughout an enterprise enabling it to develop people, procedures and systems to
achieve its goal “.
Database engineering has many purposes, including organizational planning, business
process re-engineering, application development, information systems planning and systems
re-engineering etc.
Database engineering is a discipline involving -
• conception
• modelling
• creation of a database
4 Database Engineering
• data Analysis
• database administration
• database documentation in an oganisation.
1.4 DATABASE
The related information when placed in an organized form results in a database. The
organization of data/information is necessary because unorganized information has no
meaning. There are so many examples of organized information, more precisely and the
most common are, the dictionary, the telephone directory, student record register, and
many more. In each of these, the data is stored in a specific order i.e. in an organized form.
Now the question arises, how to deal with the database? There are so many operations
like:
* To add new information (e.g. to add the contact no. of a new friend in your
telephone directory)
* To view or retrieve the stored information.
* To modify or edit the existing information
* To remove or delete the unwanted information (e.g. your friend has changed his/
her mobile number, so his/her mobile number would have to be removed from
list)
* Arranging the information in a desired order etc.
A database is a well organised collection of data are related in a meaningful way that
can be accessed in different logical order as per our requirements.
1.4.1 Features of a database
Main features of data stored in a database is :
* It is well organized.
* It is related
* It is accessible in different orders without any difficulties.
* it is stored only once.
1.4.2 Components of a database
A database consists of the following four components as shown in Fig. 1.2 :

Fig. 1.2 Components of database


Database Engineering 5
• Data item
• Relationships
• Constraints and
• Schema.
As we know, data (or data item) is a distinct piece of information. Relationships
represent a correspondence (or communication) between various data elements. Constraints
are predicates that define correct database states. Schema describes the organisation of
data and relationships within the database. It defines various views of the database for the
use of the various system components of the database management system and for
application security. A schema separates the physical aspect of data storage from the
logical aspects of data representation.
An organisation of a database is shown in Fig. 1.3. It consists of the following three
independent levels:
a. Physical storage organisation or internal schema layer
b. Overall logical organisation or global conceptual schema layer
c. Programmers’ logical organisation or external schema layer.

Fig. 1.3 : Database organisation


6 Database Engineering
The internal schema defines how and where the data are organised in physical data
storage. The conceptual schema defines the stored data structure in terms of the database
model used. The external schema defines a view of the database for specific users.
A database management system provides for accessing the database while maintaining
the required correctness and consistency of the stored data.
1.5 DATABASE MANAGEMENT SYSTEMS
A database management system (DBMS) consists of software that organizes the storage
(in a database) of data. A DBMS controls the creation, maintainance, and use of the
database storage structures of organizations and of their users.
It allows organizations to place control of organization with database development in
the hands of Database Administrators (DBAs) and other specialists. In large systems,
DBMS allows users and other software to store and retrieve data in a structured manner.
Database management systems are usually categorized according to the database model
that they support, such as the network, relational or object model.
The model tends to determine the query languages that are available to access the
database. One commonly used query language for the relational database is SQL, although
SQL syntax and function can vary from one DBMS to other. A great deal of the internal
engineering of a DBMS is independent of the data model, and is concerned with managing
factors such as performance, concurrency, integrity, and recovery from hardware failures.
A relational database management system (RDBMS) implements the features of a
relational model. In this context, the entire information content of the database is represented
in one and only one way. Namely as explicit values in column positions (attributes) and
rows in relations (tuples). Therefore, there are no explicit pointers between related tables,
where as with the object database management system (ODBMS), which stores explicit
pointers between related types.
1.5.1 Components of DBMS
We know, that, most DBMS of modern trend are relational DBMS. Other less-used
DBMS systems, such as the object DBMS, are generally used in areas of application-
specific data management where performance and scalability take higher priority than the
flexibility of adhoc query capabilities provided through relational algebra execution
algorithms of a relational DBMS.
1.5.1.1 RDBMS components
• Interface drivers :
A user or application program initiates either schema modification or content
modification. These drivers are built on top of SQL. They provide methods to prepare
statements, execute statements, fetch results, etc. Examples include DDL, DCL,
DML, ODBC, and JDBC. Some vendors provide language-specific proprietary
interfaces. For example MySQL provides drivers for PHP, Python, etc.
Database Engineering 7
• SQL engine
This component interprets and executes the SQL query. It comprises three major
components (compiler, optimizer and execution engine).
• Transaction engine
Usually Transactions are the sequence of operations that read or write database
elements, which are grouped together.
• Relational engine
Relational objects such as table, index, and referential integrity constraints are
implemented in this component.
• Storage engine
This component stores and retrieves data records. It also provides a mechanism to
store metadata and control information such as undo logs, redo logs, and lock tables,
etc.
1.5.1.2 ODBMS components
• Language drivers - A user or application program initiates either schema modification
or content modification by using programming language. The drivers then provide
the mechanism to manage object lifecycle coupling of the application memory space
with the underlying persistent storage. For examples C++, Java, .NET, and Ruby.
• Query engine - This component is responsible for interpreting and executing
language-specific query commands in the form of OQL, LINQ, JDOQL, JPAQL,and
others. The query engine returns language specific collections of objects which satisfy
a query predicate expressed as logical operators e.g. >, <, >=, <=, AND, OR, NOT,
GroupBY, etc.
• Transaction engine - The transaction engine is concerned with data isolation and
consistency in the driver cache and data volumes by coordinating with the storage
engine.
1.5.2 Features of a DBMS
The features of a Database Management System are as follows :
* Data Storage, retrieval, and update (hiding the internal physical implementation
details)
* A user-accessible catalog
* Transaction support
* Concurrency control services (multi-user update functionality)
* Recovery services (revival of damaged database)
* Authorization services (security)
* Support for data communication Integrity services (i.e. constraints)
* Services to promote data independence
* Utility services (i.e. importing, monitoring, performance, record deletion, etc.)
The components to facilitate the goals of a DBMS may include the following:
8 Database Engineering
* Query processor
* Data Manipulation Language (DML) preprocessor
* Database manager (software components to include authorization control, command
processor, integrity checker, query optimizer, transaction manager, scheduler, recovery
manager, and buffer manager)
* Data Definition Language (DDL) compiler
* File manager
* Catalog manager
1.5.3 Advantages of DBMS
• Data independence: provides an abstract view of the data that hides the details data
representation and storage.
• Efficient Data Access: This is the advantage where we use variety of techniques to
store and retrieve data.
• Data integrity and security: we can ensure data integrity if the data is always
enforced through integrity constraint
• Data administration: “Data” administration deals with the modeling of the data and
treats data as an organizational resource, while “database” administration deals with
the implementation of the types of databases that are in use.
• Concurrent Access and crash recovery: It ensures concurrent access of the data
in such a way that the data is being accessed by only one user of a time and also
protects the system from crashes.
• Reduced Application Development time: It supports all the important functions
that are common to many applications.
1.5.4 Disadvantages of DBMS
The disadvantages of the database approach are as follows:
• Complexity : The provision of the functionality is expected from a good DBMS
makes the DBMS an extremely complex piece of software. Database designers,
developers, database administrators and end-users must understand this functionality
to take full advantage of it, else the system can lead to bad design decisions, which
can have serious consequences for an organization.
• Size : The complexity and breadth of functionality makes the DBMS an extremely
large piece of software, occupying many megabytes of disk space and requiring
substantial amounts of memory to run efficiently.
• Performance : Typically, a File Based system is written for a specific application,
such as invoicing. As result, performance is generally very good. But, the DBMS is
written to be more general, to cater for many applications rather than just one. The
effect is that some applications may not run as fast as they used to.
• Higher impact of a failure : The centralization of resources increases the vulnerability
of the system. Since all users and applications rely on the availability of the DBMS,
the failure of any component can bring operations to a halt.
• Cost of DBMS : The cost of DBMS varies significantly, depending on the environment
and functionality provided. There is also the recurrent annual maintenance cost.
Database Engineering 9
• Additional Hardware costs : The disk storage requirements for the DBMS and the
database may require the purchase of additional storage space. Also, to achieve the
required performance, it may be necessary to purchase a larger machine, may be a
machine dedicated to running the DBMS. The procurement of additional hardware
results in further expenditure.
• Cost of Conversion : In some situations, the cost of the DBMS and extra hardware
may be insignificant compared with the cost of converting existing applications to run
on the new DBMS and hardware.
This cost also includes the cost of training staff to use these new systems and the
employment of specialist staff to help with conversion and running of the system. This
cost is one of the main reasons why some organizations feel to continue with their
current systems and not to switch to modern database technology.
1.6 DATABASE SYSTEM
A database system, also called database management system (DBMS), is a generalized
software system for manipulating databases. It is basically a computerized record-keeping
system; which stores information and allows users to add, delete, modify, retrieve and

Fig. 1.4 : DBMS Components


10 Database Engineering
update the information on demand. It provides the simultaneous use of a database by
multiple users and tool for accessing and manipulating the data in the database.
DBMS is also a collection of programs that enables users to create and maintain
database. It is a general-purpose software system that facilitates the process of defining
(specifying the data types, structures and constraints), constructing (process of storing
data on storage media) and manipulating (querying to retrieve specific data, updating to
reflect changes and generating reports from the data) for various applications.
Typically, a DBMS has three basic components, as shown in Fig. 1.4, and provides the
following facilities:
i) Data Description Language (DDL): It allows users to define the database, specify
the data types, and data structures, and the constraints on the data to be stored in the
database, usually through data definition language. DDL translates the schema written
in a source language into the object schema, thereby creating a logical and physical
layout of the database.
ii) Data Manipulation Language (DML) and query facility: It allows users to insert,
update, delete and retrieve data from the database, usually through data manipulation
language (DML). It provides general query facility through structured query language
(SQL).
iii) Software for controlled access of database: It provides controlled access to the
database, for example, preventing unauthorized user trying to access the database,
providing a concurrency control system to allow shared access of the database, activating
a recovery control system to restore the database to a previous consistent state following
a hardware or software failure and so on.
“The database and DBMS software together is called a database system”. A
database system overcomes the limitations of traditional file-oriented system such as, large
amount of data redundancy, poor data control, inadequate data manipulation capabilities
and excessive’ programming effort by supporting an integrated and centralized data
structure.

Database system = Database + DBMS (Software)


1.6.1 Operations Performed on Database Systems
Database system can be regarded as a repository or container for a collection of
computerized data files. The users can perform a variety of operations on database systems.
Some of the important operations performed on such files are as follows:
* Inserting new data into existing data files
* Adding new files to the database
* Retrieving data from existing files
* Changing data in existing files
* Deleting data from existing files
* Removing existing files from the database etc.
Database Engineering 11
1.7 TRADITIONAL FILE PROCESSING SYSTEM
File processing systems was an early attempt to computerize the manual file system A
file system is a method for storing and organizing computer files and the data they
contain to make it easy to find and access them. File systems may use a storage device
such as a hard disk or CD-ROM and involve maintaining the physical location of the files
In our own home, we may have a file system, which contains receipts, guarantees,
bank statements, and such like. When we need to look something up, we go to the file
system and search through the system starting from the first entry until we find what we
want. Alternatively, we may have an indexing system that helps to locate what we want
more quickly. For example we may have divisions in the file system or separate folders for
different types of item that are in some way logically related.
The manual file system works well when the number of items to be stored is small. It
even works quite adequately when there are large numbers of items and we have only to
store and retrieve them. But, the manual file system fails when we have to cross-reference
or process the information in the files.
For example, a typical real estate agent’s office might have a separate file for each
property for sale or rent, each potential buyer and renter, and each member of staff.
It signifies that, the manual system is inadequate for this type of work. The file based
system was developed in response to the needs of industry for more efficient data access.
In early processing systems, information was stored as groups of records in separate files.
1.7.1 Characteristics of File Processing System
A list of important characteristics of file processing system are :
* It is a group of files storing data of an organization.
* Each file is independent from other.
* Each file is called a flat file.
* Each file contains and processes information for a specific function, such as accounting,
payroll etc.
* Files are basically designed by using programming languages such as COBOL,
C, C++ .
* The physical implementation and access procedures are written into database
application; therefore, physical changes resulted in intensive rework on the part of
the programmer.
* File processing systems offers less flexibility, presents many limitations, and are
difficult to maintain when a system becomes complex.
1.7.2 Limitations of the File Processing System
There are following problems associated with the File Processing System.
(i) Separated and Isolated Data :
In order to make a decision, a user may need data from two separate files. First, the
files are evaluated by analysts and programmers to determine the specific data required
from each file and the relationships between the data and then applications can be written
in a programming language to process and extract the required data guess the task involved
if data from several files are needed.
12 Database Engineering
(ii) Data redundancy and inconsistency :
Most of the time, the same information is stored in more than one file. Uncontrolled
duplication of data is not required for several reasons, such as:
* Duplication is wasteful. It costs time and money to input data more than once.
* It requires additional storage space, again with associated costs.
* Duplication can lead to loss of data integrity, i.e. the data is no longer consistent.
For example, consider the duplication of data between the Payroll and Personnel
departments. If a member of staff moves to new house and the change of address is
communicated only to Personnel and not to Payroll, the person’s pay slip will be sent to the
wrong address. A more serious problem occurs if an employee is promoted with an
associated increase in salary. Again, the change is notified to Personnel but the change
does not filter through to Payroll. Now, the employee is receiving the wrong salary. When
this error is detected, it will take time and effort to resolve. In both the cases inconsistencies
that may result from the duplication of data.
As there is no automatic way for Personnel to update data in the Payroll files, it is
difficult to anticipate such inconsistencies arising. Even if Payroll is notified of the changes,
it is possible that the data will be input incorrectly.
(iii) Data Dependence :
In file processing systems, files and records are described by specific physical formats
that are coded into the application program by programmers. If the formats of a certain
record are changed, the code in each file containing that format need to updated.
Furthermore, instructions for data storage and access are written into the application’s
code. Therefore, the changes in storage structure or access methods can drastically affect
the processing or results of an application.
(iv) Atomicity of updates :
Failures may leave database in an inconsistent state with partial updates carried out.
i.e. transfer of funds from one account to another should either complete or not happen at
all.
(v) Concurrent access by multiple users :
From performance point of view, it is required to access the files by different user in
a time. But here it is not controlled, what can lead to inconsistencies ? For example- Two
persons reading a balance and updating it at a time.
(vi) Data Inflexibility :
Program-data interdependency and data isolation, limit the flexibility of file processing
systems in providing users with adhoc information requests.
(vii)Incompatible file formats :
As the structure of files is embedded in the application programs, the structures are
dependent on the application programming language.
For example, the structure of a file generated by a COBOL program may be different
from the structure of a file generated by a ‘C’ program. The direct incompatibility of such
files makes them difficult to process using both file formats.
Database Engineering 13
1.8 DATABASE MANAGEMENT SYSTEMS AND FILE MANAGEMENT
SYSTEMS : A COMPARISON
A Database Management System (DBMS) is a combination of computer software,
hardware, and information, designed to manipulate data via electronically.
Two types of database management systems are DBMS’s and FMS’s. In simple
terms, a File Management System (FMS) is a Database Management System that
allows access to single files or tables at a time. FMS’s accommodate flat files that have no
relation to other files. The FMS is the predecessor for the Database Management System
(DBMS), which allows access to multiple files or tables at a time (see Figure 1.5).

(FMS) (DBMS)
Fig. 1.5 : FMS versus DBMS Comparison Diagram
1.8.1. File Management Systems
Advantages Disadvantages
Simpler to use Typically does not support multi-user access
Less expensive Limited to smaller databases
Fits the needs of many small businesses Limited functionality (i.e. no support for
and home users complicated (transactions, recovery, etc.)
Popular FMS’s are packaged along with Decentralization of data
the operating (systems of personal
computers (i.e. Microsoft Card file
and Microsoft Works)
Good for database solutions for hand held Redundancy and Integrity issues
devices such as Palm Pilot

The features of a File Management System are as follows :


a) Data Management : An FMS should provide data management services to the
application.
14 Database Engineering
b) Generally with respect to storage device : The FMS data abstractions and access
methods should remain unchanged irrespective of the devices involved in data storage.
c) Validity : An FMS should guarantee that at any given moment the stored data reflect
the operations performed on them.
d) Protection : Illegal or potentially dangerous operations on the data should be controlled
by the FMS.
e) Concurrency : In multiprogramming systems, concurrent access to the data should be
allowed with minimal differences.
f) Performance : Compromise data access speed and data transfer rate with functionality.
From the point of view of an end user (or application) an FMS typically provides the
following functionalities
g) File creation, modification and deletion.
h) Ownership of files and access control on the basis of ownership permissions.
i) Facilities to structure data within files (predefined record formats, etc.).
j) Facilities for maintaining data redundancies against technical failure (back-ups, disk
mirroring, etc.).
k) Logical identification and structuring of the data, via file names and hierarchical
directory structures.
1.9 DATABASE ADMINISTRATOR
The database administrator (DBA) is the person (or group of people) responsible for co-
ordinating all the activities of database system and overall control of the database system.
The DBA’s responsibilities are :
* Schema definition : deciding the information content of the database, i.e. identifying
the entities of interest to the organisation and the information to be recorded about
those entities. This is defined by writing the conceptual schema using the DDL
* Storage structured and access method definition : deciding the storage structure
and access strategy, i.e. how the data is to be represented by writing the storage structure
definition. The associated internal/conceptual schema must also be specified using the
DDL
* Acting as liaision with users, i.e. to ensure that the data they require is available and to
write the necessary external schemas and conceptual/external mapping (again using
DDL)
* defining user authorisation and validation procedures. Authorisation and validation
procedures are extensions to the conceptual schema and can be specified using the
DDL
* defining a strategy for backup and recovery. For example periodic dumping of the
database to a backup tape and procedures for reloading the database for backup. Use
of a log file where each log record contains the values for database items before and
after a change and can be used for recovery purposes
Database Engineering 15
* monitoring performance and responding to changes in requirements, i.e. changing details
of storage and access thereby organising the system so as to get the performance that is
‘best for the organisation.
1.10 DATABASE APPROACH
The database approach emphasizes the integration and sharing of data across the organization,
they are :
Data-driven vs Process-driven Design
In file processing system, a process-driven approach was traditionally used to design
information system.
With the process-driven approach, organizational processes are first identified and
analyzed. Processes and data flows between processes are described using tools such as
DFD. Designers then work backward from the required to convert inputs into outputs. Design
of data tiles are a by-product of process design.
With database approach, information systems professional discovered that a data-driven
approach is often preferable. In the data-driven approach, entities that the organization must
manage are focused. Attributes and relationships of those entities are identified. After creating
suitable models of the data structures and related business rules, designers develop the
applications required to manage the data.

Fig. 1.6 : Data driven v/s Procedure driven design : A comparative view
1.10.1 Advantages of the Database Approach
* Minimal Data Redundancy
With the database approach, data files are integrated into a single, logical structure. It is
designed into the system to improve performances (or provide some other benefits), and the
system is (or should be) aware of redundancy.
* Consistency of Data
By controlling data redundancy, there will be minimization of inconsistency. For example,
if each address is stored only once, we cannot have disagreement on the stored values.
16 Database Engineering
When controlled redundancy is permitted in the , database, the database system itself should
enforce consistency by updating each occurrence of a data item when a change occurs.
* Integration of Data
In a database, data are organized into a single, logical structure, with logical relationships
defined between associated data entities. This makes it easy for users to relate one item of
data to other.
* Sharing of Data
Most database systems permit multiple users to share a database concurrently, although
certain restrictions are imposed such that each user would be able to view a subset of the
conceptual database model.
* Ease of Application Development
A major advantage of the database is that it greatly reduces the cost and time for developing
new business applications as programmer is relieved from the burden of designing, building,
and maintaining master files. In a database system, data are independent of the application
programs that use them. Within limits, either data or the application programs that use the
data can be changed without necessitating a change in the other factor.
1.11 COMPONENTS OF THE DATABASE ENVIRONMENT

Fig. 1.7 : Component of DBMS Environment


* Repository centralized knowledge base containing all data definitions, screen and report
formats and definitions of other organizations and system components.
* Database management system (DBMS)
Commercial software systems that are used to create, maintain and provide controlled access
to the database and repository.
* Database
A shared collection of logically related data, designed to meet the information needs of
multiple users in an organization.
* Application programs
Another random document with
no related content on Scribd:
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund from
the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like