Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

8

9
•In the traditional approach, information is stored in flat files which are maintained
by the file system of OS.

•Application programs go through the file system to access these flat files.

10
•Data was stored in the form of records in the files.

•Records consists of various fields which are delimited by a space , comma , tab etc.

•Special characters used to mark the end of records and end of files.

11
Disadvantages of the traditional approach
Data Security: The data as maintained in the flat file(s) is easily accessible and therefore not secure.
Example: Consider the Banking System. The Customer_Transaction file has details about the total available balance of all customers. A Customer wants information about his account balance. In a file system it is difficult to give the Customer access to only his
data. Thus enforcing security constraints[1] for the entire file or for certain data items are difficult.
Data Redundancy: Often the same information is duplicated in two or more files.
This duplication of data (redundancy) leads to higher storage and access cost. In addition it may lead to data inconsistency[2].
For Example, assume the same data is repeated in two or more files. If change is made to data in one file, it is required that the change be made to the data in the other file as well. If this is not done, it will lead to error during access of the data.
Example: Assume Customer’s details such as Cust_Last_Name, Cust_Mid_Name, Cust_First_Name, Cust_Email is stored both in the Customer_Details file and the Customer_Fixed_Deposit file. If the Email ID of one Customer, for example, Langer S. Justin changes
from Langer_Justin@yahoo.com to Langer_Justin@rediffmail.com, the Cust_Email has to be updated in both the files; otherwise it will lead to inconsistent data.
However, one can design file systems with minimal redundancy. Data redundancy is sometimes preferred. Example: Assume the Customer’s details such as Cust_Last_Name, Cust_Mid_Name, Cust_First_Name and Cust_Email are not stored in the
Customer_Fixed_Deposit file. If it is required to get this information about the customer along with his fixed deposit details, it would mean that the details be retrieved from two files. This would mean an increased overhead. It is thus preferred to store the
information in the Customer_Fixed Deposit file itself.
Data Isolation: Data Isolation means that all the related data is not available in one file. Generally, the data is scattered in various files, and the files may be in different formats, therefore writing new application programs to retrieve the appropriate data is
difficult.
Program/Data Dependence: Under the traditional file approach, application programs are dependent on the master and transaction file(s) and vice-versa. Changes in the physical format of the master file(s), such as addition of a data field requires that the
change must be made in all the application programs that access the master file. Consequently, for each of the application programs that a programmer writes or maintains, the programmer must be concerned with data management. There is no
centralized[3] execution of the data management functions. Data management is scattered among all the application programs.
Example: Consider the banking system. A master file, Customer_Fixed_Deposit file exists which has details about the customers fixed deposit accounts. A customer’s fixed deposit record is described as follows:
Cust_ID
Cust_Last_Name
Cust_Mid_Name
Cust_First_Name
Cust_Email
Fixed_Deposit_No
Amount_in_Dollars
Rate_of_Interest_in_Percent
An application program is available to display all the details about the fixed deposit accounts of all the customers. Assume a new data field, the Fixed_Deposit_Maturity_Date is added to the master file. Because the application program depends on the master file,
it also needs to be altered.
If the physical format of the master/transaction file for example the field delimiter, record delimiter, etc. are changed, it necessitates that the application program which depends on it, also be altered.
Lack of Flexibility: The traditional systems are able to retrieve information for predetermined requests for data. If the management needs unanticipated data, the information can perhaps be provided if it is in the files of the system. Extensive programming
is however required which may result in delay in making the information available. Thus by the time the information is made available, it may no longer be required or useful.
Example: Consider the banking system. An application program is available to generate a list of customer names in a particular area of the city. The bank manager requires a list of customer names having an account balance greater than $10,000.00 and residing
in a particular area of the city. An application program for this purpose does not exist. The bank manager has two choices:
To print the list of customer names in a particular area of the city and then manually find out those with an account balance greater than $10,000.00
Hire an application programmer to write the application program for the same.
Both the solutions are cumbersome.
Concurrent Access Anomalies: Many traditional systems allow multiple users to access and update the same piece of data simultaneously. But the interaction of concurrent updates may result in inconsistent data. To guard against this possibility, the system
must maintain some form of supervision. But supervision is difficult because data may be accessed by many different application programs and these application programs may not have been coordinated previously.
Example: Consider the bank system. Assume the bank manager is analyzing all the transactions made by the customers. At the same time, a customer accesses his account to make a withdrawal. The account is both read by the bank manager and updated by the
customer at the same time. This is called concurrent access. Because the customer’s account is being updated at the same time, there is a possibility of the bank manager reading an incorrect balance.
These difficulties prompted the development of database systems.
[1] Constraints: restriction, limitation
[2] Inconsistency: lacking uniformity or agreement
[3] Centralized: Systems where decision making, flow of data, or the beginning of activities are initiated at the same central point and disseminated to remote points in the chain or organization.

12
Services provided by a DBMS

•Data management
•Data definition
•Transaction support
•Concurrency control
•Recovery
•Security and integrity
•Utilities- facilities like data import & export, user management, backup,
performance analysis, logging & audit, physical storage control

13
• Now, the DBMS acts as a layer of abstraction on top of the File system.

• You might have observed that, for interacting with the file system, we were using
high level language functions for example, the ‘c’ file handling functions. For
interacting with the DBMS we would be using a Query language called SQL.

14
15
TYPES OF DATABASES
Centralized
• All data is stored at a single site.
• Allows for greater control over accessing and updating data.
• Vulnerable to failure as they depend on the availability of resources at the central site.
Example: The account information of customers is stored in a particular branch office of a
bank. This information must be shared across all Automated Teller Machines (ATM), so that
customers can withdraw money from their accounts. Instead of storing the customer
information in every ATM machine it can be stored at a common place (the branch office of
the bank) and shared over a network.

Distributed
• The database is stored on several computers – ranging from personal computers to
mainframe systems.
• Computers in a distributed system communicate with one another through various
communication media, such as high speed networks or telephone lines.
• Distributed databases are geographically separated and managed.
• Distributed databases are separately administered.
• Distributed databases have a slower interconnection.
Example: Consider the bank system. The bank’s head office is located at Chicago and the
branch offices are at Melbourne and Tokyo. The bank database is distributed across the
branch offices. The branch offices are connected through a network

16
17
In the above figure, the three level of DBMS architecture is depicted. The External
view is how the Customer, Jack views it.
The Conceptual view is how the DBA views it.
The Internal view is how the data is actually stored.

18
19
•DBA is a key person and takes care of most administrative tasks as mentioned in the slide.

•Database designers, design the database elements.

•Application programmers, make use of the various database elements and write programs to retrieve data from
them.

•End users use the DBMS.

20
1. Users and application programs need not know exactly where or how the data is
stored in order to access it.

2. Proper database design can reduce or eliminate data redundancy and confusion.

3.Support for unforeseen (ad hoc) information requests are better supported - better
flexibility.

4. Data can be more effectively shared between users and/or application programs.

Data can be stored for long term analysis (data warehousing).

21
Data: it is all the single or atomic items saved in database

Data Relationship: It is a situation that exists between two relational database


tables when one table has a foreign key that references the primary key of the
other table. Relationships allow relational databases to split and store data in
different tables, while linking disparate data items

Data semantics: It’s a way of designing a structural data to represent it in a specific


logical way.

Consistency constraints: consistency constraint specifies a relationship among two


or more data items of the database

22
Commercial Packages

•Hierarchical Model –an example is IMS (Information Management System)


•Network Model – an example is IDMS (Integrated Data Management System)
•Relational Model – few examples are Oracle, DB2(DataBase 2)

23
Record based data model – Hierarchical data model

Organizes the data in a Tree Structure

There is hierarchy of parents and child segments

Data is represented by a collection of records types

This restricts child segment having more than one parent

E.g.: Information Management System (IMS) from IBM

24
Record based data model – Network data model

Data in the network model is represented by a collection of records

Relationships among data are represented by links (Pointers)

The records in the database are collection of graphs

E.g.: Integrated Data Management System(IDMS) from Honeywell

25
26
•Though logically data is viewed as existing in the form of two dimensional tables,
actually, the data is stored under the file system only.

•The RDBMS provides an abstraction on top of the file system and gives an illusion
that data resides in the form of tables.

27
Candidate Key
An attribute, or group of attributes, that is sufficient to distinguish every tuple in the relation from every other one.

A candidate key is all those set of attributes which can uniquely identify a row. However, any
subset of these set of attributes would not identify a row uniquely
For example, in a shipment table, “S#, P#” is a candidate key. But, S# alone or P# alone would
not uniquely identify a row of the shipment table.

Primary key

The candidate key that is chosen to perform the identification task is called the
primary key.
Every tuple must have, by definition, a unique value for its primary key. A primary
key which is a combination of more than one attribute is called a composite primary
key .

Superkey
Any superset of a candidate Key is a super key.
OR
An attribute, or group of attributes, that is sufficient to distinguish every tuple in the relation from every other one.

28
For example, in a shipment table, “SupplierNo, PartNo” is a super key. if, SupplierNo alone or
PartNo alone would uniquely identify a row of the shipment table.

28
29
Each candidate key is a super key.

30
31
An attribute, or group of attributes, that is sufficient to distinguish every tuple in the relation from every other one.

A candidate key is all those set of attributes which can uniquely identify a row. However, any subset of these set of attributes
would not identify a row uniquely
For example, in a shipment table, “S#, P#” is a candidate key. But, S# alone or P# alone would not uniquely identify a row of the
shipment table.

32
33
Overlapping candidate keys: Two candidate keys overlap if they involve any attribute in common. For e.g, in the above Customer table, Cust_Id,
Account_No and Emailid, Account_No are two overlapping candidate keys. (they have Account_no in common)

34
Primary key

The candidate key that is chosen to perform the identification task is called the primary key.
Every tuple must have, by definition, a unique value for its primary key. A primary key which is a combination of more than one
attribute is called a composite primary key .

35
Foreign key
•Usually a foreign key is a “copy” of a primary key that has been exported from one relation into another to represent the
existence of a relationship between them.
•Foreign key values do not (usually) have to be unique.
•Foreign keys can also be null .

36
37
38

You might also like