Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

DBMS UNIT -1

INTRODUCTION:

A database is an organized collection of data, so that it can be easily accessed and managed.

You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.

Database handlers create a database in such a way that only one set of software program provides
access of data to all the users.

The main purpose of the database is to operate a large amount of information by storing, retrieving,
and managing data.

There are many dynamic websites on the World Wide Web nowadays which are handled through
databases. For example, a model that checks the availability of rooms in a hotel. It is an example of
a dynamic website that uses a database.

There are many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL,
SQL Server, etc.

Modern databases are managed by the database management system (DBMS).

SQL or Structured Query Language is used to operate on the data stored in a database. SQL depends
on relational algebra and tuple relational calculus.

A cylindrical structure is used to display the image of a database.

Evolution of DATABASE:
The database has completed more than 50 years of journey of its evolution from flat-file system to relational
and objects relational systems. It has gone through several generations.

The Evolution

File-Based

1968 was the year when File-Based database were introduced. In file-based databases, data was maintained
in a flat file. Though files have many advantages, there are several limitations.
One of the major advantages is that the file system has various access methods, e.g., sequential, indexed, and
random.

It requires extensive programming in a third-generation language such as COBOL, BASIC.

Hierarchical Data Model

1968-1980 was the era of the Hierarchical Database. Prominent hierarchical database model was IBM's first
DBMS. It was called IMS (Information Management System).

In this model, files are related in a parent/child manner.

Below diagram represents Hierarchical Data Model. Small circle represents objects.

Like file system, this model also had some limitations like complex implementation, lack structural
independence, can't easily handle a many-many relationship, etc.

Network data model

Charles Bachman developed the first DBMS at Honeywell called Integrated Data Store (IDS). It was developed
in the early 1960s, but it was standardized in 1971 by the CODASYL group (Conference on Data Systems
Languages).

In this model, files are related as owners and members, like to the common network model.

Network data model identified the following components:


o Network schema (Database organization)
o Sub-schema (views of database per user)
o Data management language (procedural)

This model also had some limitations like system complexity and difficult to design and maintain.

Relational Database

1970 - Present: It is the era of Relational Database and Database Management. In 1970, the relational model
was proposed by E.F. Codd.

Relational database model has two main terminologies called instance and schema.

The instance is a table with rows or columns

Schema specifies the structure like name of the relation, type of each column and name.

This model uses some mathematical concept like set theory and predicate logic.

The first internet database application had been created in 1995.

During the era of the relational database, many more models had introduced like object-oriented model,
object-relational model, etc.

Cloud database

Cloud database facilitates you to store, manage, and retrieve their structured, unstructured data via a cloud
platform. This data is accessible over the Internet. Cloud databases are also called a database as service
(DBaaS) because they are offered as a managed service.

Some best cloud options are:

o AWS (Amazon Web Services)


o Snowflake Computing
o Oracle Database Cloud Services
o Microsoft SQL server
o Google cloud spanner

Advantages of cloud database

Lower costs

Generally, company provider does not have to invest in databases. It can maintain and support one or more
data centers.

Automated

Cloud databases are enriched with a variety of automated processes such as recovery, failover, and auto-
scaling.
Increased accessibility

You can access your cloud-based database from any location, anytime. All you need is just an internet
connection.

Purpose of DB:
In the early days, database applications were built on top of file
systems
● Drawbacks of using file systems to store data:
● Data redundancy and inconsistency
● Multiple file formats, duplication of information in different files
● Difficulty in accessing data
● Need to write a new program to carry out each new task
● Data isolation — multiple files and formats
● Integrity problems
● Integrity constraints (e.g. account balance > 0) become part of
program code
● Hard to add new constraints or change existing ones

Drawbacks of using file systems (cont.)


● Atomicity of updates
● Failures may leave database in an inconsistent state with partial
updates
carried out
● E.g. transfer of funds from one account to another should either
complete
or not happen at all
● Concurrent access by multiple users
● Concurrent accessed needed for performance
● Uncontrolled concurrent accesses can lead to inconsistencies
– E.g. two people reading a balance and updating it at the same time
● Security problems
● Database systems offer solutions to all the above problems
DATA MODELS:

Data Model is the modeling of the data description, data semantics, and consistency constraints of
the data. It provides the conceptual tools for describing the design of a database at each level of
data abstraction. Therefore, there are following four data models used for understanding the
structure of the database:

1) Relational Data Model: This type of model designs the data in the form of rows and columns
within a table. Thus, a relational model uses tables for representing data and in-between
relationships. Tables are also called relations. This model was initially described by Edgar F. Codd, in
1969. The relational data model is the widely used model which is primarily used by commercial data
processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects


and relationships among them. These objects are known as entities, and relationship is an
association among these entities. This model was designed by Peter Chen and published in 1976
papers. It was widely used in database designing. A set of attributes describe the entities. For
example, student_name, student_id describes the 'student' entity. A set of the same type of entities
is known as an 'Entity set', and the set of the same type of relationships is known as 'relationship
set'.

3) Object-based Data Model: An extension of the ER model with notions of functions,


encapsulation, and object identity, as well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various database systems following the object-
oriented approach were developed. Here, the objects are nothing but the data carrying its
properties.

4) Semistructured Data Model: This type of data model is different from the other three data
models (explained above). The semistructured data model allows the data specifications at places
where the individual data items of the same type may have different attributes sets. The Extensible
Markup Language, also known as XML, is widely used for representing the semistructured data.
Although XML was initially designed for including the markup information to the text document, it
gains importance because of its application in the exchange of data.
Database Language:
o A DBMS has appropriate languages and interfaces to express database queries and updates.
o Database languages can be used to read, store and update the data in the database.

Types of Database Language

1. Data Definition Language

o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under Data
definition language.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a
database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language

o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does not have the feature of
rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language


TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last Commit.
TRANSACTION MANAGEMENT:
Transactions in DBMS :
Transactions are a set of operations used to perform a logical set of work. A transaction
usually means that the data in the database has changed. One of the major uses of DBMS is to
protect the user’s data from system failures. It is done by ensuring that all the data is restored
to a consistent state when the computer is restarted after a crash. The transaction is any one
execution of the user program in a DBMS. Executing the same program multiple times will
generate multiple transactions.
Example –
Transaction to be performed to withdraw cash from an ATM vestibule.

Set of Operations :
Consider the following example for transaction operations as follows.

Example -ATM transaction steps.


• Transaction Start.
• Insert your ATM card.
• Select language for your transaction.
• Select Savings Account option.
• Enter the amount you want to withdraw.
• Enter your secret pin.
• Wait for some time for processing.
• Collect your Cash.
• Trasaction Completed.

Three operations can be performed in a transaction as follows.
1. Read/Access data (R).
2. Write/Change data (W).
3. Commit.

Transaction States :
Transactions can be implemented using SQL queries and Server. In the below-given diagram,
you can see how transaction states works.
Active state

o The active state is the first state of every transaction. In this state, the transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But all the records are still not
saved to the database.

Partially committed

o In the partially committed state, a transaction executes its final operation, but the data is still not saved
to the database.
o In the total mark calculation example, a final display of the total marks step is executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all
the effects are now permanently saved on the database system.

Failed state

o If any of the checks made by the database recovery system fails, then the transaction is said to be in
the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks,
then the transaction will fail to execute.

Aborted

o If any of the checks fail and the transaction has reached a failed state then the database recovery
system will make sure that the database is in its previous consistent state. If not then it will abort or
roll back the transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction

Uses of Transaction Management :


• The DBMS is used to schedule the access of data concurrently. It means that the user
can access multiple data from the database without being interfered with each other.
Transactions are used to manage concurrency.
• It is also used to satisfy ACID properties.
• It is used to solve Read/Write Conflict.
• It is used to implement Recoverability, Serializability, and Cascading.
• Transaction Management is also used for Concurrency Control Protocols and
Locking of data.

Disadvantage of using a Transaction :


• It may be difficult to change the information within the transaction database by end-
users.
• We need to always roll back and start from the beginning rather than continue from
the previous state.

STORAGE MANAGEMENT:
Storage System in DBMS
A database system provides an ultimate view of the stored data. However, data in the form of bits, bytes get stored in
different storage devices.

In this section, we will take an overview of various types of storage devices that are used for accessing and storing data.

Types of Data Storage

For storing the data, there are different types of storage options available. These storage types differ from one another
as per the speed and accessibility. There are the following types of storage devices used for storing the data:

o Primary Storage

o Secondary Storage

o Tertiary Storage

Primary Storage

It is the primary area that offers quick access to the stored data. We also know the primary storage as volatile storage.
It is because this type of memory does not permanently store the data. As soon as the system leads to a power cut or a
crash, the data also get lost. Main memory and cache are the types of primary storage.
o Main Memory: It is the one that is responsible for operating the data that is available by the storage medium.
The main memory handles each instruction of a computer machine. This type of memory can store gigabytes of
data on a system but is small enough to carry the entire database. At last, the main memory loses the whole
content if the system shuts down because of power failure or other reasons.

1. Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a tiny storage
media which is maintained by the computer hardware usually. While designing the algorithms and query
processors for the data structures, the designers keep concern on the cache effects.

Secondary Storage

Secondary storage is also called as Online storage. It is the storage area that allows the user to save and store data
permanently. This type of memory does not lose the data due to any power failure or system crash. That's why we also
call it non-volatile storage.

There are some commonly described secondary storage media which are available in almost every type of computer
system:

o Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further plugged into
the USB slots of a computer system. These USB keys help transfer data to a computer system, but it varies in
size limits. Unlike the main memory, it is possible to get back the stored data which may be lost due to a power
cut or other reasons. This type of memory storage is most commonly used in the server systems for caching the
frequently used data. This leads the systems towards high performance and is capable of storing large amounts
of databases than the main memory.

o Magnetic Disk Storage: This type of storage media is also known as online storage media. A magnetic disk is
used for storing the data for a long time. It is capable of storing an entire database. It is the responsibility of the
computer system to make availability of the data from a disk to the main memory for further accessing. Also, if
the system performs any operation over the data, the modified data should be written back to the disk. The
tremendous capability of a magnetic disk is that it does not affect the data due to a system crash or failure, but
a disk failure can easily ruin as well as destroy the stored data.

Tertiary Storage

It is the storage type that is external from the computer system. It has the slowest speed. But it is capable of storing a
large amount of data. It is also known as Offline storage. Tertiary storage is generally used for data backup. There are
following tertiary storage devices available:

o Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk (CD) can store
700 megabytes of data with a playtime of around 80 minutes. On the other hand, a Digital Video Disk or a DVD
can store 4.7 or 8.5 gigabytes of data on each side of the disk.

o Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for archiving or backing
up the data. It provides slow access to data as it accesses data sequentially from the start. Thus, tape storage is
also known as sequential-access storage. Disk storage is known as direct-access storage as we can directly access
the data from any location on disk.
Storage Hierarchy

Besides the above, various other storage devices reside in the computer system. These storage media are organized on
the basis of data accessing speed, cost per unit of data to buy the medium, and by medium's reliability. Thus, we can
create a hierarchy of storage media on the basis of its cost and speed.

Thus, on arranging the above-described storage media in a hierarchy according to its speed and cost, we conclude the
below-described image:

DATA BASE ARCHITECTURE:


DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.
o The client/server architecture consists of many PCs and a workstation which are connected via the
network.
o DBMS architecture depends upon how users are connected to the database to get their request done.

Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of two
types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture

o In this architecture, the database is directly available to the user. It means the user can directly sit on
the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool
for end users.
o The 1-Tier architecture is used for development of the local application, where programmers can
directly communicate with the database for the quick response.

2-Tier Architecture

o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server side.

2-tier Architecture\ 3-tier Architecture

3-Tier Architecture

o The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't
directly communicate with the server.

o The application on the client-end interacts with an application server which further communicates with the
database system.

o End user has no idea about the existence of the database beyond the application server. The database also has
no idea about any other user beyond the application.

o The 3-Tier architecture is used in case of large web application.


DATA BASE USERS:
Different types of Database Users
Database users are categorized based up on their interaction with the data base.
These are seven types of data base users in DBMS.
1. Database Administrator (DBA) :
Database Administrator (DBA) is a person/team who defines the schema and also controls
the 3 levels of database.
The DBA will then create a new account id and password for the user if he/she need to
access the data base.
DBA is also responsible for providing security to the data base and he allows only the
authorized users to access/modify the data base.
• DBA also monitors the recovery and back up and provide technical support.
• The DBA has a DBA account in the DBMS which called a system or superuser
account.
• DBA repairs damage caused due to hardware and/or software failures.

2. Naive / Parametric End Users :


Parametric End Users are the unsophisticated who don’t have any DBMS knowledge but
they frequently use the data base applications in their daily life to get the desired results.
For examples, Railway’s ticket booking users are naive users. Clerks in any bank is a naive
user because they don’t have any DBMS knowledge but they still use the database and
perform their given task.

3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.

4. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own data base applications according to their
requirement. They don’t write the program code but they interact the data base by writing
SQL queries directly through the query processor.

5. Data Base Designers :


Data Base Designers are the users who design the structure of data base which includes
tables, indexes, views, constraints, triggers, stored procedures. He/she controls what data
must be stored and how the data items to be related.

6. Application Program :
Application Program are the back end programmers who writes the code for the
application programs.They are the computer professionals. These programs could be
written in Programming languages such as Visual Basic, Developer, C, FORTRAN, COBOL
etc.

7. Casual Users / Temporary Users :


Casual Users are the users who occasionally use/access the data base but each time when
they access the data base they require the new information, for example, Middle or higher
level manager.
OVER ALL DB SYSTEM ARCHITECTURE :

1. Database systems are partitioned into modules for different functions. Some functions (e.g. file systems)
may be provided by the operating system.
2. Components include:
o File manager manages allocation of disk space and data structures used to represent information on
disk.
o Database manager: The interface between low-level data and application programs and queries.
o Query processor translates statements in a query language into low-level instructions the database
manager understands. (May also attempt to find an equivalent but more efficient form.)
o DML precompiler converts DML statements embedded in an application program to normal
procedure calls in a host language. The precompiler interacts with the query processor.
o DDL compiler converts DDL statements to a set of tables containing metadata stored in a data
dictionary.

In addition, several data structures are required for physical system implementation:

o Data files: store the database itself.


o Data dictionary: stores information about the structure of the database. It is used heavily. Great
emphasis should be placed on developing a good design and efficient implementation of the
dictionary.
o Indices: provide fast access to data items holding particular values.
LEVEL OF ABSTRACTION:
Data Abstraction and Data Independence

Database systems comprise complex data-structures. In order to make the system efficient in terms of retrieval
of data, and reduce complexity in terms of usability of users, developers use abstraction i.e. hide irrelevant
details from the users. This approach simplifies database design.

There are mainly 3 levels of data abstraction:

Physical: This is the lowest level of data abstraction. It tells us how the data is actually stored in memory. The
access methods like sequential or random access and file organization methods like B+ trees, hashing used for
the same. Usability, size of memory, and the number of times the records are factors that we need to know
while designing the database.

Suppose we need to store the details of an employee. Blocks of storage and the amount of memory used for
these purposes are kept hidden from the user.

Logical: This level comprises the information that is actually stored in the database in the form of tables. It also
stores the relationship among the data entities in relatively simple structures. At this level, the information
available to the user at the view level is unknown.

We can store the various attributes of an employee and relationships, e.g. with the manager can also be stored.

View: This is the highest level of abstraction. Only a part of the actual database is viewed by the users. This
level exists to ease the accessibility of the database by an individual user. Users view data in the form of rows
and columns. Tables and relations are used to store data. Multiple views of the same database may exist. Users
can just view the data and interact with the database, storage and implementation details are hidden from
them.

The main purpose of data abstraction is to achieve data independence in order to save time and cost required
when the database is modified or altered.

We have namely two levels of data independence arising from these levels of abstraction :
Physical level data independence: It refers to the characteristic of being able to modify the physical schema
without any alterations to the conceptual or logical schema, done for optimization purposes, e.g., Conceptual
structure of the database would not be affected by any change in storage size of the database system server.
Changing from sequential to random access files is one such example. These alterations or modifications to the
physical structure may include:
• Utilizing new storage devices.
• Modifying data structures used for storage.
• Altering indexes or using alternative file organization techniques etc.
Logical level data independence: It refers characteristic of being able to modify the logical schema without
affecting the external schema or application program. The user view of the data would not be affected by any
changes to the conceptual view of the data. These changes may include insertion or deletion of attributes,
altering table structures entities or relationships to the logical schema, etc.
ER_ MODEL:
ENITIY SET:
A database can be modeled as:

● a collection of entities,
● relationship among entities.
● An entity is an object that exists and is distinguishable from other
objects.
● Example: specific person, company, event, plant
● Entities have attributes
● Example: people have names and addresses
● An entity set is a set of entities of the same type that share the same
properties.
● Example: set of all persons, companies, trees, holidays

Entity Sets customer and loan


cust-id cust_name cust_street cust_city loan_no amount

COMPONENTS OF ER_DIAGRAM:
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as
rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken as an
entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key
attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The
key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The composite
attribute is represented by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The double
oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be represented
by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of birth.
3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is used to represent the
relationship.

Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right
associates with the relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.

MAPPING CARDINALITY:
o A mapping constraint is a data constraint that expresses the number of entities to which another entity
can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping cardinalities.
These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)

One-to-one

In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with at most one entity in E1.

E.g.: One-to-one relationship:


● A customer is associated with at most one loan via the relationship borrower
● A loan is associated with at most one customer via borrower
One-to-many

In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2
is associated with at most one entity in E1.

E.g.: One-to-Many relationship:


● In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is
associated with several (including 0) loans via borrower

Many-to-one

In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2 is
associated with any number of entities in E1.

E.g.: Many-to-One relationship:

● In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer
is associated with at most one loan via borrower
Many-to-many

In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity in E2
is associated with any number of entities in E1.

E.g.: Many-to-Many relationship:


● A customer is associated with several (possibly 0) loans via borrower
● A loan is associated with several (possibly 0) customers via borrower

KEYS :

o Keys play an important role in the relational database.


o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
identify relationships between tables.

For example: In Student table, ID is used as a key because it is unique for each student. In PERSON table,
passport_number, license_number, SSN are keys since they are unique for each pers
Types of key:

1. Primary key

o It is the first key which is used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys as we saw in PERSON table. The key which is most suitable from those lists
become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary key since they are also
unique.
o For each entity, selection of the primary key is based on requirement and developers.

2. Candidate key

o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The candidate keys
are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like SSN,
Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can
also be a key.

The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key

o Foreign keys are the column of the table which is used to point to the primary key of another table.
o In a company, every employee works in a specific department, and employee and department are two
different entities. So we can't store the information of the department in the employee table. That's
why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the EMPLOYEE
table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
GENRALIZATION , SPECIALIZATION and AGGREGATION :
Generalization
o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
form a higher level entity if they have some attributes in common.
o In generalization, an entity of a higher level can also combine with the entities of the lower level to
form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only difference is the approach.
Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher level entity Person.

Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
higher level entity can be broken down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares some distinguishing
characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
Aggregation

In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with
its corresponding entities is aggregated into a higher level entity.

For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will never
enquiry about the Course only or just about the Center instead he will ask the enquiry about both.

ER Diagram Symbols and Notations:


Alternative E-R Notations:

You might also like