Notes of Dbms Unit-2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

UNIT-II: - Relational Data Model

1) Concepts of Relation:-

The relational model represents the database as a collection of


relations. Informally, each relation resembles a table of values or, to some
extent, a flat file of records. It is called a flat file because each record has a
simple linear or flat structure.
When a relation is thought of as a table of values, each row in the table
represents a collection of related data values. A row represents a fact that
typically corresponds to a real-world entity or relationship. The table name
and column names are used to help to interpret the meaning of the values in
each row. For example, the first table of Figure below is called STUDENT
because each row represents facts about a particular

Student entity. The column names—Name, Student number, Class,


and Major—specify how to interpret the data values in each row, based on
the column each value is in. All values in a column are of the same data type.
In the formal relational model terminology, a row is called a tuple, a column
header is called an attribute, and the table is called a relation. The data type
describing the types of values that can appear in each column is represented
by a domain of possible values. We now define these terms—domain, tuple,
attribute, and relation— formally.

a) Domains, Attributes, Tuples, and Relations


A domain D is a set of atomic values. By atomic we mean that each
value in the domain is indivisible as far as the formal relational model is
concerned. A common method of specifying a domain is to specify a data
type from which the data values
Forming the domain are drawn. It is also useful to specify a name for
the domain, to help in interpreting its values. Some examples of domains
follow:

■ Usa_phone_numbers. The set of ten-digit phone numbers valid in the


United States.
■ Local_phone_numbers. The set of seven-digit phone numbers valid
within a Particular area code in the United States. The use of local phone
numbers is quickly becoming obsolete, being replaced by standard ten-digit
numbers.
■ Social_security_numbers. The set of valid nine-digit Social Security
numbers.(This is a unique identifier assigned to each person in the United
States for Employment, tax, and benefits purposes.)
■ Names: The set of character strings that represent names of persons.
■ Grade_point_averages. Possible values of computed grade point
averages; each must be a real (floating-point) number between 0 and 4.
■ Employee_ages. Possible ages of employees in a company; each must be
an integer value between 15 and 80.
■ Academic_department_names. The set of academic department names
in a University, such as Computer Science, Economics, and Physics.
■ Academic_department_codes. The set of academic department codes,
such as ‘CS’, ‘ECON’, and ‘PHYS’.

A relation schema is used to describe a relation; R is called the name


of this relation. The degree (or arity) of a relation is the number of attributes
n of its relation schema.
A relation of degree seven, which stores information about university
students, would contain seven attributes describing each student. as follows:
STUDENT(Name, Ssn, Home_phone, Address, Office_phone, Age, Gpa)
Using the data type of each attribute, the definition is sometimes
written as: STUDENT (Name: string, Ssn: string, Home_phone: string,
Address: string, Office_phone: string, Age: integer, Gpa: real)
For this relation schema, STUDENT is the name of the relation, which
has seven attributes. In the preceding definition, we showed assignment of
generic types such as string or integer to the attributes. More precisely, we
can specify the following previously defined domains for some of the
attributes of the STUDENT relation:
dom(Name) = Names; dom(Ssn) = Social_security_numbers;
dom(HomePhone) =USA_phone_numbers3, dom(Office_phone) =
USA_phone_numbers, and dom(Gpa) = Grade_point_averages. It is also
possible to refer to attributes of a relation schema by their position within the
relation; thus, the second attribute of the STUDENT relation is Ssn, whereas
the fourth attribute is Address.
Fig above shows an example of a STUDENT relation, which
corresponds to the STUDENT schema just specified. Each tuple in the
relation represents a particular student entity (or object). We display the
relation as a table, where each tuple is shown as a row and each attribute
corresponds to a column header indicating a role or interpretation of the
values in that column. NULL values represent attributes whose values are
unknown or do not exist for some individual STUDENT tuple.

2) Relational Databases and Relational Database Schemas

The definitions and constraints we have discussed so far apply to


single relations and their attributes. A relational database usually contains
many relations, with Tuples in relations that are related in various ways. In
this section we define a relational database and a relational database schema.
A relational database schema S is a set of relation schemas S = {R1, R2, Rm}
and a set of integrity constraints IC. A relational database state 10 DB of S is
a set of relation states DB = {r1, r2, rm} such that each ri is a state of Ri and
such that the ri relation states satisfy the integrity constraints specified in IC.
Figure 3.5 shows a Relational database schema that we call COMPANY =
{EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, and
WORKS_ON, DEPENDENT}. The underlined attributes represent primary
keys. Figure 3.6 shows a relational database state corresponding to the
COMPANY schema. We will use this schema and database state for
developing sample queries in different relational languages. When we refer
to a relational database, we implicitly include both its schema and its current
state. A database state that does not obey all the integrity constraints is called
an invalid state, and a state that satisfies all the constraints in the defined set
of integrity constraints IC is called a valid state.

3. PRIMARY KEYS AND FOREIGN KEYS:-

PRIMARY KEY: - The primary key value is used to identify individual tuples in
a relation. Having NULL values for the primary key implies that we cannot identify
some tuples. For example, if two or more tuples had NULL for their primary keys,
we may not be able to distinguish them if we try to reference them from other
relations.

FOREIGN KEY: - The conditions for a foreign key, given below, specify a
referential integrity constraint between the two relation schemas R1 and R2. A set
of attributes FK in relation schema R1 is a foreign key of R1 that references relation
R2 if it satisfies the following rules:
1. The attributes in FK have the same domain(s) as the primary key attributes
PK of R2; the attributes FK are said to reference or refer to the relation R2.
2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value
of PK for some tuple t2 in the current state r2(R2) or is NULL. In the former
case, we have t1[FK] = t2[PK], and we say that the tuple t1 references or refers
to the tuple t2.
In this definition, R1 is called the referencing relation and R2 is the referenced
Relation. If these two conditions hold, a referential integrity constraint from R1
to R2 is said to hold. In a database of many relations, there are usually many
referential integrity constraints.
In fig. below the EMPLOYEE relation, the attribute Dno refers to the
department for which an employee works; hence, we designate Dno to be a foreign
key of EMPLOYEE referencing the DEPARTMENT relation. This means that a
value of Dno in any tuple t1 of the EMPLOYEE relation must match a value of the
primary key of DEPARTMENT—the Dnumber attribute—in some tuple t2 of the
DEPARTMENT relation, or the value of Dno can be NULL if the employee does
not belong to a department or will be assigned to a department later.

Relational Calculus:
Relational calculus is a non-procedural query language that tells the system what
data to be retrieved but doesn’t tell how to retrieve it. Relational Calculus exists in
two forms:
1. Tuple Relational Calculus (TRC)
2. Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC)


Tuple relational calculus is used for selecting those tuples that satisfy the given
condition.
Table: Student
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

Lets write relational calculus queries.

Question: - Query to display the last name of those students where age is greater
than 30

Solution:- { t.Last_Name | Student(t) AND t.age > 30 }


In the above query you can see two parts separated by | symbol. The
second part is where we define the condition and in the first part we
specify the fields which we want to display for the selected tuples.

The result of the above query would be:

Last_Name
---------
Singh

Question: - Query to display all the details of students where Last name is ‘Singh’

Solution:- { t | Student(t) AND t.Last_Name = 'Singh' }

Output:
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31

Ex:
Table-1: Customer

Customer name Street City


Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
Ria A5 Patiala
Table-2: Branch
Branch name Branch city
ABC Patiala
DEF Ludhiana
GHI Jalandhar

Table-3: Account
Account Branch name Balance
number
1111 ABC 50000
1112 DEF 10000
1113 GHI 9000
1114 ABC 7000

Table-4: Loan
Loan number Branch name Amount
L33 ABC 10000
L35 DEF 15000
L49 GHI 9000
L98 DEF 65000

Table-5: Borrower
Customer name Loan number
Saurabh L33
Mehak L49
Ria L98

Table-6: Depositor
Customer name Account number
Saurabh 1111
Mehak 1113
Sumiti 1114

Queries-1: Find the loan number, branch, amount of loans of greater than or equal
to 10000 amount.
{t| t ∈ loan ∧ t[amount]>=10000}
Resulting Branch name Amount
relation: Loan
number
L33 ABC 10000
L35 DEF 15000
L98 DEF 65000
Domain Relational Calculus (DRC)
In domain relational calculus the records are filtered based on the domains. Again
we take the same table to understand how DRC works.

Table: Student
First_Name Last_Name Age
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

Question: Query to find the first name and age of students where student age is
Greater than 27

Solution: {< First_Name, Age > | ∈ Student ∧ Age > 27}


Note: The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for
NOT.

Output:
First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28

Question: - Consider the following database schema:


Employee (ename, SS#, Add, Salary, Sex)
Dept (D_name, Dno, Magrss#, mgrstart_date)
Dept_Location (Dno, Dlocation)
Project (Pname, Pno, Plocation, Dno)
Works_On (SS#, Pno, hours)
Solve the following queries in relational Algebra

i) Retrieve average salary of all female employees.


ii) Retrieve the names & addresses of all employees who work in
“Research Department”.
iii) For each project, List the project name and total hours per week
Spent on that project.
iv) Find all employee in dept. No. 4 who work for more than 12
Hours per week.

Solution: -

TABLES:-
i) Retrieve average salary of all female employees.

Soln:- Sorting of female employee,:- sex = ‘female’ (employee)

After the above command you will get the following


results

ename, SS# Add Salary


Alicia 999887777 3321 Castle, Spring, TX 25000
Jennifer 987654321 291 Berry, Bellaire, TX 43000
Joyce 453453453 5631 Rice, Houston, 25000
TX

Avg salary = Favg salary (sex = ‘female’ (employee))


Avg salary = (25,000+43,000+25,000)/3 = 31,000/-

ii) Retrieve the names & addresses of all employees who work in
“Research Department”.

Soln: - X =  D.no (D name = Research (DEPARTMENT))


Result
D.No Department Name
05 Research

 name, addresses (D.no= X (EMPLOYEE))


Result

ename, Add
John 731 Fondren, Houston, TX
Franklin 638 Voss, Houston, TX
Ramesh 975 Fire Oak, Humble, TX
Joyce 5631 Rice, Houston, TX

iii) For each project, List the project name and total hours per week
Spent on that project.
Soln: - Proj_hours (p.no,total_hours) = p.no Fsum hours (Works_on)
Result=  P.name, total_hours (Proj_hours X p no = pnumber (Project)

iv) Find all employee in dept. No. 4 who work for more than 12
Hours per week.
Soln: - a) Join first
Emp_all=employee X ssn=essn Works_on X p.no=p number Project

Emp_ok=(D.no= 4 and p name=’Product X’ and hours>12 (EMP_ALL))

ANSWER=  f.name,minit,L name (Emp_ok)


b) Selects first

emp_dept_4= d.no=4(employee)

proj_Prod_X= d.no=’ProductX’ (project)

Indexing in Databases | Set 1


Indexing is a way to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed. It is a data structure technique which is used to quickly locate
and access the data in a database.
Indexes are created using a few database columns.
 The first column is the Search key that contains a copy of the primary
key or candidate key of the table. These values are stored in sorted
order so that the corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order.
 The second column is the Data Reference or Pointer which contains
a set of pointers holding the address of the disk block where that
particular key value can be found.

The indexing has various attributes:


Access Types: This refers to the type of access such as value based
search, range access, etc.
Access Time: It refers to the time needed to find particular data element
or set of elements.
Insertion Time: It refers to the time taken to find the appropriate space
and insert a new data.
Deletion Time: Time taken to find an item and delete it as well as update
the index structure.
Space Overhead: It refers to the additional space required by the index.

Functional Dependency and Attribute Closure


Functional Dependency

A functional dependency A->B in a relation holds if two tuples having


same value of attribute A also have same value for attribute B. For
Example, in relation STUDENT shown in table 1, Functional
Dependencies

STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold


but

STUD_NAME->STUD_STATE do not hold

How to find functional dependencies for a relation?


Functional Dependencies in a relation are dependent on the domain of
the relation. Consider the STUDENT relation given in Table 1.

We know that STUD_NO is unique for each student. So STUD_NO-


>STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO-
>STUD_STATE, STUD_NO->STUD_COUNTRY and STUD_NO ->
STUD_AGE all will be true.
Similarly, STUD_STATE->STUD_COUNTRY will be true as if two
records have same STUD_STATE, they will have same
STUD_COUNTRY as well.
For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME
will be true as two records with same COURSE_NO will have same
COURSE_NAME.
Functional Dependency Set: Functional Dependency set or FD set of a
relation is the set of all FDs present in the relation. For Example, FD set
for relation STUDENT shown in table 1 is:

{STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE,
STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY}
Attribute Closure: Attribute closure of an attribute set can be defined as
set of attributes which can be functionally determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:

 Add elements of attribute set to the result set.

 Recursively add elements to the result set which can be functionally


determined from the elements of the result set.
Using FD set of table 1, attribute closure can be determined as:

(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE,


STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
 If attribute closure of an attribute set contains all attributes of relation,
the attribute set will be super key of the relation.
 If no subset of this attribute set can functionally determine all
attributes of the relation, the set will be candidate key as well. For
Example, using FD set of table 1,
(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key
because its subset (STUD_NO) + is equal to all attributes of the relation.
So, STUD_NO will be a candidate key.

Introduction of Database Normalization

Database normalization is the process of organizing the attributes of the


database to reduce or eliminate data redundancy (having the same data but
at different places).
Problems because of data redundancy: Data redundancy unnecessarily
increases the size of the database as the same data is repeated in many places.
Inconsistency problems also arise during insert, delete and update operations.
Functional Dependency: Functional Dependency is a constraint between two
sets of attributes in relation to a database. A functional dependency is denoted
by an arrow (→). If an attribute A functionally determines B, then it is written as
A → B.
For example, employee_id → name means employee_id functionally determines
the name of the employee. As another example in a timetable database,
{student_id, time} → {lecture_room}, student ID and time determine the lecture
room where the student should be.

What does functionally dependent mean?


A function dependency A → B means for all instances of a particular value of A,
there is the same value of B. For example in the below table A → B is true, but
B → A is not true as there are different values of A for B = 3.
A B
------
1 3
2 3
4 0
1 3
4 0

Trivial Functional Dependency


X → Y is trivial only when Y is a subset of X.
Examples
ABC → AB
ABC → A
ABC → ABC

Non Trivial Functional Dependencies


X → Y is a non-trivial functional dependency when Y is not a subset of
X.
X → Y is called completely non-trivial when X intersect Y is NULL.

Example:
Id → Name,
Name → DOB

Semi Non Trivial Functional Dependencies


X → Y is called semi non-trivial when X intersect Y is not NULL.

Examples:
AB → BC,
AD → DC

The features of database normalization are as follows:


Elimination of Data Redundancy: One of the main features of normalization is
to eliminate the data redundancy that can occur in a database. Data redundancy
refers to the repetition of data in different parts of the database. Normalization
helps in reducing or eliminating this redundancy, which can improve the
efficiency and consistency of the database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in
the database is consistent and accurate. By eliminating redundancy,
normalization helps in preventing inconsistencies and contradictions that can
arise due to different versions of the same data.
Simplification of Data Management: Normalization simplifies the process of
managing data in a database. By breaking down a complex data structure into
simpler tables, normalization makes it easier to manage the data, update it, and
retrieve it.
Improved Database Design: Normalization helps in improving the overall
design of the database. By organizing the data in a structured and systematic
way, normalization makes it easier to design and maintain the database. It also
makes the database more flexible and adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update
anomalies, which can occur when updating a single record in a table affects
multiple records in other tables. Normalization ensures that each table contains
only one type of data and that the relationships between the tables are clearly
defined, which helps in avoiding such anomalies.
Standardization: Normalization helps in standardizing the data in the database.
By organizing the data into tables and defining relationships between them,
normalization helps in ensuring that the data is stored in a consistent and
uniform manner.
Normalization is an important process in database design that helps in improving
the efficiency, consistency, and accuracy of the database. It makes it easier to
manage and maintain the data and ensures that the database is adaptable to
changing business needs.

You might also like