Professional Documents
Culture Documents
Data Warehousing and Management (Compilation) Edited
Data Warehousing and Management (Compilation) Edited
CHAPTER 1:
DEVELOPMENT PROCESS
1|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Definitions
Metadata- data that describes the properties and context of user data.
2|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Program- Data Dependence- All programs maintain metadata for each file
they use.
formats.
Each application program needs to include code for the metadata of each
file. .
Each application program must have its own processing routines for
3|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
convenient form.
hardware resources.
4|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Program-data independence
Planned data redundancy
Improved data consistency
Improved data sharing
Program-data independence
Planned data redundancy
Improved data consistency
Improved data sharing
Program- data independence Enforcement of standard
5|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
6|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
need for these specialized skills. Installing such a system may also require
organization.
software that has a high initial cost. It requires trained personnel to install and
Conversion costs
organization that are based on file processing. The cost of converting these
organization.
raises the need to have backup copies of data for restoring a database when
Organizational conflict
7|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Software
This is the set of programs used to control and manage the overall
database. This includes the DBMS software itself, the Operating System,
the network software being used to share the data among users, and the
Hardware
Data
DBMS exists to collect, store, process and access data, the most
Procedures
These are the instructions and rules that assist on how to use the DBMS,
8|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Database_Access_Language
This is used to access the data to and from the database, to enter new
data, update existing data, or retrieve required data from databases. The
language, submits these to the DBMS, which then processes the data and
Query_Processor
This transforms the user queries into a series of low level instructions.
This reads the online user’s query and translates it into an efficient series
of operations in a form capable of being sent to the run time data manager
for execution.
Data_Manager
Also called the cache manger, this is responsible for handling of data in
Database_Engine
The core service for storing, processing, and securing data, this provides
Data Dictionary
9|Page
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the database itself. A data dictionary is a set of read-only table and views,
containing the different information about the data used in the enterprise
Report Writer
specified format. Most report writers allow the user to select records that
relation, which contains one or more data category columns. Each table
10 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
table.
table.
Many to One: More than one table record relates to another table
record.
Many to Many: More than one table record relates to more than one
where select is used for data retrieval, project identifies data attributes,
including:
11 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
child' nodes. They require the user to pass a hierarchy in order to access
specific uses.
supports many to many relationships, as child tables can have more than
one parent.
12 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
for development.
The
SDLC
is a
13 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
hardware and software. Project and program managers typically take part
PLANNING—ENTERPRISE MODELING
analyze the nature of the business area that is the subject of the
support the proposed new project. Only selected projects move into
the next phase based on the projected value of each project to the
organization.
14 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
should come to meeting the needs of the organization, and the less
data that must be managed for this information system. Every data
the Analysis phase that the conceptual data model is checked for
15 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
on.
IMPLEMENTATION—DATABASE IMPLEMENTATION
16 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
documentation, train users, and put procedures into place for the
The last step is to load data from existing information sources (files
files) and then loading these data into the new database. Finally,
the database and its associated applications are put into production
MAINTENANCE—DATABASE MAINTENANCE
because it lasts throughout the life of the database and its associated
Prototyping
prototype rapidly, (3) modify the prototype, and (4) stress the user
interface.
18 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
it: (1) the potential for changing the system early in its development,
the prototype.
implementation.
Agile modeling
19 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
agile approach are (1) short releases, (2) 40-hour workweek, (3)
appropriate test cases, writing the code, running the test cases,
Data Administrators
The database and the DBMS are corporate resources that must be
responsible for defining data elements, data names and their relationship
organization. They are also responsible for maintaining data security and
DBA Responsibilities
20 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Performance Tuning
database performance.
21 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data loss.
Documentation
database performance.
Security
Database Designers
the data (that is, the entities and attributes), the relationships
22 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Application Developers
means of DML queries. These DML queries are written in the application
End Users
The end-users are the ‘clients’ for the database, which has been
information needs.
23 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Naive Users: These are the users who use the existing application
Internal Level/Schema
the entire database. Like the actual storage of the data on the disk
The internal view tells us what data is stored in the database and
how
Conceptual Schema/Level
24 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
This logical level comes between the user level and physical
database.
relationships
External Schema/Level
specific user is interested in. It hides the unrelated details of the database
from the user. There may be “n” number of external views for each
some specific particular user. For example, a user from the sales
which is needed for a certain user group and hides the remaining
Every user should be able to access the same data but able to see
The user need not to deal directly with physical database storage
detail.
26 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
developers
In case of the failure of the one-tier no data loss as you are always
27 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 2:
ORGANIZATION
28 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Almadin, Catherine M.
Laxamana, Marlon F.
DATA MODELING
system, using text and symbols to represent the way data will flow. The diagram
can be used to ensure efficient use of data as a blueprint for the construction of
Data modeling is an important skill for data scientists and others involved with
data analysis. Traditionally, data models were built during the analysis and
design phases of a project to ensure that the requirements for a new application
are understood. A data model can become the basis for building a more detailed
data schema.
29 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
points and structures. The goal is to illustrate the types of data used and stored
within the system, the relationships among these data types, the ways the data
Data models are built around business needs. Rules and requirements are
existing one.
users. These business rules are then translated into data structures to formulate
Ideally, data models are living documents that evolve along with changing
and planning IT architecture and strategy. Data models can be shared with
30 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Ensures that all data objects required by the database are accurately
A data model helps design the database at the conceptual, physical and
logical levels.
Data Model structure helps to define the relational tables, primary and
It provides a clear picture of the base data and can be used by database
Though the initial creation of data model is labor and time consuming, in
software. It helps developers understand the domain and organize their work
accordingly.
Higher Quality
data model helps define the problem, enabling you to consider different
31 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Reduced cost
You can build applications at lower cost via data models. Data modeling typically
consumes less than 5-10 percent of a project budget, and can reduce the 65-75
catches errors and oversights early, when they are easy to fix. This is better than
fixing errors once the software has been written or – worse yet – is in customer
hands.
Clearer scope
tangible to help business sponsors and developers agree over precisely what is
included with the software and what is omitted. Business users can see what the
developers are building and compare it with their understanding. Models promote
A data model also promotes agreement on vocabulary and jargon. The model
highlights the chosen terms so that they can be driven forward into software
Faster performance
runs fast, often quicker than expected. To achieve optimal performance, the
concepts in a data model must be crisp and coherent. Then the proper rules
32 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
etc.) – but, rather, that the database is being used improperly. Once that problem
Better documentation
Models document important concepts and jargon, proving a basis for long-term
maintenance. The documentation will serve you well through staff turnover.
Today, most application vendors can provide a data model of their application
upon request. That is because the IT industry recognizes that models are
understandable manner.
Developers can still make detailed errors as they write application code, but they
are less likely to make deep errors that are difficult to resolve.
Data errors are worse than application errors. It is one thing to have an
large database.
33 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A data model not only improves the conceptual quality of an application, it also
lets you leverage database features that improve data quality. Developers can
weave constraints into the fabric of a model and the resulting database. For
example, every table should normally have a primary key. The database can
enforce other unique combinations of fields. Referential integrity can ensure that
Managed risk
You can use a data model to estimate the complexity of software, and gain
insight into the level of development effort and project risk. You should consider
equates a data model to a mathematical graph. He uses the graph as a basis for
development failure.
data mining. You can take day-to-day business data and load it into a dedicated
specifically for the purpose of data analysis, leveraging that data from routine
operations.
34 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The better data modeling you have, the more business benefits you receive on
carefully consider the discovered data types to avoid the over-modeling issues
BUSINESS RULE
Business rules are usually expressed at the atomic level -- that is, they cannot be
broken down any further. It imposes some form of constraint on a specific aspect
of the database, such as the elements within a field specification for a particular
the way the organization perceives and uses its data, which you determine from
Business rules, the foundation of data models, are derived from policies,
procedures, events, functions, and other business objects, and they state
35 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Business rules are important in data modeling because they govern how data are
handled and stored. Examples of basic business rules are data names and
definitions.
databases. Most organizations have a host of rules and/or policies that fall
outside this definition. For example, the rule “Friday is business casual dress
databases. In contrast, the rule “A student may register for a section of a course
only if he or she has successfully completed the prerequisites for that course” is
within our scope because it constrains the transactions that may be processed
student who does not have the necessary prerequisites to be rejected. Some
are stated in natural language, and some can be represented in the relational
data model.
Business rules can be applied to computing systems and are designed to help an
organization achieve its goals. Software is used to automate business rules using
business logic.
example, a business can come up with business rules that are self-imposed to
36 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
standards. Experts also point out that while there is a system of strategic
processes governing business rules, the business rules themselves are not
ER model defines entity sets, not individual entities, but entity sets described in
data for an organization or for a business area. The E-R model is expressed in
among those entities, and the attributes (or properties) of both the entities and
37 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
model.
blueprint of a database which has mainly two components i.e., relationship set,
and entity set. The ER diagram is used to represent the relationship exists
among the entity set. The entity set is considered as a group of entities of similar
system the entity is considered as a table and attributes are columns of a table.
So, the ER diagram shows the relationship among tables in the database. The
The entities have attributes that help to uniquely identify the entity. The entity set
The entity diagram is used to represent the database in the diagram form. It
helps to properly understand the database. All the necessary details of the
represent all the tables of the database, attributes are the columns of tables and
38 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
consist of real-world entities and the related associations exist between them.
The ER model gives the complete idea of a database used for any application
and it is very easy to understand. The below section contains information about
1. Entity
entity has a noun name. Some examples of each of these kinds of entities follow:
39 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Account, Course, Work Center. All type of entities has some attributes or the
properties which will help to give the proper idea of the entity. The entity set can
can be some entities exist which can contain similar type of values. For example,
the employee set will contain information from all employees. The entity set does
An entity is an object or event in our environment that we want to keep track of. A
finished product ready for sale, and a sales meeting (an event). An attribute is a
Weak entity: The weak entity is considered an entity that can’t be easily
chosen by its attribute and which required some relationship with some
diagram, the double rectangle is used for representing a weak entity. For
entity as the bank account cannot be identified which bank the bank
(Some data modeling software, in fact, use the term dependent entity.) A
weak entity type has no business meaning in an E-R diagram without the
entity on which it depends. The entity type on which the weak entity type
40 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
depends is called the identifying owner (or simply owner for short). A weak
entity type does not typically have its own identifier. Generally, on an E-R
identifier.
other entity types. (Some data modeling software, in fact, use the term
capital letters for names of entity type(s). In an E-R diagram, the entity name is
It is the fundamental building block for describing the structure of data with the
Entity Data Model. In a conceptual model, entity types are constructed from
41 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
and orders in a business application. In the same way that a class definition in a
that entity type may be represented by data stored in the database. For example,
there is one EMPLOYEE entity type in most organizations, but there may be
hundreds (or even thousands) of instances of this entity type stored in the
database. We often use the single term entity rather than entity instance when
the entity type, but Cell_1 , Cell_2 , and Cell_3 would represent the actual items
In simple words:
characteristics
2. Attributes
The entities are represented using some properties and these properties are
known as attributes. All the attributes have some value. For example- the
employee entity can have the following attributes – employee name, employee
age, employee contact details. For the attributes, there can be considered as a
domain of values that can be allocated to the attribute. For example, the
Attributes are facts or description of entities. They are also often nouns and
become the columns of the table. For example, for entity student, the attributes
can be first name, last name, email, address, and phone numbers.
Types of Attribute
that cannot be broken down into smaller components that are meaningful
43 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
for the organization. For example, all the attributes associated with
attribute in the group. For example, the employee’s name attribute can be
3. Derived attribute: The derived attribute is the type of attribute which does
not exist in the database physically, however, the values derived are from
the other database which is present in the database physically. For eg; the
stored in the database. The value can e derived from other attributes
44 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
(plus possibly data not in the database, such as today’s date, the current
name, as shown in Figure 2-8 for the Years Employed attribute. Some E-R
4. Single value attribute: The single attribute contains a single value. For
contains more than value. For example, the employee can have more than
may take on more than one value for a given entity (or relationship)
around the attribute name, as shown for the Skill attribute in the
you can edit that attribute (column), select the Collection tab and choose
one of the options. (Typically, Multiset will be your choice, but one of the
other options may be more appropriate for a given situation.) Other E-R
diagramming tools may use an asterisk (*) after the attribute name, or you
45 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
attribute.
Primary Key
identifies an instance of the entity. For example, for a student entity, student
number is the primary key since no two students have the same student number.
We can have only one primary key in a table. It identifies uniquely every row and
it cannot be null.
Foreign key
A foreign key+ (sometimes called a referencing key) is a key used to link two
tables together. Typically, you take the primary key field from one table and insert
it into
the
other
table
where it
becomes a foreign key (it remains a primary key in the original table). We can
46 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
How many entities are there in this diagram and what are they?
student_address
Not a trick question! There is only one primary key, but it is made up of two
STUDENT and COURSE contain no foreign keys in this diagram. This might
suggest that there are problems with the design... among them is the many to
many relationships here. This usually requires that we create a separate table to
describe the relationship. This type of table usually connects foreign ids to each
other.
In this
case, let's
add an
47 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
students probably sit in different seats for each course they are registered in, lets
3. Relationship
to show the dependency among the entities of the database. In the ER diagram,
which exist between the entities is connected by a line which shows in the ER
diagram.
One-to-one: In this relationship, the one entity is related to some other entity is a
48 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
in one college.
One-to-many: When one entity is linked to more than one entity is a one-to-
multiple orders.
DEGREE OF A RELATIONSHIP
49 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The degree of a relationship is the number of entity types that participate in that
because there are two entity types: EMPLOYEE and COURSE. The three most
common relationship
Higher-degree
relationships are
rarely encountered in
practice, so we restrict
of unary, binary, and ternary relationships appear in Figure 2-12. (Attributes are
not shown in some figures for simplicity.) As you look at Figure 2-12, understand
develop an E-R model that you understand the business rules of the particular
50 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
UNARY RELATIONSHIP
examples are shown in Figure 2-12a. In the first example, Is Married To is shown
Because this is a one-to-one relationship, this notation indicates that only the
current marriage, if one exists, needs to be kept about a person. What would
change if we needed to retain the history of marriages for each person? See
Review Question 2-20 and Problem and Exercise 2-34 for other business rules
the EMPLOYEE entity type. Using this relationship, we could identify, for
example, the employees who report to a particular manager. The third example is
list. In this example, sports teams are related by their standing in their league
entity instance can repeat in the same relationship instance; we will introduce
which in turn are composed of subassemblies and parts, and so on. As shown in
51 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
relationship. In this figure, the entity type ITEM is used to represent all types of
components, and we use Has Components for the name of the relationship type
Each of these diagrams shows the immediate components of each item as well
as the quantities of that component. For example, item TX100 consists of item
BR450 (quantity 2) and item DX500 (quantity 1). You can easily verify that the
associations are in fact many-to-many. Several of the items have more than one
component type (e.g., item MX300 has three immediate component types:
HX100, TX100, and WX240). Also, some of the components are used in several
higher-level assemblies. For example, item WX240 is used in both item MX300
and item WX340, even at different levels of the billof-materials. The many-to-
many relationship guarantees that, for example, the same subassembly structure
of WX240 (not shown) is used each time item WX240 goes into making some
other item.
52 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The presence of the attribute Quantity on the relationship suggests that the
entity. Figure 2-13c shows the entity type BOM STRUCTURE, which forms an
(named Effective
added to BOM
STRUCTURE to
when this
component was
related assembly.
a history of values
is required. Other
data model structures can be used for unary relationships involving such
hierarchies;
BINARY RELATIONSHIP
53 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Figure 2-12b shows three examples. The first (one-to-one) indicates that an
employee is assigned one parking place, and that each parking place is assigned
to one employee. The second (one-to-many) indicates that a product line may
contain several products, and that each product belongs to only one product line.
The third (many-to-many) shows that a student may register for more than one
course, and that each course may have many student registrants.
CONCEPTS IN ACTION
2-A THE WALT DISNEY COMPANY
The Walt Disney Company is world-famous for its many entertainment ventures
but it is especially identified with its theme parks. First there was Disneyland in
Los Angeles, then the mammoth Walt Disney World in Orlando. These were
followed by parks in Paris and Tokyo, and one now under development in Hong
Kong. The Disney theme parks are so well run that they create a wonderful
feeling of natural harmony with everyone and everything being in the right place
at the right time. When you're there, it's too much fun to stop to think about how
all this is organized and carried off with such precision. But, is it any wonder to
One of the Disney theme parks' interesting database applications keeps track of
all of the costumes worn by the workers or “cast members” in the parks. The
system is called the Garment Utilization System or GUS (which was also the
name of one of the mice that helped Cinderella sew her dress!). Managing these
costumes is no small task. Virtually all of the cast members, from the actors and
54 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
dancers to the ride operators, wear some kind of costume. Disneyland in Los
several garments), each of which is uniquely bar-coded, for its 46,000 cast
members. The numbers in Orlando are three million garments and 90,000 cast
members. Using bar-code scanning, GUS tracks the life cycle of every garment.
This includes the points in time when a garment is in the storage facility, is
checked out to a cast member, is in the laundry, or is being repaired (in house or
costumes, the system also provides a rich data analysis capability. The industrial
decide how many garments to keep in stock and how many people to have
staffing the garment checkout windows based on the expected wait times. They
also use the data to determine whether certain fabrics or the garments made by
uses or of launderings.
GUS, which was inaugurated at Disneyland in Los Angeles in 1998 and then
Microsoft's SQL Server DBMS and runs on a Compaq server. It is also linked to
an SAP personnel database to help maintain the status of the cast members. If
GUS is ever down, the process shifts to a Palm Pilot-based backup system that
can later update the database. In order to keep track of the costume parts and
cast members, not surprisingly, there is a relational table for costume parts with
55 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
one record for each garment and there is a table for cast members with one
record for each cast member. The costume parts records include the type of
garment, its size, color, and even such details as whether its use is restricted to a
Correspondingly, the cast member records include the person's clothing sizes
fundamental managerial value. The Walt Disney Company feels that consistency
in how its visitors or “guests” look at a given ride gives them an important comfort
addition, GUS takes the worry out of an important part of each cast member's
workday. One of Disney's creeds is to strive to take good care of its cast
members so that they will take good care of Disney's guests. Database
Cardinality
56 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
relationship, which means that a single occurrence of one entity type can be
associated with a single occurrence of the other entity type and vice versa. A
this case they are all private offices!) has just one salesperson assigned to it.
Note the “bar” or “one” symbol on either end of the relationship in the diagram
indicating the maximum one cardinality. The way to read these diagrams is to
start at one entity, read the relationship on the connecting line, pick up the
cardinality on the other side of the line near the second entity, and then finally
reach the other entity. Thus, Figure 2.3a, reading from left to right, says, “A
salesperson works in one (really at most one, since it is a maximum) office.” The
bar or one symbol involved in this statement is the one just to the left of the office
entity box. Conversely, reading from right to left, “An office is occupied by one
salesperson.”
FIGURE 2.3 Binary
relationships with
cardinalities
One-to-Many Binary
57 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The “crowy's foot” device attached to the customer entity box represents the
multiple association. Reading from left to right, the diagram indicates that a
3, …n. It also means that the number is not restricted to being exactly one, which
would require the “one” or “bar” symbol instead of the crow's foot.) Reading from
right to left, Figure 2.3b says that a customer buys from only one salesperson.
exclusive territory and thus each customer can be sold to by only one
By the way, in some circumstances, in either the 1-M or M-M case, “many” can
company rule may set a limit of a maximum of ten customers in a sales territory.
Then the “many” in the 1-M relationship of Figure 2.3b can never be more than
10 (a salesperson can have many customers but not more than 10). Sometimes
people include this exact number or maximum next to or even instead of the
Modality
58 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
salesperson. This situation is recorded in Figure 2.4a, where the “inner” symbol,
“outer” symbol, which can be a one or a crow's foot, represents the cardinality—
works in a minimum of one and a maximum of one office, which is another way of
paying her for?!?) Actually, this allows for the case in which we have just hired a
new salesperson and have not as yet assigned her a territory or any customers.
anything from us when they need to? We never want to be in a position of losing
salesperson is authorized to sell at least one or many of our products and each
59 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
product can be sold by at least one or many of our salespersons. This includes
the extreme, but not surprising, case in which each salesperson is authorized to
sell all the products and each product can be sold by all the salespersons.
FIGURE2.4 Binary
relationships with
cardinalities
(maximums) and
modalities (minimums)
Many Relationships
a year of hire. At the entity occurrence level, for example, one of the
these attributes are written or drawn together with the entity, as in Figure 2.1 and
the succeeding figures. This certainly appears to be very natural and obvious.
Are there ever any circumstances in which an attribute can describe something
60 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
number, name, commission percentage, and year of hire. Products are described
by their product number, name, and unit price. But, what if there is a requirement
to keep track of the number of units (call it “quantity”) of a particular product that
a particular salesperson has sold? Can we add the quantity attribute to the
product entity box? No, because for a particular product, while there is a single
product number, product name, and unit price, there would be lots of “quantities,”
one for each salesperson selling the product. Can we add the quantity attribute to
the salesperson entity box? No, because for a particular salesperson, while there
year of hire, there will be lots of “quantities,” one for each product that the
salesperson sells. It makes no sense to try to put the quantity attribute in either
the salesperson entity box or the product entity box. While each salesperson has
each salesperson has many “quantities,” one for each product he sells. Similarly,
while each product has a single product number, product name, and unit price,
each product has many “quantities,” one for each salesperson who sells that
product. But an entity box in an E-R diagram is designed to list the attributes that
simply and directly describe the entity, with no complications involving other
entities. Putting quantity in either the salesperson entity box or the product entity
61 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The quantity attribute doesn't describe either the salesperson alone or the
particular occurrence of one entity type and a particular occurrence of the other
entity type. Let's say that since salesperson number 137 joined the company, she
has sold 170 units of product number 24 013. The quantity 170 doesn't make
has sold many different kinds of products. To which one does the quantity 170
characteristic of product number 24 013 alone. It has been sold by many different
salespersons.
In fact, the quantity 170 falls at the intersection of salesperson number 137 and
between that particular salesperson and that particular product and it is known
separate box attached to the relationship line. That is the natural place to draw it.
Pictorially, it looks as if it is at the intersection between the two entities, but there
the two entities. We know that an occurrence of the Sells relationship specifies
that salesperson 137 has sold some of product 24013. The quantity 170 is an
of the relationship. Not only do we know that salesperson 137 sold some of
62 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
product 24013 but we know how many units of that product that salesperson
sold.
FIGURE 2.5 Many-to-many
intersection data
The Unique Identifier in
Many-to-Many
Relationships Since, as we
complete with attributes, it also follows that it should have a unique identifier, like
other entities. (If this seems a little strange or even unnecessary here, it will
become essential later in the book when we actually design databases based on
these E-R diagrams.) In its most basic form, the unique identifier of the many-to-
identifiers of the two entities in the many-to-many relationship. So, the unique
2.6, of the associative entity, is the combination of the Salesperson Number and
63 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the company. Thus, there can be only one occurrence of SALES combining a
particular salesperson with a particular product. But if, for example, we wanted to
keep track of the sales on an annual basis, we would have to include a year
Number, and Year. Clearly, if we want to know how many units of each product
Number and Product Number would not be unique because for a particular
salesperson and a particular product, the combination of those two values would
be the same each year! Year must be added to produce uniqueness, not to
mention to make it clear in which year a particular value of the Quantity attribute
The third and last possibility occurs when the nature of the associative entity is
such that it has its own unique identifier. For example, a company might specify a
unique serial number for each sales record. Another example would be the
many-to-many relationship between motorists and police officers who give traffic
tickets for moving violations. (Hopefully it's not too many for each motorist!) The
unique identifier could be the combination of police officer number and motorist
driver's license number plus perhaps date and time. But, typically, each traffic
ticket has a unique serial number and this would serve as the unique identifier.
TERNARY RELATIONSHIP
three entity types. A typical business situation that leads to a ternary relationship
is shown in Figure 2-12c. In this example, vendors can supply various parts to
64 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
warehouses. The relationship Supplies is used to record the specific parts that
are supplied by a given vendor to a particular warehouse. Thus, there are three
entity types: VENDOR, PART, and WAREHOUSE. There are two attributes on
the relationship Supplies: Shipping Mode and Unit Cost. For example, one
instance of Supplies might record the fact that vendor X can ship part C to
warehouse Y, that the shipping mode is next-day air, and that the cost is $5 per
unit.
in Figure 2-12c. Unit Cost cannot be properly associated with any one of the
three possible binary relationships among the three entity types, such as that
Thus, for example, if we were told that vendor X can ship part C for a unit cost of
$8, those data would be incomplete because they would not indicate to which
the (associative) entity type SUPPLY SCHEDULE is used to replace the Supplies
relationship from Figure 2-12c. Clearly, the entity type SUPPLY SCHEDULE is of
independent interest to users. However, notice that an identifier has not yet been
65 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
composite identifier whose components will consist of the identifier for each of
the participating entity types (in this example, PART, VENDOR, and
WAREHOUSE). Can you think of other attributes that might be associated with
SUPPLY SCHEDULE?
As noted earlier, we do not label the lines from SUPPLY SCHEDULE to the three
keep the same meaning as the ternary relationship of Figure 2-12c, we cannot
already mentioned. So, here is a guideline to follow: Convert all ternary (or
66 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
relationship, given the notation with attributes on the relationship line. However,
represented. Also, many E-R diagram drawing tools, including most CASE tools,
you must use these tools to represent the ternary or higher order relationship
with an associative entity and three binary relationships, which have a mandatory
entity. A composite entity has only one function: to provide an indirect link
67 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
primary key.
The following graphic illustrates a composite entity that now indirectly links the
The M:N relationship between STUDENT and CLASS has been dissolved into
this way: for one instance of STUDENT, there exists zero, one, or many
this way: For one instance of CLASS, there exists zero, one, or many
68 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
attributes from one or both entities it links, because those attributes would be
CLASS should be removed to the composite entity. The designer makes this
participation in relationships.
Many-to-Many Relationships
a year of hire. At the entity occurrence level, for example, one of the
these attributes are written or drawn together with the entity, as in Figure 2.1 and
the succeeding figures. This certainly appears to be very natural and obvious.
Are there ever any circumstances in which an attribute can describe something
69 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
number, name, commission percentage, and year of hire. Products are described
by their product number, name, and unit price. But, what if there is a requirement
to keep track of the number of units (call it “quantity”) of a particular product that
a particular salesperson has sold? Can we add the quantity attribute to the
product entity box? No, because for a particular product, while there is a single
product number, product name, and unit price, there would be lots of “quantities,”
one for each salesperson selling the product. Can we add the quantity attribute to
the salesperson entity box? No, because for a particular salesperson, while there
year of hire, there will be lots of “quantities,” one for each product that the
salesperson sells. It makes no sense to try to put the quantity attribute in either
the salesperson entity box or the product entity box. While each salesperson has
each salesperson has many “quantities,” one for each product he sells. Similarly,
while each product has a single product number, product name, and unit price,
each product has many “quantities,” one for each salesperson who sells that
product. But an entity box in an E-R diagram is designed to list the attributes that
simply and directly describe the entity, with no complications involving other
entities. Putting quantity in either the salesperson entity box or the product entity
The quantity attribute doesn't describe either the salesperson alone or the
70 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
particular occurrence of one entity type and a particular occurrence of the other
entity type. Let's say that since salesperson number 137 joined the company, she
has sold 170 units of product number 24 013. The quantity 170 doesn't make
has sold many different kinds of products. To which one does the quantity 170
characteristic of product number 24 013 alone. It has been sold by many different
salespersons.
In fact, the quantity 170 falls at the intersection of salesperson number 137 and
between that particular salesperson and that particular product and it is known
separate box attached to the relationship line. That is the natural place to draw it.
Pictorially, it looks as if it is at the intersection between the two entities, but there
the two entities. We know that an occurrence of the Sells relationship specifies
that salesperson 137 has sold some of product 24013. The quantity 170 is an
of the relationship. Not only do we know that salesperson 137 sold some of
product 24013 but we know how many units of that product that salesperson
sold.
71 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
FIGURE 2.5 Many-to-many
intersection data
CHAPTER 3:
72 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Definitions
73 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
types).
relationship) with one or more subtypes and it contains attributes that are
74 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
reused as
The basic E-R model described in the previous chapter was first introduced
during the mid-1970s. It has been suitable for modeling most common business
problems and has enjoyed widespread use. However, the business environment
has changed dramatically since that time. Business relationships are more
complex, and as a result, business data are much more complex as well. For
75 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
have continued to enhance the E-R model so that it can more accurately
term enhanced entity-relationship (EER) model is used to identify the model that
has resulted from extending the original E-R model with these new modeling
At times, few entities in a data model may share some common properties
(attributes) within themselves apart from having one or more distinct attributes.
Based on the attributes, these entities are categorized as Supertype and Subtype
entities.
Supertype is an entity type that has got relationship (parent to child relationship)
with one or more subtypes and it contains attributes that are common to its
subtypes.
Subtypes are subgroups of the supertype entity and have unique attributes, but
76 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Supertypes and Subtypes are parent and child entities respectively and the
When designing a data model for PEOPLE, you can have a supertype entity
of PEOPLE and its subtype entities can be vendor, customer, and employee.
People entity will have attributes like Name, Address, and Telephone number,
which are common to its subtypes and you can design entities employee, vendor,
and consumer with their own unique attributes. Based on this scenario, employee
entity can be further classified under different subtype entities like HR employee,
IT employee etc. Here employee will be the superset for the entities HR
Employee and IT employee, but again it is a subtype for the PEOPLE entity.
are some of the important attributes for each of these types of employees:
Hired,Hourly Rate
Notice that all of the employee types have several attributes in common:
Employee Number, Employee Name, Address, and Date Hired. In addition, each
77 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
type has one or more attributes distinct from the attributes of other types (e.g.,
this approach has the disadvantage that EMPLOYEE would have to contain all of
the attributes for the three types of employees. For an instance of an hourly
employee (for example), attributes such as Annual Salary and Contract Number
would not apply (optional attributes) and would be null or not used. When taken
2. Define a separate entity type for each of the three entities. This approach
would fail to exploit the common properties of employees, and users would have
to be careful to select the correct entity type when using the system.
type.
78 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
enhanced E-R notation. Attributes shared by all employees are associated with
the EMPLOYEE entity type. Attributes that are peculiar to each subtype are
subtypes.
- For example, what will be the method of payment – cash, check or credit card?
model them.
Subdivide an Entity
This may be the case when a group of instances has special properties,
In this case, the entity is called a “supertype” and each group is called a
“subtype”.
Subtype Characteristics
79 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A subtype:
subtype to make sure that your subtypes are exhaustive — that you are handling
80 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
- But being able to subtype is not the issue—having a reason to subtype is the
issue.
- When a need exists within the business to show similarities and differences
When modeling supertypes and subtypes, you can use three questions to
3. Does each instance fit into one and only one subtype? (mutually
exclusive)
subtype relationships.
Generalization
functions are combined to form higher level function which is called as entities.
81 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
entity types have been defined: CAR, TRUCK, and MOTORCYCLE. At this
diagram. However, on closer examination, we see that the three entity types
(with components Make and Model), Price, and Engine Displacement. This fact
three entity types is really a version of a more general entity type.This more
Capacity and Cab Type. Thus, generalization has allowed us to group entity
subtype.
82 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Notice that the entity type MOTORCYCLE is not included in the relationship. Is
does not satisfy the conditions for a subtype discussed earlier. Comparing the
two figures you will notice that the only attributes of MOTORCYCLE are those
that are common to all vehicles; there are no attributes specific to motorcycles.
subtypes.
Specialization
Specialization things are broken down into smaller things to simplify it further. We
can also say that in Specialization a particular entity gets divided into sub entities
and it’s done on the basis of it’s characteristics. Also in Specialization Inheritance
takes place.
entity type named PART, together with several of its attributes. The identifier is
Part No, and other attributes are Description, Unit Price, Location, Qty On Hand,
because there may be more than one supplier with an associated unit price for a
83 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
part.)
In discussions with users, we discover that there are two possible sources for
manufactured internally,
suppliers. Further, we
Some of the attributes in Figure 3-5a apply to all parts, regardless of source.
manufactured parts, whereas Supplier ID and Unit Price apply only to purchased
parts. These factors suggest that PART should be specialized by defining the
The data modeler initially planned to associate Supplier ID and Unit Price with
modeler suggested instead that they create a SUPPLIER entity type and an
entity (named SUPPLIES in Figure 3-5b) allows users to more easily associate
purchased parts with their suppliers. Notice that the attribute Unit Price is now
84 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
associated with the associative entity so that the unit price for a part may vary
Figure 3-5a
Figure 3-5b
rules. The disjoint rule forces subclasses to have disjoint sets of entities. The
DISJOINT RULE
85 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
more) subtypes.)
OVERLAP RULE
an instance of a
a member of at least one subtype. The total specialization rule demands that
every entity in the superclass belong to some subclass. Just as with a regular
entities. The partial specialization rule allows an entity to not belong to any of the
subtype.
86 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
combination of supertype/subtype
called a supertype/subtype
hierarchy, or generalization
hierarchy.
all the attributes of a supertype are propagated down the hierarchy to entities of a
lower type. Generalization may occur when a generic entity, which we call the
SUBTYPE DISCRIMINATOR
subtype. The attribute's values are what determine the target subtype.
Disjoint subtypes - simple attributes that must have alternative values to indicate
87 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
subtypes. Each subpart has a Boolean value that indicates whether or not the
Specialization and
Disjoint
Employee: Hourly,
Salaried, Consultant
Discriminator
Manufactured? And
Purchased?
- Where to be stored?
88 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
ENTITY CLUSTER
and relationship.
89 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Manufacturing Cluster
90 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
b. Key strategic for long-term success, game changer for data modeling
g. Data model patterns code for programs(just a good start for success)
Easier to read
91 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 4:
Baccol, Jonalyn G.
Mequin, Mary Joyce M.
92 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
PROPERTIES OF RELATIONS
tables are relations. Relations have several properties that distinguish them from
2. An entry at the intersection of each row and column is atomic (or single
valued).
There can be only one value associated with each attribute on a specific row of a
4. Each attribute (or column) within a table has a unique name.
5. The sequence of columns (left to right) is insignificant. The order of the
columns in a relation can be changed without changing the meaning or use of the
the order of the rows of a relation may be changed or stored in any sequence.
The second property of relations listed in the preceding segment states that no
multivalued attributes are allowed in a relation. Thus, a table that contains one or
93 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
more multivalued attributes is not a relation. For example, Figure 1(a) shows the
employee data from the EMPLOYEE1 relation extended to include courses that
may have been taken by those employees. Because a given employee may have
taken more than one course, Course Title and Date Completed are multivalued
attributes. For example, the employee with EmpID 100 has taken two courses. If
an employee has not taken any courses, the Course Title and Date Completed
attribute values are null. (See the employee with EmpID 190 for an example.)
We show how to eliminate the multivalued attributes in Figure 1(b) by filling the
relevant data values into the previously vacant cells of Figure 1(a). As a result,
the table in
Figure 1(b) has only single-valued attributes and now satisfies the atomic
distinguish it from EMPLOYEE1. However, as you will see, this new relation does
94 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CANDIDATE KEYS
properties (Dutka and Hanson, 1989), which are a subset of the six properties
1. Unique identification: For every row, the value of the key must uniquely
identify that row. This property implies that each non key attribute is
Figure 2 Representing
functional
dependencies
95 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
We represent the functional dependencies for a relation using the notation shown
horizontal line in the figure portrays the functional dependencies. A vertical line
drops from the primary key (EmpID) and connects to this line. Vertical arrows
then point to each of the nonkey attributes that are functionally dependent on the
primary key.
For the relation EMPLOYEE2 (Figure 1(b)), notice that (unlike EMPLOYEE1)
EmpID does not uniquely identify a row in the relation. For example, there are
two rows in the table for EmpID number 100. There are two types of functional
The functional dependencies indicate that the combination of EmpID and Course
Title is the only candidate key (and therefore the primary key) for EMPLOYEE2.
EmpID nor Course Title uniquely identifies a row in this relation and therefore
Figure 1(b) to verify that the combination of EmpID and Course Title does
dependencies in this relation in Figure 2(b). Notice that Date Completed is the
96 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
only attribute that is functionally dependent on the full primary key consisting of
normalized relations have as their primary key the determinant for each of the
nonkeys, and within that relation there are no other functional dependencies.
dependency)
A normal form is a state of a relation that requires that certain rules regarding
describe these rules briefly in this section and illustrate them in detail in the
following sections:
97 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. First normal form. Any multivalued attributes (also called repeating groups)
have been removed, so there is a single value (possibly null) at the intersection
2. Second normal form. Any partial functional dependencies have been
removed
3. Third normal form. Any transitive dependencies have been removed (i.e.,
4. Boyce-Codd normal form. Any remaining anomalies that result from
functional dependencies have been removed (because there was more than one
5. Fourth normal form. Any multivalued dependencies have been removed.
6. Fifth normal form. Any remaining anomalies have been removed.
Up to the
Boyce-Codd
normal form,
normalization is
based on the
analysis of
functional
dependencies.
98 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. Synonyms
object
They are used mainly to make it easy for users to access database
Choose either of the two attribute names and eliminate the other synonym
For example:
2. Homonyms
Homonyms are those fields of data that have different values but have
similar names.
99 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The name of the attribute will be the same but the attribute refers to
different things.
For example: The example below is between student and customer in the
database. In the database for Students we can say that the F Name is for the
First Name of the Father, while in Customer the F Name can be the First name of
that customer. They have the same attributes but different meanings.
STUDENT CUSTOMER
Transitive dependencies
Even if relations are in 3rd Normal Form prior to merging, they may not be
after merging
functional dependency.
100 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
For example:
4. Supertype/subtype relationships
Is a generic entity type that has a relationship with one or more subtypes
If there are two or more different types of a relation but they contain some
For example:
Patient
relations (table)
An ER diagram shows the relationship among entity sets. An entity set is
a group of similar entities and these entities can have attributes. In terms of
relationship among tables and their attributes, ER diagram shows the complete
It helps you to identifies the entities which exist in a system and the
102 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Provide a preview of how all your tables should connect, what fields are
Logical design
management system.
A logical design is a conceptual, abstract design. You do not deal with the
physical implementation details yet; you deal only with defining the types
103 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Relations:
Properties
Well-Structured Relation
inconsistencies
A simple ER Diagram
104 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
college can have many students however a student cannot study in multiple
colleges at the same time. Student entities have attributes such as Stu_Id,
Stu_Name & Stu_Addr and College entities have attributes such as Col_ID &
Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We
Diagram) of this guide so don’t worry too much about these terms now, just go
Ellipses: Attributes
Lines: They link attributes to Entity Sets and Entity sets to Relationship
Set
105 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A. Entity
B. Attribute
C. Relationship
A. Entity
Set is a group of similar entities and these entities can have attributes
For example: In the following ER diagram we have two entities Student and
College and these two entities have many to one relationship as many students
study in a single college. We will read more about relationships later, for now
focus on entities.
106 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
For each strong entity set creates a new relational independent table that
A strong entity set with only simple attributes will require only one table in
Attributes of the table will be the attributes of the entity set. The primary
key of the table will be the key attribute of the entity set.
107 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A strong entity set with any number of composite attributes will require
108 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Convert every weak entity set into a table where we take the
discrimination attribute of the weak entity set and takes the primary key of
the strong entity set as a foreign key and then declared the combination of
(“discriminator”) and the relationship they have with another entity set
(“identifying relationship”)
109 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A weak entity is a type of entity which doesn't have its key attribute. It can
B. Attribute
110 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. Key attributes
of a relational database.
For example: Student roll numbers can uniquely identify a student from a set of
as other attributes
attribute is underlined.
2. Composite attribute
composite attribute.
There are values that are to be stored in an attribute that can be further
111 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
3. Multivalued attribute
For every multi-valued attribute, we will make a new table where we will
take the primary key of the main table as a foreign key and multi-valued
A strong entity set with any number of multi valued attributes will require
One table will contain all the simple attributes with the primary key.
Other table will contain the primary key and all the multi valued attributes.
multivalued.
4. Derived attribute
112 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
of birth).
B. Relationship
the relationship among entities. There are four types of cardinal relationships:
1. One-to-one relationships
113 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
For example 1
In a school database, each student has only one student ID, and each
In this example, the key field in each table, Student ID, is designed to
contain unique values. In the Students table, the Student ID field is the
primary key; in the Contact Info table, the Student ID field is a foreign key.
This relationship returns related records when the value in the Student ID
field in the Contact Info table is the same as the Student ID field in the
Students table.
114 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
For example 3: a person has only one passport and a passport is given to one
person.
2.
One-to-
many relationship
For example: each customer can have many sales orders. A customer can
3. Many to One
Relationship
instance of an entity is
115 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
one relationship.
For example: many students can study in a single college but a student cannot
4. Many-to-many relationship
When more than one instance of an entity is associated with more than
relationships.
purchased by many
customers.
students.
integrity constraints.
116 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Constraints
Here is used to limit the type of data that can go into a table.
tables
This ensures the accuracy and reliability of the data in the database.
level constraints are applied only to one column, whereas the table level
Integrity constraints
Example: A blood type group must be A, B, AB or O only cannot have any other
values.
1. Entity integrity
117 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Each table should have a primary key and each record must be unique
This makes sure that records in a table are not duplicated and remain
identified by their primary key. The unique value requirement prohibits a null
Neither the PK nor any part of it can contain null values. This is because
null values for the primary key mean we cannot identify some rows.
For example 1:
Example 2:
2. Referential integrity
Focuses on Foreign
keys.
Null
118 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Is the total absence of a value in a certain field and means that the
Null is not the same as a zero value for a numerical field or space
value
tables. It means the reference from a row in one table to another table must
be valid.
keys. Each foreign key must have a matching primary key so that reference
Example 1
Rule 1: You can’t delete from a primary table if matching records exist in a
related table.
Rule 2: You can’t change a primary key value in the primary table of that records
119 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Example 2
primary table.
Example 3:
Key Terms
120 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Alternate key: all candidate keys not chosen as the primary key candidate key:
a simple or composite key that is unique (no two rows in a table may have the
table
Dependent entities: these entities depend on other tables for their meaning
Derived attributes: attributes that contain values calculated from other attributes
Entity: a thing or object in the real world with an independent existence that can
represented by ER diagrams. These are well suited to data modeling for use with
databases.
Foreign key (FK): an attribute in a table that references the primary key in
Independent entity: as the building blocks of a database, these entities are what
121 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Null: a special symbol, independent of data type, which means either unknown
defined
122 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 5:
123 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The physical design of your database optimizes performance while ensuring data
physical design is a job that truly never ends. You need to continually monitor the
performance and data integrity as time passes. Many factors necessitate periodic
124 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Physical database design does not include implementing files and databases
(i.e., creating them and loading data into them). Physical database design
and others involved in information systems construction will use during the
implementation phase.
Purpose––translate the logical description of data into the of data into the
technical specifications for storing and retrieving data storing and retrieving data.
Goal––create a design for storing data that will provide adequate performance
Because physical design is related to how data are physically stored, we need to
consider a few underlying concepts about physical storage. One goal of physical
file organization, keeping in mind that the database software will communicate
these requirements:
125 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
these events)
Physical database design requires several critical decisions that will affect
include the following:
• Choosing the storage format (called data type) for each attribute
from the logical data model. The format and associated parameters are chosen
to maximize data
to group attributes from the logical data model into physical records. You will
discover that
126 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
although the columns of a relational table as specified in the logical design are a
natural definition for the contents of a physical record, this does not always form
the foundation for the most desirable grouping of attributes in the physical
design.
disks), using
can
be stored, retrieved, and updated rapidly. Consideration must also be given to
architecture) for storing and connecting files to make retrieving related data more
efficient.
will optimize performance and take advantage of the file organizations and
and the database management systems that handle those queries are tuned to
127 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Data volume and frequency-of-use statistics are important inputs to the physical
size and usage patterns of the database throughout its life cycle.
Estimates of database size are used to select physical storage devices and
storage costs estimation and estimates of usage paths or pattern are used to
select file organization and access methods. Plans for the use of indexes, and
Data volume and usage estimation is crucial for the proper administration of
databases. As you all know, we need a storage space to store and maintain our
database. In order to make the proper storage size decision for our database we
severe. Think about an e-tailer (web-based retailer). Let's assume that the e-
tailer's management chose a database storage space using the cost as the sole
criterion. Since the e-tailer wants to save bucks from the initial set-up costs, they
chose the smallest storage space available by the vendor. After a serious
advertising campaign using web and other media , they started their online
128 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
operations. Everything was going fine, until one day they found out that their web
site is crashed due to data overload and high level of usage frequency. Now the
upset customers, who are waiting for their orders (most probably the
a bill from the vendor in order to fix the issue (the bill of course includes
the additional storage space. Because, right now the company deems it
An easy way to show the statistics about data volumes and usage is by adding
notation to the EER diagram that represents the final set of normalized relations
129 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Figure 5-1 shows the EER diagram (without attributes) for a simple inventory
database for Pine Valley Furniture Company. This EER diagram represents the
normalized relations constructed during logical database design for the original
Both data volume and access frequencies are shown in Figure 5-1. For
example,
there are 3,000 PARTs in this database. The supertype PART has two subtypes,
subtypes, the percentages sum to more than 100 percent). The analysts at Pine
Valley estimate that there are typically 150 SUPPLIERs, and Pine Valley
130 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
total of 6,000 SUPPLIES. The dashed arrows represent access frequencies. So,
for example, across all applications that use this database, there are on average
20,000 accesses per hour of PART data, and these yield, based on subtype
this total of 20,000 accesses to PURCHASED PART, 8,000 accesses then also
require SUPPLIES data and of these 8,000 accesses to SUPPLIES, there are
applications, usage maps should show the accesses per second. Several usage
maps may be needed to show vastly different usage patterns for different times
and frequency statistics are generated during the systems analysis phase of the
systems development process when systems analysts are studying current and
proposed data processing and business activities. The data volume statistics
represent the size of the business and should be calculated assuming business
growth over a period of at least several years. The access frequencies are
accesses, and such accesses may change significantly over time, and known
database access can peak and dip over a day, week, or month, the access
131 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
frequencies tend to be less certain and even than the volume statistics.
Fortunately, precise numbers are not necessary. What is crucial is the relative
size of the numbers, which will suggest where the greatest attention needs to be
given during physical database design in order to achieve the best possible
• There are 3,000 PART instances, so if PART has many attributes and
some, like description, are quite long, then the efficient storage of PART might be
important.
• For each of the 4,000 times per hour that SUPPLIES is accessed via
suggest possibly combining these two co-accessed entities into a database table
PURCHASED parts, so it might make sense to have two separate tables for
these entities and redundantly store data for those parts that are both
PART and 6,000 independent access of PURCHASED PART) and only 8,000
132 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
due to the significantly different access volumes. It can be helpful for subsequent
physical
DESIGNING FIELDS
such as
to a
simple attribute in the logical data model, and so in the case of a composite
attribute, a
Describe the mechanisms that the DBMS should use to handle missing
133 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
As a typical company’s amount of data has grown exponentially it’s become even
more critical to optimize data storage. The size of your data doesn’t just impact
storage size
and costs, it also affects query performance. A key factor in determining the size
Selecting a data type involves four objectives that will have different relative
134 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
space
135 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data types. DECFLOAT and FLOAT are also options for very large
numbers.
If the data is date and time, use DATE, TIME, and TIMESTAMP data
types.
136 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CODING TECHNIQUES
Some attributes have a sparse set of values or are so large that, given data
volumes, considerable storage space will be consumed. A field with a limited
number of possible values can be translated into a code that requires less space.
Consider the example of the ProductFinish field illustrated in Figure 5-2. Products
at Pine Valley Furniture come in only a limited number of woods: Birch, Maple,
and Oak. By creating a code or translation table, each ProductFinish field value
can be replaced by a code, a cross-reference to the lookup table, similar to a
foreign key. This will decrease the amount of space for the ProductFinish field
and hence for the PRODUCT file. There will be additional space for the
PRODUCT FINISH lookup table, and when the ProductFinish field value is
needed,
137 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
an extra access (called a join) to this lookup table will be required. If the
values is very large, the relative advantages of coding may outweigh the costs.
Note that the code table would not appear in the conceptual or logical model. The
Default Value - A default value is the value a field will assume unless a user
enters an explicit value for an instance of that field. Assigning a default value to a
field can reduce data entry time because entry of a value can be skipped. It can
also help to reduce data entry errors for the most common value.
Range control - A range control limits the set of permissible values a field may
values. Range controls must be used with caution because the limits of the range
may change over time. A combination of range controls and coding led to the
year 2000 problem that many organizations faced, in which a field for year was
138 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Null value control - A null value was defined in Chapter 4 as an empty value.
Each primary key must have an integrity control that prohibits a null value. Any
also have a null value control placed on it if that is the policy of the organization.
Referential integrity on a field is a form of range control in which the value of that
field must exist as the value in some field in another row of the same or (most
commonly) a different table. That is, the range of legitimate values comes from
the dynamic contents of a field in a database table, not from some pre-specified
set of values. Note that referential integrity only guarantees that some existing
cross-referencing value is used, not that it is the correct one. A coded field will
have referential integrity with the primary key of the associated lookup table.
• Substitute an estimate of the missing value. For example, for a missing sales
value when computing monthly product sales, use a formula involving the mean
of the existing monthly sales values for that product indexed by total sales for
that month across all products. Such estimates must be marked so that users
• Track missing data so that special reports and other system elements cause
139 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
execute when some event occurs or time period passes. One trigger could log
the missing entry to a file when a null or other missing value is stored, and
another trigger could run periodically to create a report of the contents of this log
file.
• Perform sensitivity testing so that missing data are ignored unless knowing a
value might significantly change results (e.g., if total monthly sales for a particular
salesperson are almost over a threshold that would make a difference in that
person’s compensation). This is the most complex of the methods mentioned and
hence requires the most sophisticated programming. Such routines for handling
missing data may be written in application programs. All relevant modern DBMSs
FILE ORGANIZATION
File organization refers to the way data is stored in a file. File organization is very
important because it determines the methods of access, efficiency, flexibility and storage
devices to use.
140 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
f) Accommodating growth
file are stored in sequence according to a primary key value (see Figure 5-7a).
To locate a particular record, a program must normally scan the file from the
file is the alphabetical list of persons in the white pages of a telephone directory
Each record contains a field that contains the record key. A record key for a
An indexed file can also use alternate indexes, that is, record keys that let you
access the file using a different logical arrangement of the records. For example,
141 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
you could access a file through employee department rather than through
employee number.
The possible record transmission (access) modes for indexed files are
142 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Hashed file organization - Hash File Organization uses the computation of hash
function on some fields of the records. The hash function's output determines the
location of disk block where the records are to be placed.
be 997, because it is close to 1,000. Now consider the record for employee
12,396. When we divide this number by 997, the remainder is 432. Thus, this
In this method two or more table which are frequently used to join and get the
results are stored in the same file called clusters. These files will have two or
more tables in the same data block and the key columns which map these tables
are stored only once. This method hence reduces the cost of searching for
various records in different files. All the records are found at one place and hence
in determining how the data are actually stored on the storage media. The
logical relations are structured as database tables. The purpose of this section is
143 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
processing of data and quick access to stored data. It first describes the best-
physical table to avoid the need to bring related data back together when they
are retrieved from the database. Then the section will discuss another form of
logical data model and the physical tables, but in this case one relation is
combine attributes from several relations together into one physical record, or
144 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
back specific instances of redundant data after the data structure has been
that has never been normalized. Using normalization in SQL, a database will
store different but related types of data in separate logical tables, called relations.
145 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
When a query combines data from multiple tables into a single result table, it is
called a join. The performance of such a join in the face of complex queries is
alternative.
Another approach is to denormalize the logical data design. With care this can
does not become inconsistent. This is done by creating rules in the database
separate logical tables and attempt to minimize redundant data. We may strive to
we might have a Courses table and a Teachers table. Each entry in Courses
would store the teacherID for a Course but not the teacherName. When we need
146 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
to retrieve a list of all Courses with the Teacher’s name, we would do a join
between these two tables. In some ways, this is great; if a teacher changes his or
her name, we only have to update the name in one place. The drawback is that if
tables are large, we may spend an unnecessarily long time doing joins on tables.
we decide that we’re okay with some redundancy and some extra effort to update
It is into this world of normalization with its order and useful arrangement of data
If one went to such great lengths to arrange the data in normal form, why would
one change it? In order to improve performance is almost always the answer. In
fewer joins, and faster access paths. These are all very valid reasons for
evaluative decision however and should be based on the knowledge that the
normalized model shows no bias to either update or retrieval but gives advantage
147 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Only one valid reason exists for denormalizing a relational design – to enhance
performance. However, there are several indicators which will help to identify
●Many critical queries and reports exist which rely upon data from more
than one table. Often times these requests need to be processed in an on-
line environment.
individually.
●Many large primary keys exist which are clumsy to query and consume a
tables.
option.
148 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Writing queries is much easier.If the table is properly reorganized for the
most common needs, you can extract data from only one table and not
waste time looking for join keys. However, one should remember about
No need to obtain data from dictionary tables where the values are
constant over time. Tables with country dictionaries are good examples. If
countries. In this case, it is worth adding a column with the name of the
Ability to add aggregate data, which can be used for more efficient
sales, etc., are very necessary to analyze various areas of the company’s
149 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the table may significantly increase its size, which may be associated with
Increased costs of updating tables and inserts. In a table where data has
that contains data about customer’s address has been added. Updating
this data can be burdensome and costly if the customer changes the
dictionary table at a much lower cost. It is similar with inserts. Due to the
get to know the table thoroughly and to take into account data duplication.
The query that will extract the necessary data without a risk of data
150 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Partitioning
A reserved part of a storage drive (hard disk, SSD) that is treated as a separate
drive. Even a single drive that takes all the storage space is assigned a partition.
For example, early Windows PCs came with the entire disk partitioned as drive
C:. New Windows PCs often come with the storage drive partitioned into C: and
D:. The main drive is C:and D: contains a recovery system in the event Windows
has to be re-installed. In addition, users may wish to have several drives for
organizational purposes, and utility programs come with every computer for
adding and modifying partitions. See primary partition, extended partition, basic
On Microsoft operating systems, a hard disk is divided into drives. The first drive
has one drive in the partition called the primary drive and is generally "C:", which
is the active partition that boots the OS. Extended partitions can be added such
as "D:" and "E:" have more than one drive and are used for other storage such as
A Unix OS such as Linux and some older versions of Mac OS X use multiple
This type of partition scheme allows directories with a file system hierarchy
standard (FHS) or home directory to be assigned their own file systems. A typical
that hold a file system that is attached to “/”, which is located in the root directory
151 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
a Linux OS. A Mac OS X system uses one partition for the whole file system. It
uses a swap file method within the file system instead of a swap partition.
The partitioning can be done by either building separate smaller databases (each
Imagine you want to find a piece of information that is within a large database. To
get this information out of the database the computer will look through every row
until it finds it. If the data you are looking for is towards the very end, this query
152 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
If the table was ordered alphabetically, searching for a name could happen a lot
faster because we could skip looking for the data in certain rows. If we wanted to
search for “Zack” and we know the data is in alphabetical order we could jump
down to halfway through the data to see if Zack comes before or after that row.
We could then half the remaining rows and make the same comparison.
153 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
An index is a structure that holds the field the index is sorting and a pointer from
each record to their corresponding record in the original table where the data is
actually stored. Indexes are used in things like a contact list where the data may
be physically stored in the order you add people’s contact information but it is
Let’s look at the index from the previous example and see how it maps back to
We can see here that the table has the data stored ordered by an incrementing id
based on the order in which the data was added. And the Index has the names
database becomes larger and larger, the more likely you are to see benefits from
indexing.
When data is written to the database, the original table (the clustered index) is
updated first and then all of the indexes off of that table are updated. Every time
a write is made to the database, the indexes are unusable until they have
updated. If the database is constantly receiving writes, then the indexes will
never be usable. This is why indexes are typically applied to databases in data
warehouses that get new data updated on a scheduled basis (off-peak hours)
and not production databases which might be receiving new writes all the time.
155 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 6:
Briones, Joshua
Ramos, , Eden Marie C.
Reyes, Ana Marie
156 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the vowels, because SEQUEL was a trade mark registered by the Hawker
TABLE is also called the relation or data set that is organized w/ rows and
columns
(ISO) in 1987
157 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
variables.
The first commercial DBMS that supported SQL was Oracle in 1979. Oracle is
operating systems. IBM’s DB2, Informix, and Microsoft SQL Server are available
They used a language called Sequel, also developed at the San Jose IBM
Research Laboratory.
Purposes of SQL
languages
158 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Advantages of SQL
with it from continued use. An organization can afford to invest in tools to help
IS professionals become more productive. Because they are familiar with the
language in which programs are written, programmers can more quickly maintain
existing programs.
159 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
is a standard language.
time; hence there will be little pressure to rewrite old applications. Rather,
or new versions of
used, it is easier to use different vendors for the DBMS, training and educational
services, application software, and consulting assistance; further, the market for
such vendors will be more competitive, which may lower prices and improve
service.
more easily communicate and cooperate in managing data and processing user
programs.
Disadvantages of SQL
160 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Data Table
SELECT – used to get data from tables in a database. It is also one of the most
manipulation, and return a result. In this section, I talk about the phases involved
in logical query processing. I describe the logical order in which the different
Note that by “logical query processing,” I’m referring to the conceptual way in
which standard SQL defines how a query should be processed and the final
describe here seem inefficient. The Microsoft SQL Server engine doesn’t have to
161 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
follow logical query processing to the letter; rather, it is free to physically process
would be the same as that dictated by logical query processing. SQL Server can
query.
The FROM clause is the very first query clause that is logically processed. In this
clause, you specify the names of the tables that you want to query and table
In the WHERE clause, you specify a predicate or logical expression to filter the
rows returned by the FROM phase. Only rows for which the logical expression
evaluates to TRUE are returned by the WHERE phase to the subsequent logical
query processing phase. In the sample query in Listing 2-1, the WHERE phase
Referential integrity means that a value in the matching column on the many
side must correspond to a value in the primary key for some row in the table on
Referential integrity is a property of data stating that all its references are valid.
162 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
table that is declared a foreign key can only contain either null values or values
from a parent table's primary key or a candidate key.[2] In other words, when a
foreign key value is used it must reference a valid, existing primary key in the
parent table. For instance, deleting a record that contains a value referred to by a
foreign key in another table would break referential integrity. Some relational
normally either by deleting the foreign key rows as well to maintain integrity, or by
returning an error and not performing the delete. Which method is used may be
The adjective 'referential' describes the action that a foreign key performs,
integrity' guarantees that the target 'referred' to will be found. A lack of referential
Starting with this version, the standard name used a colon instead of a hyphen to
163 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
be consistent with the names of other ISO standards. This standard was
SQL.
statements, expressions, and so forth. This part is the most important for
language.
Three more parts, also considered part of SQL:1999, were published later.
SQL:1999 introduced many important features that are part of modern SQL.
is a very useful feature that lets you ORGANIZE LONG AND COMPLEX SQL
164 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
which includes features that are helpful when preparing business reports.
ORDER BY, the inclusion of data types for large binary objects (LOB and CLOB),
The size of the SQL standard grew significantly between 1992 and 1999. The
SQL-92 standard had almost 600 pages, but it was still accessible to regular SQL
users. Books like A Guide to the SQL Standard by Christopher Date and Hugh
Starting with SQL:1999 the standard – now over 2,000 pages – was no longer
accessible to regular SQL users. It has become a resource for database experts
and database vendors. The standard guides the development of SQL in major
databases; it shows which new language features are worth implementing to stay
current. It also standardizes the syntax of new SQL features, making sure that
major databases implement them in a similar way, using similar syntax and
semantics.
165 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The change in the role of the SQL standard is emphasized by the fact that there
is no longer an official body that certifies compliance with the standard. Until
management standards program certified SQL DBMS compliance with the SQL
In the 21st century, the SQL standard has been regularly updated.
significantly increased the expressive power of SQL. They are extremely useful
the standard coincided with the popularity of OLAP and data warehouses. People
only gaining momentum, thanks to the growing amount of data that all
166 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
After 2004, there were no major ground-breaking additions to the language. The
changes in the SQL standard reflected the changes in technology at the time.
databases and XML technologies, which were the hot new thing in the early
revision of the complete SQL standard, just Part 14, which deals with SQL-XML
interoperability.
common data exchange format; modern Internet applications use JSON instead
of XML as their data format. The emerging NoSQL movement also popularized
JSON; document databases store JSON files, and key-value stores are
compatible with the JSON format. The SQL standard added JSON support to
allow for interoperability with modern applications and new types of databases.
The current SQL standard is SQL:2019. It added Part 15, which defines
167 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
functionality, ten more are related to polymorphic table functions. The additions
expression pattern
delimited string
Now that we have explored some of the possibilities for working with a single
table, it’s time to bring out the light sabers, jet packs, and tools for heavy lifting:
168 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
realized when working with multiple tables. When relationships exist among
tables, the tables can be linked together in queries. Remember from the previous
foreign key in one table references the primary key in another, and the values in
both come from a common domain. We can use these columns to establish a link
The linking of related tables varies among different types of relational systems. In
SQL, the WHERE clause of the SELECT command is also used for multiple-table
operations. In fact, SELECT can include references to two, three, or more tables
in the same command. As illustrated next, SQL has two ways to use SELECT for
The most frequently used relational operation, which brings together data from
two or more related tables into one resultant table, is called a join. Originally,
common columns over which tables were joined. Since SQL-92, joins may also
be specified in the FROM clause. In either case, two tables may be joined when
each contains a column that shares a common domain with the other. As
mentioned previously, a primary key from one table and a foreign key that
references the table with the primary key will share a common domain and are
169 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
using columns that share a common domain but not the primary-foreign key
relationship, and that also works (e.g., we might join customers and salespersons
based on common postal codes, for which there is no relationship in the data
model for the database). The result of a join operation is a single table. Selected
columns from all the tables are included. Each row returned contains data from
rows in the different input tables where values for the common columns match.
In other guides, you have learned how to write basic SQL queries to retrieve data
from a table. In real-life applications, you would need to fetch data from multiple
tables to achieve your goals. To do so, you would need to use SQL joins. In this
guide, you will learn how to query data from multiple tables using joins.
A JOIN clause is used when you need to combine data from two or more tables
into one data set. Records from both tables are matched based on a condition
met, the records are included in the output. According to the article in
using examples. So, before we go any further, let's take a look at the tables that
170 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
We are going to use tables from a fictional bank database. The first table
d d
2556889 12000 4 2 RET
132359879 1550 1 1 RET
5
2225546 5000 5 2 RET
5516229 6000 4 5 RET
5356222 7500 5 5 RET
2221889 5400 1 2 RET
2455688 12500 50 2 CORP
132248865 2500 51 1 CORP
6
132359879 3100 52 1 CORP
5
132311159 1220 53 1 CORP
5
account table
segment – Contains the values ‘RET’ (for retail clients) and ‘CORP’ (for
corporate clients).
171 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
d
1 MARC TESCO M Y
2 ANNA MARTIN F N
3 EMMA JOHNSON F Y
4 DARIO PENTAL M N
5 ELENA SIMSON F N
6 TIM ROBITH M N
7 MILA MORRIS F N
8 JENNY DWARTH F Y
customer table
Now that we have these two tables, we can combine them to display additional
questions like:
4. What is the total overdraft amount for all of Emma Johnson’s accounts?
172 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
case, customer_id). Once we merge the two tables, we will have account and
else.) Also, keep in mind that some customer IDs are not present in
There are several ways we can combine two tables. Or, put another way, we can
SQL JOIN types include:
type of JOIN.
Let's dive deeper into the first four SQL JOIN types. I will use an example to
explain the logic and the syntax of each type. Sometimes people use Venn
diagrams when explaining SQL JOIN types. I’m not going to use them here, but if
that’s your thing then check out the article HOW TO LEARN SQL JOINS.
173 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
INNER JOIN
INNER JOIN is used to display matching records from both tables. This is also
like LEFT, RIGHT, or FULL) and just use JOIN, this is the type of join you’ll get
by default.
There are usually two (or more) tables in a join statement. We call them the left
and right tables. The left table is in the FROM clause – and thus to the left of
record is included in the data set. It can be from either table. If the record does
not match the criteria, it’s not included. The image below shows what would
happen if the color blue was the join criteria for the left and right tables:
SELECT account.*,
customer.name,
customer.lastname,
customer.gender,
customer.marital_status
FROM account
174 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
JOIN customer
ON account.customer_id=customer.customer_id
the account and customer tables.
175 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
customer.customer_id
176 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Records that share the same customer ID value are matched. (They are
shown in color in the above image.) Records that don’t have a match in
either table (shown in gray) are not included in the result set.
For records that have a match, all attributes from the account table are
displayed in the result set. The name, last name, gender, and marital
others are discarded. In business terms, we displayed all the retail accounts with
detailed information about their owners. Non-retail accounts were not displayed
LEFT JOIN
Sometimes you’ll need to keep all records from the left table – even if some don't
have a match in the right table. In the last example, the gray rows were not
177 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
displayed in the output. Those are corporate accounts. In some cases, you may
want to have them in the data set, even if their customer data is left empty. If we
would like to return unpaired records from the left table, then we should write
a LEFT JOIN. Below, you can see that the LEFT JOIN returns everything in the
of INNER JOIN:
SELECT account.*,
customer.name,
customer.lastname,
customer.gender,
customer.marital_status
FROM account
178 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
LEFT JOIN customer
ON account.customer_id=customer.customer_id
The syntax is identical. The result, however, is not the same?. Now we can see
Notice how attributes like name, last name, gender, and marital status in the last
four rows are populated with NULLs. This is because these gray rows don’t have
are not present in the customer table). Thus, those attributes have been left
RIGHT JOIN
179 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Similar to LEFT JOIN, RIGHT JOIN keeps all records from the right table (even if
there is no matching record in the left table). Here’s that familiar image to show
180 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
JOIN with RIGHT JOIN:
SELECT account.account_id,
account.overdraft_amount,
account.type_id,
account.segment,
account.customer_id,
customer.customer_id
customer.name,
customer.lastname,
customer.gender,
customer.marital_status
FROM account
RIGHT JOIN customer
ON account.customer_id=customer.customer_id
The syntax is mostly the same. I’ve made one more small change: In addition
result set. I did this to show you what happens to records from
181 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
As you can see, all records from the right table have been included in the result
Unmatched customer IDs from the right table (numbers 2,3, 6,7, and 8,
shown in gray) have their account attributes set to NULL in this result set.
They are retail customers that don’t have a bank account – and thus no
records in the account table.
You might expect that the resulting table will have eight records because
not the case. We have 11 records because customer IDs 1, 4, and 5 each
displayed.
182 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
I’ve shown you how to keep all records from the left or right tables. But what if
you want to keep all records from both tables? In our case, you’d want to display
all matching records plus all corporate accounts plus all customers without
accounts. To do this, you can use FULL OUTER JOIN. This JOIN type will pair
all matching columns and will also display all unmatching columns from both
image below:
SELECT account.*,
CASE WHEN customer.customer_id IS NUL
THEN account.customer_id
183 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
customer.lastname,
customer.gender,
customer.marital_status
FROM account
FULL JOIN customer
ON account.customer_id=customer.customer_id;
Notice how the last five rows have account attributes populated with NULLs. This
also how customers 50, 51, 52, and 53 have first or last names and other
184 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CASE WHEN customer.customer_id IS NULL
THEN account.customer_id
ELSE customer.customer_id END customer_i
the other one). We could also display both columns in the output, but this CASE
used?
Solution:
A Natural Join is also a Join operation that is used to give you an output based
on the columns in both the tables between which, this join operation must be
implemented. To understand the situations n which natural join is used, you need
The main difference the Natural Join and the Inner Join relies on the number of
185 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Now, if you apply INNER JOIN on these 2 tables, you will see an output as
below:
If you apply NATURAL JOIN, on the above two tables, the output will be as
below:
From the above example, you can clearly see that the number of columns
returned from the Inner Join is more than that of the number of columns returned
from Natural Join. So, if you wish to get an output, with less number of columns,
Solution:
To map many to many relationships using joins, you need to use two JOIN
statements.
and let us assume that each employee is working on a single project. So, one
186 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
project cannot be assigned to more than one employee. So, this is basically, a
one-to-many relationship.
technologies, and any technology can be used in multiple projects, then this kind
To use joins for such relationships, you need to structure your database with 2
foreign keys. So, to do that, you have to create the following 3 tables:
Projects
Technologies
projects_to_technologies
in every row. This table maps the items on the projects table to the items on the
technologies.
Once the tables are created, use the following two JOIN statements to link all the
projects_to_technologies to projects
proejcts_to-technologies to technologies
Solution:
187 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Hash joins are also a type of joins which are used to join large tables or in an
instance where the user wants most of the joined table rows.
The Hash Join algorithm is a two-step algorithm. Refer below for the steps:
Probe phase: Go through the right side input, each row at a time to find
Solution:
Self Join
SELF JOIN in other words is a join of a table to itself. This implies that each row
Cross Join
The CROSS JOIN is a type of join in which a join clause is applied to each row of
a table to every row of the other table. Also, when the WHERE condition is used,
this type of JOIN behaves as an INNER JOIN, and when the WHERE condition is
Solution:
statements. You can refer to the second question for an understanding of how to
188 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
o - A SELECT clause
o - A FROM clause
o - A WHERE clause
SELECT statement.
You can use the comparison operators, such as >, <, or =. The
ANY, or ALL.
select.
The inner query executes first before its parent query so that the results of
189 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Syntax :
The subquery (inner query) executes once before the main query (outer
query) executes.
In this section, you will learn the requirements of using subqueries. We have the
following two tables 'student' and 'marks' with common field 'StudentID'.
Student Marks
Now we want to write a query to identify all students who get better marks than
that of the student who's StudentID is 'V002', but we do not know the marks of
'V002'.
190 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
- To solve the problem, we require two queries. One query returns the marks
(stored in Total_marks field) of 'V002' and a second query identifies the students
who get better marks than the result of the first query.
First query:
SELECT *
FROM `marks`
Query result:
- Using the result of this query, here we have written another query to identify the
students who get better marks than 80. Here is the query :
Second query:
Query result:
191 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Above two queries identified students who get the better number than the student
You can combine the above two queries by placing one query inside the other.
The subquery (also called the 'inner query') is the query inside the parentheses.
SQL Code:
(SELECT total_marks
FROM marks
Query result:
192 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Syntax:
193 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
[WHERE search_conditions]
[HAVING search_conditions])
Subqueries: Guidelines
clause in the main SELECT statement (outer query) which will be the last
clause.
If a subquery (inner query) returns a null value to the outer query, the
outer query will not return any rows when using certain comparison
Type of Subqueries
194 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Sub-queries are queries within another query. The result of the inner sub-query
is fed to the outer query, which uses that to produce its outcome. If that outer
query is itself the inner query to a further query, then the query will continue until
There are two types of sub-queries in SQL however, correlated sub-queries and
Uncorrelated Sub-query
depend upon the outer query for its execution. It can complete its execution as a
example.
Suppose, you have database “schooldb” which has two tables: student and
department. A department will have many students. This means that the student
table has a column “dep_id” which contains the id of the department to which that
student belongs. Now, suppose we want to retrieve records of all students from
195 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The sub-query used in this case will be uncorrelated sub-query since the inner
query will retrieve the id of the computer department from the department table;
the result of this inner query will be directly fed into the outer query which
retrieves records of students from the student table where “dep_id” column’s
The inner query which retrieves the id of the department using name can be
Correlated Sub-query
A correlated sub-query is a type of query, where inner query depends upon the
above. We want to retrieve the name, age and gender of all the students whose
age is greater than the average age of students within their department.
In this case, the outer query will retrieve records of all the students iteratively and
each record is passed to the inner query. For each record, the inner query will
retrieve average age of the department for the student record passed by the
outer query. If the age of the student is greater than average age, the record of
the student will be included in the result, and if not not. Let’s see this in action.
Let’s create a database named “schooldb”. Run the following SQL in your query
window:
196 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The above command will create a database named “schooldb” on your database
server.
Next, we need to create a “department” table within the “schooldb” database. The
department table shall have three columns: id, name and capacity. To create
2 (
6 )
Next lets add some dummy data to the table so that we can execute our sub-
1 USE schooldb;
2
197 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Next we need to create a “student” table within our database. The student table
will have five columns: id, name, age, gender, and dep_id.
The dep_id column will act as the foreign key column and will have values from
the id column of the department table. This will create a one to many relationship
between the department and student tables. Execute following query to create
student table.
1
USE schooldb;
2
3
CREATE TABLE student
4
(
5
id INT PRIMARY KEY,
6
name VARCHAR(50) NOT NULL,
7
gender VARCHAR(50) NOT NULL,
8
age INT NOT NULL,
9
dep_id INT NOT NULL
1
)
0
198 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
USE schooldb;
1
2
INSERT INTO student
3
VALUES (1, 'Jolly', 'Female', 20,
4
4),
5
(2, 'Jon', 'Male', 22, 3),
6
(3, 'Sara', 'Female', 25, 4),
7
(4, 'Laura', 'Female', 18, 2),
8
(5, 'Alan', 'Male', 20, 3),
9
(6, 'Kate', 'Female', 22, 2),
10
(7, 'Joseph', 'Male', 18, 2),
11
(8, 'Mice', 'Male', 23, 1),
12
(9, 'Wise', 'Male', 21, 5),
13
(10, 'Elis', 'Female', 27, 2);
Notice that values in “dep_id” column of the student table exists in the id column
1 USE schooldb;
2
3 SELECT * FROM
199 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
student
4
WHERE dep_id =
5
(
6
SELECT id from department WHERE name =
7
'Computer'
8
);
The output of the above SQL will be:
Female 18 2
Female 22 2
Male 18 2
Female 27 2
You can see that there are two queries. The inner query retrieves id of the
“Computer” department while the outer query retrieves student records with that
We know that in the case of uncorrelated sub-queries the inner query can be
executed as standalone query and it will still work. Let’s check if this is true in this
200 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The above query will execute successfully and will return 2 i.e. the of the
201 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
We know that in case of correlated sub-queries, the inner query depends upon
Lets execute a correlated sub-query that retrieves results of all the students with
age greater than average age within their department as discussed above.
USE schooldb;
1
2
SELECT name, gender, age
3
FROM student Greater
4
WHERE age >
5
(SELECT AVG (age)
6
FROM student average
7
WHERE greater.dep_id =
8
average.dep_id) ;
The output of the above query will be:
gender age
Female 22
Female 27
Male 22
Female 25
202 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
We know that in the case of a correlated sub-query, the inner query cannot be
executed as standalone query. You can verify this by executing the following
1. The outer query executes before the inner query in the case of a
Any information which you retrieve from the database using subquery can be
retrieved by using different types of joins also. SQL is flexible and it provides
different ways of doing the same thing. Some people find SQL Joins confusing
203 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. Almost whatever you want to do with subquery can also be done using join, it
2. Subquery normally returns a scaler value as a result or result from one column
3. You can use subqueries in four places: subquery as a column in select clause,
4. In the case of correlated subquery outer query gets processed before the inner
query.
SQL query-related problems. They are not just important from the SQL interview
point of view but also from the Data Analysis point of view.
transactions are either completed or rolled back if errors or problems occur. The
204 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Commonly used SQL statements include select, add, insert, update, delete,
The first thing to understand about SQL is that SQL isn’t a procedural language,
specific operation after another until the task is complete. The procedure may be
a straightforward linear sequence or may loop back on itself, but in either case,
SQL, on the other hand, is nonprocedural. To solve a problem using SQL, simply
tell SQL what you want (as if you were talking to Aladdin’s genie) instead of
telling the system how to get you what you want. The database management
system (DBMS) decides the best way to get you what you request.
All right. You were just told that SQL is not a procedural language — and that’s
manner. So, in recent years, there has been a lot of pressure to add some
205 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
procedures. With these facilities added, you can store programs at the server,
To illustrate what is meant by “tell the system what you want,” suppose you have
an EMPLOYEE table from which you want to retrieve the rows that correspond to
all your senior people. You want to define a senior person as anyone older than
age 40 or anyone earning more than $100,000 per year. You can make the
This statement retrieves all rows from the EMPLOYEE table where either the
value in the Age column is greater than 40 or the value in the Salary column is
greater than 100,000. In SQL, you don’t have to specify how the information is
retrieved. The database engine examines the database and decides for itself
how to fulfill your request. You need only specify what data you want to retrieve.
databases, it has to be modified for the Hadoop 1 model, which uses the Hadoop
Distributed File System and Map-Reduce or the Hadoop 2 model, which can
206 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The different means for executing SQL in Hadoop environments can be divided
into (1) connectors that translate SQL into a MapReduce format; (2) "push down"
systems that forgo batch-oriented MapReduce and execute SQL within Hadoop
clusters; and (3) systems that apportion SQL work between MapReduce-HDFS
One of the earliest efforts to combine SQL and Hadoop resulted in the Hive data
warehouse, which featured HiveQL software for translating SQL-like queries into
MapReduce jobs. Other tools that help support SQL-on-Hadoop include BigSQL,
Drill, Hadapt, Hawq, H-SQL, Impala, JethroData, Polybase, Presto, Shark (Hive
theory. In the early 1970s, as IBM researchers developed early relational DBMS
release their query language as a product, they found that another company had
geniuses at IBM decided to give the released product a name that was different
from SEQUEL but still recognizable as a member of the same family. So they
207 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
pre-release days and continued to do so. That practice has persisted to the
present day; some people will say “Sequel” and others will say “S-Q-L,” but they
PL/SQL
90's to enhance the capabilities of SQL. PL/SQL is one of three key programming
languages embedded in the Oracle Database, along with SQL itself and Java.
This tutorial will give you great understanding on PL/SQL to proceed with Oracle
late 1980s as procedural extension language for SQL and the Oracle relational
language.
environment.
PL/SQL can also directly be called from the command-line SQL*Plus interface.
Direct call can also be made from external programming language calls to
database.
208 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
language.
IBM DB2.
Features of PL/SQL
Advantages of PL/SQL
-SQL is the standard database language and PL/SQL is strongly integrated with
SQL. PL/SQL supports both static and dynamic SQL. Static SQL supports DML
operations and transaction control from PL/SQL block. -In Dynamic SQL, SQL
209 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
time. This reduces network traffic and provides high performance for the
applications.
-PL/SQL provides support for developing Web -Applications and Server Pages.
In this chapter, we will discuss the Environment Setup of PL/SQL. PL/SQL is not
environment. SQL* Plus is an interactive tool that allows you to type SQL and
PL/SQL statements at the command prompt. These commands are then sent to
the database for processing. Once the statements are processed, the results are
To run PL/SQL programs, you should have the Oracle RDBMS Server installed
in your machine. This will take care of the execution of the SQL commands. The
210 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
most recent version of Oracle RDBMS is 11g. You can download a trial version
You will have to download either the 32-bit or the 64-bit version of the installation
as per your operating system. Usually there are two files. We have downloaded
the 64-bit version. You will also use similar steps on your operating system, does
win64_11gR2_database_1of2.zip
win64_11gR2_database_2of2.zip
After downloading the above two files, you will need to unzip them in a single
directory database and under that you will find the following sub-directories −
Step 1
Let us now launch the Oracle Database Installer using the setup file. Following is
the first screen. You can provide your email ID and check the checkbox as
Oracle Install 1
Step 2
You will be directed to the following screen; uncheck the checkbox and click the
211 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Step 3
Just select the first option Create and Configure Database using the radio button
Oracle Install 2
Step 4
We assume you are installing Oracle for the basic purpose of learning and that
you are installing it on your PC or Laptop. Thus, select the Desktop Class option
Oracle Install 3
Step 5
Provide a location, where you will install the Oracle Server. Just modify the
Oracle Base and the other locations will set automatically. You will also have to
provide a password; this will be used by the system DBA. Once you provide the
Oracle Install 4
Step 6
212 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Oracle Install 5
Step 7
Click the Finish button to proceed; this will start the actual server installation.
Oracle Install 6
Step 8
This will take a few moments, until Oracle starts performing the required
configuration.
Oracle Install 7
Step 9
Here, Oracle installation will copy the required configuration files. This should
take a moment −
Oracle Configuration
Step 10
Once the database files are copied, you will have the following dialogue box. Just
Oracle Configuration
Step 11
213 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Oracle Install 8
Final Step
It is now time to verify your installation. At the command prompt, use the
You should have the SQL prompt where you will write your PL/SQL commands
and scripts −
Text Editor
Running large programs from the command prompt may land you in inadvertently
losing some of the work. It is always recommended to use the command files. To
Type your code in a text editor, like Notepad, Notepad+, or EditPlus, etc.
Save the file with the .sql extension in the home directory.
Launch the SQL*Plus command prompt from the directory where you created
If you are not using a file to execute the PL/SQL scripts, then simply copy your
PL/SQL code and right-click on the black window that displays the SQL prompt;
214 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
use the paste option to paste the complete code at the command prompt. Finally,
In this chapter, we will discuss the Data Types in PL/SQL. The PL/SQL variables,
constants and parameters must have a valid data type, which specifies a storage
format, constraints, and a valid range of values. We will focus on the SCALAR
and the LOB data types in this chapter. The other two data types will be covered
in other chapters.
1 Scalar
BOOLEAN.
Pointers to large objects that are stored separately from other data items, such
3 Composite
Data items that have internal components that can be accessed individually. For
4 Reference
215 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
PL/SQL Scalar Data Types and Subtypes come under the following categories −
1 Numeric
2 Character
3 Boolean
4 Datetime
PL/SQL provides subtypes of data types. For example, the data type NUMBER
has a subtype called INTEGER. You can use the subtypes in your PL/SQL
program to make the data types compatible with data types in other programs
while embedding the PL/SQL code in another program, such as a Java program.
Following table lists out the PL/SQL pre-defined numeric data types and their
sub-types −
216 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1 PLS_INTEGER
32 bits
2 BINARY_INTEGER
32 bits
3 BINARY_FLOAT
4 BINARY_DOUBLE
5 NUMBER(prec, scale)
6 DEC(prec, scale)
7 DECIMAL(prec, scale)
8 NUMERIC(pre, secale)
217 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
9 DOUBLE PRECISION
ANSI specific floating-point type with maximum precision of 126 binary digits
10 FLOAT
ANSI and IBM specific floating-point type with maximum precision of 126 binary
11 INT
12INTEGER
ANSI and IBM specific integer type with maximum precision of 38 decimal digits
13 SMALLINT
ANSI and IBM specific integer type with maximum precision of 38 decimal digits
14 REAL
decimal digits)
DECLARE
num1 INTEGER;
num2 REAL;
BEGIN
null;
END;
When the above code is compiled and executed, it produces the following result
Following is the detail of PL/SQL pre-defined character data types and their sub-
types −
1 CHAR
219 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
2 VARCHAR2
3 RAW
Variable-length binary or byte string with maximum size of 32,767 bytes, not
interpreted by PL/SQL
4 NCHAR
5 NVARCHAR2
6 LONG
7 LONG RAW
Variable-length binary or byte string with maximum size of 32,760 bytes, not
interpreted by PL/SQL
8 ROWID
9 UROWID
220 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The BOOLEAN data type stores logical values that are used in logical
operations. The logical values are the Boolean values TRUE and FALSE and the
value NULL.
SQL statements
The DATE datatype is used to store fixed-length datetimes, which include the
time of day in seconds since midnight. Valid dates range from January 1, 4712
includes a two-digit number for the day of the month, an abbreviation of the
month name, and the last two digits of the year. For example, 01-OCT-12.
221 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Each DATE includes the century, year, month, day, hour, minute, and second.
The following table shows the valid values for each field −
MONTH 01 to 12 0 to 11
DAY 01 to 31 (limited by the values of MONTH and YEAR, according to the rules
HOUR 00 to 23 0 to 23
MINUTE 00 to 59 0 to 59
222 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Large Object (LOB) data types refer to large data items such as text, graphic
images, video clips, and sound waveforms. LOB data types allow efficient,
random, piecewise access to this data. Following are the predefined PL/SQL
BFILE Used to store large binary objects in operating system files outside the
BLOB Used to store large binary objects in the database. 8 to 128 terabytes (TB)
CLOB Used to store large blocks of character data in the database. 8 to 128 TB
NCLOB Used to store large blocks of NCHAR data in the database. 8 to 128 TB
A subtype is a subset of another data type, which is called its base type. A
subtype has the same valid operations as its base type, but only a subset of its
valid values.
223 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
You can define and use your own subtypes. The following program illustrates
DECLARE
salutation name;
greetings message;
BEGIN
END;
224 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
When the above code is executed at the SQL prompt, it produces the following
result −
NULLs in PL/SQL
PL/SQL NULL values represent missing or unknown data and they are not an
integer, a character, or any other specific data type. Note that NULL is not the
same as an empty data string or the null character value '\0'. A null can be
PHP
web developers to create dynamic content that interacts with databases. PHP is
basically used for developing web based software applications. This tutorial helps
225 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
PHP started out as a small open source project that evolved as more and more
people found out how useful it was. Rasmus Lerdorf unleashed the first version
Domain. I will list down some of the key advantages of learning PHP:
commerce sites.
module on the Unix side. The MySQL server, once started, executes even very
PHP supports a large number of major protocols such as POP3, IMAP, and
LDAP. PHP4 added support for Java and distributed object architectures (COM
and CORBA), making n-tier development a possibility for the first time.
226 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Characteristics of PHP
Simplicity
Efficiency
Security
Flexibility
Familiarity
of code which takes one more input in the form of parameter and does some
You already have seen many functions like fopen() and fread() etc. They are
built-in functions but PHP gives you the option to create your own functions as
well.
227 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
In fact you hardly need to create your own PHP function because there are
already more than 1000 of built-in library functions created for different area and
Please refer to PHP Function Reference for a complete set of useful functions.
Its very easy to create your own PHP function. Suppose you want to create a
PHP function which will simply write a simple message on your browser when
you will call it. Following example creates a function called writeMessage() and
PHP gives you option to pass your parameters inside a function. You can pass
as many as parameters your like. These parameters work like variables inside
your function. Following example takes two integer parameters and add them
reference to the variable is manipulated by the function rather than a copy of the
variable's value.
Any changes made to an argument in these cases will change the value of the
228 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
ampersand to the variable name in either the function call or the function
definition.
procedures
DATABASE TRIGGERS
A trigger resides in the database and anyone who has the required privilege can
use it, a trigger lets you write a set of SQL statements that multiple applications
can use. It lets you avoid redundant code when multiple programs need to
You can use triggers to perform the following actions, as well as others that are
Create an audit trail of activity in the database. For example, you can track
table.
Implement a business rule. For example, you can determine when an order
Derive additional data that is not available within a table or within the database.
For example, when an update occurs to the quantity column of the items table,
229 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Enforce referential integrity. When you delete a customer, for example, you can
use a trigger to delete corresponding rows that have the same customer number
Benefits of Triggers
Auditing
INSTEAD OF Trigger: A special type. You will learn more about the further
STATEMENT level Trigger: It fires one time for the specified event statement.
ROW level Trigger: It fires for each record that got affected in the specified event.
(INSERT/UPDATE/DELETE)
(LOGON/LOGOFF/STARTUP/SHUTDOWN)
Syntax Explanation:
The above syntax shows the different optional statements that are present in
trigger creation.
231 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
ON clause will specify on which object the above-mentioned event is valid. For
example, this will be the table name on which the DML event may occur in the
Command “FOR EACH ROW” will specify the ROW level trigger.
WHEN clause will specify the additional condition in which the trigger needs to
fire.
The declaration part, execution part, exception handling part is same as that of
the other PL/SQL blocks. Declaration part and exception handling part are
optional.
In a row level trigger, the trigger fires for each related row. And sometimes it is
required to know the value before and after the DML statement.
Oracle has provided two clauses in the RECORD-level trigger to hold these
values. We can use these clauses to refer to the old and new values inside the
trigger body.
:NEW – It holds a new value for the columns of the base table/view during the
trigger execution
:OLD – It holds old value of the columns of the base table/view during the trigger
execution
232 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
This clause should be used based on the DML event. Below table will specify
NSTEAD OF Trigger
triggers. It is used when any DML event is going to occur on the complex view.
Consider an example in which a view is made from 3 base tables. When any
DML event is issued over this view, that will become invalid because the data is
Example 1: In this example, we are going to create a complex view from two
base table.
Then we are going to see how the INSTEAD OF trigger is used to issue UPDATE
the location detail statement on this complex view. We are also going to see how
233 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the server, called and executed by database applications. It is very simple and
stored procedure is compiled and stored in the database catalog. It runs faster
Stored procedure reduces the traffic between application and database server
application has only to send the stored procedure name and get the result back.
use it. Stored procedure exposes the database interface to all applications so
developer doesn’t have to program the functions which are already supported in
234 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Stored procedure make the database server high load in both memory and
processors. Instead of being focused on the storing and retrieving data, you
You cannot debug stored procedure in almost RDMBSs and in MySQL also.
There are some workarounds on this problem but it still not good enough to do
so.
Writing and maintain stored procedure usually required specialized skill set that
not all developers possess. This introduced the problem in both application
235 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 7:
DATABASE APPLICATION
DEVELOPMENT
236 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Client-server System
logical parts: a server, which provides response for services, and a client, which
(2003)
component is responsible for formatting and presenting data on the user’s screen
or other output device and for managing user input from a keyboard or other
input device. Presentation logic often resides on the client and is the mechanism
237 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Processing services
This handles data processing logic, business rules logic, and data
management logic. Processing logic resides on both the client and servers.
Business rules logic: have not been coded at the DBMS level
Storage services
The component responsible for data storage and retrieval from the
physical storage devices associated with the application. Storage logic usually
resides on the database server, close to the physical location of the data.
application.
238 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
In the fat client, the application processing occurs entirely on the client,
whereas in the thin client, this processing occurs primarily on the server. In the
distributed example or
hybrid client,
application processing
is partitioned between
server.
designed to
communicate with a
are produced by
servers such as a
239 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
implements its own features. It may connect to servers but it remains mostly
It is a client-
server
application
It was built in
support up to
100 users.
The client sends a request to the server where it then processes the
240 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
architecture:
with the
server
without the
presence of
any
intermediate
application.
APIs are
Open
Database
Connectivity
(ODBC) and
ADO.NET for
the Microsoft
241 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
application. Reinstall
database changes,
thus increases
deployment cost.
Database connection:
a separate database
connection.
of the network.
Bhuvana. (2006,
August 24)
242 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
THREE-TIER ARCHITECTURE
hundreds of users.
243 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
application as a whole.
244 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Applications
E-commerce Websites
VB.NET
The VB.NET code shown in figure 1 below uses the ADO.NET data
The .NET Framework has different data providers (or database drivers)
Figure 1-a shows the VB.NET code needed to create a simple form
that allows the user input to a name, department number, and student ID.
Figure 1-a -
receiving user
input.
245 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
boxes in the figure, you can see how the generic steps for accessing a
of a VB.NET program.
Figure 1-b :
Connecting to a
database and
issuing an
INSERT query.
Figure 1-
c shows
how you
would
access
the
database and process the results for a SELECT query. The main
DELETE queries. The table that results from running a SELECT query are
246 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the result by traversing the object, one row at a time. Each column in the
position in the query result (or by name). ADO.NET provides two main
choices with respect to handling the result of the query: DataReader (e.g.,
between the two options is that the first limits us to looping through the
result of a query one row at a time. This can be very cumbersome if the
in this chapter, we will see how .NET data controls (which use DataSet
program.
JAVA
students in the
Student table.
247 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
In this example, the Java program is using the JDBC API and an
Oracle thin driver to access the Oracle database. Notice that unlike the
mechanisms for this: the ResultSet and RowSet objects. The difference
points to its current row of data. When the ResultSet object is first
initialized, the cursor is positioned before the first row. This is why we
need to first call the next() method before retrieving data. The ResultSet
object is used to loop through and process each row of data and retrieve
the column values that we want to access. In this case, we access the
value in the name column using the rec.getString method, which is a part
of the JDBC API. For each of the common database types, there is a
corresponding get and set method that allows for retrieval and storage of
data in the database. It is important to note that while the ResultSet object
the table, the entire table (i.e., the result of the query) may or may not
actually be in memory on the client machine. How and when data are
248 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
as well.
Database Server
This server hosts the storage logic for the application and hosts the
DBMS. You have read about many of them, including Oracle, Microsoft SQL
Server, Informix, Sybase, DB2, Microsoft Access, and MySQL. The DBMS may
reside either on a separate machine or on the same machine as the Web server.
It can be configured to provide data access for authorized users only. This
type of server keeps the data in a central location that can be regularly backed
up. It also allows users and applications to centrally access the data across the
249 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
one server or a group of servers that are specifically configured to protect data
database files.
The hardware part of a database server is the server system used for
Web Server
The Web server provides the basic functionality needed to receive and
respond to requests from browser clients. These requests use HTTP or HTTPS
as a protocol.
The main job of a web server is to display website content through storing,
processing and delivering webpages to users. Besides HTTP, web servers also
support SMTP (Simple Mail Transfer Protocol) and FTP (File Transfer Protocol),
250 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Application Server
This software provides the building blocks for creating dynamic Web sites
Microsoft; Java Platform, Enterprise Edition (Java EE); and ColdFusion. Also,
enables you to write applications in languages such as PHP, Python, and Perl
a server programmer providing business logic behind any application. This server
can be a part of the network or the distributed network. It helps the clients to
A web browser is a software that allows you to find and view websites on
Apple’s Safari, Google’s Chrome, and Opera are examples. Sciencedirect (n.d.)
Information flow:
and the database itself. Its main role is to receive requests from client machines,
search for the required data, and pass back the results. Clients access a
database server through a front-end application that displays the requested data
on the client machine, or through a back-end application that runs on the server
251 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
server is the primary data location. Database slave servers are replicas of the
When a web browser, like Google Chrome or Firefox, needs a file that's
hosted on a web server, the browser will request the file by HTTP. When the
request is received by the web server, the HTTP server will accept the request,
find the content and send it back to the browser through HTTP.
that has 3-tier architecture. The position at which the application server fits
in is described below:
Tier 1 – This is a GUI interface that resides at the client end and is
Tier 2 – This is called the middle tier, which consists of the Application
Server.
Database
Server.
As we can see,
they usually
252 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
webserver for serving any request that is coming from clients. The client first
makes a request, which goes to the webserver. The web server then sends it to
the middle tier, i.e., the application server, which further gets the information from
3rd tier (e.g., database server) and sends it back to the webserver. The web
server further sends back the required information to the client. Different
approaches are being utilized to process requests through the web servers, and
Web Server
applications.
Application Server
(including HTTP).
(2021)
253 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
database. The data returned from the Web server is always in a format that can
be rendered by the browser (i.e., HTML or XML). As shown in Figure, if the Web
server determines that the request from the client can be satisfied without
passing the request on to the application server, it will process the request and
As with a normal page, your browser sends an HTTP request to the web
server.
The web server recognizes that the HTTP request is for a JSP page
and forwards it to a JSP engine. This is done by using the URL or JSP
A part of the web server called the servlet engine loads the Servlet
The web server forwards the HTTP response to your browser in terms
254 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
page inside the HTTP response exactly as if it were a static page. JSP
- Architecture. (n.d.)
The biggest difference between a Java web server and PHP is that
PHP doesn't have its own built-in web server. PHP itself is basically one
requests and invokes the PHP interpreter with the given requested PHP
source code file as argument, then delivers any output of that process
connects to the Internet and sends data to a server. The server then
retrieves that data, interprets it, performs the necessary actions and sends
it back to your phone. The application then interprets that data and
presents you with the information you wanted in a readable way. MySQL
255 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
provides much higher level of classes and objects through which data is
accessed easily. These classes hide all complex coding for connection,
form.
a database.
4. Once the connection is set up, you may save it for further use.
statement:
256 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
structure and format that can both be exchanged over the Internet and be
Most XML applications will work as expected even if new data is added (or
<from> <heading> <body>). Then imagine a version with added <date> and
The way XML is constructed, even older version of the application can
Example:
__
Old Version
Note
To: Tove
From: Jani
Reminder
257 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
New Version
Note
To: Tove
From: Jani
__
electronically share structured data via the public Internet, as well as via
corporate networks.
The tags in the example above (like <to> and <from>) are not defined in
any XML standard. These tags are "invented" by the author of the XML
document.
For example, Xml has offshoots/ sub class like the xbrl (eXtensible
258 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
With XML, being a standard of data exchange, data can be available to all
kinds of "reading machines" like people, computers, voice machines, news feeds,
What is XQuery?
that's designed to query collections of XML data -- not just XML files, but
anything that can appear as XML, including relational databases. The word
“Query” used in the 16th century in English as a noun meaning ‘query’, from Latin
quaere ‘ask, seek’. XQuery: Specifications, Articles, Mailing List, and Vendors.
(n.d.)
259 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 8:
260 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
DATA WAREHOUSING
Corto, Michelle T.
Manalo, Cklint Louisse M.
Romero, Agnes M.
DATA WAREHOUSING
261 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
performance. It is designed for query and analysis rather than for transaction
processing, and usually contains historical data derived from transaction data,
but can include data from other sources. Data warehouses separate analysis
be loosely described as any centralized data repository which can be queried for
superior and higher decisions. So, Data Warehousing supports architectures and
statistical analysis, reporting, data mining capabilities, client analysis tools, and
other applications that manage the process of gathering data, transforming it into
262 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
warehouse works with data collected from multiple sources. The source data
marketing, human resources and more. In today's world of big data, the data may
be many billions of individual clicks on web sites or the massive data streams
extraction, transformation, and loading (ETL) process from multiple data sources.
database that hosts the data warehouse. It is important to note that defining the
ETL process is a very large part of the design effort of a data warehouse.
Similarly, the speed and reliability of ETL operations are the foundation of the
Users of the data warehouse perform data analyses that are often time-
analysis, and profit by product and by customer. But time-focused or not, users
want to "slice and dice" their data however they see fit and a well-designed data
warehouse will be flexible enough to meet those demands. Users will sometimes
263 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
need highly aggregated data, and other times they will need to drill down to
details. More sophisticated analyses include trend analyses and data mining,
which use existing data to forecast trends or predict futures. The data warehouse
environments that serve reports, dashboards and other interfaces to end users.
Subject-Oriented
emphasis on modeling and analysis of data for decision making. It also provides
a simple and concise view around the specific subject by excluding data which is
Integrated
unit of measure for all similar data from the dissimilar database. The data also
264 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
acceptable manner.
like a mainframe, relational databases, flat files, etc. Moreover, it must keep
and C. Information stored in these applications are Gender, Date, and Balance.
However, after the transformation and cleaning process all this data is
Time-Variant
The time horizon for data warehouses is quite extensive compared with
particular period and offers information from the historical point of view. It
One such place where Data warehouse data display time variance is in
the structure of the record key. Every primary key contained with the DW should
have either implicitly or explicitly an element of time. Like the day, week, month,
etc.
warehouse, it can’t be updated or changed. All the historical data along with the
Data warehouse
to retrieve data of
any duration of
time. If the
business wants
any reports,
266 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
graphs, etc then for comparing it with the previous years and to analyze the
trends, all the old data that are 6 months old, 1-year-old or even older data, etc.
are required.
Non-volatile
The data residing in the data warehouse is permanent and defined by its
names. It additionally means that the data in the data warehouse cannot be
erased or deleted or also when new data is inserted into it. In the data
time. Operations such as delete, update and insert that are done in a software
application over data are lost in the data warehouse environment. There are only
two types of data operations that can be done in the data warehouse:
Data Loading
Data Access
maintain data warehouses and extract meaning from and help inform decision
making through the use of data in the data warehouses. Successful data
267 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
is named Data Warehousing. Data Warehousing helps to improve the speed and
efficiency of accessing different data sets and makes it easier for company
decision-makers to obtain insights that will help the business and promote
marketing tactics that set them aside from their competitors. We can say that it is
a blend of technologies and components which aids the strategic use of data and
historical data that can be retrieved and analyzed to supply helpful insight into
There are mainly three types of data warehousing, which are as follows:
representing data. With this warehouse at your end, you gain the ability to
classify the data as per the subject and grant the level of access to different
departments accordingly.
268 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
employees’ records.
designed for a specific business line like finance, accounts, sales, purchases,
or inventory. The warehouse allows you to collect data directly from the
sources.
(sometimes called systems of record because their role is to keep the official,
1960- Dartmouth and General Mills in a joint research project, develop the
1970- A Nielsen and IRI introduced dimensional data marts for retail
sales.
Data warehousing started in the late 1980s when IBM worker Paul Murphy
1988- Devlin and Murphy published the first article describing the
1992- Inmon published the first book describing data warehousing, and he
has subsequently become one of the most prolific authors in this field.
However, the real concept was given by Inmon Bill. He was considered
the father of the data warehouse. He had written about a variety of topics for
Information Factory.
architectural model for the flow of information from the operational system to
problems associated with the flow, mainly the high costs associated with it.
operate independently.
270 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
intelligence tool of an organization, the data warehouses can find out more
Two major factors drive the need for data warehousing in most organizations
today:
information.
data.
called silos, or islands, of data. They are also generally distributed on a variety of
DBMS, whereas another may be located on a SAP system. Yet, for decision-
information.
simple example shown in Figure 1. This figure shows three tables from three
separate systems of record, each containing similar student data. The STUDENT
DATA table is from the class registration system, the STUDENT EMPLOYEE
table is from the personnel system, and the STUDENT HEALTH table is from a
health center system. Each table contains some unique data concerning
271 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
students, but even common data (e.g., student names) are stored using different
formats.
STUDENT DATA
StudentNo LastName MI FirstName Telephone Status …
123-45-6789 Enright T Mark 483-1967 Soph
STUDENT EMPLOYEE
StudentID Address Dept Hours …
123-45-6789 1218 Elk Drive, Phoenix, AZ 91304 Soc 8
STUDENT HEALTH
StudentName Telephone Insurance ID …
Mark T. Enright 483-1967 Blue Cross 123-45-6789
Suppose you want to develop a profile for each student, consolidating all
data into a single file format. Some of the issues that you must resolve are as
follows:
Inconsistent key structures - The primary key of the first two tables is
some version of the student Social Security number, whereas the primary key
272 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
composite attribute) is broken into its component parts: LastName, MI, and
FirstName.
Missing data - The value for Insurance is missing (or null) for Elaine
single corporate view but fails to capture the complexity of that task. A real-life
scenario would likely have dozens (if not hundreds) of tables and thousands (or
millions) of records.
example, due to increased health care as well as other costs. In general, certain
273 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
trends in organizations encourage the need for data warehousing; these trends
corporate mergers and acquisitions, and because of the sheer size of many
the metadata are controlled and made the same by one data administrator,
the data values for the same attributes will not agree. This is because of
different update cycles and separate places where the same data are
captured for each system. Thus, to get one view of the organization, the data
synchronized into one additional database. We will see that there can be
274 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
scorecard, analytical software working with the data warehouse can be used
to “drill down,” “slice and dice,” visualize, and in other ways mine business
intelligence.
Organizations in all sectors are realizing that there is value in having a total
picture of their interactions with customers across all touch points. Different
touch points (e.g., for a bank, these touch points include ATMs, online
data warehouse, a teller may not know to try to cross-sell a customer one of
appears on the teller’s screen. Having a total picture of the activity with a
systems.
Managing the supply chain has become a critical element in reducing costs
improved this situation by bringing many of these data into one database.
275 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
time, based on current data. Examples of operational systems are sales order
and provide fast response. Operational systems are also called systems of
record.
availability use
Volume Many constant updates and Periodic batch updates and
rows rows
276 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
historical point-in-time and prediction data. They are also designed for complex
systems for sales trend analysis, customer segmentation, and human resources
planning.
they have quite different communities of users. Operational systems are used by
operational systems and makes them readily available for decision support
applications.
processing.
277 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data warehouse; the second is also a three-level data architecture that appears
usually from a more top-down approach that emphasizes more coordination and
278 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
reporting, and trend analysis. Data warehouses and their architectures vary
The independent data mart architecture for a data warehouse is shown in the
figure below. Building this architecture requires four basic steps (moving left to
1. Data are extracted from the various internal and external source system
2. The data from the various source systems are transformed and integrated
before being loaded into the data marts. Transactions may be sent to the
monthly. Thus, the data warehouse often does not have, nor does it need to
have, current data. Remember, the data warehouse is not (directly) supporting
279 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
such as account balances and inventory levels). For most data warehousing
applications, users are not looking for a reaction to an individual transaction but
rather for trends and patterns in the state of the organization across a large
subset of the data warehouse. At a minimum, five fiscal quarters of data are kept
discerned. Older data may be purged or archived. We will see later that one
independent data marts approach does not create one data warehouse. Instead,
this approach creates many separate data marts, each based on data
applications of a particular end-user group. Its contents are obtained either from
mart, or are derived from the data warehouse, which we will discuss in the next
finance data mart, a supply chain data mart, and so on, to support known
analytical processing. It is possible that each data mart is built using different
tools; for example, a financial data mart may be built using a proprietary
280 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
multidimensional tool like Hyperion’s Essbase, and a sales data mart may be
using MicroStrategy and other tools for reporting, querying, and data
visualization.
short-term objectives can be more compatible with the comparably lower cost
(money and organizational capital) to implement yet one more independent data
sets of short-term objectives means that you lose flexibility for the long term and
the ability to react to changing business conditions. And being able to react to
easier to have separate, small data warehouses than to get all organizational
Also, some data warehousing technologies have technical limitations for the size
of the data warehouse they can support—what we will call later a scalability
issue. Thus, technology, rather than the business, may dictate a data
requirements. We discuss the pros and cons of the independent data mart
architecture compared with its prime competing architecture in the next section.
281 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Approach
1. A separate ETL process is developed for each data mart, which can yield
2. Data marts may not be consistent with one another because they are
often developed with different technologies, and thus they may not provide a
3. There is no capability to drill down into greater detail or into related facts in
very difficult (e.g., doing joins across separate platforms for different data
4. Scaling costs are excessive because every new application that creates a
separate data mart repeats all the extract and load steps. Usually, operational
systems have limited time windows for batch data extracting, so at some
point, the load on the operations systems may mean that new technology is
5. If there is an attempt to make the separate data marts consistent, the cost to
do so is quite high.
282 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
dependent data mart and operational data store architecture. Here the new level
is the operational data store, and the data and metadata storage level is
reconfigured. The first and second limitations are addressed by loading the
central, integrated data warehouse that is the control point and single “version of
the truth” made available to end users for decision support applications.
Dependent data marts still have a purpose to provide a simplified and high-
groups. A data mart may be a separate physical database (and different data
marts may be on different platforms) or can be a logical (user view) data mart
A user group can access its data mart, and then when other data are
needed, users can access the EDW. Redundancy across dependent data marts
is planned, and redundant data are consistent because each data mart is loaded
in a synchronized way from one common source of data (or is a view of the data
enterprise data warehouse; it is not the end users’ responsibility to integrate data
across independent data marts for each query or application. The dependent
data mart and operational data store architecture is often called a “hub and
spoke” approach, in which the EDW is the hub and the source data systems and
the data marts are at the ends of input and output spokes.
283 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
applications.
An ODS typically does not contain “deep” history, whereas an EDW holds
may be fed from the database of an ERP application, but because most
organizations do not have only one ERP database and do not run all operations
against one ERP, an ODS is usually different from an ERP database. The ODS
also serves as the staging area for loading data into the EDW. The ODS may
284 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
receive data immediately or with some delay from the systems of record,
it supports.
warehousing. Those that endorse the independent data mart approach argue
2. The length of time until there is some benefit from data warehousing is
reduced because the organization is not delayed until all data are centralized.
1. Logical data marts are not physically separate databases but rather
warehouse.
285 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
2. Data are moved into the data warehouse rather than to a separate staging
need to be written.
4. . Data marts are always up to date because data in a view are created
when the view is referenced; views can be materialized if a user has a series
of queries and analysis that need to work off the same instantiation of the
data
mart.
data mart may not be small. Thus, scalable technology is often critical. A
significant burden and cost is placed on users when they themselves need to
integrate the data across separate physical data marts (if this is even possible).
As data marts are added, a data warehouse can be built in phases; the easiest
286 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
way for this to happen is to follow the logical data mart and real-time data
warehouse architecture.
The real-time data warehouse aspect of the architecture means that the
source data systems, decision support services, and the data warehouse
need for rapid response (i.e., action) to a current, comprehensive picture of the
questions and logging problem tickets will have a total picture of the customer’s
activities, and orders. With this information, the system supporting the help desk
sell what the analysis has shown to be a likely and profitable maintenance
similar profile. A critical event, such as entry of a new product order, can be
customer.
287 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
In addition of the given information above, here are some of the three common
Operational System
Flat Files
A Flat file system is a system of files in which transactional data is stored, and
Meta Data
A set of data that defines and gives information about other data.
Meta Data summarizes necessary information about data, which can make
finding and work with particular instances of data more accessible. For example,
288 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
author, data build, and data changed, and file size are examples of very basic
document metadata.
The area of the data warehouse saves all the predefined lightly and highly
289 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
We must clean and process your operational information before put it into the
warehouse.
A staging area simplifies data cleansing and consolidation for operational method
coming from multiple source systems, especially for enterprise data warehouses
290 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
production, etc.
The figure illustrates an example where purchasing, sales, and stocks are
separated. In this example, a financial analyst wants to analyze historical data for
customer behavior.
291 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
system:
1.
Separation: Analytical
and transactional
processing should be
possible.
the data volume, which has to be managed and processed, and the number of
292 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Understand and model the data in each of the three layers of the data
architecture for a data warehouse, you need to learn some basic characteristics
processing a business
transaction for a banking application. This log entry contains both status and
event data: The “before image” and “after image” represent the status of the bank
account before and then after a withdrawal. Data representing the withdrawal (or
actions (create, update, or delete). The withdrawal transaction in the above figure
293 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
leads to a single update, which is the reduction in the account balance from 750
to 700. On the other hand, the transfer of money from one account to another
would lead to two actions: two updates to handle a withdrawal and a deposit.
signal or dropped network connection, or an item put in a shopping cart and then
taken out before checkout, can also be important activities that need to be
Both status data and event data can be stored in a database. However, in
practice, most of the data stored in databases (including data warehouses) are
data or a summary (say, an hourly total) of transaction or event data. Event data,
which represent transactions, may be stored for a defined period but are then
deleted or archived to save storage space.Both status and event data are
typically stored in database logs (as represented in the figure) for backup and
recovery purposes.
previous year’s sales on the same date or during the same period. Most
operational systems are based on the use of transient data. Transient data are
data in which changes to existing records are written over previous records, thus
294 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
destroying the previous data content. Records are deleted without preserving the
previous contents of those records. You can easily visualize transient data by
again referring to Figure 9-6. If the after image is written over the before image,
the before image (containing the previous balance) is lost. However, because
this is a database log, both images are normally preserved. Periodic data are
data that are never physically altered or deleted once added to the store. The
before and after images in Figure 9-6 represent periodic data. Notice that each
record contains a time stamp that indicates the date (and time, if needed) when
Besides the periodic changes to data values outlined previously, six other kinds
warehousing:
tables and allowing null values for existing rows (if historical data exist in
event already stored in the warehouse, such as a column C for the table in
295 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
is more difficult when the new facts are more refined, such as data
associated with days of the week, not just month and year.
sales data. This change is in the grain of the data, an extremely important
topic, which we discuss later in the chapter. This can be a very difficult
change to accommodate.
5. Descriptive data are related to one another For example, store data are
6. New source of data This is a very common change, in which some new
some new operational system is installed that must feed the warehouse.
This change can cause almost any of the previously mentioned changes,
as well as the need for new extract, transform, and load processes.
accommodate all of these kinds of changes for the whole data history
the data warehouse to meet new business conditions and information and
296 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
business intelligence needs. Thus, designing the warehouse for change is very
important.
data warehouse:
In general, fast query performance with high data throughput is the key to a
Data warehouse can be controlled when the user has a shared way of explaining
1. Subject-oriented –
theme. That means the data warehousing process is proposed to handle with a
specific theme which is more defined. These themes can be sales, distributions,
marketing etc.
297 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
2. Integrated –
Integration means founding a shared entity to scale the all similar data from the
different databases. The data also required to be resided into various data
A data warehouse is built by integrating data from various sources of data such
3. Time-Variant –
In this data is maintained via different intervals of time such as weekly, monthly,
or annually etc. It founds various time limit which are structured between the
large datasets and are held in online transaction process (OLTP). The time limits
for data warehouse is wide-ranged than that of operational systems. The data
4. Non-Volatile –
298 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
As the name defines the data resided in data warehouse is permanent. It also
means that data is not erased or deleted when new data is inserted. It includes
the mammoth quantity of data that is inserted into modification between the
technologies of warehouse.
analysing historical data and in comprehension the functionality. It does not need
Functionalities such as delete, update, and insert that are done in an operational
application are lost in data warehouse environment. Two types of data operations
Data Loading
Data Access
called lookup tables, are used to store the dimension members for all levels in
the hierarchy. This is the data layer associated with logical or physical data
marts. It is the layer with which users normally interact for their decision support
applications. Ideally, the reconciled data level is designed first and is the basis for
the derived layer, whether data marts are dependent, independent, or logical. In
299 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
order to derive any data mart we might need, it is necessary that the EDW
accommodating transient and periodic data; this gives us the greatest flexibility to
combine data into the simplest form for all user needs, even those that are
The objectives that are sought with derived data are quite different from
Support ad hoc queries and data mining and other analytical applications
data:
300 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
a. Detailed data are often (but not always) periodic—that is, they provide a
historical record.
common) queries.
Data are distributed to separate data marts for different user groups.
The data model that is most commonly used for a data mart is a
like model (such models are used by relational online analytical processing
[ROLAP] tools).
Star Schema
a data warehouse or business intelligence that uses a single large fact table to
tables that store attributes about the data. It is called a star schema because the
fact table sits at the center of the logical diagram, and the small dimensional
301 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1996a).
A star schema consists of two types of tables: one fact table and one or
business, such as units sold, orders booked, and so on. Dimension tables hold
descriptive data (context) about the subjects of the business. The dimension
summarize facts in queries, reports, or graphs; thus, dimension data are usually
textual and discrete (even if numeric). A data mart might contain several star
schemas with similar dimension tables but each with a different fact table. Typical
schema database only has a single fact table. The fact table contains the specific
measurable (or
quantifiable) primary data to
be transactional -- in that
302 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
to a point in time.
schema database has at least one dimension table, but will often have many.
Each dimension table will relate to a column in the fact table with a dimension
3. In which stores are we losing money on which products? Does this vary by
quarter?
questions is shown in Figure 9-10. This example has three dimension tables:
PRODUCT, PERIOD, and STORE, and one fact table, named SALES. The fact
303 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the most flexible, able to support answering almost any question. However, more
Some sample data for this schema are shown in Figure 9-11. From the
fact table, we find (for example) the following facts for product number 110 during
period 002: 1. Thirty units were sold in store S1. The total dollar sale was 1500,
and total dollar cost was 1200. 2. Forty units were sold in store S3. The total
dollar sale was 2000, and total dollar cost was 1200.
obtained from the dimension tables. For example, in the PERIOD table, we find
that period 002 corresponds to year 2010, quarter 1, month 5. Try tracing the
Surrogate Key
and every record in a Dimension table in any Data Warehouse. It joins between
the fact and dimension tables and is necessary to handle changes in dimension
table attributes.
Surrogate keys are typically meaningless integers used to connect the fact
to the dimension tables of a data warehouse. There are various reasons why we
304 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
making it immune to any operational changes. They are used to relate the facts
in the fact table to the appropriate rows in the dimension tables, with the
business keys only occurring in the (much smaller) dimension tables to keep the
Business keys change, often slowly, over time, and we need to remember
old and new business key values for the same business object. As we will see
attribute values for the same production key over time. Thus, if a product
package changes in size, we can associate the same product production key
with several surrogate keys, each for the different package sizes.
Surrogate keys are often simpler and shorter, especially when the
Surrogate keys can be of the same length and format for all keys, no
matter what business dimensions are involved in the database, even dates.
The primary key of each dimension table is its surrogate key. The primary
key of the fact table is the composite of all the surrogate keys for the related
dimension tables, and each of the composite key attributes is obviously a foreign
305 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Fact tables provide the (usually) additive values that act as independent
variables by which dimensional attributes are analyzed. Fact tables are often
defined by their grain. The fact table grain functionality sets a new compound
primary key for a table. This means you no longer need to use connection points
for incremental uploads to fact tables. The grain of a fact table defines the lowest
The raw data of a star schema are kept in the fact table. All the data in a
fact table are determined by the same combination of composite key elements;
so, for example, if the most detailed data in a fact table are daily values, then all
measurement data must be daily in that fact table, and the lowest level of
characteristics for the period dimension must also be a day. Determining the
lowest level of detailed fact data stored is arguably the most important and
difficult data mart design step. The level of detail of this data is specified by the
intersection of all of the components of the primary key of the fact table. This
intersection of primary keys is called the grain of the fact table. Determining the
(i.e., the questions to be answered from the data mart). There is always a way to
way in the data mart to understand business activity at a level of detail finer than
306 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
design of a data mart is the amount of history to be kept; that is, the duration of
which is sufficient to see annual cycles in the data. Some businesses, such as
financial institutions, have a need for longer durations. Older data may be difficult
to source and cleanse if additional attributes are required from data sources.
Even if sources of old data are available, it may be most difficult to find old values
of dimension data, which are less likely than fact data to have been retained. Old
fact data without associated dimension data at the time of the fact may be
worthless.
As you would expect, the grain and duration of the fact table have a direct
impact on the size of that table. We can estimate the number of rows in the fact
table as follows:
with the fact table (in other words, the number of possible values for each
2. Multiply the values obtained in the first step after making any necessary
adjustments.
Let’s apply this approach to the star schema shown in Figure 9-11.
307 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Although there are 10,000 total products, only a fraction of these products
are likely to record sales during a given month. Because item totals appear in the
fact table only for items that record sales during a given month, we need to adjust
this figure. Suppose that on average 50 percent (or 5000) items record sales
during a given month. Then an estimate of the number of rows in the fact table is
computed as follows:
Thus, in our relatively small example, the fact table that contains two
years’ worth of monthly totals can be expected to have well over 100 million
rows. This example clearly illustrates that the size of the fact table is many times
larger than the dimension tables. For example, the STORE table has 1000 rows,
the PRODUCT table 10,000 rows, and the PERIOD table 24 rows. If we know the
size of each field in the fact table, we can further estimate the size (in bytes) of
that table. The fact table (named SALES) in Figure 9-11 has six fields. If each of
these fields averages four bytes in length, we can estimate the total size of the
The size of the fact table depends on both the number of dimensions and
the grain of the fact table. Suppose that after using the database shown in Figure
9-11 for a short period of time, the marketing department requests that daily
308 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
totals be accumulated in the fact table. (This is a typical evolution of a data mart.)
With the grain of the table changed to daily item totals, the number of rows is
computed as follows:
Total rows = 1000 stores X 2000 active products X 720 days (2 years)
= 1,440,000,000 row
sales on a given day. The database can now be expected to contain well over 1
Because data warehouses and data marts record facts about dimensions
over time, date and time (henceforth simply called date) is always a dimension
table, and a date surrogate key is always one of the components of the primary
table. Because a
aggregate facts on
many different
aspects of date or
different kinds of
dates, a date
309 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
characteristics of dates are country or event specific (e.g., whether the date is a
football game), modeling the date dimension can be more complex than
illustrated so far.
Modeling Dates
The figure above shows a typical design for the date dimension. As we have
seen before, a date surrogate key appears as part of the primary key of the fact
table and is the primary key of the date dimension table. The nonkey attributes of
the date dimension table include all of the characteristics of dates that users use
to categorize, summarize, and group facts that do not vary by country or event.
310 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
example, suppose that various users require different levels of aggregation (in
different fact table for each level of aggregation. The obvious trade-off is that
storage requirements may increase dramatically with each new fact table. More
commonly, multiple fact tables are needed to store facts for different
Conformed Dimension
meaning to every fact with which it relates. Conformed dimensions allow facts
and measures to be categorized and described in the same way across multiple
One or more dimension tables associated with two or more fact tables for which
the dimension tables have the same business meaning and primary key with
311 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
multiple stars. They are used to compare the measures from each star schema.
Figure 9-13 illustrates a typical situation of multiple fact tables with two related
star schemas. In this example, there are two fact tables, one at the center of
each star:
date
warehouse on a date
As is common, data about one or more business subjects (in this case, Product
and Date) need to be stored in dimension tables for each fact table, Sales and
Receipts. Two approaches have been adopted in this design to handle shared
dimension tables. In one case, because the description of the product is quite
different for sales and receipts, two separate product dimension tables have
been created. On the other hand, because users want the same descriptions of
dates, one date dimension table is used. In each case, we have created a
conformed dimension, meaning that the dimension means the same thing with
each fact table and, hence, uses the same surrogate primary keys. Even when
the two star schemas are stored in separate physical data marts, if dimensions
are conformed, there is a potential for asking questions across the data marts
(e.g., Do certain vendors recognize sales more quickly, and are they able to
312 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Work on facts and business subjects for which all users have the same
meaning.
A factless fact table is a fact table that does not have any measures. It is
keys). Factless facts are a simple collection of dimensional keys which define the
transactions or describing conditions for the time period of the fact. There are
two types of factless tables: One is for capturing an event, and one is for
describing conditions.
The most common example used for factless facts are student attendance in a
FACT_ATTENDANCE
is an amalgamation of
STUDENT_KEY, and
the CLASS_KEY.
313 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
As you can see there is nothing we can measure about a student’s attendance at
a class. The student was there and the attendance was recorded or the student
was not there and no record is recorded. It is a fact, plain and simple. There is a
derivation of this fact where you can always load the full roster of individuals
registered for the class and add a flag stating the person was in attendance.
In conclusion, factless fact tables are important dimensional data structures used
Fact tables are fully normalized because each fact depends on the whole
composite primary key and nothing but the composite key. However, dimension
tables may not be normalized. Most data warehouse experts find this acceptable
for a data mart optimized and simplified for a given user group, so that all the
dimension data are only one join away from associated facts. (Remember that
this can be done with logical data marts, so duplicate data do not need to be
denormalized dimension table cause add, update, and delete problems. In this
Multivalued Dimensions
When the relationships between the dimension member and the fact are many to
many which means the dimension members are lower granularity than the facts.
314 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Fact table should contain a one-to-one relationship with the dimension. So, we
There are situations when your data needs to represent a many to many
relationships such that your dimension members are at a lower grain than related
facts; aka multivalued dimension. In these cases, a single fact record should
relate to multiple dimension values. Here are a few examples from the Kimball
Group.
315 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Multivalued dimension
There may be a need for facts to be qualified by a set of values for the same
business subject. For example, consider the hospital example in Figure 9-15. In
this situation, a particular hospital charge and payment for a patient on a date
(e.g., for all foreign keys in the Finances fact table) is associated with one or
more diagnoses. (We indicate this with a dashed M:N relationship line between
the Diagnosis and Finances tables.) We could pick the most important diagnosis
as a component key for the Finances table, but that would mean we lose
316 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Or, we could design the Finances table with a fixed number of diagnosis keys,
more than we think is ever possible to associate with one row of the Finances
table, but this would create null components of the primary key for many rows,
associative entity between Diagnosis and Finances, in this case the Diagnosis
group table. (Thus, the dashed relationship in the Figure is not needed.) In the
“helper table,” and we will see more examples of helper tables as we progress
through subsequent sections. A helper table may have nonkey attributes (as can
any table for an associative entity); for example, the weight factor in the
Diagnosis group table of Figure above indicates the relative role each diagnosis
plays in each group, presumably normalized to a total of 100 percent for all the
diagnoses in a group. Also note that it is not possible for more than one Finances
row to be associated with the same Diagnosis group key; thus, the Diagnosis
group key is really a surrogate for the composite primary key of the Finances fact
table.
Hierarchies
Many times a dimension in a star schema forms a natural, fixed depth hierarchy.
For example, there are geographical hierarchies (e.g., markets within a state,
states within a region, and regions within a country) and product hierarchies
(packages or sizes within a product, products within bundles, and bundles within
317 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. Include all the information for each level of the hierarchy in a single
denormalized dimension table for the most detailed level of the hierarchy,
2. Normalize the dimension into a nested set of a fixed number of tables with
1:M relationships between them. Associate only the lowest level of the
hierarchy with the fact table. It will still be possible to aggregate the fact
data at any level of the hierarchy, but now the user will have to perform
nested joins along the hierarchy or be given a view of the hierarchy that is
prejoined.
When the depth of the hierarchy can be fixed, each level of the hierarchy is a
separate dimensional entity. Some hierarchies can more easily use this scheme
than can others. Consider the product hierarchy in this Figure. Here each product
is part of a product family (e.g., Crest with Tartar Control is part of Crest), and a
318 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
part of a product group (e.g., health and beauty). This works well if every product
follows this same hierarchy. Such hierarchies are very common in data
historical data. In other words, implementing one of the SCD types should enable
users to assign the proper dimension's attribute value for a given date. Examples
There are many approaches to deal with SCD. The most popular are:
Type 0 - The passive method. In this method no special action is performed upon
dimensional changes. Some dimension data can remain the same as it was first
Type 1 - Overwriting the old value. In this method no history of dimension
changes is kept in the database. The old dimension value is simply overwritten
by the new one. This type is easy to maintain and is often used for data which
319 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Type 2 - Creating a new additional record. In this methodology all history of
adding a new row with a new surrogate key to the dimension table. Both the prior
and new rows contain as attributes the natural key(or other durable identifier).
Also 'effective date' and 'current indicator' columns are used in this method.
There could be only one record with current indicator set to 'Y'. For 'effective
date' columns, i.e. start_date and end_date, the end_date for the current record
320 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Type 3 - Adding a new column. In this type usually only the current and previous
value of dimension is kept in the database. The new value is loaded into the
'current/new' column and the old one into the 'old/previous' column. Generally
speaking, history is limited to the number of columns created for storing historical
1. Use atomic facts: Eventually, users want detailed data, even if their initial
321 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
2. Create single-process fact tables: Each fact table should address the
5. Disallow null keys in fact tables: Facts apply to the combination of key
relationships.
dimension.
codes used in fact tables in associated dimension tables, which can then be
322 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
10. Balance requirements with actual data: Unfortunately, source data may
not precisely support all business requirements, so you must balance what is
Big Data
Big Data is an ill-defined term applied to databases whose size strains the ability
of commonly used relational DBMSs to capture, manage, and process the data
Big Data basically refers to the data which is in large volume and has complex
data sets. This large amount of data can be structured, semi-structured, or non-
making. Big data is a very powerful asset in today's world. Big data can also be
collected by organizations that can be mined for information and used in machine
Systems that process and store big data have become a common component
support big data analytics uses. Big data is often characterized by the three V's:
323 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the wide variety of data types frequently stored in big data systems; and
processed.
Concept of 5V’s
Big data refers to data that is so large, fast or complex that it’s difficult or
impossible to process using traditional methods. The act of accessing and storing
large amounts of information for analytics has been around for a long time.
Volume
Volume, the first of the 5 V's of big data, refers to the amount of data that exists.
Volume is like the base of big data, as it is the initial size and amount of data that
is collected. If the volume of data is large enough, it can be considered big data.
What is considered to be big data is relative, though, and will change depending
Velocity
The next of the 5 V's of big data is velocity. It refers to how quickly data is
generated and how quickly that data moves. This is an important aspect for
companies that need their data to flow quickly, so it's available at the right times
324 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
An organization that uses big data will have a large and continuous flow of data
that is being created and sent to its end destination. Data could flow from
needs to be digested and analyzed quickly, and sometimes in near real time.
wearable devices, collected data needs to be sent to its destination and analyzed
quickly.
In some cases, however, it may be better to have a limited set of collected data
than to collect more data than an organization can handle -- since this can lead
Variety
The next V in the five 5 V's of big data is variety. Variety refers to the diversity
of data types. An organization might obtain data from a number of different data
sources, which may vary in value. Data can come from sources in and outside an
325 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data is data that has not been organized into a specialized repository but has
unstructured data. Structured data, meanwhile, is data that has been organized
into a formatted repository. This means the data is made more addressable for
Veracity
Veracity is the fourth V in the 5 V's of big data. It refers to the quality and
accuracy of data. Gathered data could have missing pieces, may be inaccurate
or may not be able to provide real, valuable insight. Veracity, overall, refers to the
Data can sometimes become messy and difficult to use. A large amount of data
can cause more confusion than insights if it's incomplete. For example,
concerning the medical field, if data about what drugs a patient is taking is
Both value and veracity help define the quality and insights gathered from data.
Value
The last V in the 5 V's of big data is value. This refers to the value that big data
can provide, and it relates directly to what organizations can do with that
collected data. Being able to pull value from big data is a requirement, as the
326 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
value of big data increases significantly depending on the insights that can be
Organizations can use the same big data tools to gather and analyze the data,
but how they derive value from that data should be unique to them.
The importance of big data doesn’t simply revolve around how much data you
have. The value lies in how you use it. By taking data from any source and
analyzing it, you can find answers that 1) streamline resource management, 2)
revenue and growth opportunities and 5) enable smart decision making. When
Spotting anomalies faster and more accurately than the human eye.
insights.
to changing variables.
327 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
actions that, ultimately, can increase revenue and profits. Businesses that use it
effectively hold a potential competitive advantage over those that don't because
For example, big data provides valuable insights into customers that companies
can use to refine their marketing, advertising and promotions in order to increase
customer engagement and conversion rates. Both historical and real-time data
needs.
Big data is also used by medical researchers to identify disease signs and risk
media sites, the web and other sources gives healthcare organizations and
outbreaks.
Here are some more examples of how big data is used by organizations:
In the energy industry, big data helps oil and gas companies identify
328 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Financial services firms use big data systems for risk management
Columnar Databases
DBMS world. Both columnar and row databases can use traditional database
query languages like SQL to load data and perform queries. Both row and
columnar databases can become the backbone in a system to serve data for
storing data in columns rather than rows, the database can more precisely
access the data it needs to answer a query rather than scanning and discarding
A columnar database stores data by columns rather than by rows, which makes it
suitable for analytical query processing, and thus for data warehouses.
329 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
overall disk I/O requirements, and reduces the amount of data you need to load
from disk.
read data from disks only for those columns that are used in any given query.
The cost is that operations that affect whole rows become proportionally more
330 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
together. For example, all the values in column 1 are grouped together; then all
values in column 2 are grouped together; etc. The data is stored in record order,
so the 100th entry for column 1 and the 100th entry for column 2 belong to the
same input record. This enables individual data elements, such as customer
Here is an example of a simple database table with four columns and three rows.
0411,0412,0413;Moriarty,Richards,Diamond;Angela,Jason,Samantha;52.35,325.
82,25.50.
0411,Moriarty,Angela, 52.35;412,
Richards,Jason,325.82;0413,Diamond,Samantha,25.50.
NoSQL
331 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Short for “Not only SQL,” NoSQL is a class of database technology used to store
and access textual and other unstructured data, using more flexible structures
than the rows and columns format of relational databases. The major purpose of
using a NoSQL database is for distributed data stores with humongous data
storage needs. NoSQL is used for Big data and real-time web apps. For
example, companies like Twitter, Facebook and Google collect terabytes of user
data every single day. Carl Strozz introduced the NoSQL concept in 1998.
NoSQL databases (aka "not only SQL") are non-tabular databases and store
types based on their data model. The main types are document, key-value, wide-
column, and graph. They provide flexible schemas and scale easily with large
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive
volumes of data.
To resolve this problem, we could “scale up” our systems by upgrading our
process is expensive.
332 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
on multiple hosts whenever the load increases. This method is known as “scaling
out.”
NoSQL databases emerged in the late 2000s as the cost of storage dramatically
manage data model in order to avoid data duplication. Developers (rather than
As storage costs rapidly decreased, the amount of data that applications needed
to store and query increased. This data came in all shapes and sizes
were rethinking the way they developed software. They were recognizing the
need to rapidly adapt to changing requirements. They needed the ability to iterate
333 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
quickly and make changes throughout their software stack — all the way down to
Cloud computing also rose in popularity, and developers began using public
clouds to host their applications and data. They wanted the ability to distribute
data across multiple servers and regions to make their applications resilient, to
scale out instead of scale up, and to intelligently geo-place their data. Some
Each NoSQL database has its own unique features. At a high level, many
Flexible schemas
Horizontal scaling
The User-Interface
User Interface
334 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A variety of tools are available to query and analyze data stored in data
OLAP, MOLAP, and ROLAP tools NoSQL Short for “Not only SQL,”
Data-mining tools
Role of Metadata
that describes the data in the data mart in business terms that users can easily
understand.
The metadata associated with data marts are often referred to as a “data
“yellow pages” directory to the data in the data marts. The metadata should allow
335 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. What subjects are described in the data mart? (Typical subjects are
2. What dimensions and facts are included in the data mart? What is the
3. How are the data in the data mart derived from the enterprise data
4. How are the data in the enterprise data warehouse derived from
5. What reports and predefined queries are available to view the data?
7. Who is responsible for the quality of data in the data marts, and to whom
multidimensional views of their data. Such tools also usually offer users a
graphical interface so that they can easily analyze their data. In the simplest
Online analytical processing (OLAP) is the use of a set of query and reporting
tools that provides users with multidimensional views of their data and allows
them to analyze the data using simple windowing techniques. The term online
analytical processing is intended to contrast with the more traditional term online
336 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data model. It allows managers, and analysts to get an insight of the information
a general term for several categories of data warehouse and data mart access
Relational OLAP (ROLAP) tools use variations of SQL and view the database
or denormalized set of tables. ROLAP tools access the data warehouse or data
mart directly.
MOLAP in the next few sections because of its popularity. It is important to note
with MOLAP that the data are not simply viewed as a multidimensional
hypercube, but rather a MOLAP data mart is created by extracting data from the
data warehouse or data mart and then storing the data in a specialized separate
data store through which data can be viewed only through a multidimensional
(DOLAP), which includes OLAP functionality in the DBMS query language (there
are proprietary, non-ANSI standard SQL systems that do this), and hybrid OLAP
query languages.
337 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
OLAP Operations
Cube slicing–
cube to produce a
simple two-
dimensional table or
view.
Drill-down–
In the Figure, this slice is for the product named shoes. The resulting table shows
the cube.
338 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
summary report for the total sales of three package sizes for a given brand of
paper towels: 2-pack, 3-pack, and 6-pack. However, the towels come in different
colors, and the analyst wants a further breakdown of sales by color within each of
these package sizes. Using an OLAP tool, this breakdown can be easily obtained
The result of the drill-down is shown in Figure 9-22b. Notice that a drill-down
presentation is equivalent to adding another column to the original report. (In this
Data Mining
It is the process of finding patterns and correlations within large data sets to
organization to predict customer behavior. Data mining tools are used to build
risk models and detect fraud. Data mining is used in market analysis and
339 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
families are more likely to buy family medical coverage than single-income
families
internally and externally. The goal of CPM is to use current and historical
340 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
ways.
For example, Figure 9-25 is a simple dashboard for one financial measure,
revenue. The left panel shows dials about revenue over the past three years,
with needles indicating where these measures fall within a desirable range. Other
panels show more details to help a manager find the source of out-of-tolerance
measures.
Data Visualization
formats for human analysis. Benefits of data visualization include the ability to
better observe trends and patterns and to identify correlations and clusters. Data
visualization is often used in conjunction with data mining and other analytical
techniques.
numbers and text but as graphs. Thus, precise values are often not shown, but
rather the intent is to more readily show relationships between the data.
341 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
342 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 9:
DATA WAREHOUSING-
METHODOLOGIES
Donnabelle M. Durante
Shiela Mae E. Rosano
343 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
business. A DSS sifts (filter) through and analyzes massive amounts of data,
problems in decision-making.
sales figures or past ones from different time periods, and other inventory- or
operations-related data.
whose function is just to collect data. The DSS can either be completely
ideal systems analyze information and actually make decisions for the user. At
the very least, they allow human users to make more informed decisions at a
quicker pace..
specifications. For example, the DSS can generate information and output its
344 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
laptops. Certain DSS applications are also available through mobile devices. The
flexibility of the DSS is extremely beneficial for users who travel frequently. This
gives them the opportunity to be well-informed at all times, providing the ability to
make the best decisions for their company and customers on the go or even on
the spot.
1. Communication-driven
Example:
online collaboration and net meeting systems using Google Meet or Zoom
2. Data-driven
345 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
in particular a GIS
3. Document-driven
making.
346 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
4. Knowledge-driven
organization setting it up but may also include others interacting with the
domain.
Example:
information (tips) that can help one to improve his or her tax outcome and
financial wellness.
5. Model-driven
driven DSS use data and parameters provided by decision makers to aid
provides managers with models and analysis capabilities that can be used
347 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
are the quantities of TVs, stereos and speakers to build. The objective
function is to maximize total profits. The constraints are from the parts
inventory. Managers should be able to determine the best way to use the
data assets (anything valuable) developed with a vision of how those assets and
your information systems will inevitably interact with one another. This includes
transmitted.
warehouses are quicker and cheaper to set up. There is no need to spend more
running necessary systems such as power and cooling. And lastly, cloud-based
much faster because they use massively parallel processing (MPP), a term that
349 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
nothing". There are numerous physical nodes, each runs its instance or
each has its own task. Thus, making it a lot faster in terms of performance
Amazon
Redshift
uses MPP
350 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
architecture, breaking up large data sets into chunks which are assigned to slices
within each node. Queries perform faster because the compute nodes process
queries in each slice simultaneously. The Leader Node aggregates the results
Redshift using open source PostgreSQL JDBC and ODBC drivers. Analysts can
up in the form of clusters, which contain a collection of one or more nodes. Each
node has its own CPU, storage, and RAM. A leader node compiles queries and
a columnar storage, meaning each block of data contains values from a single
column across a number of rows, instead of a single row with values from
multiple columns.
2. Multi-Structured Data
Interprets Big Data or data set whose size or type is beyond the
the data with low latency and Analytics Infrastructure (concept that
process of extracting the value of a given data) for multiple storage data
351 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Example:
Lazada) will use a NoSQL Store for storing the session state (record or
track the while browsing the app) of the users shopping on the website
while the payment system which captures the credit card information
implement different services to use different data stores and avoid building
lead to the entire business going down. The need for polyglot data stores
is not just for high availability but also for scalability demands of an
internet-scale application.
3. Lambda Architecture
Lambda architecture
proposes a simpler,
designed to tame
352 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
reconciling the requirements of two different user groups. On the one hand, there
are users who have always had to process and evaluate data of high quality.
These are usually enriched with additional, calculated key figures. The “classic”
users need the data for specific key dates in departments such as reporting,
accounting, risk or controlling. On the other hand, there are users with a short-
term need for information who have to react quickly to events. This can be the
defective ATM for the maintenance technician, but also the next boycott call for a
fed to the batch layer and the speed layer simultaneously. It looks at all
the data at once and eventually corrects the data in the stream layer.
Here we can find lots of ETL and a traditional data warehouse. This layer
353 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The outputs from the batch layer in the form of batch views and
those coming from the speed layer in the form of near real-time views
(users see data that is only a few seconds old) get forwarded to the
serving. This layer indexes the batch views so that they can be queried in
This layer handles the data that are not already delivered in the
batch view due to the latency of the batch layer. In addition, it only deals
with recent data in order to provide a complete view of the data to the user
The query application reads data from the text file where the batch
layer stored its results. It combines and then sorts the data.
Example:
354 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
namely Hive, Spark and Kafka. The pre-system is an SAP Bank Analyzer 9 on a
HANA database.
The program (1) for loading the market data receives JSON files from the
ECB Statistical Data Warehouse via a REST call. These files are then parsed
(analyze) to extract and re-bundle the relevant data. Using the Kafka-Java-API a
into a Kafka-Topic
(2) Since only the latest version of the market data is needed, such a
topic is an easy-to-use key-value store. Of course, this step can also be done
directly in Spark and you can also skip the caching of the data in Kafka Topics.
However, the focus was to test as many interfaces as possible with a simple use
The main program for loading cash flows (3) was developed using the
Spark-Java-API. Two versions of the program were created for this purpose, one
for stream processing and a second for batch processing. Thanks to the
possibility to use Spark-Streaming for batch processing via the trigger setting
Most of the code can be used for both cases. The processing mode is simply
selected as needed via a configuration file. Such a single processing brings all
355 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
are retained, such as the reduction of costs through targeted cluster startup and
shutdown.
Using the Spark-API, the HANA database (4) is accessed and the latest
record is retrieved. The recognition runs over a column with a continuous integer
of the datatype Long, which is generated from the timestamp of the data set. This
detour had to be taken because the SAP timestamp is not compatible with the
Spark timestamp in this case. The loading is then done in so-called microbatch
requests, which are sent to the HANA DB at certain time intervals and retrieve all
data since the last microbatch by querying the number just described. This
conventional Spark batch retrieval, all data from the last processed time stamp
would be retrieved, but would then have to be managed and stored by the user.
The Spark Streaming API does this automatically using the checkpoint files, as
explained above.
The latest market data is directly loaded from the aforementioned Kafka
Topic (5) via the Spark-Kafka implementation and is provided to the FTP library
for discounting cash flows. The library interpolates the grid points of the yield
curve to the due dates of the cash flow and discounts the cash flow accordingly.
The resulting dataframe is then checked with the help of a delta library for
changes of records already available in Hive and applies a filter if necessary. For
this purpose, the contents of the relevant fields are hashed and compared with
356 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the values in the target table. If there is a match, the corresponding row is filtered
The result including the hash values is written to a partitioned hive table (6) by
Spark. The partitioning by month and year helps to keep the performance of
4. Hybrid Architecture
cloud for your cloud adoption. Therefore, selecting the right cloud source
Service (IaaS) platform. IaaS is one of the three main categories of cloud
the internet.
357 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
358 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
often data warehouses, data
359 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Data staging
transient (temporary)
prior to running an
ETL process or
immediately following
volumes of raw data from multiple sources, converts it for analysis, and loads
Extraction
In the first step, extracted data sets come from a source, say for example
from SQL server into a staging area. The staging area acts as a buffer between
the data warehouse and the source data. Since data may be coming from
the data to the warehouse may result in corrupted data. The staging area is used
360 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Transformation
The data cleaning and organization stage is the transformation stage. All of
that data from multiple source systems will be normalized and converted to a
single system format — improving data quality and compliance. ETL yields
Loading
Finally, data that has been extracted to a staging area and transformed is
loaded into your data warehouse. Depending upon your business needs, data
can be loaded in batches or all at once. The exact nature of the loading will
depend upon the data source, ETL tools, and various other factors.
Multidimensional Model
For example, a shop may create a sales data warehouse to keep records
of its sales for the dimension time, item, and location. These dimensions allow
the save to keep track of things such as monthly sales of items and the locations
361 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
at which the items were sold. Each dimension has a table related to it, called a
dimensional table.
Consider the data of a shop for items sold per quarter in the city of Delhi.
The data is shown in the table. In this 2D representation, the sales for Delhi are
shown for the time dimension (organized in quarters) and the item dimension
(classified according to the types of an item sold). The fact or measure displayed
362 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Now, if we want to view the sales data with a third dimension, For
example, suppose the data according to time and item, as well as the location is
considered for the cities Chennai, Kolkata, Mumbai, and Delhi. These 3D data
are shown in the table. The 3D data of the table are represented as a series of
2D tables.
363 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
create cubes to support fast response times, and to provide a single data source
364 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
third-party solutions.
META-DATA
Think about the last time you searched Google. That search started with the
metadata you had in your mind about something you wanted to find. You may
have begun with a word, phrase, meme, place name, slang or something else.
The possibilities for describing things seem endless. Certainly metadata schema
can be simple or complex, but they all have some things in common.
EXAMPLE
365 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
information like the author, file size, the date the document was created, and
keywords to describe the document. Metadata for a music file might include the
In their most early stages, many companies have Data Bases. The data is
storage. Unless extrapolated and manually analyzed, this data sits where it is
and does not impact ongoing business functions. Transactions such as loading
https://glosbe.com
This is the initial stage where data is simply copied to a server from an operating
system.
366 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
While not entirely up-to-date, offline Data Warehouses regularly update their
data structures, the organized data meets the particular objectives of the Data
Warehouse.
Offline Data Warehouse: In this stage, all the data warehouses are
Integrated Data Warehouse. Integrated Data Warehouses are the ideal Data
Warehouse stage with the data not just readily available but also updated and
accurate.
367 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
transactions which are used daily by the organization are passed back into
Storage is a fairly simple choice. You can host your data warehouse on-
is, according to some, on its way out. Cloud hosting is much cheaper and more
flexible because you’re renting space on another’s server. You don’t need to run
maintenance, you can expand and cut back as needed, and there is an ever-
expanding set of features added each year. Bridging the gap between these two
To get data into your data warehouse, you need to use a type
process where the data is extracted, made ready for use, then loaded into the
data warehouse.
keeping a data warehouse running because it’s not just a system; it’s a “full-
What is OLAP?
368 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
intelligence (BI) space over 20 years ago, in a time where computer hardware
and software technology weren’t nearly as powerful as they are today. OLAP
Aggregating, grouping, and joining data are the most difficult types of
queries for a relational database to process. The magic behind OLAP derives
from its ability to pre-calculate and pre-aggregate data. Otherwise, end users
would be spending most of their time waiting for query results to be returned by
the database.
Vendors offer a variety of OLAP products that can be grouped into three
What is ROLAP?
data in columns and rows (also known as relational tables) and retrieves the
handle large data volumes, but the larger the data, the slower the processing
times.
What is MOLAP?
369 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Its speedy data retrieval makes it the best for “slicing and dicing” operations.
What is HOLAP?
suggests, the HOLAP storage mode connects attributes of both MOLAP and
ROLAP. Since HOLAP involves storing part of your data in a ROLAP store and
With this use of the two OLAPs, the data is stored in both
one of the databases depends on which is most appropriate for the requested
processing application or type. This setup allows much more flexibility for
The problems associated with developing and managing a data warehousing are
as follows:
the data into the warehouse. It may take the significant proportion of the total
370 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
development time, although some tools are there which are used to reduce the
In some cases the required data is not captured by the source systems
which may be very important for the data warehouse purpose. For example the
date of registration for the property may be not used in source system but it may
High maintenance
the business processes and the source systems may affect the data warehouse
Data ownership
data. Sensitive data that owned by one department has to be loaded in data
warehouse for decision making purpose. But some time it results in to reluctance
371 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 10:
Gabotero, Stephanie S.
Tiolo, Michelle Anne M.
372 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
jointly owned by IT and the business. Successful data governance will require
support from upper management in the firm. A key role in enabling success of
Data steward
applications properly support the organization’s enterprise goals for data quality.
corporate resource
organization, and
373 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
employees and the public from accounting errors and fraudulent financial
representatives from each major business unit who have the authority to make
business policy decisions can contribute to the establishment of high data quality
stewards.
3. Data stewards for different business units, data subjects, source systems,
goals, coordinate activities, and provide guidelines and standards for all
374 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
1. Transparency
The data that serves as the foundation of these systems must be good
High-quality data—that is, data that are accurate, consistent, and available
today.
in their data.
375 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
o Not only is quality data essential for SOX and Basel II (Europe)
know all aspects of customer activity with your organization will help
Redman (2004) summarizes data quality as “fit for their intended uses in
operations, decision making, and planning.” In other words, this means that data
1. Uniqueness
within the database, and there is a key that can be used to uniquely
2. Accuracy
3. Consistency
376 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
(database) are in agreement with the values for related data in another
4. Completeness
to have values.
5. Timeliness
when data are expected and when they are readily available for use
6. Currency
useful.
7. Conformance
8. Referential integrity
requirements to exist
377 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Much of an
organization’s data
originates outside
the organization,
data sources to
comply with
expectations of the
receiving
organization.
other improvements in data entry control— are tied for the number-one
378 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
For a variety of reasons, many organizations simply have not made the
than as an IT project.
with an audit of data to understand the extent and nature of data quality
problems.
accountable for the quality of the data for which they are responsible.
As noted earlier, lax data entry is a major source of poor data quality, so
improvement program.
For simplicity, we summarize what Inmon recommends only for the original data
capture step:
379 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
ii. Where data must be entered manually, ensure that it is selected from
preset options.
iv. Follow good user interface design principles that create consistent
screen layouts, easy to follow navigation paths, clear data entry masks
make sure that only high-quality data enter the database; when
the data.
Powerful software is now available that can assist users with the technical
Ensuring the quality of data that enters databases and data warehouses is
380 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
organization, one would likely find that certain categories of data are referenced
more frequently than others across the enterprise in operational and analytical
system.
2006).
3 popular architectures
particular data
381 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
3. Persistent Approach
Data Integration
unified view.
master server for data. The master server then intakes the needed data
from internal and external sources. The data is extracted from the
sources, then consolidated into a single, cohesive data set. This is served
Application Integration
382 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Each individual application has a particular way it emits and accepts data,
You only need to enter your data into a system once, then your
interoperability.
(e.g., selling and billing) so that applications can be shared and more
systems.
approach:
1. Data Consolidation
383 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data warehouse.
consolidation is to
2. Data Federation
as one.
without actually
physical,
centralized
database.
384 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
data:
3. Data Propagation
application to
from one
location(source) to
another
location(destination).
385 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Detailed
Historical Transient
application.
Timely
Often of poor quality
Quality
controlled
It helps you for extracting accurate and reliable information about the state
2. During subsequent updates to keep the EDW current and/or to expand it.
386 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
2. Extract
Static Extract
387 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
extracted.
Incremental Extract
the data which have changed from the time when the
3. Cleanse
quality.
data.
is to load the selected data into the target data warehouse and
The two basic modes for loading data to the target EDW:
Refresh mode
388 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Update mode
warehouse.
Data transformation
the goal of data transformation is to convert the data format from the
1. Record-level functions
Selection
Joining
table or view.
Normalization
389 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Aggregation
summary level.
2. Field-level function
translate data
new form.
transforming a
data.
390 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Multi-field transformation
- may involve more than one source record and/or more than
391 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
392 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
CHAPTER 11:
Garcia, Janah G.
Notario, Don
Trias, Angela B.
393 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
A Data administrator is the one who are responsible for processing data which is
role with some technical role which also called as Data Analyst, that why this is
more likely a high level function which is responsible for the overall management
definition and standard. The head of data administration is a senior- level person
who required to have a high level of both managerial and technical skill. Data
database technology.
Responsibilities:
person
technology, controls the organization design and use the database. It provides
person but a person that can understand the business to administrate the
database effectively.
394 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Responsibilities:
hardware is suitable to use in the company. To know how much it will cost,
2. Managing Data Integrity- the DBA need to protect the data from
unauthorized use.
3. Decides Data Recovery and Back up method- the DBA need to back up
the entire database in case of breach. DBA need also to recover the data
in case of loss
authorized users.
5. Capacity Issues- DBA need to know the maximum limit of storing data.
395 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
10. Decides content of the database- DBA decide what will be the structure of
11. Provides help and support to user- DBA is responsible also to help the
13. Improve query processing performance- question made by the users need
Open source movement is a term that referred to open source software. The
open source movement is a code that people can modify and share because it’s
software that a computer users can’t see. A programmers who can access the
computer program’s source code can improve the program, by adding either
feature or fixing parts. The example of open source software is LibreOffice and
compatible with other office productivity like Microsoft Office. It runs on Microsoft
396 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
It is often cheaper but more flexible and has more longevity because it developed
The most common reason why people choose the open source
1. Peer review- the open source code is free and accessible to all, that’s why
2. Transparency- open source help to track and check if there’s any changes
3. Reliability- the open source code constantly updated through active open
source community.
6. No vendor locking- because it is free to be used and you can take your
7. Open collaboration- the active source communities can help you to get
manipulate the data itself, data format, field name, record and file structure,
Components of a DBMS
397 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
to store data
dictionary, the DBMS uses this to verify the user who request for the data.
3. Database access language- the DBMS most provide API to access the
data.
signing into your Facebook account using your phone, the mobile
application
4. Lock manager- locks are required to make sure that multiple user can’t
4. Log Manager- record all data changes to makes sure that the records are
accurate and efficient. The DBMS uses the log manager during shutdown
4. Data Utilities- include reorganization, run stats, backup and copy, recover,
398 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Central storage and management of data within the DBMS provides the
following:
data security;
same data;
reservations.
399 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
access to those files even it can be only open by an authorized user, it can
virus can spread from one computer and computer system to another, this
The 2017 WannaCry Ransomware Attack Was One Of The Most Widespread
hostage. It does either they will lock you out of your computer so you can’t use it
(called lock ransomware) or encrypting your valuable files so you can’t able to
400 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Here are some basics of data security that are often included in any data security
management plan:
1. Backups- ensure that you have another copy of all the file to easily
recovery in case that there might be happen like breach, computer viruses
3. Data Erasure- a method when all the data in the computer is wiped clean
4. Encrypted- the process which the data is scrambled and encoded, only
the another entity can decode the data using encryption key
6. One time password- the password that only work in one network session
or transaction
401 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
files, intruder who access the data using different server can’t read or
10. Cloud access security broker- Software that works between users of a
cloud service and the cloud applications. The software monitors activity
11. Big data security- securing the extremely large amount of data that add
another level of security by using security tools. The Hadoop can be used
12. Payment security, mobile app security, web browser security email
access
Database software is used to create, edit, and maintain database files and
records, enabling easier file and record creation, data entry, data editing,
updating, and reporting. The software also handles data storage, backup and
(DBMS). It’s primarily used for storing, modifying, extracting and searching for
402 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
security threats.
data in a structured form and then access it. It typically has a graphical interface
to help create and manage the data and, in some cases, users can construct
interface between the database and its end users or programs, allowing users to
retrieve, update, and manage how the information is organized and optimized. A
and recovery.
through the software. Administrators input SQL queries to prompt the system to
perform an action, such as retrieving a specific set of data. However, there are
also databases that use other means for retrieving information in addition to SQL.
The most widely-used databases consist of a basic set of columns and rows that
display information retrieved using SQL. However, more complex software has
403 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
such as XML.
support. While open-source software may lack this support, they make up for it
Database software exists to protect the information in the database and ensure
that it’s both accurate and consistent. Its functions include storage, backup and
recovery, and presentation and reporting. It can also help your team with multi-
Database Challenges
Data breaches are happening everywhere these days, and hackers are
getting more inventive. It’s more important than ever to ensure that data is
404 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
of new opportunities.
and patches. As databases become more complex and data volumes grow,
companies are faced with the expense of hiring additional talent to monitor
A business needs to grow if it’s going to survive, and its data management
must grow along with it. But it’s very difficult for database administrators to
predict how much capacity the company will need, particularly with on-
premises databases.
Some organizations have use cases that are better suited to run on-premises.
optimized for running the database are ideal. Customers achieve higher
405 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
to input queries to direct you to the exact data you’re searching for.
location in the event of an outage or data breach. It can then use these
406 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
previous state.
what information users’ access, the frequency at which they access it,
efficiency.
USER ROLES
Part of what allows database software to improve efficiency and maintain security
is the ability to assign roles to users that authorize or restrict access to certain
portions of a network. This ensures that users only have access to the assets
they need to do their job. The primary roles include the following:
database. They are able to view and manage the most sensitive
more.
407 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
End users: These users typically have the most restricted access. and
can only retrieve, update, share and delete information relevant to their
duties. At most, they can retrieve, update, share and delete information
only in the applications that are essential to their jobs. In some cases,
they are confined to read-only access. This only allows users to view
USER INTERACTION
entry forms are created to input this information for each file. This
offers an ‘Edit’ mode to make these changes. However, each file will
408 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
permissions.
View and query data: Besides storing data, one of the primary uses of
database activity. It also has features that allow users to pull this
decisions.
There are multiple different types of database software that are typically broken
that can pull and store data from a variety of databases. Data sets from
to improve data integrity.
409 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
the internet.
columns and allows for easy querying using SQL. RDBMS are mostly
RDBMS. The name of this technology stands for “not only SQL.”
410 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
improved performance.
database.
customization.
411 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
It is also more scalable than on-premise software, as it’s not limited by hardware.
Because they have so many uses, there are dozens of database software
Microsoft SQL Server: Microsoft’s SQL server is one of the oldest players in the
game, first released in 1989. It’s mainly used for Windows-based systems but
Oracle RDBMS: This tool is one of the most popular database software options
for enterprise organizations as it can support large databases but maintains good
IBM DB2: IBM DB2 was also an early contender in the database software space,
introduced in 1983. It’s praised for its simple deployment, installation and
sharding.
hosting providers to bundle MySQL with their offerings making it a popular tool
412 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
for web developers. It can handle robust sets of data but its relatively simple
another DBaaS offering that is easy to use. It allows users to structure, connect
and extend data without the need for any coding. It’s already gained a notable
organizations can now leverage the data they collect to run more efficiently,
enable better decision-making, and become more agile and scalable. Optimizing
more data volume to track. It’s critical to have a platform that can deliver the
performance, scale, and agility that businesses need as they grow over time.
413 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
become more proactive with their data. By having direct control over the ability to
create and use databases, users gain control and autonomy while still
tuning, security, backups, updates, and other routine management tasks. With
secure their data, enabling performance advantages, lower costs, and improved
security.
Database security refers to the range of tools, controls, and measures designed
odds with database usability. The more accessible and usable the database, the
414 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
The physical database server and/or the virtual database server and the
underlying hardware
Encryption
unreadable to anyone without the decryption key. The general idea is to make
accessing the unauthorized data. There are two situations where data encryption
can be deployed: data in transit and data at rest. In a database context, data “at
rest” encryption protects data stored in the database, whereas data “in transit”
Encrypting data at rest is undertaken to prohibit “behind the scenes” snooping for
gains access to the data behind the scenes, without the decryption key the data
415 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Encrypting data in transit protects against network packet sniffing. If the data is
encrypted before it is sent over the network and decrypted upon receipt at its
access the data when in route will receive only encrypted data. And again,
without the decryption key, the data cannot be deciphered. Data in transit
granular security scheme. LBAC can be set up to specify who can read and
LBAC is not for every application; it is geared more for top-secret, governmental,
Any attempted access to a protected column when the LBAC credentials do not
permit that access will fail. If users try to read protected rows not allowed by their
LBAC credentials, the DBMS simply acts as if those rows do not exist. This is
416 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
important because sometimes even the knowledge that the data exists (without
Data Masking
inappropriate visibility by replacing it with gibberish or realistic but not real data.
Protecting sensitive data using data masking can prevent fraud, identity theft,
A good data masking solution should offer the ability to mask using multiple
products.
Staying Up-to-Date
your DBMS. Understand what is available to you and what you may need to
Database Backup
Database Backup is storage of data that means the copy of the data. It is
a safeguard against unexpected data loss and application errors. It protects the
417 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
database against data loss. If the original data is lost, then using the backup it
can reconstructed.
The backups are divided into two types, Physical Backup and Logical Backup
1. Physical backups
Physical Backups are the backups of the physical files used in storing and
some other location, such as disk, some offline storage like magnetic tape.
Physical backups are the foundation of the recovery mechanism in the database.
Provides the minute details about the transaction and modification to the
database.
2. Logical backup
includes backup of logical data like views, procedures, functions, tables, etc. It is
sufficient protection against data loss without physical backups, because logical
Importance Of Backups
system, software and any other kind of failures that cause a serious data crash. It
Methods of Backup
418 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Full Backup - This method takes a lot of time as the full copy of the database is
Transaction Log - Only the transaction logs are saved as the backup in this
method. To keep the backup file as small as possible, the previous transaction
Differential Backup - This is similar to full back up in that it stores both the data
and the transaction records. However only that information is saved in the
backup that has changed since the last full backup. Because of this, differential
1. System Crash
external factors like a power failure. The data in the secondary memory is not
affected when system crashes because the database has lots of integrity.
2. Transaction Failure
because of logical errors in the code. This failure occurs when there are system
transaction.
3. Network Failure
419 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
4. Disk Failure
Disk Failure occurs when there are issues with hard disks like formation of
Media failure is the most dangerous failure because, it takes more time to
recover than any other kind of failures. A disk controller or disk head crash is a
6. User Error
in a database. To rectify the error, the database needs to be restored to the point
Redundancy
technology in which the same piece of data is held in two separate places. This
can mean two different fields within a single database, or two different spots in
basically constitutes data redundancy. This can occur by accident, but is also
Hardware redundancy
420 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
been exhausted, hardware redundancy may be the only way to improve the
dependability of a system.
What Is Recovery?
transaction tables.
There are two methods that are primarily used for database recovery. These are:
failure, the database can recover the data. All log information, such as the
time of the transaction, its data etc. should be stored before the
transaction is executed.
421 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
database.
Concurrency Control
Concurrent access is quite easy if all users are just reading data. There is no way
they can interfere with one another. Though for any practical Database, it would
have a mix of READ and WRITE operations and hence the concurrency is a
challenge.
Lost Updates - occur when multiple transactions select the same row and update
same row several times and reads different data each time.
Incorrect Summary issue - occurs when one transaction takes summary over the
value of all the instances of a repeated data-item, and second transaction update
few instances of that specific data-item. In that situation, the resulting summary
422 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
obstructions
amount of concurrency they allow and the amount of overhead that they impose.
Read or Write the data until it acquires an appropriate lock. Lock based protocols
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
A shared lock is also called a Read-only lock. With the shared lock, the data item
can be shared between transactions. This is because you will never have
With the Exclusive Lock, a data item can be read as well as written. This is
exclusive and can’t be held concurrently on the same data item. X-lock is
423 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
requested using lock-x instruction. Transactions may unlock the data item after
object before beginning operation. Transactions may unlock the data item after
4. Pre-claiming Locking
required data items which are needed to initiate an execution process. In the
situation when all locks are granted, the transaction executes. After that, all locks
Starvation
Deadlock
Deadlock refers to a specific situation where two or more processes are waiting
for each other to release a resource or more than two processes are waiting for
424 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
transaction data which blocks other transactions to access the same data
simultaneously.
DBMS.
First phase, when the transaction begins to execute, it requires permission for
When a transaction releases its first lock, the third phase starts.
Third phase, the transaction cannot demand any new locks. Instead, it only
Growing Phase: In this phase transaction may obtain locks but may not release
any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain
425 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Strict-Two phase locking system is almost similar to 2PL. The only difference is
that Strict-2PL never releases a lock after using it. It holds all the locks until the
commit point and releases all the locks at one go when the process is over.
Centralized 2PL
Primary copy 2PL mechanism, many lock managers are distributed to different
sites. After that, a particular lock manager is responsible for managing the lock
for a set of data items. When the primary copy has been updated, the change is
Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all
sites. They are responsible for managing locks for data at that site. If no data is
The older transaction is always given priority in this method. It uses system time
to determine the time stamp of the transaction. This is the most commonly used
concurrency protocol.
426 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Lock-based protocols help you to manage the order between the conflicting
protocol, the local copies of the transaction data are updated rather than the data
Read Phase
Validation Phase
Write Phase
Read Phase: In the Read Phase, the data values from the database can be read
by a transaction but the write operation or updates are only applied to the local
Validation Phase
Write Phase
In the Write Phase, the updates are applied to the database if the validation is
successful, else; the updates are not applied, and the transaction is rolled back.
427 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
minimize overhead.
transactions.
meanings and purposes of data elements within the context of a project, and
Data Dictionary also provides metadata about data elements. The metadata
included in a Data Dictionary can assist in defining the scope and characteristics
of data elements, as well the rules for their usage and application. A data
model for the benefit of programmers and others who need to refer to them.
relationship to other objects. This process is called data modeling and results in a
structure that implicitly describes relationship. The type of data, such as text or
image or binary value, is described, possible predefined default values are listed
428 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
and a brief textual description is provided. This data collection can be organized
There are two types of data dictionaries. Active and passive data dictionaries
• Active data dictionaries. These are data dictionaries created within the
host databases. This avoids any discrepancies between the data dictionaries and
databases -- separate from the databases they describe -- for the purpose of
step to stay in sync with the databases they describe and must be handled with
Specific contents in a data dictionary can vary. In general, these components are
429 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
• System-level diagrams
• Reference data
• Business rules (such as for validation of data quality and schema objects)
Data dictionaries can be a valuable tool for the organization and management of
• Easily searchable
programs
• No data redundancy
Though they provide thorough listings of data attributes, data dictionaries may be
Data Standards are rules that govern the way data are collected, recorded, and
By using standards, researchers in the same disciplines will know that the way
their data are being collected and described will be the same across different
projects. Using Data Standards as part of a well-crafted Data Dictionary can help
431 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
increase the usability of your research data, and will ensure that data will be
Databases are the guts of an application; without them, you're left with just skins
and skeletons, which aren't as useful on their own. Therefore, the overall
are dozens of factors that affect performance including how indexes are used,
how queries are structured and how data is modeled. Consequently, making
minor adjustments to any of these elements can have a large impact. Database
tuning SQL Server or Oracle queries for enhanced performance. The goal of
best used, including deploying clusters, and working toward optimal database
companies can’t afford barriers to data access. One of the best ways to navigate
like a car needs standard tuning and maintenance, database engines and the
are working as they should and performing optimally. Database tuning can be an
incredibly difficult task, particularly when working with large-scale data where
432 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
even the most minor change can have a dramatic (positive or negative) impact
who have to perform DBA-like tasks; meanwhile, DBAs often struggle to work
Tuning the databases enhances the performance but it is only the first step in
organize data in a way that makes retrieving information much easier. Without
queries, even the response is incorrect or the query takes too long to perform.
Table statistics are used to generate optimal execution plans. If the performance
tuning tool is using out-of-date statistics, the plan won’t be optimized for the
current situation.
indexed field inside the table. If the database engine must scan all the rows in a
table to find what it’s looking for, the delivery speed of your query results suffers.
Other queries may suffer as well, since scanning all of that data into memory will
433 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
cause the CPU utilization to spike and not allow other queries any time in
memory.
3. Avoid SELECT *
This tip is particularly important if you have a large table (think hundreds of
include them individually instead of wasting time querying for all the data. Again,
reading extra data will cause CPU utilization to spike and memory to be
thrashed. You should check the Page Life Expectancy (PLE) to make sure you
4. Use constraints
Constraints are an effective way to speed up queries and helps the SQL
optimizer come up with a better execution plan, but the improved performance
comes at the cost of the data requiring more memory. The increased query
speed may be worth it depending on the business objective, but it’s important to
The estimated execution plan is helpful when you are writing queries because it
gives you a preview of how the plan will run, but it is blind to parameter data
types which could be wrong. To get the best results when performance tuning,
it’s often better to review the actual execution plan because it uses the latest,
434 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
Making too many changes at once tends to muddy the waters. A better, more
efficient approach to query tuning is to make changes with the most expensive
Before you dive into troubleshooting I/O directly, first try adjusting indexes and
query tuning. Consider using a covering index that includes all the columns in the
query, this reduces the need to go back to the table as it can get all the columns
from the index. Adjusting indexes and query tuning have a high impact on almost
all areas of performance, so when they are optimized, many other performance
Utilizing artificial intelligence to analyze your execution plan and determine how
When optimizing SQL queries, be sure to highlight changes in the SQL statement
so you can compare the original statement with the optimized version. Gather a
baseline metric such as logical I/O to compare against as you tune. Don’t make
any changes until you are sure the optimized version is accurate (i.e., includes
Automated SQL optimization tools not only analyze your SQL statement but can
also automatically rewrite it or optimize indexes until it finds the variation that
DATA AVAILABILITY
have your data available 24x7x365, which will permit your business to run
with data management, so designing a system that can work around those
issues while still delivering data is essential. Data availability is primarily used to
create service level agreements (SLA) and similar service contracts, which define
ability to guarantee reliable access to data. Organizations must keep crucial data
available and shorten data outage times as much as possible. To achieve data
availability, organizations must be able to quickly repair all hardware failures and
services, policies and procedures that ensure that data is available in normal and
436 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
more. Storage area networks (SAN), network attached storage and RAID-based
data availability.
There are several issues that can affect the availability of your data:
Host server failures—if the server that stores your data fails, your data will
become unavailable.
Storage failures—if your physical storage device fails, you can no longer access
Network crash—if the network crashes, the host server becomes inaccessible
Legacy data—data that is too outdated can become unusable. You can use data
transformation tools to make older data readily accessible, but these do not
always work.
437 | P a g e
UNIVERSITY OF CALOOCAN CITY
Biglang Awa St. Grace Park East, Caloocan City
• The use of data loss prevention tools. DLP tools can help mitigate data
• Erasure coding. This data protection method breaks data into fragments,
expands it and then encodes it with redundant data pieces. The data is then
stored across a set of different locations or storage devices. If a drive fails or data
becomes corrupted, the data can be reconstructed from the segments stored on
is lost.
438 | P a g e