Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Data Management

and
Database Design
INFO 6210
Week #3

Northeastern University
What have we learnt last week?

DATA MODELING ENTITY AND ATTRIBUTES RELATIONSHIPS

2
Before we start....

• How did the Survey form modeling exercise go?


• Any questions?
• Can I see the model?

3
So, This week....

• database management system rules


• E.F. Codd rules
• Relational and Non-Relational databases
• Why we need
• When to use
• Normalization
• SQL Introduction

4
E. F. Codd Rules

5
12 Rules

Information Physical Data Logical Data


representation Independence Independence

High level
Guaranteed The Distribution
update, Insert,
Access Rule
Delete

Systematic
treatment of View updating Non-Subversion
NULL values

Comprehensive
Database
data sub Integrity Rule
description rule
language
6
12 Rules

• Information Rule
• All information including metadata stored in table.

• Guaranteed Access
• Every value of data must be logically addressable using combination of –
• TABLE Name [ENTITY]
• COLUMN Name [ATTRIBUTE]
• Primary Key

• Systematic treatment of NULL values


• NULL value is representation of missing information
• Support for NULL values must be consistent and independent of data types

7
12 Rules

• Database description Rule / Data Catalog


• Metadata to be stored in form of tables
• Allow users with appropriate authority to access catalog data
• Same query language should be used

• Comprehensive Data sublanguage


• RDBMS manageable through its own extension of SQL
• SQL should support –
• Data Definition
• Views
• Data Manipulation
• Integrity constraints
• Authorization
• Transaction boundary

8
12 Rules

• View Updatable
• All views that are updatable in theory should be updatable by the system

• Insert, Update, Delete Rule


• RDBMS must do more than just be able to retrieve relational data sets
• Should be capable to Modify data as a relational set

• Physical Data Independence


• Physical data storage should not impact the system
• User access must remain logically consistent even when storage changes
• Application must be limited to interfacing with logical layer

9
12 Rules

• Logical Data Independence


• Application programmers must be independent of table changes
• A table should be divisible into 1 or more other tables, Provided it –
• Preserves all the data (No Data loss)
• Maintains Primary Key in every fragment

• Distribution Rule
• RDBMS must have distribution independence
• Geographically distributed database with data stored in pieces is transparent to users.

10
12 Rules

• Non-Subversion
• Row level access to modify should not bypass integrity constraints
• RDBMS must be governed by relational rules as its primary laws

• Integrity Rule
• Database should be able to enforce its own integrity
• Integrity constraints defined in relation data and storable in catalog
• Should not depend on application programs

11
12 Rules

• Non-Subversion
• Row level access to modify should not bypass integrity constraints
• RDBMS must be governed by relational rules as its primary laws

• Integrity Rule
• Database should be able to enforce its own integrity
• Integrity constraints defined in relation data and storable in catalog
• Should not depend on application programs

Database that satisfies at least 6 of 12 is accepted as full-fledged RDBMS

12
Non-Relational Databases – NoSQL

• While a SQL database is a defined, concrete concept, NoSQL is not.


• Dynamic schema for unstructured data
• data is stored in many ways:
• column-oriented
• document-oriented
• Key Value store
Non-Relational Databases – NoSQL

Few good things…


• You can create documents without having to first define their structure
• Each document can have its own unique structure
• Do not require a defined schema structure
• Support wide range of modern programming languages and tools
• Primarily eventually-consistent by default
Popular Non-Relational Databases

• MongoDB by MongoDB Inc.


• DynamoDB by Amazon
• Cassandra from Apache foundation
• Coachbase from Apache foundation
Relational Vs Non-Relational
How to Decide?

• When it comes to choosing a database, one of the biggest decisions is picking SQL vs NoSQL

• Relational (SQL) is a Strong choice for –


• Clear business rules and predefined structure
• Applications that involve in transaction management
• Link information from different tables to uniquely identify data sets
• Maintain data integrity and relationships (Primary key, Foreign key)
• Consistency
• Support complex SQL to manage and maintain data
Relational Vs Non-Relational
How to Decide?

• Non-Relational (NoSQL) is a Strong choice for –

• Any businesses that have rapid growth or databases with no clear schema
definitions

• Applications that cannot define a schema for database, if you find yourself de-
normalizing data schemas, or if your schema continues to change
• Very high performance
• Eventually consistency
Terminology

• Relation is table with rows and columns


Attributes
• Attribute is a named column of a relation

• Degree of a Relation is number of attributes

• Tuple is row in a relation

Cardinality
Relation
• Cardinality of a relation is the number of tuples

Degree
Relation / Table characteristics

• A table is a two-dimensional structure composed of rows and columns.


• Each table row (Tuple) represents a single entity occurrence within the entity set.
• Each table column represents an attribute, and each column has a distinct name.
• Each row/column intersection represents a single data value.
• All values in a column must conform to the same data format.
• Each column has a specific range of values known as the attribute domain.
• The order of the rows and columns is immaterial to the DBMS.
• Each table must have an attribute or a combination of attributes that uniquely identifies each
row.
Keys

• In Relational model Keys are important as –


🔧It allows us to uniquely identify a tuple
🔧It can be a combination of one or more attributes

20
Keys

• In Relational model Keys are important as –


🔧It allows us to uniquely identify a tuple
🔧It can be a combination of one or more attributes

Different types of keys -


🔑 Natural Key
🔑 Super Key
🔑 Candidate Key
🔑 Primary Key
21
Natural Key

• Natural keys is –
• An Identifier which can be used to uniquely identify data
• Example: Invoice number, Registration number, Part number, etc.,
• If an entity has Natural identifier –
• We can use it as Primary key
• Most of the natural keys are acceptable as primary key identifiers

22
Natural Key

• Natural keys is –
• An Identifier which can be used to uniquely identify data
• Example: Invoice number, Registration number, Part number, etc.,
• If an entity has Natural identifier –
• We can use it as Primary key for the respective entity
• Most of the natural keys are acceptable as primary key identifiers

• Foreign key is an attribute that is mapped/linked to another table’s Primary key

23
Super Key

1 or more attributes, that uniquely identifies a tuple in a relation

Example: Entity Product (Product_Id, Product_Name, Manufacturer)


• Product_Id
• Product_Id, ProductName
• Product_Id, ProductName, Manufacturer
• Product_Id, Manufacturer
• Product_Name, Manufacturer

Each super key can uniquely identify each tuple

24
Candidate key

Candidate key is a super key with no redundancy


• A proper subset of a super key (Least number of columns)
• The set of candidate keys form the base for selection of a single primary key

Example: Entity Product (Product_Id, Product_Name, Manufacturer)


• Product_Id
• Product_Id, ProductName
• Product_Id, ProductName, Manufacturer
• Product_Id, Manufacturer
• Product_Name, Manufacturer
Product_id is eligible for Candidate Key

25
Primary key

• Characteristics
• Unique values
• No change over a period of time
• Non intelligent
• Minimum number of Attributes (Recommended One attribute)
• Numeric Datatype
• Not part of any PII data

26
From the below table which column is best fit for a Primary Key?

27
Composite Primary key

• A composite Primary key is combination


of more than one key that uniquely
identifies each tuple

• Combination of each Primary key is


allowed only once
Artist_Id and Genere_Id
Composite Primary Key

28
Surrogate key

• Surrogate key is created in case where a


table has more than one natural key

• Table has more than one column to


identify unique row

• Its not part of application data

• Usually, these values are auto generated

29
Exercise
• Requirement
• Basketball league has many divisions
• Division has many players
• Division has many teams

Design the model. Simple task, envision 3 entities


with sample attributes

30
Questions?

31

You might also like