Professional Documents
Culture Documents
De Unit 1-Database Concepts
De Unit 1-Database Concepts
De Unit 1-Database Concepts
UNIT CONTENTS
1 Understanding Data
2 Understanding Data Collection
3 Understanding Data Storage & Management
4 Understanding Data Visualization
SYLLABUS
I. Understanding Data
Data, information and knowledge
Types of data
Introduction to database management systems
Data modeling using ER Diagrams
Using relational DBMS
SQL: how to create a database, load data, insert/delete, and ask queries
II. Understanding DataCollection
Basics of Data Collection
Data Measurement & Scaling techniques
Data collection methods for Primary and Secondary data
Issues with data collection methods
III. Understanding Data Storage and Management
Data storage techniques
Data management techniques
IV. Understanding Data Visualization
Data Visualization need and concept
Data Visualization techniques
MODULE I –
UNDERSTANDING DATA
OBJECTIVES OF THIS SESSION
• Definition and Concept of DBMS
• Drawbacks of file processing systems
• Database environment Components
• Database Users
• Advantages of DBMS
• When Not to Use a DBMS
• Evolution of Database Systems
Information is the backbone of any organization. In a world that
focuses on achievement and advantage, information is the critical
factor that enables managers and organizations to gain a
competitive edge. It is the most critical resource of an organization.
Information is nothing but refined data.
DATA ITEMS
RELATIONSHIPS
DATABASE
CONSTRAINTS
SCHEMA
Fig : Components of a Database
Slide 1- 8
RDBMS stands for Relational Database Management Systems..
All modern database management systems like SQL, MS SQL Server,
IBM DB2, ORACLE, My-SQL and Microsoft Access are based on RDBMS.
It is called Relational Data Base Management System (RDBMS) because
it is based on relational model introduced by E.F. Codd.
Data is represented in terms of tuples (rows) in RDBMS.
Relational database is most commonly used database. It contains
number of tables and each table has its own primary key.
Due to a collection of organized set of tables, data can be accessed
easily in RDBMS
The RDBMS database uses tables to store data. A table is a collection of
related data entries and contains rows and columns to store data.
A table is the simplest example of data storage in RDBMS.
Slide 1- 9
Student Table
Field is a smaller entity of the table which contains specific information about every record
in the table. field in the student table consists of id, name, age, course.
A row of a table is also called record. It contains the specific information of each individual
entry in the table. It is a horizontal entity in the table. Student contains 5 records.
The NULL value of the table specifies that the field has been left blank during record
creation. It is totally different from the value filled with zero or a field that contains space.
Slide 1- 10
ER Model - Basic Concepts
Slide 1- 15
Disadvantages of file processing systems
Still widely used today (e.g. for backup) but have the following problems:
• Program-Data Dependence– file descriptions are stored within each
application that accesses file, so change to file structure requires changes
to all file descriptions in all programs.
• Data Redundancy (Duplication of data) – wasteful, inconsistent, loss of
metadata integrity (same data has different names in different files, or
same name may be used for different data in different files).
• Limited Data Sharing – users have little opportunity to share data outside
their own applications.
• Lengthy Development Times – little opportunity to re-use previous
development efforts.
• Excessive Program Maintenance – factors above combine to create heavy
maintenance load
A simplified database system environment
Slide 1- 20
When not to use a DBMS
Main costs of using a DBMS:
- High initial investment in hardware, software, training
and possible need for additional hardware.
- Overhead for providing generality, security, recovery, integrity,
and
concurrency control.
- Generality that a DBMS provides for defining and processing data.
• 1960’s – file processing systems: punch cards, paper tape, magnetic tape –
sequential access and batch processing
• 1970s - Hierarchical and Network (legacy, some still used today) – difficulties
= hard to access data (navigational record-at-a-time procedures), limited data
independence, no widely accepted theoretical model (unlike relational)
• 1980s - Relational – E.F. Codd and others developed this theoretically well-
founded model – all data represented in the form of tables – Oracle, DB2,
Ingres
• 1990s - Object-oriented, but some organisations have to handle large
amounts of both structured and unstructured data, so Object-relational
databases developed.
• 2000 and beyond – multi –tier, client-server, distributed environments, web-
based, content-addressable storage, data mining
Example of a simple database
(UNIVERSITY)
Example of a Database
(with a Conceptual Data Model)
• Some mini-world relationships:
Potential for enforcing standards: this is very crucial for the success of
database applications in large organizations Standards refer to data item
names, display formats, screens, report structures, meta-data
(description of data) etc.
Reduced application development time: incremental time to add each
new application is reduced.
Flexibility to change data structures: database structure may evolve as
new requirements are defined.
Availability of current information: Extremely important for on-line
transaction systems such as airline, hotel, car reservations.
Economies of scale: by consolidating data and applications across
departments wasteful overlap of resources and personnel can be
avoided.
Extending Database Capabilities
• Data Model: A set of concepts to describe the structure of a database, and certain constraints
that the database should obey.
• Data Model Structure and Constraints:
• Constructs are used to define the database structure
• Constructs typically include elements (and their data types) as well as groups of elements
(e.g. entity, record, table), and relationships among such groups
• Constraints specify some restrictions on valid data; these constraints must be enforced at
all times
• Data Model Operations: Operations for specifying database retrievals and updates by
referring to the concepts of the data model. Operations on the data model may include basic
operations and user-defined operations.
• By structure of a database, we mean the data types, relationships, and constraints that should
hold for the data.
Categories of data models
1. Conceptual (high-level, semantic) data models: Provide concepts that are close to the
way many users perceive data. (Also called entity-based or object-based data models.)
2. Physical (low-level, internal) data models: Provide concepts that describe details of how
data is stored in the computer.
3. Implementation (representational) data models: Provide concepts that fall between the
above two, balancing user views with some computer storage details.
• Conceptual data models use concepts such as entities, attributes, and relationships.
• An entity represents a real-world object or concept, such as an employee or a project,
that is described in the database.
• An attribute represents some property of interest that further describes an entity, such
as the employee's name or salary.
• A relationship among two or more entities represents an association among two or
more entities, for example, a works-on relationship between an employee and a project.
History of Data Models
ADVANTAGES:
• Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and relationship
types.
• Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc.
• Programmers can do optimal navigation through the database.
DISADVANTAGES:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set of
records.
• Little scope for automated "query optimization”
Hierarchical Model
• ADVANTAGES:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains - e.g.,
assemblies in manufacturing, personnel organization in companies
• Language is simple :
• uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
• DISADVANTAGES:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
Relational Data Model