Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

CSE221: Database Systems

Lecture 1: Course syllabus & introduction


Professor Shaker El-Sappagh
Shaker.elsappagh@gu.edu.eg
Spring 2024
Outline
• Welcome to CSE221: Database Systems
• Course syllabus, grading, and CANVAS
• Needed tools
• Projects and the SIS project
• Student groups
• Definitions of database and database management system
Course syllabus
Covered topics
Course Grading
• Attendance is crucial.
• Participation.
• Projects.
• Assignments.
• Quizzes.
• Midterm and final exams.
CANVAS
• The course will be delivered on CANVAS.
Needed tools
• DBMS: MySQL, Oracle, IBM DB2, MS SQL Server, MS Access, PostgreSQL
1. MYSQL workbench
2. SQL shell: https://github.com/mysql Check: db-engines ranking

• Database design:
1. Power designer, data architect, ERDplus.com, Toad data modeler, or MySQL Workbench:
Visual Database Design
2. Vertabelo: https://vertabelo.com/ Online
3. Navicat Data Modeler: https://navicat.com/en/products/navicat-data-modeler Online
4. Visual Paradigm: https://www.visual-paradigm.com/ Online
5. SQLDBM: https://sqldbm.com/Home/ Online
• Database programming:
1. Python programming language (MySQL python connector).
• Database application with GUI interface (python):
1. Desktop application using Tkinter, PyQT, Kivy, WxPython, Bottle, and PyGUI
2. Web application Flask, Django, CherryPy, Pyramid, Web2Py, Tornado, Bottle, BlueBream,
Quixote
Projects
Information Systems for different entities:
• SIS
• Pharmacy IS
• School IS
• Airport IS
• MIS
• LMS
• Inventory control management
• Electronic health record – Hospital management system
• Online retail application
• Railway management system
• Library management system
• Restaurant management system
• Hotel management system
• Salary management system
• Bank management system
• Electricity bill management system
• Telecommunication company management system
Student groups
• Groups of five students.
• Each group takes one project.
• Working on projects is from the first day.
• Project’s 25 marks are distributed while the semester.
Steps to create a database system
The database life cycle
• This is plus implementing the app
that will provide the business
logic.
Steps
Steps
Steps

Then implement the


app interface and logic
Final database system usage
What is database (DB) and database
management system (DBMS)?
Basic Definitions
• Data:
• Known raw facts that can be recorded and have an implicit meaning.
• Information:
• Is the result of processing raw data to reveal its meaning. Data processing can be as simple as
organizing data to reveal patterns or as complex as making forecasts or drawing inferences using
statistical and machine learning modeling.
• Accurate, relevant, and timely information is the key to good decision making.
• Database (DB):
• Is a related, shared, and integrated computer structure that stores a collection of: End-user data and
Metadata. So, DB is a collection of self-describing data.
• Mini-world:
• Some part of the real world about which data is stored in a database. For example, student grades and
transcripts at a university.
• Database Management System (DBMS):
• DBMS is a collection of programs that manages the database structure and controls access to the data
stored in the database.
• Database System: (Big) Data + DBMS + Application Programs
• The DBMS software together with the data itself. Sometimes, the applications are also included.
Simplified database system environment
• The DBMS serves as the intermediary between the user and the database. The database structure itself is
stored as a collection of files, and the only way to access the data in those files is through the DBMS.
• The DBMS receives all application requests and translates them into the complex operations required to fulfill
those requests.
• The DBMS hides much of the database’s internal complexity from the application programs and users.
• The application program might be written by a programmer using a programming language such as Python,
Visual Basic.NET, Java, or C++, or it might be created through a DBMS utility program.
Database Architecture
• Centralized databases
• One to a few cores, shared memory
• Client-server,
• One server machine executes work on behalf of multiple client machines.
• Parallel databases
• Many core shared memory
• Shared disk
• Shared nothing
• Distributed databases
• Geographical distribution
• Schema/data heterogeneity
Centralized
DBMS
Web server
interaction
with DBMS
N-tier architectures
• Two-tier architecture: the
application resides at the client
machine, where it invokes
database system functionality at
the server machine
• Three-tier architecture: the
client machine acts as a front end
and does not contain any direct
database calls.
• The client end
communicates with an
application server, usually
through a forms interface.
• The application server in
turn communicates with a
database system to access
data.
Two Tier Client-Server Architecture
• Client and server must install appropriate client module and server
module software for ODBC or JDBC
• A client program may connect to several DBMSs, sometimes called
the data sources.
• In general, data sources can be files or other non-DBMS software that
manages data.
• See Chapter 10 for details on Database Programming
Three-tier client-server architecture
• Common for Web applications
• Intermediate Layer called Application Server or
Web Server:
• Stores the web connectivity software and the
business logic part of the application used to
access the corresponding data from the
database server
• Acts like a conduit for sending partially
processed data between the database server
and the client.
• Three-tier Architecture Can Enhance Security:
• Database server only accessible via middle tier
• Clients cannot directly access database server
• Clients contain user interfaces and Web
browsers
• The client is typically a PC or a mobile device
connected to the Web
Distributed
DBMS
Database management system Layers: Application layer, Logical layer, Physical layer
Database management system Layers Application layer, Logical layer, Physical layer

https://medium.com/coderby
te/understanding-mysql-
logical-architecture-
526eaf72f66e
Database management system components
The Architecture of a Relational DBMS
Web Forms Application Front Ends SQL Interface

SQL Commands

Plan Executer Parser Query


Evaluation
Operator Evaluator Optimizer Engine

Transaction Files and Access Methods


Manager Recovery
Buffer Manager
Lock Manager
Manager Disk Space Manager
Concurrency Control DBMS

Database Index Files System Catalog


Data Files
Database Engine
• A database system is partitioned into modules that deal with each of the responsibilities
of the overall system.
• The functional components of a database system can be divided into
• The storage manager
• The query processor component
• The transaction management
1. Storage Manager
• A program module that provides the interface between the low-level data stored in the database
and the application programs and queries submitted to the system.
• The storage manager is responsible for the following tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
• The storage manager components include:
• Authorization and integrity manager
• Transaction manager
• File manager
• Buffer manager
• The storage manager implements several data structures as part of the physical system
implementation:
• Data files: store the database itself
• Data dictionary: stores metadata about the structure of the database, in particular the schema
of the database.
• Indices: can provide fast access to data items. A database index provides pointers to those
data items that hold a particular value.
2. Query Processor
• The query processor components include:
• DDL interpreter: it interprets DDL statements and records the definitions in the
data dictionary.
• DML compiler: it translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine
understands.
• The DML compiler performs query optimization; that is, it picks the lowest cost
evaluation plan from among the various alternatives.
• Query evaluation engine: it executes low-level instructions generated by the DML
compiler.
2. Query Processing (Cont’d)
1. Parsing and translation
2. Optimization
3. Evaluation
3. Transaction Management

• A transaction is a collection of operations that performs a single logical function in


a database application
• Transaction-management component ensures that the database remains in a
consistent (correct) state despite system failures (e.g., power failures and operating
system crashes) and transaction failures.
• Concurrency-control manager controls the interaction among the concurrent
transactions, to ensure the consistency of the database.
Typical DBMS Functionality
• Define a particular database in terms of its data types, structures, and constraints
• Construct or Load the initial database contents on a secondary storage medium
• Manipulating the database:
• Retrieval: Querying, generating reports
• Modification: Insertions, deletions and updates to its content
• Accessing the database through Web applications
• Processing and Sharing by a set of concurrent users and application programs –
yet, keeping all data valid and consistent.
• Protection or Security measures to prevent unauthorized access
• “Active” processing to take internal actions on data
• Presentation and Visualization of data
• Maintenance of the database and associated programs over the lifetime of the
database application
Application Activities Against a Database
• Applications interact with a database by generating
- Queries: that access different parts of data and formulate the result
of a request
- Transactions: that may read some data and “update” certain values
or generate new data and store that in the database
• Applications must not allow unauthorized users to access data
• Applications must keep up with changing user requirements against
the database
Impact of Databases and Database Technology
• Businesses: Banking, Insurance, Retail, Transportation, Healthcare,
Manufacturing
• Service Industries: Financial, Real-estate, Legal, Electronic Commerce,
Small businesses
• Education : Resources for content and Delivery
• More recently: Social Networks, Environmental and Scientific
Applications, Medicine and Genetics
• Personalized Applications: based on smart mobile devices
Data Models
▪ The user of a DBMS is ultimately concerned with some real-world enterprises (e.g., a
University)
▪ The data to be stored and managed by a DBMS describes various aspects of the enterprises
▪ E.g., the data in a university database describes students, faculty and courses entities and the
relationships among them
▪ A data model is a collection of high-level data description constructs that hide many low-level
storage details. It describes the Data, Data relationships, Data semantics, and Data constraints
▪ For relational DB: A widely used data model called the entity-relationship (ER) model allows
users to pictorially denote entities and the relationships among them.
• Object-based data models (Object-oriented and Object-relational)
• Semi-structured data model (XML)
• Other older models:
– Network model
– Hierarchical model
The Relational Model
▪ Relational database with relational DBMS
▪ The ER model can be translated into a relational model, which is one of the most
widely used models
▪ The central data description construct in the relational model
is the relation
▪ A relation is basically a table (or an entity or a set) with rows (or records or
tuples) and columns (or fields or attributes)
▪ Every relation has a schema, which describes the columns of a relation
▪ Conditions that records in a relation must satisfy can be specified
▪ These are referred to as integrity constraints

Ted Codd
Turing Award 1981
CODD’S 12 RULES FOR RDBMS
• Rule 1 : The information rule. All information in the database should be represented in the same way, and stored in tables in the
form of rows and columns.
• Rule 2 : The guaranteed access rule. All data must be accessible logically using the table name, primary key (identifying the
row) and column (attribute value).
• Rule 3 : Systematic treatment of null values. The Null values in a database should be handled systematically and uniformly. No
math expression with Null. Null is Null.
• Rule 4 : Active online catalog. The structure of the entire database must be stored in an online catalog, as a data dictionary.
This data dictionary can be queried by users employing the same query language as used to query other tables in the database.
• Rule 5 : The comprehensive data sub language rule. The system must support at least one relational language that 1. Has a
linear syntax 2. Can be used both interactively and within application programs 3. Supports data definition operations, data
manipulation operations, security and integrity constraints, and transaction management operations. This rule necessitates a query
language like SQL.
• Rule 6 : The view updating rule. All views that can be updated theoretically, must be updated by the system.
• Rule 7 : High-level insert, update, and delete. The system must support insert, update, and delete operations on the database. It
should also support operators that manipulate a set of rows instead of just a single row.
• Rule 8 : Physical data independence. Data stored in the database should be independent of how it is being accessed by external
applications.
• Rule 9 : Logical data independence. Any changes in the logical data should not impact the applications using it.
• Rule 10 : Integrity independence. The database should be able to enforce its own integrity rather than using other programs. Key
and Check constraints, trigger etc., should be stored in Data Dictionary. This also make RDBMS independent of front-end.
• Rule 11 : Distribution independence. The distribution of data to different servers and locations should be hidden from the user.
The user should not get impacted by the distribution of data.
• Rule 12: The non-subversion rule. The database system access should never bypass a relational security or integrity constraint.
The Relational Model: An Example
▪ Let us consider the student entity in a university database
Students Schema

Students(sid: string, name: string, login: string, dob: string, gpa: real)

An attribute, field or column


Integrity Constraint: Every student has a unique sid value

sid name login dob gpa


A record, tuple
or row 512412 Khaled khaled@qatar.cmu.edu 18-9-1995 3.5

512311 Jones jones@qatar.cmu.edu 1-12-1994 3.2

512111 Maria maria@qatar.cmu.edu 3-8-1995 3.85


Atomic value

An instance of a Students relation


Relational Model
• All the data is stored in various tables.
• Example of tabular data in the relational model

Columns

Rows
A Sample Relational Database
Example of a Database
(with a Conceptual Data Model)
• Mini-world for the example:
• Part of a UNIVERSITY environment.
• Some mini-world entities:
• STUDENTs
• COURSEs
• SECTIONs (of COURSEs)
• (academic) DEPARTMENTs
• INSTRUCTORs
Example of a Database
(with a Conceptual Data Model)
• Some mini-world relationships:
• SECTIONs are of specific COURSEs
• STUDENTs take SECTIONs
• COURSEs have prerequisite COURSEs
• INSTRUCTORs teach SECTIONs
• COURSEs are offered by DEPARTMENTs
• STUDENTs major in DEPARTMENTs

• Note: The above entities and relationships are typically expressed in a conceptual
data model, such as the ENTITY-RELATIONSHIP data model (see Chapters 3, 4)
Example of a simple database
Data Independence
▪ One of the most important benefits of using a DBMS is data
independence.
▪ With data independence, application programs are insulated from how
data are structured and stored.
▪ Data independence entails two properties:
▪ Logical data independence: The capacity to change the conceptual
schema without having to change the external schemas and their
associated application programs.
• Physical data independence: The capacity to change the internal
schema without having to change the conceptual schema. For example,
the internal schema may be changed when certain file structures are
reorganized, or new indexes are created to improve database
performance
Levels of Abstraction
▪ The data in a DBMS is described at three levels of abstraction, the conceptual (or
logical), physical and external schemas

▪ The conceptual schema describes data in terms of a specific data model (e.g., the relational
model of data)

▪ The physical schema specifies how data described in the conceptual schema are stored on
secondary storage devices

▪ The external schema (or views) allow data access to be customized at the level of individual
users or group of users (views can be 1 or many)
Levels of Abstraction
• Mappings among schema levels
are needed to transform requests
and data.
• Programs refer to an external
schema, and are mapped by the
DBMS to the internal schema for
execution.
• Data extracted from the internal
DBMS level is reformatted to match
the user’s external view (e.g.
formatting the results of an SQL
query for display in a Web page)
Views
▪ A view is conceptually a relation

▪ Records in a view are computed as needed and usually not stored in a DBMS

▪ Example: University Database

Conceptual Schema Physical Schema External Schema (View)


• Students(sid: string, name: • Relations stored as heap files Students can be allowed to find
string, login: string, dob: string, • Index on first column of out course enrollments:
gpa:real) Students • Course_info(cid: string,
• Courses(cid: string, enrollment: integer)
cname:string, credits:integer) Can be computed from the relations in
• Enrolled(sid:string, cid:string, the conceptual schema (so as to avoid
grade:string) data redundancy and inconsistency).
Instances and Schemas
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the database
• Example: The database consists of information about a set of
customers and accounts in a bank and the relationship between them
• Analogous to type information of a variable in a program
• Physical schema – the overall physical structure of the database
• Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable
Accessing and managing databases
Special languages to deal with the databases:
• Data definition language (DDL)
• Data manipulation language (DML)
• Transaction control language (TCL)
• Data control language (DCL)
Data Definition Language (DDL)
• Main commands:
• CREATE: Creates a new database or object, such as a table, index or column
• ALTER: Changes the structure of the database or object
• DROP: Deletes the database or existing objects
• RENAME: Renames the database or existing objects
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
• DDL compiler generates a set of table templates stored in a data dictionary
• Data dictionary contains metadata (i.e., data about data)
• Database schema
• Integrity constraints
• Primary key (ID uniquely identifies instructors)
• Authorization
• Who can access what
Data Manipulation Language (DML)
• Main commands:
• INSERT: Adds new data to the existing database table
• UPDATE: Changes or updates values in the table
• DELETE: Removes records or rows from the table
• SELECT: Retrieves data from the table or multiple tables
• Language for accessing and updating the data organized by the appropriate data model.
DML also known as query language
• The portion of a DML that involves information retrieval is called a query language.
SQL Query Language
• SQL query language is nonprocedural. A query takes as input several tables (possibly only
one) and always returns a single table.
• Example to find all instructors in Comp. Sci. dept
select name
from instructor
where dept_name = 'Comp. Sci.'
• SQL is NOT a Turing machine equivalent language
• To be able to compute complex functions SQL is usually embedded in some higher-level
language
• Application programs generally access databases through one of
• Language extensions to allow embedded SQL
• Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent
to a database
Data Control Language (DCL)
• Main commands:
• GRANT: Gives a user access to the database
• REVOKE: Removes a user's access to the database
• DCL controls access to the data that users store within a database.
• Essentially, this language controls the rights and permissions of the database
system.
• It allows users to grant or revoke privileges to the database.
Transaction Control Language (TCL)
• Main commands:
• COMMIT: Carries out a transaction
• ROLLBACK: Restores a transaction if any tasks fail to execute
• SAVEPOINT: Sets a point in a transaction to save
• TCL manages the transactions within a database.
• Transactions group a set of related tasks into a single, executable task.
• All the tasks must succeed for the transaction to work.
Database Access from Application Program
• Non-procedural query languages such as SQL are not as powerful as a universal
Turing machine.
• SQL does not support actions such as input from users, output to displays, or
communication over the network.
• Such computations and actions must be written in a host language, such as
C/C++, Java or Python, with embedded SQL queries that access the data in the
database.
• Application programs -- are programs that are used to interact with the database
in this fashion.
Main Characteristics of the Database Approach
• Self-describing nature of a database system:
• A DBMS catalog stores the description of a particular database (e.g., data structures,
types, and constraints)
• The description is called meta-data.
• This allows the DBMS software to work with different database applications.
• Insulation between programs and data:
• Called program-data independence.
• Allows changing data structures and storage organization without having to change the
DBMS access programs.
• Data Abstraction:
• A data model is used to hide storage details and present the users with a conceptual
view of the database.
• Programs refer to the data model constructs rather than data storage details
Main Characteristics of the Database Approach (continued)
• Support of multiple views of the data:
• Each user may see a different view of the database, which describes only the data of
interest to that user.
• Sharing of data and multi-user transaction processing:
• Allowing a set of concurrent users to retrieve from and to update the database.
• Concurrency control within the DBMS guarantees that each transaction is correctly
executed or aborted
• Recovery subsystem ensures each completed transaction has its effect permanently
recorded in the database
• OLTP (Online Transaction Processing) is a major part of database applications. This
allows hundreds of concurrent transactions to execute per second.
Database Users
• Users may be divided into
• Those who actually use and control the database content, and those who
design, develop and maintain database applications (called “Actors on the
Scene”), and
• Those who design and develop the DBMS software and related tools, and the
computer systems operators (called “Workers Behind the Scene”).
Database Users – Actors on the Scene
• Actors on the scene
• Database administrators:
• Responsible for authorizing access to the database, for coordinating and monitoring its
use, acquiring software and hardware resources, controlling its use and monitoring
efficiency of operations.
• Database Designers:
• Responsible to define the content, the structure, the constraints, and functions or
transactions against the database. They must communicate with the end-users and
understand their needs.
Database Users – Actors on the Scene
• Actors on the scene (continued)
• End-users: They use the data for queries, reports and some of them update the
database content. End-users can be categorized into:
• Casual: access database occasionally when needed
• Naïve or Parametric: they make up a large section of the end-user population.
• They use previously well-defined functions in the form of “canned transactions” against the
database.
• Users of Mobile Apps mostly fall in this category
• Bank-tellers or reservation clerks are parametric users who do this activity for an entire shift of
operations.
• Social Media Users post and read information from websites
Database Users – Actors on the Scene (continued)
• System Analysts and Application Developers
This category currently accounts for a very large proportion of the IT work force.
• System Analysts: They understand the user requirements of naïve and sophisticated users and
design applications including canned transactions to meet those requirements.
• Application Programmers: Implement the specifications developed by analysts and test
and debug them before deployment.
• Business Analysts: There is an increasing need for such people who can analyze vast amounts
of business data and real-time data (“Big Data”) for better decision making related to planning,
advertising, marketing etc.
Database Users – Actors behind the Scene
• System Designers and Implementors: Design and implement DBMS packages in
the form of modules and interfaces and test and debug them. The DBMS must
interface with applications, language compilers, operating system components, etc.
• Tool Developers: Design and implement software systems called tools for modeling
and designing databases, performance monitoring, prototyping, test data generation,
user interface creation, simulation etc. that facilitate building of applications and
allow using database effectively.
• Operators and Maintenance Personnel: They manage the actual running and
maintenance of the database system hardware and software environment.
Database Users
People Who Work With Databases
▪ There are five classes of people associated with databases:
1. End users
▪ Store and use data in DBMSs
▪ Usually not computer professionals
2. Application programmers
▪ Develop applications that facilitate the usage of DBMSs for end-users
▪ Computer professionals who know how to leverage host languages, query languages and DBMSs
altogether
3. Database Administrators (DBAs)
▪ Design the conceptual and physical schemas
▪ Ensure security and authorization
▪ Ensure data availability and recovery from failures
▪ Perform database tuning
4. Implementers
▪ Build DBMS software for vendors like IBM and Oracle
▪ Computer professionals who know how to build DBMS internals
5. Researchers
▪ Innovate new ideas which address evolving and new challenges/problems.
Advantages of Using the Database Approach
• Improved data sharing: Controlling redundancy in data storage and in development and
maintenance efforts.
• Sharing of data among multiple users.
• Improved data security: Restricting unauthorized access to data. Only the DBA staff uses
privileged commands and facilities.
• Providing Storage Structures (e.g., indexes) for efficient Query Processing.
• Better data integration: Wider access to well-managed data promotes an integrated view
of the organization’s operations and a clearer view of the big picture.
• Minimized data inconsistency.
• Improved data access: The DBMS makes it possible to produce quick answers to ad hoc
queries.
• Improved decision making: Better-managed data and improved data access make it
possible to generate better quality information, on which better decisions are based.
• Increased end-user productivity: The availability of data, combined with the tools that
transform data into usable information, empowers end users to make quick, informed
decisions.
Advantages of Using the Database Approach (Cont’d)

• Providing optimization of queries for efficient processing.


• Providing backup and recovery services.
• Providing multiple interfaces to different classes of users.
• Representing complex relationships among data.
• Enforcing integrity constraints on the database.
• Drawing inferences and actions from the stored data using deductive
and active rules and triggers.
Additional Implications of Using the Database Approach
• Potential for enforcing standards:
• This is very crucial for the success of database applications in large organizations.
Standards refer to data item names, display formats, screens, report structures, meta-
data (description of data), Web page layouts, etc.
• Reduced application development time:
• Incremental time to add each new application is reduced.
• Flexibility to change data structures:
• Database structure may evolve as new requirements are defined.
• Availability of current information:
• Extremely important for on-line transaction systems such as shopping, airline, hotel, car
reservations.
• Economies of scale:
• Wasteful overlap of resources and personnel can be avoided by consolidating data and
applications across departments.
Historical Development of Database Technology
• Early Database Applications:
• The Hierarchical and Network Models were introduced in mid 1960s and
dominated during the seventies.
• A bulk of the worldwide database processing still occurs using these models,
particularly, the hierarchical model using IBM’s IMS system.
• Relational Model based Systems:
• Relational model was originally introduced in 1970, was heavily researched
and experimented within IBM Research and several universities.
• Relational DBMS Products emerged in the early 1980s.
Historical Development of Database Technology (continued)

• Object-oriented and emerging applications:


• Object-Oriented Database Management Systems (OODBMSs) were
introduced in late 1980s and early 1990s to cater to the need of complex data
processing in CAD and other applications.
• Their use has not taken off much.
• Many relational DBMSs have incorporated object database concepts, leading
to a new category called object-relational DBMSs (ORDBMSs)
• Extended relational systems add further capabilities (e.g., for multimedia
data, text, XML, Spatial, temporal, and other data types)
Historical Development of Database Technology (continued)

• Data on the Web and E-commerce Applications:


• Web contains data in HTML (Hypertext markup language) with
links among pages.
• This has given rise to a new set of applications and E-commerce is
using new standards like XML (eXtended Markup Language).
• Script programming languages such as PHP and JavaScript allow
generation of dynamic Web pages that are partially generated
from a database.
Extending Database Capabilities (1)
• New functionality is being added to DBMSs in the following areas:
• Scientific Applications – Physics, Chemistry, Biology - Genetics
• Earth and Atmospheric Sciences and Astronomy
• XML (eXtensible Markup Language)
• Image Storage and Management
• Audio and Video Data Management
• Data Warehousing and Data Mining – a very major area for future development
using new technologies
• Spatial Data Management and Location Based Services
• Time Series and Historical Data Management
• The above gives rise to new research and development in incorporating new data types,
complex data structures, new operations and storage and indexing schemes in database
systems.
Extending Database Capabilities (2)
• Background since the advent of the 21st Century:

• First decade of the 21st century has seen tremendous growth in user generated data
and automatically collected data from applications and search engines.

• Social Media platforms such as Facebook and Twitter are generating millions of
transactions a day and businesses are interested to tap into this data to “understand”
the users

• Cloud storage, processing power, and backup are making unlimited amount of
storage available to users and applications with the ability to process these data,
maybe in real time.
Extending Database Capabilities (3)
• Emergence of Big Data Technologies and NOSQL databases
• New data storage, management and analysis technology was necessary to deal with
the onslaught of data in petabytes a day (10**15 bytes or 1000 terabytes) in some
applications – this started being commonly called as “Big Data”.
• Hadoop (which originated from Yahoo) and Mapreduce Programming approach to
distributed data processing (which originated from Google) as well as the Google
file system have given rise to Big Data technologies. Further enhancements are
taking place in the form of Spark based technology.
• NOSQL (Not Only SQL- where SQL is the de facto standard language for relational
DBMSs) systems have been designed for rapid search and retrieval from
documents, processing of huge graphs occurring on social networks, and other
forms of unstructured data with flexible models of transaction processing.
Thank you

You might also like