Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

4.11.

2023

Advanced Database
Applications

File Organization Terms and


Concepts
• Database: Group of related files
• File: Group of records of same type
• Record: Group of related fields
• Field: Group of characters as word(s) or number(s)
• Entity: Person, place, thing on which we store information
• Attribute: Each characteristic, or quality, describing entity

1
4.11.2023

Figure 1.1 The Data Hierarchy

Figure 1.2 Traditional File Processing

2
4.11.2023

Database Management Systems


• Database
• Serves many applications by centralizing data and controlling
redundant data
• Database management system (DBMS)
• Interfaces between applications and physical data files
• Separates logical and physical views of data
• Solves problems of traditional file environment
• Controls redundancy
• Eliminates inconsistency
• Uncouples programs and data
• Enables organization to centrally manage data and data
security

Figure 1.3 Human Resources


Database with Multiple Views

3
4.11.2023

Relational DBMS
• Represent data as two-dimensional tables
• Each table contains data on entity and attributes
• Table: grid of columns and rows
• Rows (tuples): Records for different entities
• Fields (columns): Represents attribute for entity
• Key field: Field used to uniquely identify each record
• Primary key: Field in table used for key fields
• Foreign key: Primary key used in second table as look-up
field to identify records from original table

Figure 1.4 Relational Database Tables

4
4.11.2023

Operations of a Relational DBMS


• Three basic operations used to develop useful sets of data
• SELECT
• Creates subset of data of all records that meet stated
criteria
• JOIN
• Combines relational tables to provide user with more
information than available in individual tables
• PROJECT
• Creates subset of columns in table, creating tables with
only the information specified

Figure 1.5 The Three Basic


Operations of a Relational DBMS

5
4.11.2023

Capabilities of Database
Management Systems
• Data definition capability
• Data dictionary
• Querying and reporting
• Data manipulation language
• Structured Query Language (SQL)
• Many DBMS have report generation capabilities for creating
polished reports (SQL Server)

Figure 1.7 Example of an SQL Query

6
4.11.2023

Designing Databases
• Conceptual design vs. physical design
• Normalization
• Streamlining complex groupings of data to minimize redundant data
elements and awkward many-to-many relationships
• Referential integrity
• Rules used by RDBMS to ensure relationships between tables remain
consistent
• Entity-relationship diagram
• A correct data model is essential for a system serving the business
well

Figure 1.9 An Unnormalized


Relation for Order

7
4.11.2023

Figure 1.10 Normalized Tables


Created from Order

Figure 1.11 An Entity-Relationship


Diagram

8
4.11.2023

Non-Relational Databases and


Databases in the Cloud
• Non-relational databases: “No SQL”
• More flexible data model
• Data sets stored across distributed machines
• Easier to scale
• Handle large volumes of unstructured and structured data
• Databases in the cloud
• Appeal to start-ups, smaller businesses
• Amazon Relational Database Service, Microsoft SQL Azure
• Private clouds

The Challenge of Big Data


• Big data
• Massive sets of unstructured/semi-structured data from
web traffic, social media, sensors, and so on
• Volumes too great for typical DBMS
• Petabytes, exabytes of data
• Can reveal more patterns, relationships and anomalies
• Requires new tools and technologies to manage and analyze

9
4.11.2023

Analytical Tools: Relationships,


Patterns, Trends
• Tools for consolidating, analyzing, and providing access to vast
amounts of data to help users make better business decisions
• Multidimensional data analysis (OLAP)
• Data mining
• Text mining
• Web mining

Online Analytical Processing (O L A P)


• Supports multidimensional data analysis
• Viewing data using multiple dimensions
• Each aspect of information (product, pricing, cost, region,
time period) is different dimension
• Example: How many washers sold in the East in June
compared with other regions?
• OL AP enables rapid, online answers to ad hoc queries

10
4.11.2023

Figure 1.13 Multidimensional Data


Model

Data Mining
• Finds hidden patterns, relationships in datasets
• Example: customer buying patterns
• Infers rules to predict future behavior
• Types of information obtainable from data mining:
• Associations
• Sequences
• Classification
• Clustering
• Forecasting

11
4.11.2023

Text Mining and Web Mining


• Text mining
• Extracts key elements from large unstructured data sets
• Web mining
• Discovery and analysis of useful patterns and information
from web

Databases and the Web


• Many companies use the web to make some internal
databases available to customers or partners
• Typical configuration includes:
• Web server
• Application server/middleware/CGI scripts
• Database server (hosting DBMS)
• Advantages of using the web for database access:
• Ease of use of browser software
• Web interface requires few or no changes to database
• Inexpensive to add web interface to system

12
4.11.2023

Figure 1.14 Linking Internal


Databases to the Web

Establishing an Information Policy


• Firm’s rules, procedures, roles for sharing, managing,
standardizing data
• Data administration
• Establishes policies and procedures to manage data
• Data governance
• Deals with policies and processes for managing availability,
usability, integrity, and security of data, especially
regarding government regulations
• Database administration
• Creating and maintaining database

13
4.11.2023

Ensuring Data Quality


• More than 25 percent of critical data in Fortune 1000 company
databases are inaccurate or incomplete
• Before new database is in place, a firm must:
• Identify and correct faulty data
• Establish better routines for editing data once database in
operation
• Data quality audit
• Data cleansing

Database Applications

SQL Server

14
4.11.2023

Data Types
Introduction
SQL data types define the type of value that can be stored in a table column. For example, if
you want a column to store only integer values, you can define its data type as INT.

Categories of SQL Server data types


SQL Server supports the following data type’s categories:
•Exact numeric: bit, tinyint, smallint, int, bigint, decimal, numeric, money and
smallmoney
•Approximate numeric: Read and float
•Date and time: date, DateTime, datetime2, datetimeoffset, smalldatetime, time
•Character strings:char, varchar, text
•Unicode character strings: Nchar, Nvarchar, Ntext
•Binary strings: Binary, image and varbinary
•Other data types: Cursor, hierarchyid, sql_variant, table, rowversion, uniqueidentifier,
XML, Spatial and geography

29

30

15
4.11.2023

Exact numeric SQL Server data type


We use exact numeric data types for integer, decimal, and money. Each data type has its own lower, upper
limit and memory requirements. We should use the smallest data type to save memory requirements as
well. For example, we can use the bit data type for storing true (1) or false (0) values.
Data Type Lower Range Upper Range Storage Remarks
Bit 0 1 1 byte We can also store NULL values in this.
We can store whole numbers up to 255
tinyint 0 255 1 byte
in this data type.
We can store whole numbers between a
Smallint -2^15 (-32,768) 2^15-1 (32,767) 2 bytes
lower and higher range.
It also stores the whole number similar
Int −2^31 (−2,147, 483,648) 2^31−1 (−2,147, 483,647 4 bytes to a smallint but its lower and upper
limits changes as defined.
We should use bigint data type if we
2^63−1 (−9,223,372,
Bigint −2^63 (−9,223,372, 036,854,775,808) 8 bytes cannot accommodate data in the integer
036,854,775,807)
data type.
It depends upon precision.
1 – 9 -> 5 bytes
We use decimal data type for
Decimal −10^38+1 10^381−1 10-19->9 bytes
scale and fixed precision numbers.
20-28->13 bytes
29-28->17 bytes
It depends upon precision.
1 – 9 -> 5 bytes
Decimal and numeric are synonyms. We
Numeric −10^38+1 10^381−1 10-19->9 bytes
can use them interchangeably.
20-28->13 bytes
29-28->17 bytes
We can use this data type for monetary
Smallmoney -214,478.3648 +214,478.3647 4 bytes
or currency values.
+922,337, 203,
Money −922,337, 203, 685,477.5808 8 bytes
685,477.5807 31

Approximate numeric SQL Server data type

Data Type Lower Range Upper Range Storage Remarks

We can use
float924) as ISO
Real −3.40E+38 3.40E+38 4 bytes
synonym
for real.

Its storage
It is an
depends upon
Approximate-
value (n)
number data
Float(n) −1.79E+308 1.79E+308 N(1-24) ->4
types.
bytes
The default
N(25-53)->8
value of N is
bytes

16
4.11.2023

Date and Time SQL Server Data types

Data Type Lower Range Upper Range Storage Remarks


1. It stores only dates in SQL Server.
2. Its default value is 1900-01-01.
Date 0001-01-01 9999-12-31 3 bytes
3. It provides default format
YYYY-MM-DD.
1. We can define a date along with time with fractional
seconds.
2. The default value for this data type is 1900-01-01 00:00:00.
Datetime 1753-01-01 9999-12-31 8 bytes 3.It provides accuracy in increments of .000, .003, or .007
seconds.
4. We should avoid using this data type. We can use Datetime2
instead.
1. the default format for this is YYYY-MM-DD hh:mm:
6-8 bytes ss[.fractional seconds].
Datetime2 0001-01-01 00:00:00 9999-12-31 23:59:59.9999999 1. Precision<3 -> 6 bytes 2. It provides precision from 0 to 7 digits, with an accuracy of
2.Precision 3 or 4-> 7 bytes 100ns.
2. The default precision for datetime2 is 7 digits.
1. It is similar to a datetime2 data type but includes time zone
Datetimeoffset 0001-01-01 00:00:00 9999-12-31 23:59:59.9999999 10 bytes offset as well.
2. Timezone offset is -14:00 through +14:00.
1. It defines a date with the time of the day.
smalldatetime 1900-01-01 00:00:00 2079-06-06 23:59:59 4 bytes 2. Its default value is 1900-01-01 00:00:00.
3. It provides an accuracy of one minute.
1. We can use it for storing only time data.
Time 00:00:00.0000000 23:59:59.9999999 5 bytes 2. Its default format is hh:mm:ss[.nnnnnnn].
3. It provides an accuracy of 100 nanoseconds.

Character Strings SQL Server Data types

Data Type Lower Range Upper Range Storage Remarks

Char(n) 0 characters 8000 characters N bytes 1. It provides a fixed-width character data type.

1.It is a variable length character data type.


Varchar(n) 0 characters 8000 characters n bytes + 2 bytes
2.N defines the string size.

n bytes + 2 bytes ~ We should avoid using this data type unless required
Varchar (max) 0 characters 2^31 chars
2 GB due to its huge storage requirement.

1. It is a variable-length character data type.


Text 0 chars 2,147,483,647 chars n bytes + 4 bytes 2. We should avoid using this data type as it might get
deprecated in future versions of SQL Server.

17
4.11.2023

Unicode character string SQL Server data types

Data Type Lower Range Upper Range Storage Remarks

Nchar 0 characters 4000 characters 2 times n bytes It is a Unicode string of fixed width.

Nvarchar 0 chars 4000 Chars 2 times n bytes Nvarchar is a Unicode string of variable width.

1. It is a Variable-length Unicode data


Ntext 0 chars 1,073,741,823 char 2 times the string length 2. We should avoid using this data type as it will
be deprecated in future SQL releases.

Binary SQL Server data types

Data Type Lower Range Upper Range Storage Remarks

This data type is a fixed-width


Binary 0 bytes 8000 bytes N bytes
binary string.

Its storage is the actual length of


varbinary 0 bytes 8000 bytes
string + 2 bytes.

Avoid using this data type, as it


Image 0 bytes 2,147,483,647 bytes will be deprecated in future SQL
Server releases.

18
4.11.2023

Other data types

There are few data types as well that can be used as per the requirement:
•Cursor: It is useful for variables or stored procedure OUTPUT parameter referencing to a cursor
•Rowversion: It returns automatically generated, unique binary numbers within a database
•Hierarchyid: it is a system data type with variable length. We use it to represent a position
in a hierarchy
•Uniqueidentifier: It provides 16 bytes GUID
•XML: It is a special data type for storing the XML data in SQL Server tables
•Spatial Geometry type: We can use this for representing data in a flat (Euclidean) coordinate
system
•Spatial Geography Types: We can use Spatial Geography type for storing ellipsoidal (round-
earth) data, such as GPS latitude and longitude coordinates. It represents data in a round-earth
coordinate system
•Table: It is a special data type useful for storing result set temporarily in a table-valued function.
We can use data from this for processing later. It can be used in functions, stored procedures, and
batches

19

You might also like