Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Data & Business Intelligence

Databases
• Database is collection of related data that can be stored in a central
location or in multiple locations
• Usually a group of files
• File: Group of records of same type
• Record: Group of related fields
• Field: Group of characters as word(s) or number(s)
• Entity: Person, place, thing on which we store information
• Attribute: Each characteristic, or quality, describing entity
Databases
• Data hierarchy is the structure and organization of data, which
involves fields, records, and files.
• A database management system (DBMS) is software for creating,
storing, maintaining, and accessing database files.
• A DBMS makes using databases more efficient.
Databases

Data
Hierarchy
Databases
Databases
• ADVANTAGES
• Complex requests can be handled more easily.
• Data redundancy is eliminated or minimized.
• Programs and data are independent, so more than one program
can use the same data.
• Data management is improved.
• A variety of relationships among data can be easily maintained.
• More sophisticated security measures can be used.
• Storage space is reduced
Types of Data in a Database
• Internal data
• Collected within organization
• Transaction records, sales records, personnel records
• External data
• Competitors, customers, and suppliers
• Distribution networks
• Economic indicators (e.g., the consumer price index)
• Government regulations
• Labor and population statistics
• Tax records
Methods for Accessing Files
• Sequential file structure
• Records organized and processed in numerical or sequential order
• Organized based on a “primary key”
• Usually used for backup and archive files
• Because they need updating only rarely
• Random access file structure
• Records can be accessed in any order
• Fast and very effective when a small number of records need to be
processed daily or weekly
Methods for Accessing Files
• Indexed sequential access method (ISAM)
• Records accessed sequentially or randomly
• Depending on the number being accessed
• Uses an index structure with two parts:
• Indexed value
• Pointer to the disk location of the record matching the indexed
value
Logical Database Design
• Physical view
• How data is stored on and retrieved from storage media
• Logical view
• How information appears to users
• How it can be organized and retrieved
• Can be more than one logical view
Logical Database Design
• Data model
• Determines how data is created, represented, organized
• Data structure—Describes how data is organized and the
relationship among records
• Operations—Describe methods, calculations that can be
performed on data, such as updating and querying data
• Integrity rules—Define the boundaries of a database, such as
maximum and minimum values allowed for a field, constraints
and access methods
Logical Database Design
• Hierarchical model
• Relationships between records form a treelike structure
• Records are called nodes
• Relationships among records are called branches.
• The node at the top is called the root
• Every other node (called a child) has a parent.
• Nodes with the same parents are called twins or siblings.
Logical Database Design

Hierarchical
model
Logical Database Design
• Network model
• Similar to the hierarchical model
• Records are organized differently
• Each record in the network model can have multiple parent and
child records
Logical Database Design

Network
model
Logical Database Design
• Relational model
• Uses a two-dimensional table of rows and columns of data
• Data dictionary
• Field name—Student name, admission date, age, and major
• Field data type—Character (text), date, and number
• Default value—The value entered if none is available; for example,
if no major is declared, the value is “undecided.”
• Validation rule—A rule determining whether a value is valid; for
example, a student’s age cannot be a negative number.
Logical Database Design
• Relational model
• Each table contains data on entity and attributes
• Table: grid of columns and rows
• Rows (tuples): Records for different entities
• Fields (columns): Represents attribute for entity
• Key field: Field used to uniquely identify each record
• Primary key: Field in table used for key fields
• Foreign key: Primary key used in second table as look-up field
to identify records from original table
Logical Database Design

Relational
database
tables
Logical Database Design

Relational
database
tables
Logical Database Design
• Relational model
• Normalization improves database efficiency by eliminating
redundant data and ensuring that only related data is stored in a
table.
• Data retrieval
• Three basic operations used to develop useful sets of data
• SELECT: Creates subset of data of all records that meet stated
criteria
• JOIN: Combines relational tables to provide user with more
information than available in individual tables
• PROJECT: Creates subset of columns in table, creating tables with
only the information specified
Logical Database Design

Relational
database
tables
Logical Database Design

Relational
database
tables
Components of a DBMS
• Database Engine
• Heart of DBMS software
• Responsible for data storage, manipulation, and retrieval
• Converts logical requests from users into their physical equivalents
• Data Definition
• Create and maintain the data dictionary
• Define the structure of files in a database
• Adding fields
• Deleting fields
• Changing field size
• Changing data type
Components of a DBMS
• Data Manipulation
• Add, delete, modify, and retrieve records from a database
• Query language
• Structured Query Language (SQL)
• Standard fourth-generation query language used by many
DBMS packages
• The basic format of an SQL query is as follows:
SELECT field FROM table or file WHERE conditions
SELECT NAME, SSN, TITLE, GENDER, SALARY
FROM EMPLOYEE, PAYROLL
WHERE EMPLOYEE.SSN=PAYROLL.SSN AND
TITLE5“ENGINEER”
Components of a DBMS
• Data Manipulation
• Query by example (QBE)
• Construct statement of query forms
• Graphical interface
• Finetune the query
• AND—Means that all conditions must be met.
• OR—Means only one of the conditions must be met.
• NOT—Searches for records that do not meet the condition.
Components of a DBMS
• Application Generation
• Design elements of an application using a database
• Data entry screens
• Interactive menus
• Interfaces with other programming languages
• Data Administration
• Used for Backup and recovery, Security, Change management
• Create, read, update, and delete (CRUD)
• Database administrator (DBA) : Individual or department
• Responsibilities
Components of a DBMS
• Data Administration
• Database administrator (DBA) : Individual or department
• Designing and setting up a database
• Establishing security measures to determine users’ access
rights
• Developing recovery procedures in case data is lost or
corrupted
• Evaluating database performance
• Adding and finetuning database functions
Recent Trends in Database Design and Use
• Data-driven Web sites
• Interface to a database
• Retrieves data and allows users to enter data
• Improves access to information
• Useful for:
• E-commerce sites that need frequent updates
• News sites that need regular updating of content
• Forums and discussion groups
• Subscription services, such as newsletters
Recent Trends in Database Design and Use
• Distributed database system
• Data is stored on multiple servers placed throughout an
organization
• Reasons for choosing
• Decrease response time/network traffic
• Minimize effect of computer failures
• Small integrated systems may cost less than one large server
Recent Trends in Database Design and Use
• Distributed database system
• Approaches for setup
• Fragmentation: how tables are divided among multiple
locations.
• Horizontal: breaks a table into rows, storing all fields
(columns) in different locations.
• Vertical fragmentation stores a subset of columns in
different locations.
• Mixed fragmentation, which combines vertical and
horizontal fragmentation, stores only site-specific data in
each location.
Recent Trends in Database Design and Use
• Distributed database system
• Approaches for setup
• Replication: each site store a copy of the data in the
organization’s database.
• Allocation: each site stores the data it uses most often.
• Security issues because of multiple access points from both inside
and outside the organization.
• Security policies, scope of user access, and user privileges must
be clearly defined, and authorized users must be identified.
Recent Trends in Database Design and Use
• Object-oriented database
• This data model represents real-world entities with database
objects.
• An object consists of attributes (characteristics describing an
entity) and methods (operations or calculations) that can be
performed on the object’s data.
Recent Trends in Database Design and Use
• Object-oriented database
Recent Trends in Database Design and Use
• Object-oriented database
• Encapsulation: Grouping objects along with their attributes and
methods into a class
• Inheritance: New objects can be created faster and more easily by
entering new data in attributes
• Interaction with an object-oriented database takes places via
methods
Recent Trends in Database Design and Use
• Cloud Databases
• Special appeal for businesses seeking database capabilities at a
lower cost than in-house database products.
• Microsoft Azure SQL Database
Recent Trends in Database Design and Use
• Blockchain
• Distributed database technology to create and verify transactions
on a network nearly instantaneously without a central authority.
• Distributed ledgers in a peer-to-peer distributed database
• Maintains a growing list of records and transactions shared by all
• Encryption used to identify participants and transactions
• Used for financial transactions, supply chain, and medical records
• Foundation of Bitcoin, and other crypto currencies
Recent Trends in Database Design and Use

How
Blockchain
Works
Business Intelligence Infrastructure
• Array of tools for obtaining information from separate systems and
from big data
• Data warehouse
• Stores current and historical data from many core operational
transaction systems
• Support decision-making applications and generate business
intelligence
• Store multidimensional data, called hypercubes
Business Intelligence Infrastructure
• Data warehouse
• Characteristics
• Subject oriented: Focused on a specific area
• Integrated: Comes from a variety of sources
• Time variant: Categorized based on time
• Type of data: Captures aggregated data
• Purpose: Used for analytical purposes
Business Intelligence Infrastructure
• Data warehouse
• Components
• Input: External, Databases, Transaction files, ERP systems, CRM
systems
• Extraction, transformation, and loading (ETL)
• Extraction
• Collecting data from a variety of sources
• Transformation processing
• Make sure data meets the data warehouse’s needs
• Loading
• Process of transferring data to the data warehouse
Business Intelligence Infrastructure

• Data warehouse
• Components
Business Intelligence Infrastructure
• Data warehouse
• Components
• Storage
• Raw data: information in its original form
• Summary data: users subtotals of various categories
• Metadata: information about data—its content, quality,
condition, origin, and other characteristics.
Business Intelligence Infrastructure
• Data warehouse
• Components
• Output
• Online analytical processing (OLAP)
• Generates business intelligence
• Uses multiple sources of information and provides
multidimensional analysis
• Hypercube
• Drill down and drill up
• Data-mining analysis
• Discover patterns and relationships
Business Intelligence Infrastructure
• Data warehouse
• Components
• Output
• Reports
• Cross-reference segments of an organization’s
operations for comparison purposes
• Find patterns and trends that can’t be found with
databases
• Analyze large amounts of historical data quickly
Business Intelligence Infrastructure
• Data mart
• Smaller version of data warehouse
• Used by single department or function
• Advantages over data warehouses
• More limited scope than data warehouses
Business Intelligence Infrastructure
• Business analytics
• Uses data and statistical methods to gain insight into the data
• Provide decision makers with information they can act on
• Leverages and explores the data in a database, data warehouse, or
data mart system
• Descriptive analytics
• Reviews past events
• Analyzes the data
• Provides a report indicating what happened in a given period
• How to prepare for the future
• It is a reactive strategy.
Business Intelligence Infrastructure
• Business analytics
• Predictive analytics
• It is a proactive strategy
• It prepares a decision maker for future events
• Prescriptive analytics
• Recommending a course of action that a decision maker
should follow
• Shows the likely outcome of each decision
• Amazon Analytics, Google Analytics, and Twitter Analytics
• Web analytics: efficiency and effectiveness of a Web site
• Mobile analytics: measures traffic among mobile devices and all
the apps used by these mobile devices
Big Data Era
• The Challenge of Big Data
• Massive sets of unstructured/semi-structured data from web
traffic, social media, sensors, and so on
• Can reveal more patterns, relationships and anomalies
• Requires new tools and technologies to manage and analyze
• Volumes: Petabytes, exabytes of data
• Variety: Structured/Unstructured Data
• Velocity: Speed of gathering and processing data
• Veracity: Trustworthiness and accuracy of data
• Value: Value for the decision making process
Big Data Era
• Benefits from Big Data
• Retail
• Financial services
• Advertising and public relations
• Government
• Manufacturing
• Media and telecommunications
• Energy
• Healthcare
Big Data Era
• Tools and Technologies of Big Data
• Open-source Apache Hadoop
• Hadoop Distributed File System (HDFS) to manage storage.
• Distributed databases, including NoSQL and Cassandra
• Examples of big data commercial platforms
• SAP Big Data Analytics (www.sap.com/BigData)
• Tableau (www.tableausoftware.com)
• SAS Big Data Analytics (www .sas.com/big-data)
• QlikView (www.qlikview.com).
Big Data Era
• Big Data Privacy Risks
• Discrimination
• Privacy breaches and embarrassments
• Unethical actions based on interpretations
• Loss of anonymity
Database Marketing
• An organization’s database of customers and potential customers in
order to promote products or services that an organization offers.
• Implement marketing strategies that eventually increase profits
and enhance the competitiveness
• Use multivariate analysis, data segmentation, and automated
tools
• Loyalty programs, such as grocery chain club cards, airline mileage
programs, My Starbucks Rewards.
Database Marketing
•Successful Database marketing campaigns:
• Calculating customer lifetime value (CLTV)—what the lifetime
relationship of a typical customer will be worth to a business.
• Recency, frequency, and monetary analysis (RFM)—how
valuable a customer is based on the recentness of purchases,
frequency of purchases, and how much the customer spends
• Customer communications— Techniques to communicate
effectively with customers increases loyalty, customer
retention, and sales.
• Analytical software—Techniques in order to monitor
customers’ behavior across a number of retail channels,
including Web sites, mobile apps, and social media.

You might also like