Professional Documents
Culture Documents
Business Intelligence Systems
Business Intelligence Systems
o Main requirement: database schema, data extraction and loading, database size
Data Warehouse: integrated, subject-oriented, time-variant, and non-volatile collection of
data that supports decision making
Transform operational data to decision support data
o Operational data is in tabular format, with each row representing a single
transaction
o Decision support data is DSS that has broader timespan and multiple dimensions
ETL Process (check notes)
Star Schemas
o Data modeling technique that maps multidimensional decision support data into
relational data
o Components: facts, dimension, attributes
o Ex. Sales
Data Mining
o Use sophisticated statistical and mathematical techniques to find patterns and
relationships
o Analyze data, uncover problems hidden in data relationships and predict business
behavior
o Two modes: guided and automated
Data Mining Phases
o Data prep
Identify data set
Clean data set
Integrate data set
o Data Analysis and classification
Classification analysis
Clustering and sequential analysis
Link analysis
Trend and deviation analysis
o Knowledge acquisition
Select and apply algorithms
Neural networks
Inductive logic
Decision trees
Clustering
Regression tree
Nearest neighbor
Visualization, etc.
o
Prognosis
modeling
forecasting
predicting
Big Data
o Graph Neo4J
SQL limitations:
o Rigid schema, difficulty adding columns
o JOINS are expensive
o Cant handle unstructured data
o Not adaptive to new requirements
When to use RDBMS
When to use NoSQL
Centralized application (ERP)
Decentralized application (IOT, mobile,
web)
Moderate to high availability
Continuous availability, no downtime
Moderate velocity data
High velocity data (devices, sensors,
Data from one to few locations
etc)
Primarily structured data
Data from many locations (variety)
Complex/nested transaction
Structured, semi/unstructured data
Primary concern scaling reads
Simple transactions
Scale UP for more data/users
Primary concern scaling reads AND
Maintain moderate data volume with
writes
purge
Scale OUT for more data/users
Maintain high data volumes; retain
forever
MongoDB
Primary Keys
o MongoDB automatically creates PK called _id
o Uses combination of timestamp, machine id, process id, and counter
Query
o To query: db.products.find()
o To filter query: db.collection.find({
key 1: value
})