Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

APPENDIX

Normalization Rules

N ormalization is the process of removing data redundancy by implementing normalization


rules. There are five degrees of normal forms, from the first normal form through the fifth
normal form, as described in this appendix.

First Normal Form


The following are the characteristics of first normal form (1NF):

• There must not be any repeating columns or groups of columns. An example of a


repeating column is a customer table with Phone Number 1 and Phone Number 2
columns. Using “table (column, column)” notation, an example of a repeating group of
columns is Order Table (Order ID, Order Date, Product ID, Price, Quantity, Product ID,
Price, Quantity). Product ID, Price, and Quantity are the repeating group of columns.

• Each table must have a primary key (PK) that uniquely identifies each row. The PK
can be a composite, that is, can consist of several columns, for example, Order Table
(Order ID, Order Date, Customer ID, Product ID, Product Name, Price, Quantity). In
this notation, the underlined columns are the PKs; in this case, Order ID and Product
ID are a composite PK.

Second Normal Form


The following are the characteristics of second normal form (2NF):

• It must be in 1NF.

• When each value in column 1 is associated with a value in column 2, we say that
column 2 is dependant on column 1, for example, Customer (Customer ID, Customer
Name). Customer Name is dependant on Customer ID, noted as Customer ID ➤
Customer Name.

505
506 APPENDIX ■ NORMALIZATION RULES

• In 2NF, all non-PK columns must be dependent on the entire PK, not just on part of
it, for example, Order Table (Order ID, Order Date, Product ID, Price, Quantity). The
underlined columns are a composite PK. Order Date is dependent on Order ID but
not on Product ID. This violates 2NF.

• To make it 2NF, we need to break it into two tables: Order Header (Order ID, Order
Date) and Order Item (Order ID, Product ID, Price, Quantity). Now all non-PK columns
are dependent on the entire PK. In the Order Header table, Order Date is dependent on
Order ID. In the Order Item table, Price and Quantity are dependent on Order ID and
Product ID. Order ID in the Order Item table is a foreign key.

Third Normal Form


The following are the characteristics of third normal form (3NF):

• It must be in 2NF.

• If column 1 is dependent on column 2 and column 2 is dependent on column 3, we


say that column 3 is transitively dependent on column 1. In 3NF, no column is tran-
sitively dependent on the PK, for example, Product (Product ID, Product Name,
Category ID, Category Name). Category Name is dependant on Category ID, and
Category ID is dependant on Product ID. Category Name is transitively dependent
on the PK (Product ID). This violates 3NF.

• To make it 3NF, we need to break it into two tables: Product (Product ID, Product Name,
Category ID) and Category (Category ID, Category Name). Now no column is transi-
tively dependent on the PK. Category ID in the Product table is a foreign key.

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is between 3NF and 4NF. The following are the characteris-
tics of BCNF:

• It must be in 3NF.

• In Customer ID ➤ Customer Name, we say that Customer ID is a determinant. In


BCNF, every determinant must be a candidate PK. A candidate PK means capable of
being a PK; that is, it uniquely identifies each row.

• BCNF is applicable to situations where you have two or more candidate composite
PKs, such as with a cable TV service engineer visiting customers: Visit (Date, Route ID,
Shift ID, Customer ID, Engineer ID, Vehicle ID). A visit to a customer can be identified
using Date, Route ID, and Customer ID as the composite PK. Alternatively, the PK can
be Shift ID and Customer ID. Shift ID is the determinant of Date and Route ID.
APPENDIX ■ NORMALIZATION RULES 507

Higher Normal Forms


The following are the characteristics of other normal forms:

• A table is in fourth normal form (4NF) when it is in BCNF and there are no multivalued
dependencies.

• A table is in fifth normal form (5NF) when it is in 4NF and there are no cyclic
dependencies.

It is a good practice to apply 4NF or 5NF when it is applicable.

■Note A sixth normal form (6NF) has been suggested, but it’s not widely accepted or implemented yet.
Index

■Numbers and Symbols overview, 302


@ for naming report parameters, 343 purposes of, 323
1NF (first normal form), 506 audits
2NF (second normal form), 505 DQ auditing, 296–298
3NF (third normal form), 506 ETL, defined, 31
4NF (fourth normal form), 507 reports, 332
5NF (fifth normal form), 507 authentication of users, 498
authorization of user access, 498
■A Auto Build, 385
accounts, security audits of, 499 Auto Layout, 249
action column, 322 autofix action (DQ rules), 296
actions, data quality, 293–296 automating ETL monitoring, 492–493
administration functions
data quality monitoring, 495–498 ■B
database management, 499–501 backing up
ETL monitoring, 492–495 databases, 500
schema changes, 501–502 MDBs, 405–408
security management, 498–499 band attribute (Amadeus), 64
updating applications, 503 batch files
ADOMD.NET, 412 creating, 138, 157
aggregates. See also summary tables ETL, 269
defined, 415 updating, 15–16
alerts (BI), 437–438 BCNF (Boyce-Codd Normal Form), 506
aligning partition indexes, 166 BI (Business Intelligence)
allow action (DQ rules), 295 alerts, 437–438
Amadeus Entertainment case study. See case analytics applications, 413–416
study (Amadeus Entertainment) application categories, 411
AMO (Analysis Management Objects), 417 Business Intelligence Development Studio
Analysis Services (OLAP) Report Wizard, 339
authentication and, 397 dashboard applications, 432–437
cubes in, 397 data mining applications. See data mining
failover clusters and, 115 applications (BI)
partitioned cubes, 119 examples of, 12–13
tools vs. reports, 333 portal applications, 438–439
analytics applications (BI), 413–416 reports, 34, 412–413
applications, updating by DWA, 503 search product vendors, 474
architectures systems, applications for, 17–18
data flow. See data flow architecture binary files, importing, 190
determining, 52 bitmapping, index, 169
system. See system architecture design block lists (black lists), 451
association scores, 471 boolean data type (data mining), 419, 420
attributes, customer, 444 bounce rate (e-mail), defined, 447
audio processing, text analytics and, 473 bridge tables, defined, 109
audit metadata bulk copy utility (bcp) SQL command,
components of, 323 188–189
event tables, 323 bulk insert SQL command, 187, 189
maintaining, 327 business areas, identifying (Amadeus), 61–62
business case document, 51–52
509
510 ■INDEX

business Intelligence (BI). See BI (Business class attribute (Amadeus), 64


Intelligence) classification algorithm, 422
Business Objects Crystal Report XI, 356 cleaning (CDI), defined, 468
Business Objects XI Release 2 Voyager, 380 cleansing, data, 277–290
business operations, evaluating (Amadeus), click-through rate (email), 98, 447
62–63 clustered configuration, defined, 43
business performance management, 13 clustering algorithm, 422
business requirements Clustering model, 431
CRM data marts (Amadeus), 96 Cognos
subscription sales data mart (Amadeus), BI 8 Analysis, 380
90 PowerCube, 377, 379
verifying with functional testing, 480 Powerplay, 356
collation, database, 124
■C columns
calendar date attributes column (date continuous (data mining), 419
dimension), 77–78 cyclical (data mining), 420
campaigns description (data definition table), 305
creating CRM, 447–448 discrete (data mining), 419
defined, 447 discretized (data mining), 419
delivery/response data (CRM), 454–460 ordered (data mining), 420
response selection queries, 449 repeating, 505
results fact table, 99, 450 risk_level column, 322
segmentation (CRM), 18, 98, 447–450 status, 320, 322
candidate PK, 506 storing historical data as, 81
case sensitivity in database configuration, types in DW tables, 306
124 communication
case study (Amadeus Entertainment) Communication Subscriptions Fact Table
data feasibility study, 67–70 (example), 452
data warehouse risks, 67 communication_subscription transaction
defining functional requirements, 63–65 table (NDS database), 140–143
defining nonfunctional requirements, master table (NDS physical database), 143
65–67 permission, defined, 96
evaluating business operations, 62–63 preferences, defined, 96
extracting Jade data with SSIS, 191–200 subscription, defined, 96
functional testing of data warehouse, 480 comparing data (ETL monitoring), 494–495
identifying business areas, 61–62 complaint rate (email), 98
iterative methodology example, 56–58 conformed dimensions
overview of, 44–46 creating (views), 158
product sales. See product sales data mart defined, 7
(Amadeus) consolidation of data, 5–6
product sales reports, 349, 353, 355, 359, construction iteration, 56
369 content types (data mining), 419–420
query for product sales report, 331 continuous columns (data mining), 419
security testing, 485 control system, ETL, 31
server licenses and, 119 converting data for consolidation, 6
case table, defined (data mining), 418 cookies vs. self-authentication, 464
CDI (Customer Data Integration) covering index, 170
customer data store schema, 469 CRM (customer relationship management)
fundamentals, 23–24, 467–468 basics, 14
implementation of, 469 campaign analysis (Amadeus), 64
CET (current extraction time), 182 campaign delivery/response data,
change requests, procedures for, 501 454–460
character-based data types, 277 campaign segmentation, 447–450
charting. See also analytics applications (BI), customer analysis, 460–463
440 customer loyalty schemes, 465–466
churn analysis, 465 customer support, 463–464
■INDEX 511

data marts (Amadeus), 96–101 history, storing, 10–11


fundamentals, 441 integration, defined, 36
permission management, 450–454 leakage, ETL testing and, 187, 479
personalization, 464–465 lineage metadata. See data mapping
single customer view, 442–447 metadata
systems, applications for, 18–19 matching, 6, 277–290
cross-referencing vs. metadata (example), 475
data validation and, 291–292 querying basics, 11
data with external sources, 290–291 reconciliation of (ETL monitoring),
cross tab reports, 13 493–495
cubes (multidimensional data stores) retrieval of, 4–5
in Analysis Services, 397 risks, examples of (Amadeus), 67–69
building/deploying, 388–394 scrubbing, 277
Cube Wizard, 385 storage, estimating, 69
defined, 3 transformation, defined, 36
engines, 379 update frequency, 6
reports from, 362–366 data definition metadata
scheduling processing with SSIS, 399–404 overview, 301
current extraction time (CET), 318 report columns, 306
customer relationship management (CRM). table, 303
See CRM (customer relationship table DDL, 305
management) data extraction
customers connecting to source data, 179–180
analysis (CRM), 18, 460–463 ETL. See ETL (Extract, Transform, and
attributes, 444 Load)
behavior selection queries, 449 extracting e-mails, 191
customer table (NDS physical database), extracting file systems, 187–190
147–151 extracting message queues, 191
Customer Data Integration (CDI). See CDI extracting relational databases. See
(Customer Data Integration) extracting relational databases
data store schema (CDI), 469 extracting web services, 190
dimension, creating, 133 from flat files, 208–213
dimension, designing, 84–86 memorizing last extraction timestamp,
defined, 18 200–207
loyalty schemes (CRM), 18, 465–466 potential problems in, 178
permissions (CRM). See permissions, with SSIS, 191–200
management (CRM) from structured files, 177
profitability analysis, 13 from unstructured files, 178
services/support (CRM), 18, 463-464 data feasibility studies
cyclical columns (data mining), 420 Amadeus example of, 67–70
populating source system metadata, 317
■D purpose of, 67
daily batches, 269 data firewall
dashboards creating, 215, 218–219
applications (BI), 432–437 defined, 32
data quality, 275 data flow
data formatting, 249
architecture vs. data flow architecture, 29 table (ETL process metadata), 318–320
availability, 5 data flow architecture
cleansing, 69, 277–290 vs. data architecture, 29
comparing (ETL monitoring), 494 data stores. See data stores
consolidation of, 5–6 defined, 29
conversion of, 6 federated data warehouse (FDW), 39–42
defining, 6 fundamentals, 29–33
dictionary, defined, 308 NDS+DDS example, 35–37
hierarchy in dimension tables, 101–102
512 ■INDEX

ODS+DDS example, 38–39 DQ rules table, 321–322


single DDS example, 33–35 DW user table, 321–322
data mapping metadata overview, 302
data flow table, 307 data quality rules
overview, 302 data quality metadata and, 320
source column and, 306 defined, 32
data mart fundamentals, 291-293
fact tables and, 74 violations, 496-497
view, 158–159 data stores
data mining data lineage between, 307
applications for, 19–20 defined, 30
fundamentals, 14, 19 delivering data with ETL testing, 478
data mining applications (BI) overview, 31–32
column data types, 419–420 types of, 30
creating/processing models, 417–422 data structure metadata
demographic analysis example, 424–431 maintaining, 326
implementation steps, 417 overview, 302
processing mining structure, 423–424 populating from SQL Server, 311–313
uses for, 416 purposes of, 308–309
data modeling tables, 309–311
CRM data marts (Amadeus), 96–101 tables with source system metadata,
data hierarchy (dimension tables), 314–317
101–102 data types
date dimension, 77–80 conversion output for (SSIS), 250
defined, 29 in data mining, 419
designing DDS (Amadeus), 71–76 data warehouses (DW)
designing NDS (Amadeus), 106–111 advantages for SCV, 445–447
dimension tables, 76–77 alerts, 437
product sales data mart. See product sales building in multiple iterations, 54
data mart (Amadeus) The Data Warehouse Toolkit (Wiley), 82
SCD, 80–82 defined, 1, 16–17
source system mapping, 102–106 deploying, 53
subscription sales data mart (Amadeus), designing, 52
89–94 development methodology. See system
supplier performance data mart development methodology
(Amadeus), 94–95 development of, 52
data quality (DQ) DW keys, 109
actions, 293–296 vs. front-office transactional system, 5
auditing, 296-298 internal validation, 291
components in DW architecture, 274 major components of, 478
cross-referencing with external sources, MDM relationship to, 23
290–291 migrating to production, 491
data cleansing and matching, 277–290 non-business analytical uses for, 14
Data Quality Business Rules document, operation of, 53
292 populating. See populating data
database, defined, 32 warehouses
importance of, 273 real-time, 27
logging, 296–298 system components, 4
monitoring by DWA, 495–498 updating data in, 15–16
process, 274–277 uses for, 17
processes, defined, 32 databases
reports, 32, 332 collation of, 124
reports and notifications, 298–300 configuring, 123–128
data quality metadata design, data stores and, 7
components of, 320 extracting relational. See extracting
DQ notification table, 321–322 relational databases
■INDEX 513

management by DWA, 499–501 delivery


MPP systems, 175 campaign delivery/response data,
multidimensional. See MDB 454–460
(multidimensional database) channel, defined (CRM), 447
naming, 124 rate (e-mail), defined, 447
restoring backup of, 500 demographic data selection queries
servers, sizing, 116–118 (campaigns), 449
SQL Server. See physical database design denormalization (DDS dimension tables),
transaction log files, 189 251
DataMirror software, 190 denormalized databases, defined, 30
date dimension dependency network diagrams, 425
fundamentals, 77–80 deploying
source system mapping, 104 data warehouses, 53
dates reports, 366–369
data type (data mining), 419–420 description column (data definition table),
date/time data types, 278 305
dimension table, creating, 128–132 descriptive analysis
excluding in MDM systems, 21 in data mining, 417
format columns (date dimension), 77 defined, 14
DBA (Database Administrator), liaising with, examples of, 460–463
489 determinants, 506
DDL (Data Definition Language) diagram pane (Query Builder), 337
of data definition table, 303 dicing, defined (analytics), 413
of data mapping table, 307 dimension tables (DDS)
for subscription implementation fundamentals, 76–77
(example), 453 loading data into, 250–266
DDS (dimensional data store) dimensional attributes, defined, 76
database, creating new, 501 dimensional data marts, defined, 33
defined, 2, 30 dimensional data store (DDS). See DDS
designing (Amadeus), 71–76 (dimensional data store)
dimension tables, populating, 215, dimensional databases, defined, 30
250–266 dimensional hierarchy, defined, 101
drill-across dimensional reports, 333 dimensional reports, 332
fact tables, populating, 215, 266–269 dimensions, defined, 3, 377
fundamentals, 7 discrete columns (data mining), 419
vs. NDS, 9 discretized columns (data mining), 419
NDS+DDS example, 35–37 disk, defined, 121
ODS+DDS example, 38–39 distributing (CDI), defined, 468
single DDS example, 33–35 Division parameter example, 349–351
single dimension reports, 333 DMX (Data Mining Extensions), 432
sizing, 124, 126–128 DMX SQL Server data mining language, 417
DDS database structure documentation, creating, 489
batch file, creating, 138 documents
customer dimension, creating, 133 transforming with text analytics, 471–473
date dimension table, creating, 128–132 unstructured into structured, 471
product dimension, creating, 132 double data type (data mining), 419, 420
Product Sales fact table, 135 DQ (data quality). See data quality (DQ)
store dimension, creating, 135 drilling
decision trees across, 394
algorithm, 422 up, 414–415
model, 431 DW (data warehouse). See data warehouses
decode table (example), 180 (DW)
defragmenting database indexes, 500 DWA (data warehouse administrator)
degenerate dimensions, defined, 73 functions of, 56, 488–489. See also
deletion trigger, 184 administration functions
metadata scripts and, 326
dynamic file names, 188
514 ■INDEX

■E exception-based reporting, 492


e-commerce industry exception scenarios (performance testing),
customer analysis and, 460–461 484
customer support in, 464 execution, report, 374–375
e-mails external data, NDS populating and, 219,
email_address_junction table (NDS 222–223
physical database), 155–156 external notification (ETL monitoring),
email_address_table (NDS physical 493–494
database), 153 external sources, cross-referencing data with,
email_address_type table (NDS physical 290–291
database), 156–157 Extract, Transform, and Load (ETL). See ETL
extracting, 191 (Extract, Transform, and Load)
store application, 473 extracting relational databases
EII (enterprise information integration), 40 fixed range method, 186
elaboration iteration, defined, 56 incremental extract method, 181–184
ELT (Extract, Load, and Transform) related tables, 186
defined, 5 testing data leaks, 187
ETL and, 117 whole table every time method, 180–181
fundamentals, 175
end-to-end testing ■F
defined, 477 fact constellation schema, 7
fundamentals, 487 fact tables
enterprise data warehouse, illustrated, 10 campaign results, 99
Enterprise Edition, SQL Server, 118–119 loading data into (DDS), 250, 266–269
enterprise information integration (EII). See populating DDS, 215, 266–269
EII (enterprise information product sales (Amadeus), 71, 75, 102
integration) subscription sales (Amadeus), 90
entertainment industry, customer support supplier performance (Amadeus), 90
in, 464 failover clusters
ETL (Extract, Transform, and Load) defined, 114
batches, 269 number of nodes for, 119
CPU power of server, 116 FDW (federated data warehouse), 39–42
defined, 2, 4 feasibility studies, 51–52
ELT and, 175 federated data warehouse (FDW). See FDW
extraction from source system, 176–177 (federated data warehouse)
fundamentals, 32, 173–174 fibre networks, 115
log, 483 fifth normal form (5NF), 507
monitoring by DWA, 492–495 file names, dynamic, 188
near real-time ETL, 270 file systems, extracting, 187–190
performance testing and, 482 filegroups, 131–132
pulling data from source system, 270 filtering reports, 351
testing, defined, 477–479 financial industry, customer support in, 463
ETL process metadata firewalls
components of, 318 creating data, 215, 218–219
overview, 302 ODS, 276
purposes of, 320 first normal form (1NF), 505
tables, 318–320 first subscription date, 274
updating, 327 fiscal attribute columns (date dimension),
events 77, 78
defined, 62 fix action (DQ rules), 295
Event Collection (Notification Services), fixed position files, 177
438 fixed range extraction method, 185–186
event tables (audit metadata), 323 flat files, extracting, 187, 208–213
exact matching, 278 forecasting (data mining), 416
Excel, Microsoft, creating reports with,
359–362
■INDEX 515

foreign keys inception iteration, 56


naming, 146, 157 incoming data validation, 291
necessity of, 137 incremental extraction method, 181–184
fourth normal form (4NF), 507 incremental loading (DDS dimension tables),
fragmentation of database indexes, 500 251
frequent-flier programs, 465 incremental methodology. See iterative
full-text indexing, 126 methodology
functional requirements indexes
defined, 61 covering index, 170
establishing (Amadeus), 63–65 creating in partitioned tables, 170
functional testing index intersection, 169
defined, 477 Index Wizard, 168
fundamentals, 480 indexer in search applications, 474
fuzzy logic matching, 278 maintaining database, 500
Fuzzy Lookup transformation (example), indexing
279–290 full-text, 126
implementing, 166–170
■G online index operation, 119
galaxy schemas, 7 parallel index operations, 119
general performance requirements, 483 stage tables, 217
general permissions (CRM), 450 indicator columns (date dimension), 77, 79
Generic Query Designer, 352 inferred dimension members, 260
geographic dispersion of rule violations, 496 infrastructure setup overview, 53
global enterprise currency, 75 Inmon, Bill, 16
Google search products, 474 insert SQL statements, 323
grain, table, 72 insurance industry, customer analysis and,
granularity, FDW data and, 39 460
grid pane (Query Builder), 337 integration testing. See end-to-end testing
grouping reports, 351–355 internal data store, defined, 30
groups, security, 498–499 internal notification (ETL monitoring), 493
internal validation, data warehouse, 291–292
■H intersection, index, 169
hard RI, defined, 76 invoices, text analytics and, 473
hardware platform (physical database iterative methodology, 54–59
design), 113–119
help desk support, 488 ■J
hierarchy Jade system, 45
data, 101–102 junction tables
dimensional, 101 defined, 109
MDM, 23 NDS populating and, 219, 225–228
historical data, storing, 10–11 Jupiter ERP system, 44
HOLAP (Hybrid Online Analytical
Processing), 381 ■K
horizontal/vertical partitioning, 162 key columns (data mining), 419
hot spare disks, 123 key management
hubs, MDM, 23 DDS dimension tables and, 251
Hungarian naming conventions, 343 in NDS, 151
hybrid data store, defined, 30 NDS populating and, 219, 223–225
hypercubes, 378 key sequence columns (data mining), 420
Hyperion Essbase, 377, 379 key time columns (data mining), 420
keys, DW, 109
■I Kimball dimensional modeling, 41
IIS logs, 325 Kimball, Ralph, 16, 82
image processing, text analytics and, 473 knowledge discovery, 416. See also data
impact analysis, defined, 302 mining
inactive accounts, security audits of, 499
516 ■INDEX

■L online analytical processing (OLAP),


language attribute table (NDS physical 380–381
database), 145–146 querying, 394–396
last cancellation date, 274 vs. relational databases, 378
last month view, 159 scheduling cube processing with SSIS,
last successful extraction time (LSET), 318 399–404
late-arriving dimension rows, 260 security of, 397–399
late-arriving facts, 269 MDBMS (multidimensional database
latest summary table, 161 management systems), 379, 415
layouts, report, 340–342 MDDS (multidimensional data store), 377
leakage, defined, 174 MDM (master data management)
leavers examples of, 20–21
defined, 498 fundamentals, 21–23
updating, 499 OLTP systems and, 22
levels of objects, defined, 63 relationship to data warehouses, 23
licensing models, SQL Server, 119 MDX (Multidimensional Expressions)
lift charts, defined (data mining), 430 fundamentals, 435
list selection process (campaigns), 448 MDX Query Designer, 365
loading/query of partitioned tables, 163 membership subscriptions, 452
log files memory maintenance (database
database transaction, 189 management), 500
size of, 125 message queue (MQ). See MQ (message
logging queue)
data quality, 296–298 messaging, defined, 16
ETL log, 483 metadata
log reader, database, 176–177 change request (example), 326
SSIS logging, 484 vs. data (example), 475
web logs, 189 database, configuring, 126, 128
logical unit number design, 123 defined, 31
logins for customer ID, 464 maintaining, 325–327
long data type (data mining), 419, 420 overview, 301–303
Lookup transformations, upsert using, reasons for using, 303
236–242 storage, 22
loyalty schemes, customer (CRM), 465–466 types of, 301–302
LSET (last successful extraction time), 182 unstructured data and, 473
methodology, system development. See
■M system development methodology
massively parallel processing (MPP) Microsoft
database system. See MPP (massively Analysis Services, 377, 379
parallel processing) database system clustering algorithm, 428
master data Office SharePoint Server, 438–439
fundamentals, 21 MicroStrategy OLAP Services, 381
management (MDM). See MDM (master migrating data warehouse to production,
data management) 487–489, 491
store, defined, 30 mini-batches, defined, 15, 269
storing history of, 11 Mining Structure designer, 421–422
master tables, defined, 36, 106 MOLAP (multidimensional online analytical
matching, data, 6, 277–290 processing)
matching rules (metadata storage), 22 applications, 415
matrix form (reports), 13, 338, 342 defined, 14, 381
MDB (multidimensional database) monitoring
backing up and restoring, 405–408 data quality, 495–498
building/deploying cube, 388–394 ETL processes, 492–495
creating (Amadeus), 381–387 Morris, Henry, 413
defined, 3, 31
fundamentals, 377–379
■INDEX 517

movers networks, testing security access, 485


defined, 498 NK (natural key). See natural keys
updating, 499 NLB (network load balanced) servers, 114
MPP (massively parallel processing) nodes
database system columns, defined, 425
defined, 43 defined, 43
fundamentals, 175 nonfunctional requirements
MQ (message queue) defined, 61
basics, 16 establishing, 65–67
extracting, 191 normal scenarios (performance testing), 484
failure, simulating, 479 normalization
multidimensional data stores (cubes). See defined, 8
cubes (multidimensional data stores) NDS population and, 219, 242–248
multidimensional database (MDB). See MDB normalized databases, defined, 30
(multidimensional database) normalized data store (NDS). See NDS
multidimensional online analytical (normalized data store)
processing (MOLAP). See MOLAP rules, 109, 505–507
(multidimensional online analytical notification
processing) column, 320
multiple iterations, building in, 54 data quality, 275, 298–300
to monitor ETL processes, 493
■N Notification Delivery (Notification
naming Services), 438
database, 124 Notification Services, SQL Server, 438
dynamic file names, 188 numerical data types, 278
foreign keys, 146
primary keys, 146 ■O
report parameters, 343 OCR (Optical Character Recognition), 471
tables, 137 ODBC (Open Database Connectivity), 412
natural keys ODS (operational data store )
defined, 37, 223 CRM systems and, 18
example, 84 defined, 30
NDS (normalized data store) firewall, 276
customer table (Amadeus), 110 ODS+DDS architecture, configuring, 126
defined, 30 ODS+DDS architecture (example), 38–39
designing (Amadeus), 106–111 reports, 332
fundamentals, 8–10 OLAP (Online Analytical Processing)
NDS+DDS example, 35–37 applications. See analytics applications
populating, 215, 219–228 (BI)
populating with SSIS, 228–235 basics, 14
population, normalization and, 242–248 fundamentals, 380–381
sizing, 124 server cluster hardware, 116
store table example (populating), 242 servers, 379
NDS physical database, creating tools, 333, 356, 380
batch file, 157 OLTP (Online Transaction Processing)
communication master table, 143 vs. data warehouse reports, 333
communication_subscription transaction defined, 2
table, 140–143 Online Analytical Processing (OLAP). See
customer table, 147–151 OLAP (Online Analytical Processing)
email_address_junction table, 155–156 online index operation, 119
email_address_table, 153 Online Transaction Processing (OLTP). See
email_address_type table, 156–157 OLTP (Online Transaction
language attribute table, 145–146 Processing)
order_header table, 151–153 open rate (e-mail), defined, 98, 447
overview, 139 operation, data warehouse (overview), 53
near real-time ETL, 270 operation team, user support and, 488
518 ■INDEX

operational data store (ODS). See ODS NDS, creating physically. See NDS physical
(operational data store ) database, creating
operational system alerts, 437 partitioning tables. See partitioned tables
opting out (permissions), 454 (databases)
order column, defined, 318 sizing database server, 116–118
order header table SQL Server, editions of, 118–119
example, 182 SQL Server, licensing of, 119
NDS physical database, 151–153 storage requirements, calculating,
ordered columns (data mining), 420 120–123
summary tables, 161
■P views. See views (database object)
package, ETL, defined, 31 PIM (product information management), 22
package table (ETL process metadata), PM (project manager), function of (example),
318–320 56
parallel database system. See MPP (massively populating data warehouses
parallel processing) database system data firewall, creating, 215, 218–219
parallel index operations, 119 DDS dimension tables, 215, 250–266
parallel query, defined, 10 DDS fact tables, 266–269
parameters, report ETL batches, 269
Division parameter example, 349–351 NDS, 215, 219–228
naming, 343 NDS with SSIS, 228–235
overview, 342 near real-time ETL, 270
Quarter parameter example, 346–348 normalization, 242–248
Year parameter example, 345–346 overview, 215
partition indexes, aligning, 166 pushing data approach, 270–271
partitioned cubes, 119 SSIS practical tips, 249–250
partitioned tables (databases) stage loading, 215, 216–217
administering, 166 upsert using Lookup transformation, 236
creating indexes in, 170 upsert using SQL statements, 235–236
loading/query of partitioned tables, 163 portals
maintenance of, 500 applications (BI), 438–439
Subscription Sales fact table example, 162, creating data warehouse, 489
163–166 post office organizations, 290
vertical/horizontal partitioning, 162 Prediction Query Builder, 417
partitioning, table and index, 118 predictive analysis
patches, security, 498 basics, 13
per-processor licenses (SQL Server), 119 customer analysis (example), 461
performance in data mining, 416
requirements, 483 defined, 14
testing, defined, 477 PredictProbability function, 432
testing, fundamentals, 482–484 primary keys, naming, 146
periodic snapshots processes
defined, 11 data quality, 274–277
fact table, 90, 269 ETL, 31
periodic updating of data, 6 mining structure (data mining), 423–424
permissions ProClarity Analytics 6, 380
management (CRM), 18, 450–454 product data, MDM systems and, 21–22
selection queries, 449 product dimension
personalization (CRM), 18, 464–465 creating, 83–84, 132
physical database design source system mapping, 105
configuring databases, 123–128 product information management (PIM). See
DDS database structure, creating. See DDS PIM (product information
database structure management)
hardware platform, 113–119 product sales data mart (Amadeus)
indexing, 166–170 analysis of product sales, 63
customer dimension, 84–86
■INDEX 519

date dimension, 77–80 relational online analytical processing


fact tables, 71, 75 (ROLAP). See ROLAP (relational
product dimension, 83–84 online analytical processing )
sales taxes, 73 reliability DQ key, 277
source system logic, 73 repeating columns, 505
source system mapping, 103 reports
store dimension, 86–87 BI, 412–413
production environment, migrating DW to, creating with Excel, 359–362
487–489 creating with report wizard, 334–340
profitability band attribute (Amadeus), 64 data quality, 275, 298–300
project management, 53 deploying, 366–369
pull approach (updating), 16, 22 dimensional, 332
purchase orders, 62 execution, managing, 374–375
purchase pattern table (data mining), 418 filtering, 351
pushing data approach formatting cells, 341
for populating DW, 270–271 fundamentals, 13
updating with, 16, 22 grouping, 351–355
layout of, 340–342
■Q from multidimensional data stores,
QA (Quality Assurance) in DW, 46 362–366
Quarter parameter example, 346–348 OLAP tools vs. data warehouse, 333
querying OLTP vs. data warehouse, 333
data, 11 overview, 329–332
MDBs, 394–396 parameters. See parameters, report
Query Builder, 244, 337 report columns (data definition table), 306
Query Execution Plan, 168 Report Manager, 366–367
recursive queries, defined, 308 report server scale-out deployment, 118
Reporting Services SharePoint web parts,
■R 439
RAID (Redundant Array of Inexpensive search, 475
Disks) security, managing, 370–372
definition and configurations, 121 simplicity vs. complexity of, 356–357
RAID 5 volumes, 122 sorting, 351, 354
ranking algorithms, 474 spreadsheets, 357–362
RCD (rapidly changing dimension), 82 subscriptions, managing, 372–374
real-time data integration, 271 types of, 332–333
real-time data warehouse requests, change, 501
fundamentals, 27 requirements, determining user, 52
updates from key tables, 15 response data. See campaigns,
recipient_type table, 322 delivery/response data (CRM)
reconciliation, to monitor ETL processes, restoring MDBs, 405–408
493–495 retrieval of data, 4–5
recoverability retriever (search applications), 474
defined, 174 retrieving (CDI), defined, 468
ETL testing and, 479 revenue analysis, 465
recovery model, 125 revoking permissions, 454
recursive queries, 308 risk_level column, 322
referential integrity, 75 ROLAP (relational online analytical
refresh frequency, 313 processing )
reject action (DQ rules), 294–295 applications, 415
relational databases defined, 14, 380
analytics and, 415 roles
defined, 30 defined, 63
extracting. See extracting relational security, 498
databases Ross, Margy, 82
rows, storing historical data as, 80
520 ■INDEX

rules sequential methodology. See waterfall


data quality, 291–293 methodology
DQ, adjusting, 496–497 server+CAL licenses (SQL Server), 119
normalization, 109, 505–507 servers, sizing database, 116–118
rule-based logic, 278 service-oriented architecture (SOA). See SOA
rule category table, 322 (service-oriented architecture)
rule risk level table, 322 share nothing architecture, 175
rule (SQL Server keyword), 322 SharePoint Server, Microsoft Office, 438–439
rule type table, 322 Simon, Alan, 17
RUP (Rational Unified Process) methodology, single customer view, 18, 442–447
56 single DDS architecture example, 33–35
single login requirement, 70
■S sizing database servers, 116–118
sales taxes, 73 SK (surrogate key), defined, 223
SAN (storage area network), 115 slicing, defined (analytics), 413
scale-out deployment, 115 slowly changing dimension (SCD). See SCD
scanned documents, text analytics and, 473 (slowly changing dimension)
SCD (slowly changing dimension) smalldatetime, 131
DDS dimension tables and, 251–265 SMP (symmetric multiprocessing) database
defined, 11 system, 43
fundamentals, 80–82 snapshots
Slowly Changing Dimension Wizard defined, 11
(SSIS), 228–230 report output, 374
schemas, database snowflake schemas
design for campaign delivery/response basics, 7
data, 457–459 benefits of, 89
managing changes to, 501–502 SOA (service-oriented architecture), 26–27
snowflake, 7, 89 soft deletes (records), 184
updating, 501 sorting reports, 351, 354
scoring routines (search applications), 474 source data
scripts, metadata, 326 connecting to, 179–180
scrubbing, data, 277 profiles, 317–318
SCV (single customer view), 442–447 source system metadata
SDLC (system development life cycle). See overview, 302
system development methodology populating, 317
searching purposes of, 313
fundamentals, 25–26 source data profiles, 317–318
search facilities, 474 table components of, 314–317
search interface, 475 source systems
second normal form (2NF), 505 analysis. See data feasibility studies
security functional testing and, 481–482
groups, defined, 498 logic, replicating, 72
management by DWA, 498–499 mapping, 102–106
of MDBs, 397–399 moving data out of, 176
report, managing, 370–372 pushing data from, 270–271
testing, defined, 477 querying, 12
testing, fundamentals, 485-486 spam verdict (e-mail), defined, 98
segmentation specific performance requirements, 483
algorithm, 422 specific permissions (CRM), 450
campaign (CRM), 447–450 specific store view, 160
selection queries (campaigns), 448 spiral methodology. See iterative
self-authentication vs. cookies, 464 methodology
semiadditive aggregate functions, 119 spreadsheets (reports), 357–362
semistructured files, defined, 178 SQL (Structured Query Language)
Send Mail tasks (SSIS), 493 Native Client driver, 412
queries, exploring data with, 357
■INDEX 521

query formatting, 352 store dimension


statements, upsert using, 235–236 creating, 135
SQL Server designing (Amadeus), 86–87
Analysis Services 2005, 356 structured data, defined, 470
Configuration Manager, 367 structured files, extracting, 177
databases, design of. See physical subscribers
database design subscriber class attribute (Amadeus), 64
Enterprise Edition, 118–119 subscriber profitability, analyzing
licensing, 119 (Amadeus), 64
Management Studio, 232, 363 subscriptions
Notification Services, 438 Communication Subscriptions Fact Table
object catalog views, 311–313, 326 (example), 452
Profiler, performance testing and, 484 managing report, 372–374
Reporting Services. See SSRS (SQL Server membership, 452
Reporting Services) permissions (CRM), 451
system views, naming, 322 sales, analyzing (Amadeus), 63
SSAS (SQL Server Analysis Services) sales data mart (Amadeus), 89–94
data mining in, 20 sales fact table (partitioning), 163–166
KPIs and, 434–437 Subscription Management (Notification
as OLAP tool, 380 Services), 438
SSIS (SQL Server Integration Services) Subscription Processing (Notification
data extraction with, 191–200 Services), 438
failover clusters and, 115 Subscription Sales fact table
logging, 484 (partitioning), 162
packages, simulating incremental load summary tables
with, 70 application performance and, 484
populating dimension table with, 251–265 fundamentals, 161
populating NDS with, 228–235 supplier performance
practical tips, 249–250 analyzing (Amadeus), 64
scheduling cube processing with, 399–404 data mart (Amadeus), 94–95
Send Mail tasks in, 493 SupplyNet system, 44
SSRS (SQL Server Reporting Services) support, types of user, 53
building reports with, 329–330 surrogate keys, defined, 37
charts and tables with, 412 survivorship rules (metadata storage), 22
DQ reports and, 299 symmetric multiprocessing (SMP) database
NLB servers and, 114 system. See SMP (symmetric
report security and, 370–371 multiprocessing) database system
scheduling package (example), 403–404 sys.dm_db_index_physical_stats dynamic
stage data store management function, 500
defined, 30 system architecture design, 42–44
fundamentals, 33 system development methodology
stage loading (populating DW), 215–217 defined, 49
star schemas, 7, 89 iterative methodology, 54–59
statistical analysis, 13 waterfall methodology, 49–53
status columns, 320, 322 systematic comparisons (ETL monitoring),
status of objects, defined, 62 495
status table (ETL process metadata), system_user SQL variable, 325
318–320
steps, ETL, defined, 31 ■T
storage of data table grain, defined, 72
calculating database requirements, 120, table partitioning
123 defined, 10
customer data, 468 maintenance of, 500
estimating, 69 tables
unstructured data, 470 column types in DW, 306
data definition metadata, 303
522 ■INDEX

data mapping, 307 travel industry


data quality audit. See audits, DQ auditing customer analysis and, 461
data quality metadata, 321–322 customer support in, 463
data structure metadata, 309–311, 314–317 treatment, defined (campaigns), 448
DDL of data definition, 305 Trend expression (MDX), 435
DDL of data mapping, 306 triggers
ETL process metadata, 318–320 database, 176
loading DDS fact, 215, 250, 266–269 detecting updates and inserts with, 184
log, data quality. See logging update, 184
naming, 137
normalization rules and, 505–507 ■U
populating DDS dimension, 215, Unicode, 131
250–266 unknown records, defined, 233
source system metadata, 314–317 unsegmented campaigns, 448
structure of stage, 216 unstructured data
updating related, 186 defined, 470
usage log (usage metadata), 325 fundamentals, 24–25
whole table every time extraction method, metadata and, 473
180–181 search facilities and, 475
tabular data, defined, 178 storing, 470
tabular report (example), 330 text analytics and, 471–473
telecommunications industry unstructured files, extracting, 178
customer analysis and, 460 update triggers, 184
customer support in, 463 updating
testing applications, 503
data leaks, 187 batch data, 15–16
database restore, 500 customer data store, 468
end-to-end testing, 487 data warehouse schemas, 501–502
ETL testing, 478–479 ETL process metadata, 327
functional testing, 480 periodic data, 6
performance testing, 482–484 upsert
security testing, 485–486 using Lookup transformation, 236–242
types of, 477–478 using SQL statements, 235–236
user acceptance testing (UAT), 477, usage metadata
486–487 maintaining, 327
waterfall methodology and, 52 overview, 302
text analytics purposes of, 324–325
for recruitment industry, 471 usage log table, 325
transforming documents with, 471–473 usage reports, 332
text data type (data mining), 419, 420 user acceptance testing (UAT)
third normal form (3NF), 506 defined, 477
time fundamentals, 486–487
consolidating data with different ranges, user-facing data store, defined, 30
5 users
excluding in MDM systems, 21 authentication of, 498
timestamps authorizing access of, 498
memorizing last extraction, 200–207 interface, search facility and, 475
reliable, 182 training, 489
transactions utilities industry
database transaction log files, 189 customer analysis and, 460
Transact SQL script, 326 customer support in, 463
transactional systems, 5, 12
transaction fact table, defined, 90 ■V
transaction tables, defined, 36, 106 validations, types of, 291
transition iteration, 56 VAT (value-added tax), 73
trap hit rate (email), 98 vertical/horizontal partitioning, 162
■INDEX 523

views (database object) web services, extracting, 190


conform dimensions, creating, 158 WebTower9 system, 44
data mart view, 158–159 whole table every time extraction method,
defined, 157 180–181
increasing availability with, 160–161 Windows 2003 R2 Datacenter Edition, 118
last month view, 159 Windows 2003 R2 Enterprise Edition (EE),
purposes of, 157 118
specific store view, 160
virtual layers, creating, 158 ■X
virtual layers, creating (views), 158 XML files as source data, 190
volume, disk, 121 XMLA (XML for Analysis )
accessing MDBs with, 412
■W connecting to MDBMS with, 379
waste management, customer analysis and, processing mining models with, 417
461 scripts, backing up MDBs with, 406–408
waterfall methodology, 49–53
web analytics, 15 ■Y
web logs, 189 Year parameter example, 345–346
web parts (SharePoint), defined, 438

You might also like