Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 120

Data Strategy in Practice

Sid Adelman &


Associates
sidadelman@aol.com
818.783.9634
Data Strategy
 Module 1 – Introduction to Data Strategy
 Module 2 – Data Quality
 Module 3 – Metadata
 Module 4 – Organization, Roles &Responsibilities
 Module 5 – Security & Privacy
 Module 6 – Business Intelligence
 Module 7 – Information Integration
 Module 8 – Software/Products
 Module 9 – Performance & Measurement

Copyright Sid Adelman, 2007 2


Module 1 – Introduction to Data
Strategy
 Components of a data strategy
 Why have a data strategy
 Do these problems exist in your organization?
 Gain control
 Support the IT strategy
 Data in the Dark Ages
 Enlightened data strategy
 Critical success factors
 How to implement a data strategy
 Best Practices

Copyright Sid Adelman, 2007 3


Components of a Data Strategy +
 RDBMS - Relational Database Management
System
 Data Quality
 Metadata
 Performance
 Data Distribution
 Organization
 Data Ownership

Copyright Sid Adelman, 2007 4


Components of a Data Strategy +
 Security and Privacy
 Total Cost of Ownership
 Subject area databases
 Data modeling
 Data sharing
 Business Intelligence
 Information integration

Copyright Sid Adelman, 2007 5


Components of a Data Strategy +
 Legacy/operational data
 Standards
 Data migration
 Application packages
 Software/products
 Personal/departmental databases

Copyright Sid Adelman, 2007 6


Components of a Data Strategy
 Categorization of data
 Communicating and selling the data
strategy
 Measurement

Copyright Sid Adelman, 2007 7


Why Have a Data Strategy
 Capitalize on the data asset
 Support the IT Strategy
 Gain control

Copyright Sid Adelman, 2007 8


Do these problems exist in your
organization? +
 Uncontrolled redundant data
 Data not easily accessible by the user
 Lack of knowledge of available data
 Poor data quality
 Each new application designs, builds and
populates it own data base
 Inconsistent reports

Copyright Sid Adelman, 2007 9


Do these problems exist in your
organization?
 Private databases
 No central meta data repository
 Management unclear on the importance of
data
 No responsibility for data
 Data standards non existent, not understood
or not followed

Copyright Sid Adelman, 2007 10


Gain Control
 Consistent security implementation
 Understand, define and assign ownership
 Understand, define and assign stewardship
 Minimize redundancy
 Inventory data
 Develop consistent terminology

Copyright Sid Adelman, 2007 11


Support the IT Strategy
 Provide departments, projects and personnel with
guidelines for storing and accessing data
 Minimize the number of RDBMSs
 Establish, disseminate and maintain standards for
shared data resources
 Deliver a high level of service
– performance
– Availability
– response time
– responsiveness to user requests

Copyright Sid Adelman, 2007 12


Data in the Dark Ages
 Data is kept locked by each application or
department
 Users do not trust the data
 Data is not well understood either by users
or by IT
 Data is difficult to access
 Senior Management does not understand
the value of data

Copyright Sid Adelman, 2007 13


Enlightened Organization
 Data is shared
 Users trust the accuracy of the data
 Data is inventoried and terminology is clear
 Data is easily accessed by IT and by the users
 Senior Management view data as an asset that
is critical to the organization and to decision
making

Copyright Sid Adelman, 2007 14


Critical Success Factors
 Data Strategy supports IT plans
 Quality data
 Support of legacy data
 Support of development efforts
 Infrastructure
– Organization
– Skills
– Tools
 Achieve short-term successes

Copyright Sid Adelman, 2007 15


How to Implement a Data
Strategy
 Data environment assessment
 Establish a target data environment
 Develop an implementation plan
 Sell Data Strategy within the organization
 Evaluate progress and justify your
existence
 Revisit the plan

Copyright Sid Adelman, 2007 16


Best Practices
 Don’t get into the details too soon
 Don’t be seen as a theorist -- your actions
must be pragmatic
 Don’t lead with long-term deliverables
 Don’t commit more than you can deliver
 Avoid unproven technology

Copyright Sid Adelman, 2007 17


Module 1 Workshop
Assessment of Existing Organization

Copyright Sid Adelman, 2007 18


Module 2– Data Quality
 Management Support
 Evaluation/Diagnosis
 Timeliness
 ETL Validation
 Prioritization - Which Data to Clean First
 Cost of Cleansing
 Responsibility for Data Quality

Copyright Sid Adelman, 2007 19


Management Support
 Management awareness of importance of
data quality
 Cost justification of data quality initiative
 Ongoing commitment
 Finding a business management sponsor

Copyright Sid Adelman, 2007 20


Evaluation/Diagnosis

 Which source data is most correct


 Valid values (domains)
 Business rules
 Data types (e.g., hex, packed decimal)
 Completeness
 Inappropriate defaults
 Fields used for multiple purposes
 Accuracy
 Quality of historical data

Copyright Sid Adelman, 2007 21


Data Timeliness
 Currency of data, e.g., last Friday
 Frequency of update, e.g., daily, weekly,
monthly, quarterly
 User awareness – how will the users know?

Copyright Sid Adelman, 2007 22


ETL Validation
 Validation of ETL process
 Tie-outs
– Number of records
– Dollar matching
– Quantitative matching
 Automatic versus manual checking
 Referential integrity?

Copyright Sid Adelman, 2007 23


Triage - Prioritization
 Which data to clean
 Justification for cleansing
 Ease of cleansing
 Possibility of cleansing
 Political support for cleansing

Copyright Sid Adelman, 2007 24


Cost of Cleansing
 Automatic versus manual
– Tools to perform automatic cleansing
– Effort to support use of tools
 Use of defaults
 Knowledge/experience of those performing
manual cleansing

Copyright Sid Adelman, 2007 25


Responsibility for Data Quality
 “It’s not enough to say that data quality is
everyone’s responsibility.”
 Data Quality Administrator
 Ongoing commitment
 Data ownership responsibility
 Operational versus data warehouse
responsibility

Copyright Sid Adelman, 2007 26


Data Quality – Best Practices
 Inventory the quality of your data
 Sell the importance of data quality to
management
 Assign data quality responsibility
 Triage the cleansing process

Copyright Sid Adelman, 2007 27


Module 2 Workshop
Data Quality

Copyright Sid Adelman, 2007 28


Module 3– Metadata
 Management Support
 Meta Data as the Keystone
 Which Metadata to Capture
 Responsibility for Capture
 Responsibility for Maintenance
 Business Metadata
 Technical Metadata
 How will Metadata be Used
 Data Inventory

Copyright Sid Adelman, 2007 29


Metadata – Management Support
 IT and the Business
 Management understanding of the
importance of metadata
 Impact on project schedules
 Long term benefit of metadata
 Importance for operational and data
warehouse

Copyright Sid Adelman, 2007 30


Metadata as the Keystone
 Single version of the truth
 It’s the inventory of information
 Tears down dysfunctional information
fiefdoms
 Opportunities to reduce redundancy
 Opportunities for integration

Copyright Sid Adelman, 2007 31


Which Metadata to Capture
 Don’t boil the ocean
 What meta data is valuable
 Ease and cost of capture
 Political issues relating to capture

Copyright Sid Adelman, 2007 32


Responsibility for Capturing
Metadata
 Incentive for capturing
 Management direction
 Automatic and manual

Copyright Sid Adelman, 2007 33


Responsibility for Metadata
Maintenance
 Where does Metadata Repository
maintenance report?
 Why is maintenance important?
 Long-term commitment

Copyright Sid Adelman, 2007 34


Business Metadata
 Business definitions
 Source of data
 How data was derived (algorithms)
 Lineage (data genealogy)
 Timeliness
 Security
 Ownership
 Quality

Copyright Sid Adelman, 2007 35


Technical Metadata
 Field name
 Database
 Data type
 Source
 Length

Copyright Sid Adelman, 2007 36


How Will Metadata be Captured
 Data modeling tools
 ETL tool
 Access and analysis tool
 Metadata Repository tool
 Data dictionary
 Copybooks
 Home grown application

Copyright Sid Adelman, 2007 37


How Will Metadata be Used
 Business
– Understanding the data
– Understanding the meaning of results
– Avoiding incorrect conclusions
 IT
– Research
– Impact analysis
– Tool interchange

Copyright Sid Adelman, 2007 38


Inventory
 Where is the data?
 How and where is it used?
 Quality of data
 Redundancy
 Ownership
 Documentation

Copyright Sid Adelman, 2007 39


Metadata – Best Practices
 Determine which meta data to capture and
use
 Determine how the tools will capture and
use metadata
 Sell management on the importance
 Assign metadata responsibility

Copyright Sid Adelman, 2007 40


Module 3 Workshop
Metadata

Copyright Sid Adelman, 2007 41


Module 4 Organization – Data-
related Roles & Responsibilities

 Database Administrator
 Data Administrator
 Data Quality Administrator
 Security
 Architect
 Data ownership

Copyright Sid Adelman, 2007 42


Database Administrator
 Database design
 Backup and recovery
 Reorganization
 Monitoring
 Tuning
 Index creation

Copyright Sid Adelman, 2007 43


Data Administrator
 Data modeling
 Source data evaluation
 Enterprise data integration
 Data quality analysis
 Metadata responsibility

Copyright Sid Adelman, 2007 44


Data Quality Administrator
 Uncovering data quality problems
 Communicating data quality problems
 ETL verification
 Responsibility for some cleansing

Copyright Sid Adelman, 2007 45


Security

 Responsibility for who can do what to the


data
– Data access
– Data create/update/delete
 Working with those administering the tools
that have security capabilities

Copyright Sid Adelman, 2007 46


Architect
 Knowing what the enterprise needs
 Evaluating technical options
 Developing an appropriate architecture
 Selling the architecture

Copyright Sid Adelman, 2007 47


Data Ownership +
 Creation
 Access
 Determine requirements for performance
 Determine requirements for availability
 Determine historical requirements

Copyright Sid Adelman, 2007 48


Creation
 Data Entry process
– Training
– Incentives for quality
 Quality of data
 Data edits

Copyright Sid Adelman, 2007 49


Access
 Need to know
 Opt in/Opt out
 Level of granularity
 By department
 By role
 External access by people outside the
organization

Copyright Sid Adelman, 2007 50


Performance Requirements
 Response time
 What is excellent response time worth?
 Timeliness

Copyright Sid Adelman, 2007 51


Availability Requirements
 How many hours and days does the system
need to be available?
 What is the availability requirement during
scheduled hours?

Copyright Sid Adelman, 2007 52


Historical Requirements
 How far back to keep the data
 How detailed does old data need to be?
 Impact of code changes and organizational
changes over time

Copyright Sid Adelman, 2007 53


Organization – Best Practices
 Establish the appropriate organization for
your enterprise
 Enumerate roles and responsibilities
 Gain concurrence for roles and
responsibilities
– Management
– Those performing the functions

Copyright Sid Adelman, 2007 54


Module 4 Workshop
Organization

Copyright Sid Adelman, 2007 55


Module 5 Security & Privacy
 Categorization for security
 Responsibility for determining
 Mechanism for establishing procedures
 Security audit
 Regulatory issues
 Data sharing

Copyright Sid Adelman, 2007 56


Categorization for
Security/Privacy
 Does all data have the same
security/privacy requirements?
 Who determines security/privacy
requirements of data?
 What are the regulatory requirements for
security and privacy?
 Does your organization have a Security
Office? What authority do they have?

Copyright Sid Adelman, 2007 57


Responsibility
 Security Office
 Internal auditors?
 Data Owners
 Responsibility for administering
 Testing security and privacy

Copyright Sid Adelman, 2007 58


Mechanism for Establishing
Procedures
 Security requirements
– Internal
– Regulatory
 Tools that implement security
 Communicating security requirements to
those who implement

Copyright Sid Adelman, 2007 59


Security Audit
 Validating procedures
 Validating training
 Testing and probing
 Recommending mitigation
 Frequency of audits

Copyright Sid Adelman, 2007 60


Regulatory Issues
 Health Care – HIPPA
 Finance
 Brokerage - SEC
 Insurance
 Media – FCC

Copyright Sid Adelman, 2007 61


Data Sharing
 Inhibitors
 Motivation/incentives to share
 Management directives on sharing

Copyright Sid Adelman, 2007 62


Inhibitors
 Power
 Fear of others
 Fear of boss micromanaging

Copyright Sid Adelman, 2007 63


Motivation/incentives to share
 Are there any?

Copyright Sid Adelman, 2007 64


Management Direction on
Sharing
 Direction to share must come from the
CEO
– Need to know
– Reason for withholding access must be
documented
– Access only given when directed

Copyright Sid Adelman, 2007 65


Security & Privacy – Best
Practices
 Raise the consciousness of security and
privacy requirements
 Connect with your Security Office
 Determine security capabilities of tools
 Assign responsibilities
 Test and validate

Copyright Sid Adelman, 2007 66


Module 5 Workshop
Security & Privacy

Copyright Sid Adelman, 2007 67


Module 6 Business Intelligence
 Goals and Objectives
 Architecture
 Data Mining
 Tools
 Methodology

Copyright Sid Adelman, 2007 68


Goals and Objectives
 Why have a data warehouse?
 Have goals and objectives been identified
 Have they been communicated?
 Are they measured post-implementation

Copyright Sid Adelman, 2007 69


Architecture
 Platform
 Tools/products
 How the data flows

Copyright Sid Adelman, 2007 70


Data Mining
 Discovery versus hypothesis testing
 Different tools
 Different people mining the data

Copyright Sid Adelman, 2007 71


Tools
 RDBMS
 Data Modeling
 ETL
 Access and Analysis
 Data quality (Cleansing)
 Measurement

Copyright Sid Adelman, 2007 72


Methodology
 Spiral versus waterfall
 Phasing more appropriate
 Tasks more difficult to estimate

Copyright Sid Adelman, 2007 73


Business Intelligence – Best
Practices
 Set goals and objectives
 Set expectations early and often
 Establish cost justification
 Find a terrific sponsor

Copyright Sid Adelman, 2007 74


Module 6 Workshop
Business Intelligence

Copyright Sid Adelman, 2007 75


Module 7 Information Integration
 Integrating business data
 Data redundancy
 Different RDBMSs and their impact
 Data migration

Copyright Sid Adelman, 2007 76


Integrating Business Data
 Understanding the customer
 ERPs
 Supply chain

Copyright Sid Adelman, 2007 77


Data Redundancy
 Goal to reduce data redundancy?
 Inconsistent data
 Single version of the truth
 Cost of data redundancy

Copyright Sid Adelman, 2007 78


Different RDBMSs & Their
Impact
 More interface programs
 Less depth in DBA pool
 More product expense
 Integration problems
 Less optimizer capability

Copyright Sid Adelman, 2007 79


Data Migration +
 Should data be dropped?
 Should data be converted?
 Should data be integrated/consolidated?

Copyright Sid Adelman, 2007 80


Should Data be Dropped?
 Is it even being used?
 What’s the cost of maintaining this data?
 Could another database be used in its
place?
 Any political issues?
 Any regulatory issues?

Copyright Sid Adelman, 2007 81


Should Data be Migrated?
 Can we consolidate RDBMSs?
 What is the cost of migration?
 What is the impact on other systems?

Copyright Sid Adelman, 2007 82


Should Data be
Integrated/Consolidated?
 Why do we want to integrate/consolidate?
 Costs of integration/consolidation
 Savings of integration/consolidation
 Political issues
 Regulatory issues

Copyright Sid Adelman, 2007 83


Information Integration – Best
Practices
 Determine information integration benefits
and costs
 Sell information integration to management
 Establish and execute priorities

Copyright Sid Adelman, 2007 84


Module 7 Workshop
Information Integration

Copyright Sid Adelman, 2007 85


Module 8 Software/Products
 RDBMS
 Tools/utilities
 Organization standards for products
 Criteria for selection
 Responsibility for Selection
 Single vendor/best of breed
 Deals/Negotiation
 Relationship with vendors
 Application packages

Copyright Sid Adelman, 2007 86


RDBMS
 Which RDBMS is the standard
 Relation to platform
 What applications is it being used for

Copyright Sid Adelman, 2007 87


RDBMS Choices
 IBM (DB2, IMS, Informix)
 Microsoft (SQL Server)
 Oracle
 Sybase
 Teradata

Copyright Sid Adelman, 2007 88


Why standardize the RDBMS?
 Minimize the number of RDBMSs
 Less training required
 More leverage on RDBMS vendor
 Flexible assignments
 Fewer interface problems
 Fewer interface programs

Copyright Sid Adelman, 2007 89


Relation to platform
 RDBMS performance impacted by
platform
 Platform may dictate (or strongly
recommend) RDBMS choice
 Which decision comes first?

Copyright Sid Adelman, 2007 90


What application is RDBMS
being used for
 Operational/OLTP
 Data Warehouse/Business Intelligence

Copyright Sid Adelman, 2007 91


Tools/Utilities
 Platform dependent
 RDBMS dependent
 Expensive
 33% on the shelf
 Lots of product duplication
 Necessary?

Copyright Sid Adelman, 2007 92


Organization Standards for
Products
 Who sets standards?
 Are the standards known?
 Are they standards or guidelines?
 Who can give dispensation?

Copyright Sid Adelman, 2007 93


Criteria for Selection
 Need
 Cost
 Vendor
– Support
– Reputation
– Financial stability

Copyright Sid Adelman, 2007 94


Responsibility for Selection
 Technical evaluators
 Strategic architect
 Management

Copyright Sid Adelman, 2007 95


Single Vendor vs Best of Breed
 Single vendor
– Possibly a better relationship
– Leverage
– Not always the best products
– Products should all work together
 Best-of-breed
– Need to integrate yourself
– Finger pointing when problems
– Potential incompatibilities

Copyright Sid Adelman, 2007 96


Deals/Negotiations
 Have someone else negotiate
 Don’t let vendor know you have chosen
them before you negotiate
 www.dobetterdeals.com (Joe Auer –
ComputerWorld)

Copyright Sid Adelman, 2007 97


Relationship with Vendors
 Partnerships
 Money Issues
 Support
 Conferences
 Being a reference

Copyright Sid Adelman, 2007 98


Databases Required by the
Application Packages
 Packages do not support all RDBMSs
 Packages do not support all RDBMSs
equally well
 Does preferred RDBMS violate
organization standard
 Are support personnel (DBAs) available?

Copyright Sid Adelman, 2007 99


Impact of Package
 Machine Requirements
 Performance
 Availability

Copyright Sid Adelman, 2007 100


Software – Best Practices
 Determine real requirements
 Establish software standards
 Make use of existing software whenever
possible
 Talk to organizations who are using the
products

Copyright Sid Adelman, 2007 101


Module 8 Workshop
Software/Products

Copyright Sid Adelman, 2007 102


Module 9 – Performance and
Measurement
 Categorization for performance
 Capacity Planning
 Monitoring/Measuring
 Service Level Agreements
 Tuning
 Roles and Responsibilities
 Reporting performance

Copyright Sid Adelman, 2007 103


Categorization for Performance
 How good does response time need to be?
 How does it differ from application to
application?
 What is the cost-benefit of excellent
response time?
 Were performance considerations included
in the architecture?

Copyright Sid Adelman, 2007 104


Categorization for Availability
 Scheduled hours (24 X 7, 18 X 6,…)
 Availability during scheduled hours
 How does it differ from system to system?
 Is excellent availability cost justified?
 Was availability included in the
architecture?

Copyright Sid Adelman, 2007 105


Capacity Planning
 Database size
 Number of users
 Number of transactions
 Number of queries/reports
 Time and day of usage
 Complexity of transactions/queries/reports
 Proactive response to capacity increase

Copyright Sid Adelman, 2007 106


Monitoring/Measuring
 Response time
 Resource utilization (CPU, disk access,
network)
 Who is using the system
 When is the system being used
 Chargebacks

Copyright Sid Adelman, 2007 107


Service Level Agreements
 Response time
 Availability
– Schedule hours (hours/day, days/week)
– Availability during scheduled hours
 Timeliness of data
 Response to problems
 Response to new requests

Copyright Sid Adelman, 2007 108


Tuning
 Awareness of problems – measurement
tools and responsibilities
 Tuning capability of platform, RDBMS,
tools
 Responsibility for tuning

Copyright Sid Adelman, 2007 109


Roles and Responsibilities
 DBA - RDBMS
 Application performance
 Systems programmer – operating system
 System Architect
 Capacity Planner
 Performance testing

Copyright Sid Adelman, 2007 110


Reporting performance
 IT
– Who needs to take action
– Who needs to see reports/alerts
 Business
– Matching project agreements
– Expectations

Copyright Sid Adelman, 2007 111


Measurement Tools
 Performance
 Usage
 Resource utilization
 Network

Copyright Sid Adelman, 2007 112


Measurement Usage
 What do you do with the performance
measurement information?

Copyright Sid Adelman, 2007 113


Reporting to Management
 High level (not detailed)
 Problems, aberrations
 Frequency
 Form (tables, charts, graphs)

Copyright Sid Adelman, 2007 114


Service Level Agreements
 Response time
 Availability
 Who establishes agreements?
 What’s realistic?
 Incentives to meet SLAs

Copyright Sid Adelman, 2007 115


Performance & Measurement –
Best Practices
 Determine what is advantageous to measure
 Assign responsibilities
 Designate tools for measurement
 Report metrics to management

Copyright Sid Adelman, 2007 116


Module 9 Workshop
Performance & Measurement

Copyright Sid Adelman, 2007 117


Overall Data Strategy Best Practices
 Don’t get into the details too soon
 Don’t be seen as a theorist -- your actions
must be pragmatic
 Don’t lead with long-term deliverables
 Don’t commit more than you can deliver
 Avoid unproven technology

Copyright Sid Adelman, 2007 118


How to Implement a Data
Strategy
 Conduct a data environment assessment
 Establish a target data environment
 Develop an implementation plan
 Sell Data Strategy within the organization
 Evaluate progress and justify your
existence
 Revisit the plan

Copyright Sid Adelman, 2007 119


Summary
 Pitch the importance of a data strategy to
your CIO and CTO
 Ask to either lead the effort or to be a
permanent member of the team

Copyright Sid Adelman, 2007 120

You might also like