Module 1 – Introduction to Data Strategy Components of a data strategy Why have a data strategy Do these problems exist in your organization? Gain control Support the IT strategy Data in the Dark Ages Enlightened data strategy Critical success factors How to implement a data strategy Best Practices
Copyright Sid Adelman, 2007 3
Components of a Data Strategy + RDBMS - Relational Database Management System Data Quality Metadata Performance Data Distribution Organization Data Ownership
Copyright Sid Adelman, 2007 4
Components of a Data Strategy + Security and Privacy Total Cost of Ownership Subject area databases Data modeling Data sharing Business Intelligence Information integration
Copyright Sid Adelman, 2007 5
Components of a Data Strategy + Legacy/operational data Standards Data migration Application packages Software/products Personal/departmental databases
Copyright Sid Adelman, 2007 6
Components of a Data Strategy Categorization of data Communicating and selling the data strategy Measurement
Copyright Sid Adelman, 2007 7
Why Have a Data Strategy Capitalize on the data asset Support the IT Strategy Gain control
Copyright Sid Adelman, 2007 8
Do these problems exist in your organization? + Uncontrolled redundant data Data not easily accessible by the user Lack of knowledge of available data Poor data quality Each new application designs, builds and populates it own data base Inconsistent reports
Copyright Sid Adelman, 2007 9
Do these problems exist in your organization? Private databases No central meta data repository Management unclear on the importance of data No responsibility for data Data standards non existent, not understood or not followed
Copyright Sid Adelman, 2007 10
Gain Control Consistent security implementation Understand, define and assign ownership Understand, define and assign stewardship Minimize redundancy Inventory data Develop consistent terminology
Copyright Sid Adelman, 2007 11
Support the IT Strategy Provide departments, projects and personnel with guidelines for storing and accessing data Minimize the number of RDBMSs Establish, disseminate and maintain standards for shared data resources Deliver a high level of service – performance – Availability – response time – responsiveness to user requests
Copyright Sid Adelman, 2007 12
Data in the Dark Ages Data is kept locked by each application or department Users do not trust the data Data is not well understood either by users or by IT Data is difficult to access Senior Management does not understand the value of data
Copyright Sid Adelman, 2007 13
Enlightened Organization Data is shared Users trust the accuracy of the data Data is inventoried and terminology is clear Data is easily accessed by IT and by the users Senior Management view data as an asset that is critical to the organization and to decision making
Copyright Sid Adelman, 2007 14
Critical Success Factors Data Strategy supports IT plans Quality data Support of legacy data Support of development efforts Infrastructure – Organization – Skills – Tools Achieve short-term successes
Copyright Sid Adelman, 2007 15
How to Implement a Data Strategy Data environment assessment Establish a target data environment Develop an implementation plan Sell Data Strategy within the organization Evaluate progress and justify your existence Revisit the plan
Copyright Sid Adelman, 2007 16
Best Practices Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology
Copyright Sid Adelman, 2007 17
Module 1 Workshop Assessment of Existing Organization
Copyright Sid Adelman, 2007 18
Module 2– Data Quality Management Support Evaluation/Diagnosis Timeliness ETL Validation Prioritization - Which Data to Clean First Cost of Cleansing Responsibility for Data Quality
Copyright Sid Adelman, 2007 19
Management Support Management awareness of importance of data quality Cost justification of data quality initiative Ongoing commitment Finding a business management sponsor
Copyright Sid Adelman, 2007 20
Evaluation/Diagnosis
Which source data is most correct
Valid values (domains) Business rules Data types (e.g., hex, packed decimal) Completeness Inappropriate defaults Fields used for multiple purposes Accuracy Quality of historical data
Copyright Sid Adelman, 2007 21
Data Timeliness Currency of data, e.g., last Friday Frequency of update, e.g., daily, weekly, monthly, quarterly User awareness – how will the users know?
Copyright Sid Adelman, 2007 22
ETL Validation Validation of ETL process Tie-outs – Number of records – Dollar matching – Quantitative matching Automatic versus manual checking Referential integrity?
Copyright Sid Adelman, 2007 23
Triage - Prioritization Which data to clean Justification for cleansing Ease of cleansing Possibility of cleansing Political support for cleansing
Copyright Sid Adelman, 2007 24
Cost of Cleansing Automatic versus manual – Tools to perform automatic cleansing – Effort to support use of tools Use of defaults Knowledge/experience of those performing manual cleansing
Copyright Sid Adelman, 2007 25
Responsibility for Data Quality “It’s not enough to say that data quality is everyone’s responsibility.” Data Quality Administrator Ongoing commitment Data ownership responsibility Operational versus data warehouse responsibility
Copyright Sid Adelman, 2007 26
Data Quality – Best Practices Inventory the quality of your data Sell the importance of data quality to management Assign data quality responsibility Triage the cleansing process
Copyright Sid Adelman, 2007 27
Module 2 Workshop Data Quality
Copyright Sid Adelman, 2007 28
Module 3– Metadata Management Support Meta Data as the Keystone Which Metadata to Capture Responsibility for Capture Responsibility for Maintenance Business Metadata Technical Metadata How will Metadata be Used Data Inventory
Copyright Sid Adelman, 2007 29
Metadata – Management Support IT and the Business Management understanding of the importance of metadata Impact on project schedules Long term benefit of metadata Importance for operational and data warehouse
Copyright Sid Adelman, 2007 30
Metadata as the Keystone Single version of the truth It’s the inventory of information Tears down dysfunctional information fiefdoms Opportunities to reduce redundancy Opportunities for integration
Copyright Sid Adelman, 2007 31
Which Metadata to Capture Don’t boil the ocean What meta data is valuable Ease and cost of capture Political issues relating to capture
Copyright Sid Adelman, 2007 32
Responsibility for Capturing Metadata Incentive for capturing Management direction Automatic and manual
Copyright Sid Adelman, 2007 33
Responsibility for Metadata Maintenance Where does Metadata Repository maintenance report? Why is maintenance important? Long-term commitment
Copyright Sid Adelman, 2007 34
Business Metadata Business definitions Source of data How data was derived (algorithms) Lineage (data genealogy) Timeliness Security Ownership Quality
Copyright Sid Adelman, 2007 35
Technical Metadata Field name Database Data type Source Length
Copyright Sid Adelman, 2007 36
How Will Metadata be Captured Data modeling tools ETL tool Access and analysis tool Metadata Repository tool Data dictionary Copybooks Home grown application
Copyright Sid Adelman, 2007 37
How Will Metadata be Used Business – Understanding the data – Understanding the meaning of results – Avoiding incorrect conclusions IT – Research – Impact analysis – Tool interchange
Copyright Sid Adelman, 2007 38
Inventory Where is the data? How and where is it used? Quality of data Redundancy Ownership Documentation
Copyright Sid Adelman, 2007 39
Metadata – Best Practices Determine which meta data to capture and use Determine how the tools will capture and use metadata Sell management on the importance Assign metadata responsibility
Copyright Sid Adelman, 2007 40
Module 3 Workshop Metadata
Copyright Sid Adelman, 2007 41
Module 4 Organization – Data- related Roles & Responsibilities
Database Administrator Data Administrator Data Quality Administrator Security Architect Data ownership
Copyright Sid Adelman, 2007 42
Database Administrator Database design Backup and recovery Reorganization Monitoring Tuning Index creation
Copyright Sid Adelman, 2007 43
Data Administrator Data modeling Source data evaluation Enterprise data integration Data quality analysis Metadata responsibility
Copyright Sid Adelman, 2007 44
Data Quality Administrator Uncovering data quality problems Communicating data quality problems ETL verification Responsibility for some cleansing
Copyright Sid Adelman, 2007 45
Security
Responsibility for who can do what to the
data – Data access – Data create/update/delete Working with those administering the tools that have security capabilities
Copyright Sid Adelman, 2007 46
Architect Knowing what the enterprise needs Evaluating technical options Developing an appropriate architecture Selling the architecture
Copyright Sid Adelman, 2007 47
Data Ownership + Creation Access Determine requirements for performance Determine requirements for availability Determine historical requirements
Copyright Sid Adelman, 2007 48
Creation Data Entry process – Training – Incentives for quality Quality of data Data edits
Copyright Sid Adelman, 2007 49
Access Need to know Opt in/Opt out Level of granularity By department By role External access by people outside the organization
Copyright Sid Adelman, 2007 50
Performance Requirements Response time What is excellent response time worth? Timeliness
Copyright Sid Adelman, 2007 51
Availability Requirements How many hours and days does the system need to be available? What is the availability requirement during scheduled hours?
Copyright Sid Adelman, 2007 52
Historical Requirements How far back to keep the data How detailed does old data need to be? Impact of code changes and organizational changes over time
Copyright Sid Adelman, 2007 53
Organization – Best Practices Establish the appropriate organization for your enterprise Enumerate roles and responsibilities Gain concurrence for roles and responsibilities – Management – Those performing the functions
Copyright Sid Adelman, 2007 54
Module 4 Workshop Organization
Copyright Sid Adelman, 2007 55
Module 5 Security & Privacy Categorization for security Responsibility for determining Mechanism for establishing procedures Security audit Regulatory issues Data sharing
Copyright Sid Adelman, 2007 56
Categorization for Security/Privacy Does all data have the same security/privacy requirements? Who determines security/privacy requirements of data? What are the regulatory requirements for security and privacy? Does your organization have a Security Office? What authority do they have?
Copyright Sid Adelman, 2007 57
Responsibility Security Office Internal auditors? Data Owners Responsibility for administering Testing security and privacy
Copyright Sid Adelman, 2007 58
Mechanism for Establishing Procedures Security requirements – Internal – Regulatory Tools that implement security Communicating security requirements to those who implement
Copyright Sid Adelman, 2007 59
Security Audit Validating procedures Validating training Testing and probing Recommending mitigation Frequency of audits
Copyright Sid Adelman, 2007 60
Regulatory Issues Health Care – HIPPA Finance Brokerage - SEC Insurance Media – FCC
Copyright Sid Adelman, 2007 61
Data Sharing Inhibitors Motivation/incentives to share Management directives on sharing
Copyright Sid Adelman, 2007 62
Inhibitors Power Fear of others Fear of boss micromanaging
Copyright Sid Adelman, 2007 63
Motivation/incentives to share Are there any?
Copyright Sid Adelman, 2007 64
Management Direction on Sharing Direction to share must come from the CEO – Need to know – Reason for withholding access must be documented – Access only given when directed
Copyright Sid Adelman, 2007 65
Security & Privacy – Best Practices Raise the consciousness of security and privacy requirements Connect with your Security Office Determine security capabilities of tools Assign responsibilities Test and validate
Copyright Sid Adelman, 2007 66
Module 5 Workshop Security & Privacy
Copyright Sid Adelman, 2007 67
Module 6 Business Intelligence Goals and Objectives Architecture Data Mining Tools Methodology
Copyright Sid Adelman, 2007 68
Goals and Objectives Why have a data warehouse? Have goals and objectives been identified Have they been communicated? Are they measured post-implementation
Copyright Sid Adelman, 2007 69
Architecture Platform Tools/products How the data flows
Copyright Sid Adelman, 2007 70
Data Mining Discovery versus hypothesis testing Different tools Different people mining the data
Copyright Sid Adelman, 2007 71
Tools RDBMS Data Modeling ETL Access and Analysis Data quality (Cleansing) Measurement
Copyright Sid Adelman, 2007 72
Methodology Spiral versus waterfall Phasing more appropriate Tasks more difficult to estimate
Copyright Sid Adelman, 2007 73
Business Intelligence – Best Practices Set goals and objectives Set expectations early and often Establish cost justification Find a terrific sponsor
Copyright Sid Adelman, 2007 74
Module 6 Workshop Business Intelligence
Copyright Sid Adelman, 2007 75
Module 7 Information Integration Integrating business data Data redundancy Different RDBMSs and their impact Data migration
Copyright Sid Adelman, 2007 76
Integrating Business Data Understanding the customer ERPs Supply chain
Copyright Sid Adelman, 2007 77
Data Redundancy Goal to reduce data redundancy? Inconsistent data Single version of the truth Cost of data redundancy
Copyright Sid Adelman, 2007 78
Different RDBMSs & Their Impact More interface programs Less depth in DBA pool More product expense Integration problems Less optimizer capability
Copyright Sid Adelman, 2007 79
Data Migration + Should data be dropped? Should data be converted? Should data be integrated/consolidated?
Copyright Sid Adelman, 2007 80
Should Data be Dropped? Is it even being used? What’s the cost of maintaining this data? Could another database be used in its place? Any political issues? Any regulatory issues?
Copyright Sid Adelman, 2007 81
Should Data be Migrated? Can we consolidate RDBMSs? What is the cost of migration? What is the impact on other systems?
Copyright Sid Adelman, 2007 82
Should Data be Integrated/Consolidated? Why do we want to integrate/consolidate? Costs of integration/consolidation Savings of integration/consolidation Political issues Regulatory issues
Copyright Sid Adelman, 2007 83
Information Integration – Best Practices Determine information integration benefits and costs Sell information integration to management Establish and execute priorities
Copyright Sid Adelman, 2007 84
Module 7 Workshop Information Integration
Copyright Sid Adelman, 2007 85
Module 8 Software/Products RDBMS Tools/utilities Organization standards for products Criteria for selection Responsibility for Selection Single vendor/best of breed Deals/Negotiation Relationship with vendors Application packages
Copyright Sid Adelman, 2007 86
RDBMS Which RDBMS is the standard Relation to platform What applications is it being used for
Copyright Sid Adelman, 2007 87
RDBMS Choices IBM (DB2, IMS, Informix) Microsoft (SQL Server) Oracle Sybase Teradata
Copyright Sid Adelman, 2007 88
Why standardize the RDBMS? Minimize the number of RDBMSs Less training required More leverage on RDBMS vendor Flexible assignments Fewer interface problems Fewer interface programs
Copyright Sid Adelman, 2007 89
Relation to platform RDBMS performance impacted by platform Platform may dictate (or strongly recommend) RDBMS choice Which decision comes first?
Copyright Sid Adelman, 2007 90
What application is RDBMS being used for Operational/OLTP Data Warehouse/Business Intelligence
Copyright Sid Adelman, 2007 91
Tools/Utilities Platform dependent RDBMS dependent Expensive 33% on the shelf Lots of product duplication Necessary?
Copyright Sid Adelman, 2007 92
Organization Standards for Products Who sets standards? Are the standards known? Are they standards or guidelines? Who can give dispensation?
Copyright Sid Adelman, 2007 93
Criteria for Selection Need Cost Vendor – Support – Reputation – Financial stability
Single Vendor vs Best of Breed Single vendor – Possibly a better relationship – Leverage – Not always the best products – Products should all work together Best-of-breed – Need to integrate yourself – Finger pointing when problems – Potential incompatibilities
Copyright Sid Adelman, 2007 96
Deals/Negotiations Have someone else negotiate Don’t let vendor know you have chosen them before you negotiate www.dobetterdeals.com (Joe Auer – ComputerWorld)
Copyright Sid Adelman, 2007 97
Relationship with Vendors Partnerships Money Issues Support Conferences Being a reference
Copyright Sid Adelman, 2007 98
Databases Required by the Application Packages Packages do not support all RDBMSs Packages do not support all RDBMSs equally well Does preferred RDBMS violate organization standard Are support personnel (DBAs) available?
Copyright Sid Adelman, 2007 99
Impact of Package Machine Requirements Performance Availability
Copyright Sid Adelman, 2007 100
Software – Best Practices Determine real requirements Establish software standards Make use of existing software whenever possible Talk to organizations who are using the products
Copyright Sid Adelman, 2007 101
Module 8 Workshop Software/Products
Copyright Sid Adelman, 2007 102
Module 9 – Performance and Measurement Categorization for performance Capacity Planning Monitoring/Measuring Service Level Agreements Tuning Roles and Responsibilities Reporting performance
Copyright Sid Adelman, 2007 103
Categorization for Performance How good does response time need to be? How does it differ from application to application? What is the cost-benefit of excellent response time? Were performance considerations included in the architecture?
Copyright Sid Adelman, 2007 104
Categorization for Availability Scheduled hours (24 X 7, 18 X 6,…) Availability during scheduled hours How does it differ from system to system? Is excellent availability cost justified? Was availability included in the architecture?
Copyright Sid Adelman, 2007 105
Capacity Planning Database size Number of users Number of transactions Number of queries/reports Time and day of usage Complexity of transactions/queries/reports Proactive response to capacity increase
Copyright Sid Adelman, 2007 106
Monitoring/Measuring Response time Resource utilization (CPU, disk access, network) Who is using the system When is the system being used Chargebacks
Copyright Sid Adelman, 2007 107
Service Level Agreements Response time Availability – Schedule hours (hours/day, days/week) – Availability during scheduled hours Timeliness of data Response to problems Response to new requests
Copyright Sid Adelman, 2007 108
Tuning Awareness of problems – measurement tools and responsibilities Tuning capability of platform, RDBMS, tools Responsibility for tuning
Copyright Sid Adelman, 2007 109
Roles and Responsibilities DBA - RDBMS Application performance Systems programmer – operating system System Architect Capacity Planner Performance testing
Copyright Sid Adelman, 2007 110
Reporting performance IT – Who needs to take action – Who needs to see reports/alerts Business – Matching project agreements – Expectations
Measurement Usage What do you do with the performance measurement information?
Copyright Sid Adelman, 2007 113
Reporting to Management High level (not detailed) Problems, aberrations Frequency Form (tables, charts, graphs)
Copyright Sid Adelman, 2007 114
Service Level Agreements Response time Availability Who establishes agreements? What’s realistic? Incentives to meet SLAs
Copyright Sid Adelman, 2007 115
Performance & Measurement – Best Practices Determine what is advantageous to measure Assign responsibilities Designate tools for measurement Report metrics to management
Copyright Sid Adelman, 2007 116
Module 9 Workshop Performance & Measurement
Copyright Sid Adelman, 2007 117
Overall Data Strategy Best Practices Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology
Copyright Sid Adelman, 2007 118
How to Implement a Data Strategy Conduct a data environment assessment Establish a target data environment Develop an implementation plan Sell Data Strategy within the organization Evaluate progress and justify your existence Revisit the plan
Copyright Sid Adelman, 2007 119
Summary Pitch the importance of a data strategy to your CIO and CTO Ask to either lead the effort or to be a permanent member of the team