Professional Documents
Culture Documents
WP3137 A DQ Survival Guide
WP3137 A DQ Survival Guide
Perhaps you recognize or have suffered at the hands of some of these problems. We
list and discuss them briefly here to establish a common understanding of what we face.
One thing that becomes apparent is data quality problems exacerbate each other. For
example, if you have duplicate records, some of the duplicates will not be reconciled if
crucial data elements such as addresses are incorrect or not standardized.
Data quality is crucial to knowing who your customers are and reaching them in an
effective manner. In order for marketing efforts to gain the greatest benefits, a clear
and single view of the customer is necessary. Without this view, contacts are made
with the wrong prospects and the right prospects either are missed or have multiple
touches that are confusing and costly. A goal of marketing is to cross-sell and up-sell
existing customer accounts, which is not achievable if the multiple accounts for the
same customer are not matched and consolidated into a single view. This is where
data quality plays a direct role in delivering value through marketing efforts.
Usually, the way an organization first experiences the need for data quality and
other EIM capabilities is when they build a CRM or customer data integration (CDI)
system, and find that the data they’ve loaded into the system is far less than expectations.
Throughout this paper, we use CRM as the surrogate for marketing data repositories
in general. Somewhere, somehow, customer and prospect data must be stored
and accessed, and CRM/CDI systems, whether homegrown or vendor-supplied,
are the common repositories for this data.
Your position within the marketing organization determines your visibility into the
data and the perceptions of the quality of that data. The higher in the organization,
the more removed a manager is from the data that drives their operations. A chief
marketing officer (CMO), for example, may be the person to ask the question, “How
do I know my data is defective?” The field-marketing specialist is most likely to
wonder, “I know the types of problems, I need the counts.” Fortunately, no matter
who is asking, the solution to both situations and other data integrity questions
is the same: Conduct a data quality assessment. Without the findings from an
assessment, you’ll have a number of issues to deal with:
• You won’t know the scope and depth of your problems. For example, are they
systemic or superficial?
• You won’t know the cause of the defects. Without knowing the types of problems,
you can’t track back in the process to isolate the source.
• You won’t know how effective the resulting cleansing operation was.
• The cleansing operation may very well miss whole categories of problems.
• You won’t be able to conduct trend analysis over time to see how your data is
regressing or progressing.
Many data quality problems are processed-based. That is, the problems result from
non-standard practices, no validation of data entry procedures, or just faulty
application design. The results of a data quality assessment often uncover process
issues as you work through the cause and effect. The assessment exposes the
effect, and it’s a relatively simple matter for the marketing manager to backtrack
through the data distribution chain to, for example, the account management system
and verify field edits are being used or enforced.
A data quality assessment is something marketing managers can do themselves,
especially if they have an assessment tool suitable for business users. Another
alternative is for the marketing manager to engage IT to conduct the assessment,
or even contract with a third-party information management-consulting firm.
Regardless, there is little mystery to conducting a data quality assessment. The
hardest part belongs with the business–that is, marketing–to articulate the business
rules that define good or bad. What are the rules that govern a specific field, such
as product name? For example, is it a mandatory field, can the field have abbreviations,
is there a maximum length, are special characters allowed, how many generations
The best place for direct marketers to cleanse their data is as close to the point of
creation as possible. Consider if you will an information supply chain where at the
very beginning the data is captured from the prospect, perhaps at a trade show or
from a Web site registration form. See Figure 1.
The farther upstream you start cleansing, the earlier the ROI counter starts ticking
Total Benefit
TRANSACTIONAL UPDATES
This opportunity fits well with organizations that take a proactive approach to data
cleansing. Organizations can identify the entry points of information into the
organization–in this case, during transactions, such as a new customer login or
order entry–and where exposure to flawed data may occur. When a transaction is
processed, organizations have an opening to validate the data before it is saved to an
operational system. Transactional updating also affords the chance to validate data
as it arrives in its information packet rich with contextual information. Since this
contextual setting is lost as soon as the data is sent down stream, it is therefore
important to leverage it.
By their very nature, transaction updates force organizations to handle individual
information packets as they become available, which implies real-time processing,
low volumes, and a potentially wide distribution of implementation. In other words,
the cleansing functionality must be connected to or embedded in the transactional
environment and be able to respond in milliseconds, and also be able to service
multiple transactional applications.
PURCHASED DATA
The third opportunity to cleanse is when you purchase data from a third party. Many
organizations erroneously assume data to be clean when purchased. Not so. Buying
third-party data is in many ways like buying a used car. Do you really know what the
previous owner has done to it? Of course not, that’s why you take the car to your
mechanic to have him pop the hood and put it on the hoist. You should do the same
thing with purchased data; otherwise, you are essentially abdicating your data quality
standards to those of the vendor.
In the case of a purchased list for a marketing campaign, you can ask for a random
sample from the prospect list and conduct your own data quality assessment.
Rudimentary tests for field completion and validation are simple to run. Validating
purchased data extends to matching the purchased data against your current data
set. The merging of two clean data sets is the equivalent of pouring a gallon of red
paint into blue. A merge will not equate to 1+1 = 2, but is more like 1.5. The reason
being duplication between data sets, and the duplication may not be easily recon-
ciled. Two records may appear the same, but one record might have a crucial field
that is different. The merged data sets must be matched and consolidated as one
new, entirely different set to ensure continuity. A hidden danger with purchased data
is it occurs as an ad hoc event, which means no regular process (a cleansing job
with business rules) exists to incorporate the data into an existing system. The lack
of regularly occurring processes raises the specter that in the rush to get the file
loaded, “expedient” shortcuts may be taken.
REGULAR MAINTENANCE
The fifth opportunity is during regular maintenance. Even if an organization starts
with perfect data today, tomorrow it will be flawed. Data ages–and ages more quickly
that most expect. For example, 17% of U.S. households move each year, and in some
years, as many as 60% of phone records change in some way. Moreover, every day
people get married, divorced, have children, have birthdays, get new jobs, get promoted,
and change titles. And if that wasn’t enough, the companies we work for start up, go
bankrupt, merge, acquire, rename, and spin-off. To account for this irrevocable
aging process, organizations must implement regular data cleansing and consolidation
processes, be they nightly, weekly or monthly. The longer the interval between
regular data quality activities, the lower the overall value of your data.
Maintenance
Obsolete,
Home-grown Legacy CRM System
Call Center Migration
There are nine data quality functions marketers call upon to cleanse their data. As
shown below and depicted in Figure 4, in order of their occurrence in a data quality
project, those functions are:
1. Measure
2. Analyze
3. Identify (Parse)
4. Standardize
5. Correct
6. Enhance
7. Match
8. Consolidate
9. Monitor
For the record to be useful, it needs its various components identified and standardized–
as in changing corporation to corp and correcting divson to division. It then must
match and consolidate with the other records pulled from the source systems.
Measuring and analysis kick off the process by providing metadata as to the level
and types of defects found in the source data, so subsequent cleansing operations
can be tailored for the greatest effect.
The first six functions–including enhancement where additional data is appended
like demographic or geo codes–improve data to the point where it can be matched
and consolidated. Matching and consolidation is where a tremendous amount of value
is delivered to marketing in that duplicate records are eliminated, best of records
are built, and the manager now has a single view of each prospect or customer
within the context of the applied source data. Now able to build a corporate or
retail household for target marketing, the marketing manager can identify the top
20% of the customer base or form demographic groups for segmentation in the
next campaign.
Last, monitoring uses the business rules and definitions created in the measure and
analyze phases to create an automated profiling project that provides managers with
defect information (metadata) at any time, so they can make decisions as to whether
the data is good enough to use or needs to be improved for the next operation.
Once the marketing manager has determined the nature and scope of data problems
and has determined the data quality functionality needed, there are a number of
options available for connecting the functionality to the data. In broad terms, those
options are:
• On-premise software
• Service bureaus
These options range from having the greatest control and largest footprint (on-premise
software) to least control and no footprint (service bureau). From a cost basis one
might suspect that on-premise software would be the most expensive. However, the
cost of maintaining a level of data quality is not limited to the initial expenditure.
Consider that the data and its usage will extend as far into the future as the organization
remains in existence. Maintenance fees, per record charges (otherwise known as
click charges), or subscription fees over time will exceed initial software license
fees. What becomes important when calculating the cost of data quality processing
is the breadth and depth of functionality needed and the volume of records processed
each month.
ON-PREMISE SOFTWARE
Next to service bureaus, on-premise software is the oldest form of data quality delivery
mechanism. Early data quality software vendors such as Postalsoft began selling
and distributing on-premise data quality software in the mid 1980s. On-premise
software is simply that: a software application—either commercial or hand-coded—
that resides in your facility and is run by IT or the marketing staff. Sometimes the
software is run by third-party consulting or contracting agencies and can be operated
locally or remotely via a virtual private network (VPN) or Internet connection. The
advantages of on-premise software are you control the application, the parameter
settings, the computing environment, processing schedule, and so on. The disadvantage
is that your organization is responsible for all of the above. You need to have the
system resources, personnel, and training to run the software. For most firms, however,
the advantages far outweigh the disadvantages. Basically, if a firm has grown to the
size where it has any sort of customer data management system and an IT staff to
match, it usually has the capabilities to host and run data quality software internally.
ON-DEMAND
On-demand software is the newer form of what previously was known as an application
service provider, or ASP. With on-demand software, the marketing manager contracts
with a third–party service provider and accesses the contracted software via the
Internet. The advantage of on-demand is the service provider bears the complete
burden of installing, running, and maintaining the software at its own facilities. The
disadvantage is the user must trust the provider to safeguard any data that is
stored at the provider’s facility, and functionality offered by the provider may be
limited when compared to on-premise software. With Internet reliability constantly
improving, access to contract software is rarely a problem, and almost always the
user interface is Web-enabled and therefore accessible via an Internet browser. A
downside of on-demand for data quality processing is the customer’s data must be
uploaded to the service and then returned after cleansed. This round-tripping of data
adds latency to processing times. However, if the marketing manager is not inter-
ested in real-time processing, the added latency may be of little concern.
On-demand actually allows the marketing manager the option of creating a hybrid
solution. Look at address cleansing as an example. A firm may have 5 million
customer and prospect records, 80% of which have U.S. addresses, 15% have
European addresses, and the remaining 5% have Japanese addresses. At the
volumes and frequency of processing the firm averages per month, in addition to
SERVICE BUREAUS
Service bureau processing is yet one step further removed from when compared to
on-demand. With a service bureau a complete project including data file, processing
rules and delivery instructions is sent to a third party agency that processes the job
in batch with relatively little interaction with the customer. The advantage of a service
bureau is the marketing manager or their IT counterpart need not run any software,
either on-premise or on-demand. Once the initial effort is taken to establish the
contract and job requirements, the hard work is done. The files are delivered to the
service bureau and the customer awaits either their return or, in the case of a direct
mail/email campaign, the marketing pieces are sent to the prospects. The disadvantage
of using a service bureau is the client must trust the bureau to follow all the requirements
and perform the proper cleansing, and the work performed is largely on the service
bureau’s schedule. There are ways, of course, to validate that the bureau has complied
with all the requirements, and when negotiating the contract the customer can set
the desired delivery date of the finished product.
In the grand scheme of things, marketing managers have numerous options for ensuring
and uplifting the quality of their data. EIM provides a framework for deploying those
options together in one streamlined process flow. The EIM framework contains everything
from data integration–extract, transform, and load (ETL) or enterprise information
integration (EII)–through metadata management, data quality, and building specialized
reporting marts. Data integration applications are a primary deployment mechanism for
data quality functionality, which makes it convenient to cleanse data when it is being
moved into or out of your CRM/CDI system.
Today’s data quality vendors have built rich and deep functionality that can
remediate almost any customer data problem, and they’ve structured their deployment
mechanisms to give you the greatest flexibility in deciding when, where, and how
to cleanse data. On-premise, internal hosting, on-demand, service bureaus, or any
combination thereof are the options that can be tailored to infrastructure and
marketing needs. With the latitude of options available, there is really no reason
why suboptimal data should be used to deliver suboptimal results in your marketing
efforts, whether you’re identifying cross-sell opportunities or distributing leads to
the appropriate sales person. The question for marketing managers becomes: Why
marginalize your marketing efforts when better results lie in improving your data?
© 2008 Business Objects. All rights reserved. Business Objects owns the following U.S. patents, which may cover products that are offered and licensed by Business Objects: 5,555,403; 5,857,205;
6,289,352; 6,247,008; 6,490,593; 6,578,027; 6,831,668; 6,768,986; 6,772,409; 6,882,998; 7,139,766; 7,299,419; 7,194,465; 7,222,130; 7,181,440 and 7,181,435. Business Objects and the Business
Objects logo, BusinessObjects, Business Objects Crystal Vision, Business Process On Demand, BusinessQuery, Crystal Analysis, Crystal Applications, Crystal Decisions, Crystal Enterprise, Crystal Insider,
Crystal Reports, Desktop Intelligence, Inxight, the Inxight Logo, LinguistX, Star Tree, Table Lens, ThingFinder, Timewall, Let there be light, Metify, NSite, Rapid Marts, RapidMarts, the Spectrum Design, Web
Intelligence, Workmail and Xcelsius are trademarks or registered trademarks in the United States and/or other countries of Business Objects and/or affiliated companies. All other names mentioned herein
may be trademarks of their respective owners. May 2008 WP3137-A