Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

What would be considered data management best practices?

Best practices in data management are entirely dependent on the organization. For example, organizations that are focused primarily on the purchase of packaged solutions would have different objectives with regard to data management as compared to very large organizations with a substantial amount of custom developed applications, both legacy and newly integrated. For the most part, data managements objective is to standardize and reuse accurate data, focusing on the set of data that is clearly "corporate" in that it does not change across departments and applications. Examples of such data are "customer data" (customer ID, customer name, etc.), "employee data" (employee name, SS number, job title, etc.), "product data" and so on.

Other best practices include:


Data stewardship and ownership Enforcement of data quality Integration of data across enterprise Naming standards and consistency Required use of data models

Master data
Master data is information that is key to the operation of a business. It is the primary focus of the Information Technology (IT) discipline of Master Data Management (MDM), and can include reference data. This key business information may include data about customers, products, employees, materials, suppliers, and the like. While it is often non-transactional in nature, it is not limited to non-transactional data, and often supports transactional processes and operations. For example, analysis and reporting is greatly dependent on an organization's master data.

Master data management


Master Data Management (MDM) comprises a set of processes, governance, policies, standards and tools that consistently defines and manages the master data (i.e. non-transactional data entities) of an organization (which may include reference data).

An MDM tool can be used to support Master Data Management by removing duplicates, standardizing data (Mass Maintaining), incorporating rules to eliminate incorrect data from entering the system in order to create an authoritative source of master data. Master data are the products, accounts and parties for which the business transactions are completed
MDM has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information. One of the most common reasons some large corporations experience massive issues with MDM is growth through mergers or acquisitions. Two organizations which merge will typically create an entity with duplicate master data

Topics in Data Management, grouped by the DAMA DMBOK Framework,[3] include: 1. Data governance o Data asset o Data governance o Data steward 2. Data Architecture, Analysis and Design o Data analysis o Data architecture o Data modeling 3. Database Management o Data maintenance o Database administration o Database management system 4. Data Security Management o Data access o Data erasure o Data privacy o Data security 5. Data Quality Management o Data cleansing o Data integrity o Data enrichment o Data quality o Data quality assurance 6. Reference and Master Data Management o Data integration o Master data management o Reference data 7. Data Warehousing and Business Intelligence Management o Business intelligence o Data mart o Data mining o Data movement (extract, transform and load) o Data warehousing 8. Document, Record and Content Management o Document management system o Records management 9. Meta Data Management o Meta-data management o Metadata o Metadata discovery o Metadata publishing o Metadata registry 10. Contact Data Management o Business continuity planning o Marketing operations o Customer data integration o Identity management o Identity theft o Data theft o ERP software o CRM software o Address (geography) o Postal code o Email address o Telephone number

Data governance
Data governance encompasses the people, processes and technology required to create a consistent, enterprise view of an organisation's data in order to:

Increase consistency & confidence in decision making Decrease the risk of regulatory fines Improve data security Maximize the income generation potential of data Designate accountability for information quality

Data steward

a data steward is a person that is responsible for maintaining a data element in a metadata registry. A data steward may share some responsibilities with a data custodian. Data stewardship roles are common when organizations are attempting to exchange data precisely and consistently between computer systems and reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed.

Data Architecture, Analysis and Design


Data analysis

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facts and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes. Business intelligence covers data analysis that relies heavily on aggregation, focusing on business information.
data architecture data architecture is composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organizations Data architectures address data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc.

Data Quality Management


Data cleansing

Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data. After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
data integrity data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle,[1] and is an important feature of a database or RDBMS system. Data integrity means that the data contained in the database is accurate and reliable. Data Quality Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer.

Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.[1

Business intelligence (BI)


Business intelligence (BI) is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information for business purposes. BI

data mart
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.

Data mining
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD),[1] an interdisciplinary subfield of computer science,[2][3][4] is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.[2] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.[2] Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[2]

Data movement
In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs, which can include quality levels Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse)

data warehouse
a data warehouse or enterprise data warehouse (DW, DWH, or EDW) is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse are uploaded from the operational systems (such as marketing, sales etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before they are used in the DW for reporting. The typical ETL-based data warehouse uses staging, data integration, and access layers to house its key functions

The typical ETL-based data warehouse uses staging, data integration, and access layers to house its key functions. The staging layer or staging database stores raw data extracted from each of the disparate source data systems. The integration layer integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. The integrated data are then moved to yet another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. The combination of facts and dimensions is sometimes called a star schema. The access layer helps users retrieve data.[1] A data warehouse constructed from an integrated data source systems does not require ETL, staging databases, or operational data store databases. The integrated data source systems may be considered to be a part of a distributed operational data store layer. Data federation methods or data virtualization methods may be used to access the distributed integrated source data systems to consolidate and aggregate data directly into the data warehouse database tables. Unlike the ETL-based data warehouse, the integrated source data systems and the data warehouse are all integrated since there is no transformation of dimensional or reference data. This integrated data warehouse architecture supports the drill down from the aggregate data of the data warehouse to the transactional data of the integrated source data systems. Data warehouses can be subdivided into data marts. Data marts store subsets of data from a warehouse.

Metadata
The term metadata refers to "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types). Structural metadata is about the design and specification of

data structures and is more properly called "data about the containers of data"; descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description would be "data about data content" or "content about content" thus metacontent. Descriptive, Guide and the National Information Standards Organization concept of administrative metadata are all subtypes of metacontent

You might also like