Professional Documents
Culture Documents
Data Clacification Kit
Data Clacification Kit
Data Clacification Kit
Office of Information Technology and the Data Protection Subcommittee August 11, 2008
Preface
This document was created as a result of the research conducted by the Data Protection Subcommittee of the Multi-Agency CIO Advisory Council. Member agencies of the subcommittee include:
Administrative Services Aging Alcohol and Drug Addiction Attorney General Auditor of State Board of Regents Budget and Management Commerce Development Education EPA eTech Ohio Health Industrial Commission Insurance Lottery Commission Mental Health Mental Retardation and Developmental Disabilities Natural Resources OAKS Job and Family Services Rehab and Corrections OIT Public Safety PUCO Rehab Services Commission Taxation Transportation Workers Compensation Youth Services
For more information, contact: Statewide IT Policy Investment and Governance Division Ohio Office of Information Technology 30 East Broad Street, 39th Floor Columbus, Ohio 43215 Telephone: Facsimile: E-mail: 614-644-9352 614-644-9152 State.ITPolicy.Manager@oit.ohio.gov
August 2008
Overview
The purpose of data classification is to assign data ownership, identify and document security requirements, and then translate such requirements into security controls and implementation costs. Data classification is not just the act of designating or labeling data as confidential or critical; it involves close collaboration between business units and IT organizations to work through issues that go well beyond IT. This document is a compilation of resources to assist agencies in their data classification efforts. The resource kit consists of: A diagram that illustrates the data classification process. Data classification process is typically initiated by a triggering event; for example: intake of new data sets, a proposed new business application or infrastructure capability, or an incident requiring security remediation. The diagram helps frame the event by identifying the participants and input materials needed to begin the process, as well as the outcomes the group will need to work towards. A worksheet for this diagram is available as a separate document. A classification scale adapted to public-sector organizations and developed in the context of Ohio state government. The scale provides general parameters for low/moderate/high/extreme impact thresholds, and then relates these thresholds to the categories defined in Ohio IT Policy ITP-B.11, Data Classification. An account of an Ohio agencys experience with engaging their business units in data classification. The agency has created a worksheet and a set of process diagrams to help guide their data classifications. An account of the Ohio Department of Educations data classification experience using an ODE-developed tool called the Data Classification Meta Data Manager. This tool helps catalogue agency databases. Job descriptions are also included in this section for those individuals closely involved in the classification process, those who bridge the technical and business environments.
Table of Figures
Figure 1: Classification Process ...................................................................................... 3 Figure 2: Data Classification Activity Worksheet ............................................................. 5 Figure 3: Incident Impact Scale....................................................................................... 7 Figure 4: ITP-B.11, State IT Policy Confidentiality Labels ............................................... 8 Figure 5: ITP-B.11, State IT Policy Criticality Labels ....................................................... 8 Figure 6: Data Classification Process Diagram ............................................................... 9 Figure 7: Establish Confidentiality Subprocess Diagram............................................... 10 Figure 8: Establish Criticality Subprocess Diagram....................................................... 11 Figure 9: Data Classification Meta Data Manager Startup Screen ................................ 16 Figure 10: Data Classification Meta Data Manager Data Entry Screen......................... 16
Page 1
Overview
August 2008
Terms
The following terms are used throughout this document. Availability - The assurance that information and services are delivered when needed. Certain data must be available on demand or on a timely basis. Confidentiality - The assurance that information is disclosed only to those systems or persons who are intended to receive the information. Areas in which confidentiality may be important include nonpublic customer information, patient records, information about a pending criminal case, or infrastructure specifications. Data Coded representation of quantities, objects and actions. The word data is often used interchangeably with the word information in common usage. Data owner Individual or group responsible for classifying data and generating guidelines for its lifecycle management. Synonymous with information owner. Impact A combination of data confidentiality, integrity and availability. Whether a set of data is LOW, MEDIUM, HIGH, or of VERY HIGH impact will inform the criticality designation and whether or not the set should be considered sensitive data. Information Data processed into a form that has meaning and value to the recipient to support an action or decision. Information is often used interchangeably with data in common usage. Information owner Individual or group responsible for classifying data and generating guidelines for its lifecycle management. Synonymous with data owner. Integrity - The assurance that information is not changed by accident or through a malicious or otherwise criminal act. Because businesses, citizens and governments depend upon the accuracy of data in state databases, agencies must ensure that data is protected from improper change.
Page 2
Overview
August 2008
Participants Identify the principal stakeholders. Who is the business unit data owner? Who decides how the information is utilized in the business unit or whether it is shared with other organizations? Who can provide information on the handling requirements? Who maintains the data and the security controls protecting it? Both the business and IT sides of the organization will need to be adequately represented in the process. Over time, these roles will become more recognizable and allow for refinement of the agency classification process. Materials Gather any documentation on where data is maintained within the agency and how data is protected. Note that the scope of this task is beyond the IT organization. Has a risk assessment been performed for any of the agencys information systems? Are potential threats, vulnerabilities, and existing security controls already documented, such as in a Privacy Impact Assessment? Are there other (e.g., federal) data classification materials that might help inform the process? What federal or state laws inform the use of the data? What other materials were used in previous data classification events? Some materials will likely need to be reviewed by participants before meeting. Activities Once all stakeholders and relevant documentation have been identified, designate the most appropriate data owner: that is, someone who can determine data sensitivity and can also work with senior executives to determine data criticality. This is most often the unit that collects or uses the data. Ensure that existing security controls are consistently applied to data
Page 3
August 2008
using the classification scheme. Identify whether any data is improperly categorized or secured, or even if the data needs to be collected at all. Outcomes The ultimate goal of the data classification collaborative activity is to:
Establish and document the identification of ownership for a particular set of data. Ensure that the minimum amount of data is collected to support the business use. This goal may involve reducing the amount of data that the agency collects. Streamlining the data collection process not only reduces the amount of storage needed to maintain a data set, it also minimizes the amount of risk and liability the agency assumes in collecting and maintaining information. Assess whether appropriate security controls are in place, and whether requirements need to be defined for additional controls. Begin a data classification practice that can be repeated for other data sets.
The data classification activity will enable the IT part of the agency to have a clearer understanding of how data owners require their data to be handled during operations. ITs responsibility then becomes the proper administration of security controls.
Page 4
August 2008
Page 5
August 2008
Page 6
August 2008
Victims Welfare
None or slight
Serious injury or Loss of benefits or service, loss of life privacy, doctor-patient privilege, attorney-client privilege, trade secrets, IP Individual: loss of 1+ months Permanent loss of all income or income or benefits or benefits or services service Business: loss of 1+ months income Equal to 100% of Amount requires OIT or Controlling Board approval agency budget Amount requires OIT or Equal to 100% of Controlling Board approval agency budget Percent of agency budget Significant risk associated with a civil suit Stringent regulatory requirements Loss of public confidence Loss of legislature support/funding Loss of statutory authority
Individual: loss of 1 months income or benefits or service Business: loss of 1 months income < $25K (or within agency spending authority) < $25K (or within agency spending authority) Percent of agency budget Limited risk associated with a civil suit Limited regulatory requirements Intense media scrutiny Budget reductions Loss of political capital
None or slight
Reputation
None or slight
Below is a possible mapping of Impact Scale to State of Ohio Data Classification Labels after considering the probability associated with the occurrence of a particular security breach incident.
PUBLIC
Confidentiality Label
LOW
Criticality Label
Page 7
August 2008
When using the incident impact scale, consider the following aspects of the data set: Confidentiality if data is disclosed, how serious are the consequences? A public record disclosure will likely have a very low impact on an agency; a breach involving restricted data may incur significant costs or a loss of public confidence. Integrity if data is lost (i.e., deleted), what is the impact on agency business processes? What amount of resources would be necessary to recapture the data? Does this data need to be collected anyway? Availability if information is not available for a business transaction, what is the result?
Use the incident impact scale to inform the appropriate categories for data as required by ITPB.11, Data Classification. Each data set will require a confidentiality and criticality label from Figures 4 and 5. Figure 4: ITP-B.11, State IT Policy Confidentiality Labels
Confidentiality
PUBLIC Includes information that must be released under Ohio public records law or instances where an agency unconditionally waives an exception to the public records law. Applies to information that an agency may release if it chooses to waive an exception to the public records law and places conditions or limitations on such a release. Applies to information, the release of which is prohibited by state or federal law. This label also applies to records that an agency has discretion to release under public records law exceptions but has chosen to treat the information as highly confidential.
Criticality
LOW The loss of data integrity or availability would result in insignificant or no financial loss, legal liability, public distrust, or harm to public health and welfare. The loss of data integrity or availability would result in limited financial loss, legal liability, public distrust, or harm to public health and welfare. The loss of data integrity or availability would result in significant financial loss, legal liability, public distrust, or harm to public health and welfare. The loss of data integrity or availability would result in catastrophic financial loss, legal liability, public distrust, or harm to public health and welfare.
MEDIUM
HIGH
VERY HIGH
Page 8
August 2008
Page 9
August 2008
The subprocess shown in Figure 7 is used to determine the appropriate confidentiality label for a dataset. This elaborates on the Determine data classification labels step shown in Figure 6. Data is categorized as either public or non-public information. If the data is not PUBLIC, an additional set of criteria is applied to determine if access or other security controls are necessary. The process concludes in the assignment of a confidentiality label.
Page 10
August 2008
The subprocess shown in Figure 8 is used to determine the appropriate criticality label for a dataset. This elaborates on the Determine data classification labels step shown in Figure 6. Different forms of loss down time, reputation, money are used to determine the impact of losing data. The process concludes in the assignment of a LOW, MEDIUM or HIGH criticality rating.
Page 11
August 2008
Page 12
August 2008
Unit
Data Owner Document dataset(s), sequential files, data field(s), security matrix, external data feeds, etc. (attach listing if appropriate; large systems list generic naming convention/qualifier) Data Classification (see process flows for assistance) Determine data confidentiality class Restricted Limited Access Public Why: Determine data criticality class Very High High Medium Why: Data Usage Who accesses the data? What levels of access are applicable (view/update)? When was the last review of system users and their access levels? Is the same data stored/duplicated anywhere else? If yes, where? Can the data be consolidated? Is ALL of the data collected being used? Can any of the data be reduced or eliminated? Data Origin List Federal sources: List State sources: Other Agency System(s): Customer(s) Business Individual
Low
Page 13
August 2008
Data Distribution Does this data feed other internal system/business processes? List all that apply. Is this data shared with external agencies or other entities? List all that apply. If yes, what controls/safeguards are in place to protect this data when it leaves our systems? Data Retention How long is the data retained? Why? What is the business requirement for retention? What are the legal requirements for retention? Where is the data retained? On-line Off-line How is old data destroyed? Disaster Recovery Can the Business Unit function if the data is unavailable? If yes, for how long? Training Does training need to be updated? Disclosure Training. Describe changes needed: Security Awareness training Describe changes needed:
A meeting will be scheduled to formally discuss and classify the applicable system/data. Please come to the meeting prepared to provide as much applicable information as possible.
Page 14
August 2008
Business Unit
The agencys data classification effort began in 2004 with the development of a group of experienced business analysts involved in data management roles. This group was established to integrate data management into the agency program offices. Each of these analysts represents a major functional area within the agency: Center for Curriculum and Assessment Center for Operations Center for School Finance Center for School Improvement Center for Students, Families, and Communities Center for the Teaching Profession Superintendency/Chief of Staff/Policy Research and Accountability
Some functional areas have more than one analyst. A majority of these analysts are classified as Data Administration Managers 1 or 2, with one level 3 to manage the group (see appendixes for position descriptions). The group meets on a weekly basis. While the members are embedded in each of the functional business areas of the agency, they report through the Data Management Office, which is an element of the Information Technology Office.
Tool Development
With the development of the Information Security Program, the agency identified the need to have a tool to help catalog and classify the information being used by the agency. The first generation of the classification tool began as an MS Excel spreadsheet, though this format later proved too cumbersome. The agency also examined a university-developed application, though this tool did not meet the agency requirements. Finally, a decision was made to develop the application internally. Mini-data dictionaries from all the production databases were compiled into an MS Access database dubbed Data Classification Meta Data Manager. Database system tables (i.e., metadata) were determined to be out of scope for the effort and not included in the compilation. Once all table and field structures were absorbed into the Meta Data Manager, work began on identifying information owners for each dataset.
Page 15
Entry into the application begins through the screen shown in Figure 9. Figure 9: Data Classification Meta Data Manager Startup Screen
The user selects a particular database to work in. Actual data-entry occurs in the screen shown in Figure 10. Figure 10: Data Classification Meta Data Manager Data Entry Screen
Classification is table-based, i.e., users select a table within a database and then determine its classification and criticality. The currently selected database and table name appear in the upper-left, and the grid in the center displays each field contained inside the table. The set of controls in the upper-right permit the user to indicate: the Information Owner for the data set; the name of the Data Manager performing the most recent edit;
Page 16
the Classification assignment: PUBLIC, FOR INTERNAL USE (LIMITED USE), or RESTRICTED, as defined in Ohio IT Policy ITP-B.11, Data Classification; the Criticality assignment: LOW, MEDIUM, HIGH, or VERY HIGH, as defined in ITPB.11; and whether Personally Identifiable Information exists in the table.
Where an owner for a particular set of data cannot be identified, the agencys chief information officer works with agency senior leadership to determine ownership.
Page 17
Challenge: Organization - Using an embedded personnel structure requires clearly defined roles and responsibilities to reduce the tendency to think of the embedded role as another FTE to use for miscellaneous tasks. Solution: The agency struggled with this initially but through continued meetings with business unit leaders and support from Senior Leaders, the position has become more clearly defined. The position descriptions in the appendices are now a starting point to clarifying the roles and responsibilities of the data manager. More detailed job duties will be included in future position descriptions including information security responsibilities.
Future Benefits
ODE believes that it will realize many future benefits from using this approach to data classification. Legacy and production staff can use the classification tool to evaluate new systems before they are made available online to assess if sensitive data derived from older systems is present. When complete, the classification tool will aid report creators in properly classifying the Reports based upon the classification of data being used. The agency currently leverages the Data Management team to minimize extraneous/unnecessary data collected and to further align information usage in accordance with Ohio Revised Code 1347. Using the classification tool, the Data Management team will be able to have a more comprehensive view of what is being collected in the various business units.
Page 18
Page 19
Page 20
August 2008
Statewide IT Policy Investment and Governance Division Ohio Office of Information Technology 30 East Broad Street, 39th Floor Columbus, Ohio 43215 Telephone: Facsimile: E-mail: 614-644-9352 614-644-9152 State.ITPolicy.Manager@oit.ohio.gov
August 2008