Data Classification Guide

Guide
Data Classification
List of Content
1 Data Classification
2 Data Classification Policy
3 Data Classification Best Practices (part 1)
4 Data Classification Best Practices (part 2)
5 Data Classification Types: Criteria, Levels,
Methods, and More
6 Sensitive Data Discovery 101
7 Data Classification Framework:
What, Why and How
8 Data Classification Examples
9 What are the Data Classification Levels?
10 Data Classification Examples and its Importance
Data Classification: Compliance,

Concepts, and 4 Best Practices
What is Data Classification?
he term data classification refers to processes and tools designed to organize data into categories. The
purpose is to make data easier to store, manage, and secure.
Data classification systems support organizations in many efforts, including risk management, compliance,
and legal discovery. Additionally, data classification systems can improve the usability and accessibility of
data, helping organizations derive more value from their information assets.
Data classification can improve all three fundamental aspects of information security:
Confidentiality—enabling and application of stronger security measures for sensitive data.

Integrity—enabling adequate storage provisioning and access controls to prevent data loss,
unauthorized modification or corruption.
Availability—providing controls to make data easily accessible by authorized users.
In this article:
Why Is Data Classification Important?
What Are the Four Data Classification Levels?
What Are the Different Types of Classification of Data?
Challenges of Data Classification
How Do Compliance Standards Impact Data Classification?
Data Classification Levels
Establishing a Data Classification Policy
4 Data Classification Best Practices
Conduct a Data Risk Assessment
Create a Data Inventory
Establish Data Security Controls
Maintenance and Monitoring
The information provided in this article and elsewhere on this website is meant purely for educational
discussion and contains only general information about legal, commercial and other matters. It is not legal
advice and should not be treated as such. Information on this website may not constitute the most
up-to-date legal or other information.
https://satoricyber.com | contact@satoricyber.com 1
The information in this article is provided “as is” without any representations or warranties, express or
implied. We make no representations or warranties in relation to the information in this article and all
liability with respect to actions taken or not taken based on the contents of this article are hereby expressly
disclaimed.
You must not rely on the information in this article as an alternative to legal advice from your attorney or
other professional legal services provider. If you have any specific questions about any legal matter you
should consult your attorney or other professional legal services provider.
This article may contain links to other third-party websites. Such links are only for the convenience of the
reader, user or browser; we do not recommend or endorse the contents of any third-party sites.
Why Is Data Classification Important?
Data classification provides an interface for organizations to implement controls and procedures across
data formats, structures and storage technologies. Classified data allows an organization to define and
implement a single policy for handling sensitive data across multiple systems and data objects. Defining
multiple policies per each type of data object is not realistic in today’s data abundant environments.
There are several reasons why data classification is important:
Context: data classification adds business context to applications and processes. For example, based on
data classification, an organization can identify applications that handle sensitive data and define
stricter security requirements for those applications.
Compliance: data classification makes it easier to comply, and also proves compliance, with regulatory
frameworks such as GDPR, CCPA, HIPAA, and PCI.
Security: data classification makes the business aware of the data sensitivity, both as a whole and each
time data is introduced, and allows the business to use that context to apply the right level of security
control.
Governance: data classification makes it easier to map, track, and control data.
What Are the Four Data Classification Levels?
There are typically four data classification levels in information security:
Public: data that is in, or can be in, the public domain and can be openly shared with anyone outside of
the organization. For example: a data sheet about the company’s products and services.
Internal: company-wide data that is kept within the organization and, while not sensitive, should not be
shared externally. For example: a guide about how to get help from the IT helpdesk.
Confidential: domain-specific data that can be shared with specific people or teams and contains
sensitive company information. For example: a price list for one of the company’s products.
Restricted: highly sensitive information that should only be available on a need-to-know basis. For
example: employee agreements.
What Are the Different Types of Classification of Data?
While data is classified based on each individual business’s needs, there are a few types of data classifica-
tion that are more common:
Data-based classification: classification that describes the nature of the data. For example: a credit card
number or an email address.
Context-based classification: classification that describes the data’s business context. For example:
sensitive data or earnings data.
Source-based classification: classification that describes the source of the data. For example: customer
data collected from the webinar registration form.
Challenges of Data Classification
While data classification is essential for carrying out various functions, information security is mainly
concerned with sensitive data. In most organizations, sensitive data is classified into various sensitivity
levels and then mapped to different categories of sensitive data (e.x. personal information).
The challenges organizations usually face when classifying data are:
False positives: the same data could appear in different formats and different contexts. Classification
algorithms that do not take into account the data’s format and context are more likely to generate false
classifications. As huge amounts of data are usually involved in classification projects, even very low false
positive rates can prevent an organization from effectively classifying.
False negatives: under various regulatory standards, data might be considered sensitive in a specific
context but not in another. For example, a name might be considered non-sensitive by itself but sensi-
tive when alongside a medical record. Classifying data outside of the usage context can and often does
result in incorrect classification.
Big data: data lakes and data warehouses represent ever-growing, dynamic repositories of data, creating
a huge challenge for non-continuous classification tools.
Cost: for most classification tools, the cost of implementing and operating a data classification policy
depends on the amount of data and the number of controls established. This process hinders an organi-
zation that wants to classify large data sets with strict access requirements.
How Do Compliance Standards Impact Data Classification?
Many regulations and compliance standards require organizations to perform data classification. Require-
ments may be different in each compliance standard, depending on the type of data each organization
uses, processes, collects, transmits, and stores.
Here are several common compliance standards and their data classification requirements:
GDPR—entities handling the personal data of European data subjects are required to classify all collect-
ed data types. GDPR categorizes specific data related to race, political opinions, healthcare, ethnic origin,
and biometrics, as “special”. This data requires additional protection.
PCI DSS—Requirement 9.6.1 stipulates that entities must “classify data so that sensitivity of the data can
be determined.”
SOC 2—the Trust Services Criteria of SOC 2 requires entities to demonstrate that they regularly identify
and maintain confidential information in a manner that meets their unique confidentiality objectives.
HIPAA—considers personal health information (PHI) as a high-risk asset. The HIPAA Security Rule
requires covered entities and relevant business associates (BA) to identify PHI and implement safeguards
that ensure its integrity, availability, and confidentiality. The HIPAA Privacy Rule limits the uses and
disclosures of PHI, forcing covered entities and business associates to establish data classification
procedures.
Data sensitivity levels help determine how each type of classified data should be handled. The Center for
Internet Security (CIS), for example, recommends three information classes:
1. Public
2. Business Confidential
3. Sensitive
The US government has a more extensive classification, with seven levels of data sensitivity:
1. Controlled Unclassified Information (CUI)

2. Public Trust
3. Confidential
4. Secret
5. Top Secret
6. Code Word Classification
7. Restricted Data/Formerly Restricted Data
Using more than three levels can introduce complexities and make data classification hard to control and
maintain. Using less than three levels, on the other hand, is considered too simplistic and may lead to
insufficient protection and privacy. This is why the majority of organizations use three levels of classification,
as advised by the CIS.
Here is a generalized form of the CIS classification definitions which you can use in your data classification
efforts:
1. Low Sensitivity Data—public information that does not require access restrictions, such as public
web pages, blog posts, and job listings.
2. Medium Sensitivity Data—intended only for internal use, and can have a major impact on the
organization if breached. For example, business plans, customer lists, and non-identifiable personal
data.
3. High Sensitivity Data—data protected by regulations or compliance standards, requiring strict
access controls and protection measures. If breached, the data may cause significant harm to
individuals or the organization, and may also result in compliance penalties or fines.
Learn more in our detailed guide to data classification levels
Establishing a Data Classification Policy
A data classification policy defines how your organization manages its information lifecycle. The goal is to
ensure sensitive information is handled in a manner relevant to the level of risk it poses. A data classifica-
tion policy should address access and authorization, taking into account the data structure and its
day-to-day business uses.
Here are several key aspects your policy should cover:
Objectives—the motivation for implementing data classification and the goals to achieve, with measur-
able key performance indicators (KPIs).
Workflows—clearly define how the entire classification process should be organized and structured.
Explain how this process will impact all employees, and how they should treat different levels of sensitive
data.
Location—identify where the data is stored—on premises, in the cloud, on backup systems, within
databases, file systems, etc.
Schema—determine and describe the categories chosen to classify data.
Data owners—clearly define all roles and responsibilities of all parties involved in the management of
data classification. Describe how each role should classify data and grant access.
Compliance—clearly define which information is subject to compliance regulations, and what measures
to be taken to ensure compliance.
Learn more in our detailed guide to data classification policy (coming soon)
4 Data Classification Best Practices
Here are a few best practices that can help you improve data classification in your organization.
Conduct a Data Risk Assessment

A data risk assessment can help you achieve a comprehensive understanding of all data requirements,
including those related to company policies and compliance regulations. You should also determine
contractual privacy and confidentiality requirements. Define data classification objectives in coordination
with all stakeholders—including IT, security, and legal teams.
Create a Data Inventory

Before you can classify data, you need to locate it using data discovery techniques and tools. Once you have
located all sensitive data, you need to identify and classify it to ensure each type of data is appropriately
protected.
To make the process efficient and accurate, you can label each sensitive data asset. This can significantly
improve your data classification policy enforcement process. You can label data manually or automatically.
Intelligent classification systems can automate this process. For example, a data classification system can
use predefined policies to automatically identify and classify data, and then tag it with the appropriate
classification label. These systems can continuously monitor data, ensuring that it is always classified
properly across the entire data lifecycle.
Establish Data Security Controls

Each data classification level requires a different level of security. To ensure each level is appropriately
protected, you should establish standard security measures. Then, define policy-based controls for each
classification label.
When defining security measures, you should take into account where each data type resides and the
value this data provides to the organization. You can then assess the risks and implement the appropriate
controls.
Maintenance and Monitoring
Data is dynamic and requires ongoing monitoring and maintenance. It can be frequently copied, created,
modified, deleted, and moved. Since data may undergo many changes throughout its lifecycle, data
classification can quickly turn into a time consuming effort.
An important way to reduce data classification efforts is to identify which data really needs to be protected,
and focus efforts there. Automated classification systems are another way to reduce workloads and ensure
fast detection and treatment of newly created sensitive data. Finally, ensure your data classification policies
are flexible enough to deal with changes to data structure, new data types, and growing data volumes.
Learn more in our detailed guide to data classification best practices
Data Classification with Satori
Satori offers continuous data classification and sensitive data discovery that requires no pre-configuration
and works out of the box. Learn more about Data Classification With Satori.
Data Classification Policy: Benefits,

Examples, and Techniques
What is Data Classification Policy?
The main goal of a data classification policy is to standardize how a company manages its data assets. A
data classification policy ensures that sensitive information is properly handled throughout its entire
lifecycle by all relevant stakeholders. It can significantly reduce risks associated with data security, privacy,
and compliance.
A data classification policy is unique to each organization and is strongly dependent on industry standards
and regulations that affect the organization. It takes into account how data is collected and structured by
the organization, as well as the authorized parties allowed to access and use the information.
Data classification policies can help ensure that authorized stakeholders have access to the data while
preventing unauthorized access and abuse of privileges. By classifying the data stored in databases,
organizations can ensure that only those who are authorized can view, modify, delete, or add sensitive
information.
A data classification policy is based on the separation of data into several classification levels, according to
the sensitivity of the data. Learn more in our guide to data classification levels (coming soon).
In this article:
What are the Benefits of a Data Classification Policy?
Examples of Data Classification Policy
Example #1: Healthcare
Example #2: Acquisitions
Data Classification Policy Techniques
Automated Classification Policy
User-Driven Classification Policy
What is the Difference Between Data Classification Policies,

Security Policies, and Risk Assessments?
It is important to understand the difference between data classification policies, security policies, and risk
assessments:
Data classification policy—a plan that helps an organization determine risk tolerance across all its data
assets.
Security policy—a plan designed according to the overall security needs of the organization. It includes
security controls determined according to predefined risk tolerance. Data security policies are depen-
dent on the organization’s data classification policy.
Risk assessment—a technique used to assess the impact of threats on each asset. It helps in under-
standing the level of security each asset requires, what safeguards to put in place, and what countermea-
sures are required to mitigate risks. Risk assessments can complement data classification policies, by
determining what concrete threats affect each category of the data asset.
What are the Benefits of a Data Classification Policy?
A data classification policy can help you achieve the following:
Know how much data you are required to protect—and then easily implement security-related
resource allocation.
Gain a better understanding of data across the organization—learn what types of data are located in
each location and determine the security requirements of each data type. Additionally, you can learn
whether your current data protection situation is acceptable, from either a compliance regulation or
company standpoint.
Understand compliance requirements—by defining what types of data require certain levels of
protection.
Improve data visibility and control—properly categorized data can help gain accurate visibility into
data protection, which can help improve protection controls. You can learn if data is well protected,
identify weaknesses, and mitigate existing data security issues.
Examples of Data Classification Policy
Here are two examples of how data classification policies are used in practice by organizations.
Example #1: Healthcare

Healthcare technology companies that store sensitive patient information are required to comply with the
Health Insurance Portability and Accountability Act of 1996 (HIPAA), which defines special requirements for
the protection of protected health information (PHI).
A data classification policy can help organizations quickly provide proof that all personal healthcare infor-
mation is properly classified and protected. It details the measures the organization takes and what
security safeguards are applied to healthcare information. It ensures evidence is properly filed and remains
accessible for auditors.
Example #2: Acquisitions

When companies are in the process of being acquired by other entities, they enter into a short window of
due diligence. During this time, the company needs to demonstrate value and viability. This requires
compiling a list of all assets and liabilities. Additionally, the company is assessed for how well it manages
risks.
A data classification policy enables companies undergoing due diligence processes to accurately and
swiftly provide all necessary information. It helps the company show that data protection is treated serious-
ly and efficiently, and informs relevant stakeholders exactly how data is classified and protected. An
efficient classification system can significantly reduce data risks, minimize liability, and increase the
perceived value of the company—all of which can contribute to a successful acquisition.
Data Classification Policy Techniques
Here are two alternative techniques commonly used to classify data and define an appropriate data
classification policy. In many cases, organizations combine these two methods.

In this technique, classification is performed by software solutions. The classification relies on algorithms
that analyze phrases or keywords in the content in order to classify it. This approach is useful when specific
types of data are generated without user involvement—for instance, reports created by ERP systems, or
information featuring specific personal details which can be easily identified (such as credit card details or
social security numbers).
Automated solutions are useful for many use cases, but because they cannot appreciate context, they often
result in false positives—data wrongly classified as sensitive, resulting in unnecessary security measures
that can hinder business processes and annoy users. They may also give false negative errors that make
organizations vulnerable to the loss of sensitive information and may result in compliance violations.

Data classification is more efficient when the user, responsible for the data in their day-to-day role, is in
charge. The user-driven classification approach gives employees themselves the responsibility to decide
which classification label fits the information they manage, applying a label while data is being edited,
created, saved, or sent.
User-driven classification has several benefits:

Taps into the user’s knowledge of business value, context, and sensitivity of specific data, making data
classification much more accurate
Improves security by eliminating false negative classifications
Promotes a culture of data security, and makes it easier to keep track of user behavior
Makes it possible to isolate potential insider threats, and identify policy violations by specific users or
departments, which can be addressed by policy changes
Related content: Read our guide to data classification best practices
Best Practices (part 1)
Importance of Data Classification
Back when we were doing the manual classification project, we did not doubt the importance of data
classification. We fully understood the need for it, and the request made perfect sense. We knew how
crucial it is to know what you have when it comes to data that we were willing to work long and hard to
execute the task.
As such, I think it is important to elaborate on the main reasons why you need to know where sensitive
data is:
Prioritizing Placement of Security Controls

Yes, everything needs to be properly secured, but we also need to be rational about our resources.
Classifying data helps avoid a “peanut butter approach” in which you spread your resources too thin.
Data classification helps determine a starting point and suggests where you should allocate the most
resources on security. Based on risk analysis, the greatest need for security tends to be mostly where
sensitive data is located.
Monitoring and Enforcing Access Controls Specific to PII

Similar to the last point, in many cases, it is beneficial to have specific auditing and access controls when
accessing sensitive data. For example, you may apply automatic data masking when sensitive data is being
accessed. Classifying your data allows you to enforce these additional controls on specific data.
Limiting Resource Access to Specific Individuals

When you know where sensitive data is, you gain an increased ability to limit access to those resources. For
example, if you have classified data as sensitive, you will think twice about granting access to this data to
other business units or entities outside of your company. You can even control which data you provide
access to and grant access to certain data, while maintaining security of the sensitive data.
Data Classification for Compliance

The requirements for compliance vary based on the types of data stored, your industry, and other factors,
but it may be that access to sensitive data is to be audited and retained for a specific period of time or that
permissions to access the data need to be controlled. Regardless of the specifics, compliance requirements
around access to sensitive data require knowing where the sensitive data is stored and how it is being
accessed.
Data Classification for Data Protection & Privacy Acts/Regulations

Due to data protection and privacy acts and regulations, there may be limitations on how you use data
based on its sensitivity. These limitations can include applying functionality such as “the right to be
forgotten” on users’ data or regional privacy. Knowing where different data types are located helps you
scope out such projects and ensure you comply with the regulations.
Data Classification for Contractual Reasons

In a similar manner as compliance requirements, you are often obligated to treat certain data differently
based on customer commitments. For example, a SaaS company may have an obligation towards
businesses in a specific region not to move their sensitive data out of the specific region.
Why Is Data Classification Hard?
Going back to the data classification project I performed, as I wrote previously, we had a perfect
understanding of the task’s importance, yet the project was very difficult and time-consuming. After
discussing the issue with a lot of data engineers and data owners, I have summarized the common
hardships surrounding data classification below:
Data Is a Moving Target

Data is often a moving target due to ETL or ELT processes in which data is moved to enrich it, anonymize it,
or apply other transformations to it. These movements can occur within the same platform (such as from
one Snowflake database to another), but they can also be across different public clouds or data platforms,
which can get very complicated to track.
Data Itself Is Changing

Not only is data a frequent-flier in terms of travel as it moves from one place to another, but it also changes.
You may have a table that does not have any sensitive data in it until someone changes something some-
where, and, all of the sudden, you are dealing with sensitive PII. For example, once I was dealing with a
product table that was not supposed to have any sensitive data in it, but then an application added custom
hidden products that contained the customer name added as a custom field.
Data is Spread Across Different Platforms

If having the data move around and change continuously wasn’t challenging enough, one of the hardships
in the project I was running, as well as in other projects, was that data was not all stored in the same
platform. Some of it may be stored in Parquet files stored on S3 and retrieved using Athena, some are in
AWS Redshift, and others are in Postgres.
Classifying Semi-Structured Data Is a Challenge

Semi-structured data (such as data stored in JSON files or in other semi-structured data objects in data
warehouses or data lakes) can add complexity to data classification. It makes it harder to classify and
discover sensitive data, maintain a report on it, and monitor it. For example, a column named event_data in
a Snowflake table may contain different types of semi-structured objects depending on the type of event,
and, in some cases, there is an item with sensitive data. Iterating through the data to discover sensitive data
becomes much more difficult with semi-structured data. For example, it can look like in the image below,
with sensitive data in the customer_details.first_name and customer_details.last_name fields:
{
event_type: “complaint”,
ts: “”,
tech_details: [
{item_id: “item1”, …},
{item_id: “item2”, …}
],
customer_details: {
first_name: “Ben”,
last_name: “Herzberg”
}
…
}
However, it can otherwise be in a totally different location (this time in the matching_results.phone and
matching_results.blood_type fields):
{
event_type: “checkup”,
ts: “”,
tech_details: [
{item_id: “item1”, …},
{item_id: “item2”, …}
],
matching_results: {
phone: “555-6672”,
blood_type: “AB”
}
…
}
This is a relatively simple example, but often, semi-structured data is far more hierarchical, including lists,
and can be much larger in size. In many cases, it is collected without proper knowledge of what it may
contain, which adds to the complexity of performing data classification on semi-structured data.
Data Classification With Satori
Satori offers continuous data classification and sensitive data discovery that requires no pre-configura-
tion and works out of the box. Learn more about Data Classification With Satori.
Best Practices (part 2)
How To Perform Data Classification
The following are questions you should ask before starting a data classification project:
What Is the Motivation Behind This Data Classification Project?

In many cases, the reasoning driving data classification is a demand from another team (such as GRC, legal,
privacy, or security). In these situations, it is important to understand the reason for the request as well as
the end goal. Sometimes, the team requesting data classification will be certain that they require a specific
quality or granularity (e.g. they may need all data types in a certain data store or a mapping of columns to
sensitive data parts).
Discussing these requirements with that team can also help you prioritize the data classification project
over other projects and understand its degree of urgency.
What Level of Granularity Is Required?

The level of granularity required is twofold – one is the granularity level when describing the location with
the classified data, and the other is the granularity level of the types of data classified. Let’s discuss these
levels of granularity:
Location Level Granularity of Data Classified
The requirement can be granular to a specific data store, database, schema, table, or column. It can be
even more granular to require an understanding of the location of the different data types within
semi-structured data located within a specific column.
Data Types Granularity of Data Classified
The requirement can be boolean, which means specifying the locations where we have sensitive data
versus the locations where we do not have sensitive data. However, in most cases, there is a requirement to
at least define the categories of data classified. For example, these categories can be PII or PHI data. In
many cases, the requirement is to be even more specific and classify the data as specific types such as
phone numbers, names, blood types, patient IDs, or social security numbers.
How Often Does the Data Change?

Some data stores are relatively static, with constant additions of the same types of data. Some data stores
are continuously changing, often by contributions from many different teams. These changes include new
data being poured in and transformations, which can lead to ongoing shifts in the data types being stored,
processed, and accessed. In these situations, it is important to understand that an ad-hoc data classifica-
tion project can become stale very quickly.
Where Does the Data Come From?

In many cases, data is not produced then stored, but rather it is taken from a different location where it
goes through an ETL/ELT process. There are situations in which you have data classification known for the
data source, and can take this knowledge into account when planning a data classification project. If you
can get the inventory or catalog information about the source data, you can prioritize “following the
sensitive data.” However, you still need to keep in mind that, often, sensitive data is added in unexpected
places or without any conscious decisions being made.
How Diverse Is the Data?

It is one thing to handle data that is pretty much consistent and another when the data is inconsistent. The
inconsistency can be in the data platforms (e.g. some of it is stored in S3 buckets and queried with AWS
Redshift Spectrum, some in MS-SQL, and some in Snowflake). Inconsistency in the data can also mean that
the data structures themselves are very different from one another, often due to semi-structured data. The
more the diverse the data is, the more difficult a data classification project becomes.
The Data Classification Project
Once you have answered the questions above, you have good background knowledge about the data
classification project and can make an informed decision about the best path to completion. There are
three main paths you can take at this point:
Manual Data Classification

A manual data classification project is performed without any specific tools by accessing the data and
preparing an inventory of the types of data and their locations, depending on the level of granularity
required, as discussed above. This path is taken mainly when the data stack is too complicated or outdated
to run automated classifications or when running automated data classification is not an option for various
other reasons.
If the data is changing, or if it is important for the data classification to remain up to date, a manual data
classification is not a good option. Nevertheless, even though it is often not a very efficient strategy, manual
classification is still quite popular and is often completed by distributing the work across the data owners.
Automated Data Classification Tools

The more streamlined alternative to manual classification is running an automated data classification.
Automated classification is implemented by using data classification tools (or sometimes homebrewed
scripts) which access the stored data (either the files or by sending queries), analyze the data returned, and
suggest a classification for the data. This process should obviously be well-planned, so it does not create
any operational problems when scanning the data (such as data scan costs or performance impact).
Data classification tools are using algorithms to identify different data types, and, depending on the
answers you provide to the questions in the section above and on the way they operate, these tools may
require manual validation to mitigate false positives.
Automated data classification is good for the time the data is being scanned, butany changes to the data
made after the scan make the results obsolete. It is therefore important to understand the motivation for
the project and how often the data changes.
Continuous Data Classification

A continuous data classification process involves scanning data on-the-fly as it is being accessed. This is the
most fitting data classification method for organizations with data that changes rapidly or in any situation
where you would like to keep your data classification information up to date. As long as data is being
accessed, there is no additional overhead spent on scanning the data in this method.
We, at Satori, chose this method of data classification, as it is the most suitable method for DataSecOps
because it is continuous (and not ad-hoc) and ensures that, even if sensitive data “found its way” to a new
location, it will get discovered. You can always manually override the data classifications performed by
Satori.
Detecting Data Which Is Not Accessed

Continuous data classification focuses on data in use, as it is scanned when being accessed in real time.
However, in some cases, you may want to initiate partial or full data scans in addition to the continuous
scans. The way to do this is straightforward: running a SQL query to query data from the locations you want
to scan. For more information on this method, feel free to contact us.
Data Classification With Satori
Data Classification Types:

Criteria, Levels, Methods, and More
What Is Data Classification?
Data classification involves the organization of structured and unstructured data into logical categories.
The goal is to ensure data is used in a more secure and efficient manner. Data classification enables
organizations to easily locate and retrieve their data. It also facilitates better risk management, regulatory
compliance and legal discovery.
Data classification processes apply labels to personal information and sensitive data. Data classification
labels ensure that data can be effectively and accurately searched and tracked. Another key advantage of
data classification is that these processes eliminate duplicate data, reduce storage and backup costs, and
help minimize cyber security risks.
In this article:
3 Data Classification Criteria
Data Sensitivity Levels Used by Businesses
Data Sensitivity Levels Used in Government
Common Data Classification Methods
Paper-Based Classification Policy
3 Data Classification Criteria
Data classification involves assigning metadata to pieces of information according to certain parameters.
Here are three common criteria used for data classification:
1. Content-based classification—assigns tags based on the contents of certain pieces of data. This
scheme reviews the information stored in a database, document or other sources, and then applies
labels that define the data type and a sensitivity level.
2. Context-based classification—uses environmental information, like metadata, to create data
classification labels. For example, this method may automatically classify all documents produced
by a specific application or user as financial information. Additionally, you can use context-based
classification to generate labels based on predefined rules that define data type and the sensitivity
level.
3. User-based classification—a knowledgeable user decides how a certain classification label should
be applied to a specific piece of data. This user can be a specialized classification authority or the
creator of the data. However, this method may cause scalability issues in organizations that gener
ate large amounts of data.
Here are several types of data sensitivity levels:
Data Sensitivity Levels Used by Businesses

Restricted—restrict the use and access of all data classified as highly sensitive. This type of level is often
handled on a “need-to-know” basis. Restricted data may include intellectual property, personally identifi-
able information (PII), trade secrets, health information and cardholder data. Disclosure of this data can
have significant financial or legal implications.
Confidential—this data can be used across the organization. However, it must be contained within the
boundaries of business. Confidential data is usually subject to legal restrictions that regulate how the
data must be handled. Confidential data may include pricing, contracts and marketing plans. Disclosure
of this data can negatively affect operations and brand.
Internal—this type of information is made available company-wide but it is still considered internal data
that requires protection, albeit limited. Internal data may include company directories, company-wide
memos, and employee handbooks. Disclosure of this type of data may result in minimal impact on the
organization .
Public—you can share the information openly with the public. This type of data does not require any
security controls when used or stored.
Data Sensitivity Levels Used in Government

Top Secret—information that requires the highest level of access control and protection. It is restricted to
people with a “need to know” clearance. Disclosed top-secret data can threaten national security.
Secret—information that requires a high level of protection. The disclosure of this information can cause
serious damage to national security.
Confidential—applies to the lowest level of classified government data. Confidential data requires less
protection than top-secret or secret data. Disclosed confidential information can cause some harm to
national security.
Sensitive but unclassified (SBU)—includes all information that is not otherwise classified. However, it is
still categorized as sensitive, which means it requires some protection. Disclosed SBU data may violate
the privacy rights of citizens.
Unclassified—applies to data labeled as not sensitive. This data does not require any protection.
Learn more in our detailed guide to data classification levels
Common Data Classification Methods
The following are several ways of addressing data classification using an organization-wide data
classification policy.
Related content: Read our guide to data classification policies
Paper-Based Classification Policy

This policy outlines how employees need to treat various sorts of data they deal with, in keeping with the
organization’s overall approach to data security and strategy. A well-defined policy will let users make
intuitive and speedy decisions regarding the worth of a bit of information, and which handling rules apply.
For instance, who might access the information and should you use a rights management template. The
difficulty, without backing technology, is making sure that all parties have knowledge of the policy and put
it in place correctly.
This technique does not involve the user. It enforces a classification policy, making sure it is consistently
applied over all touchpoints, without major education programmes or communication.
Classifications are put in place by solutions which rely on software algorithms, which use phrases or
keywords from the content to classify and analyze it. This method is very effective where particular sorts of
data are developed without user involvement – such as reports developed by ERP systems, or where the
information includes particular personal information that can be quickly identified, for example credit card
data.
Yet, automated solutions cannot interpret context and are thus open to inaccuracies, and providing false
positives that can annoy users and hinder business processes. They might also give false negatives that
expose organizations to sensitive information loss.

The data classification process could be entirely automated, yet it is more efficient if the user has control.
This approach makes employees responsible for choosing the appropriate label, and attaching it via a
software tool at the point of editing, creating, saving or sending. The benefit of including the user in this
exercise is that their understanding of the context, sensitivity of a bit of information and business value lets
them arrive at an accurate and informed decision regarding which label to use. User-driven classification is
an added layer of security often combined with automated classification.
Involving users in classification has other organizational advantages, including better security awareness
and enhanced capacity to monitor user behavior. This makes it easier to report issues and demonstrate
compliance. What’s more, managers can make use of this behavioral data to isolate potential insider
threats. They can attend to any issues by offering more guidance to users where fitting (for instance, via
additional training or fine tuning policy).
Automated Data Classification with Satori
Satori continuously classifies all data being accessed across your databases, data warehouses, and data
lakes. This means that even if new sensitive data is detected, you will know about it, and can even enforce
security policies that will prevent data exposure. To learn more, schedule or view a demo.
Sensitive Data Discovery 101

One of the most precious possessions any person could have is their sensitive personal information. It is
only important to keep track of its whereabouts, keep it confidential, and keep it secured.
In this article, we will discuss the following:
Integrity
Availability
Confidentiality
What is Sensitive Data?
Examples of Sensitive Data
Data Discovery
Sensitive Data Discovery
Advantages and Benefits of Sensitive Data Discovery
Importance of Sensitive Data Discovery
Challenges when Detecting Sensitive Data
Ad-hoc Discovery and Continuous Discovery
Before proceeding to what Sensitive Data and Sensitive Data Discovery are, one must know what Data
Classification is.
Data classification refers to the methods and techniques that get used to categorize data. The goal is to
make storing, managing, and securing data easier. Risk management, compliance, and legal discovery are
just some of the tasks that data classification systems offer. Moreover, the Data classification system helps
organizations get more value out of their information assets by improving the usefulness and accessibility
of data.
The three fundamental aspects of information security, namely integrity, confidentiality, and availability, are
also improved by data classification.
Integrity: Data classification enables proper storage availability and access restrictions to avoid data loss,
unlawful modification, or destruction.
Availability: It establishes controls to allow authorized individuals easy access to data.
Confidentiality: It authorizes and implements more stringent sensitive data security measures.
Data classification alerts the company to the sensitivity of its data, both overall and for each new piece of
data, and allows it to apply the appropriate level of security management in that context. It is simpler to
map, track, and handle data when it is classified.
What is Sensitive Data?
Sensitive data is private information that must be securely encrypted and out of the hands of anyone who
does not have the authorization to see it. Data security and information security measures should be in
place to limit access to sensitive data to prevent data leaks and intrusions.
All data, whether original or duplicated, is considered sensitive information. Below is a list of examples of
sensitive data.
Personal Data: This includes sensitive data that reveals ethnic or racial origins, genetic data, biometric
data, financial information, and health data.
Card Holder Data: To ensure security, organizations should know how to manage large credit card
systems from card schemes.
Education Records: This includes potential employers, publicly financed schools and universities, and
foreign governments to access educational information and records.
Protected Health Information: Any data about a person’s medical status, health care service, or health
care payment developed or collected by a covered entity or a third-party affiliate that you may link to
the person.
Customer Information: Financial institutions must disclose how they share and protect their customers’
personal information.
Data Discovery
The method of locating specific subsets of data from unstructured and structured data sources is known as
data discovery. It is critical to determine what data gets stored in company repositories and where it is
stored.
The method of categorizing different types of data depending on its sensitivity and vulnerability is called
data classification. It goes hand in hand with data discovery. Sensitive data discovery and classification are
separate processes that get required for identifying and protecting business-critical data.
Sensitive Data Discovery
Sensitive data has always been in danger of being hacked, exposed, and exploited. When businesses’
sensitive personal data is compromised, the results can be disastrous. This option is why it is crucial to
understand where your personal information gets kept. A sensitive data discovery and classification tool
aids in the discovery of sensitive data, its ownership, and the many data regulations that are being
breached by storing sensitive data in insecure areas.
Advantages and Benefits of Sensitive Data Discovery

Every firm must establish that sensitive data discovery is a critical data security activity and is a must. Not
only does it ensure trust and security, but it also comes with a lot of benefits.
They identify every occurrence of sensitive data in a company’s data store.

Data classification is made easier.
It monitors sensitive data that has gotten disclosed or may get exposed due to a security breach.
You are creating the foundation for the development of a comprehensive data management system.
It facilitates the completion of data access requests.
Importance of Sensitive Data Discovery

The foundation of a successful business is dynamism, and data discovery is a key component of that
adaptability. Sensitive data discovery provides corporate executives and their teams with a
behind-the-scenes look at their processes, allowing them to identify better and manage any issues that
may arise.
As more firms see their data as an asset, sensitive data discovery is becoming more common. Businesses
may use the data they acquire about their consumers and operations to set themselves apart from their
competition. Furthermore, sensitive data discovery enables them to use this insight into a competitive
advantage through product development, improved customer engagement, or increased productivity.
Challenges when Detecting Sensitive Data
Just like any other security measure, Sensitive Data Discovery also has its challenges.
Goals are not Set from the Beginning

The aim is to collect more data from impacting decision-making, but the real decisions requiring more
influence do not get considered early enough. As a result, one may get results that are not worth the time
spent analyzing the data.
Sensitive Data Discovery is Client-Driven and is an Iterative Process

Tools that are not well suited for business professionals are typical errors in data discovery. Traditional
tools in this field may confuse the user with a plethora of unrelated graphs and charts. Data fusion and
unification skills across numerous internal and external company data sources are critical components of
a successful data discovery strategy.
Sensitive Data Discovery and Classification Should not be Separate

Users will not be able to improve the data security and compliance status by simply locating and categoriz-
ing the data. When organizations utilize it in conjunction with other data security procedures, they will
realize significant value.
On the other hand, a network-based method allows businesses to find all known and undiscovered person-
al data storage and processing. It also provides a comprehensive, frequently updated perspective of the
undiscovered uses and categories of private data.
Sensitive data discovery and classification are useful and vital, but they should not get done in isolation.
One will see the true value when the functionality with permissions analysis, client and item behavior
analytics, and change auditing is working hand in hand.
Ad-hoc Discovery and Continuous Discovery
As data rises in quantity, so does its importance in commercial decision-making. However, for businesses to
fully realize the value of data at any given time, it must be freely available, accurate, and current. Deci-
sion-makers will only be able to completely trust reports and analyses if they instinctively comprehend the
story that their data is telling.
Ad-hoc Discovery
Ad-hoc reporting is a business intelligence technique for swiftly generating reports on demand. Ad hoc
reports are typically produced on a one-time basis to answer a specific business challenge. Ad hoc analysis
goes a step further, elaborating on a report’s objective facts to derive new insights. Ad hoc analysis allows
business teams to connect not just what happened but why it happened as well.
Decision-makers need answers to important questions as soon as possible in today’s fast-paced business
world. However, when time is tight, employees cannot always rely on their regular, static reports to provide
business answers. Reports and analyses that take several days to arrive are frequently late. Ad hoc reporting
and analysis is critical because it allows organizations to swiftly obtain answers to specific questions as
quickly as they are raised, speeding up the judgment call process.
A separate team of product researchers and marketers conducts the discovery, which is then passed on to
delivery teams to construct what needs to get built. Other companies give employees more authority, and
product teams are in charge of both discovery and delivery. Some groups begin with a discovery phase,
followed by a continuous delivery phase, while others execute constant discovery and continuous delivery
simultaneously.
Continuous Discovery
Continuous discovery refers to discovering a plan, a product, new features to develop, changing market
needs, or economic expansion required to accelerate growth. Continuous Discovery is a technique that
assists product teams in improving and polishing their ideas based on the demands of their customers to
enhance the product’s value. It involves doing modest research activities with clients regularly to get the
desired product outcome.
Sensitive Data Discovery with Satori
Using Satori, data is continuously classified as it’s being accessed. This means that even if new sensitive
information is added to your data stores, it will quickly be mapped. You can also integrate your sensitive
data locations with your data catalogs, as well as set security policies to apply automatically on your
sensitive data. Read here about how we do it.
In addition, with Satori, you can also set custom business-specific sensitive data to be continuously
discovered.
Conclusion
Personal data is indeed a top priority considering the amount of traffic of users that continue to rise. It is
only important to take security measures seriously to avoid breaches and compromise. To not lose the
public’s trust, every establishment should have security plans for the acquired data. Confidentiality is of top
priority and should not get taken lightly.
Data gathering is also essential along with time. Research is important to have fact-based decisions to
develop a more effective outcome. Ad-hoc discovery and continuous discovery make it possible to obtain
sufficient and accurate data when time is limited.
Data Classification Framework:

What, Why and How
What Is a Data Classification Framework?
The data classification process consists of content identification, categorization, and protection accord-
ing to sensitivity or impact levels. Data classification aims to protect data from unauthorized modifica-
tion, destruction, or disclosure.
A data classification framework is a formal policy typically executed enterprise-wide. It often consists of
three to five classification levels, which include three elements—name, description, and real-world
examples.
Ideally, you should use a maximum of five top-level parent labels, each with its own five sub-labels—25
in total. This limitation can help keep your user interface manageable.
Each data classification level is associated with certain controls. By themselves, levels are simply labels
(tags) that indicate the sensitivity level or value of the content. Data classification frameworks control
this content by defining controls for each level.
Related content: Read our guide to data classification levels
In this article:
What Information Should a Data Classification Framework Include?
Data Classification Matrix
Data Classification Framework Best Practices
Implement Data Classification Gradually
Write Framework Documentation for All Stakeholders
Minimal Number of Data Classification Levels
Balance Security Against Convenience
What Information Should a Data Classification Framework Include?
It is common to include the following information as part of a data classification framework:

Goal—why an organization wants to classify data and the benefits it brings.
Scope—the types of data that need to be classified, where the data is stored, and who in the organiza-
tion will perform the classification and use it.
Responsibilities—specifies which individuals are responsible for which tasks in the data classification
workflow.
Procedures—step-by-step processes for accessing, evaluating, and classifying data, taking into account
confidentiality, troubleshooting, and other important issues.
Impact level—mapping out data in the organization and its impact on business processes and compli-
ance requirements. This can help understand the criticality of data classification for each dataset.
Visual data classification guide—a visual chart showing types of data assets, brief description of these
assets, level of impact, and applicable data classification labels.
Glossary—a definition of terms used in the data classification framework, which should be clear to
everyone in the organization.
Data Classification Matrix
The data classification matrix allows you to evaluate various security grades. You can add information to
your security specs, maintaining all data classification information in one place with additional informa-
tion added in. You can use various templates for your data classification matrix.
Here is an example of a simple template that describes a data classification framework with three
security groups ranging from low to high risk:
Related content: Read our guide to data classification examples (coming soon)
Data Classification Framework Best Practices
Here are several practices that can help you create and refine your data classification framework.
Related content: For additional, general guidelines on improving data classification read our guide
to data classification best practices
Implement Data Classification Gradually

Start by prioritizing any feature critical to your organization and then map these features against a specific
timeline. When executing your plan, start by completing the first step. Once you ensure the success of step
1, you can move forward while applying any lessons learned. While creating your data classification, your
organization can remain exposed to risk. Starting small with a few classification levels and expanding later
on can help you manage this risk.
Write Framework Documentation for All Stakeholders

A data classification framework serves a broad audience, including all staff members, legal and compliance
teams, and IT teams. Write the framework clearly and concisely to help all stakeholders understand the
framework. You should also provide real-world examples when possible. Use clear definitions for data
classification levels, avoid jargon, and include a glossary for highly technical terms and acronyms.
Minimal Number of Data Classification Levels

The standard amount of data classification levels per framework is typically between three to five. However,
that does not mean you should use the maximum amount. Here are several aspects to consider when
determining the number of required classification levels:
Industry standards and any relevant regulatory obligations—highly regulated industries often require
more classification levels than other industries.
Operational overhead—complex frameworks typically incur high expenses.
Implementation complexities—users are required to implement and uphold the data classification
framework users. A complex framework may not allow for proper implementation.
User experience and accessibility—when applying manual classification across different device types,
consider whether the framework allows for positive user experience and accessibility.
Balance Security Against Convenience

A secure but overly restrictive framework can be difficult to implement. Consider your users and whether
they can follow rigid, complex, and time-consuming procedures when applying the framework during
normal operations.
If users do not believe in the value of the framework, they will not follow the outlined procedures. This issue
can occur at all organizational levels, including executive-level (C-suite) management. You should balance
security against convenience and offer user-friendly tools to ensure users of all skill sets adopt and use the
framework.
As part of the DataSecOps platform provided by Satori, data is continuously classified by Satori, as it’s
being accessed. This means that any new data being accessed is immediately scanned and sensitive
data is discovered in it. This helps you audit all access to sensitive data, set security policies in a simpli-
fied way, and enable faster access to sensitive data.
To learn more refer to our product page, or schedule a demo.
Data Classification Examples:

Data Types and Policies
What Is Data Classification?
Data classification is the process of organizing structured and unstructured data into predefined
categories that represent different types of data.
Data classification helps you understand the type and location of organizational data. This enables risk
management, compliance and legal discovery, and lets you apply appropriate security measures to data
according to its sensitivity. Data classification also improves user productivity and decision-making.
Another important impact of data classification is cost reduction—classifying data reduces storage
costs by identifying duplicate data that can be deleted, or moving low-importance or infrequently
accessed data to lower cost storage tiers.
In this article:
Data Classification Examples by Type of Data
Public Data
Private Data
Internal Data
Confidential Data
Restricted Data
Data Classification Policy Examples
Data Classification Examples by Type of Data
Public Data
Public data classification means that when information is stored or used, it can be published and shared
without security controls.
Common examples of public data include: first and last names, company names, dates of birth, job
descriptions, the content of press releases, and license plate numbers.
Private Data
Private data is not intended for the public, but does not require high security. Nevertheless, it is prudent to
protect private data from public access to protect its integrity, and prevent malicious parties from making
use of it in combination with other data. Sharing, destroying or modifying private data carries some risk to
the organization or individual.
Common examples of private data include: personal contact information such as phone numbers, text
from messaging applications like Slack or WhatsApp, employee ID numbers, research data, recordings of
non-sensitive conversations.
Internal Data
Internal data is information used internally by an organization, which requires some protection. Unintended
exposure of this data can have a detrimental effect on a company.
Examples of internal data include: company catalogs, employee handbooks, business plans, a corporate
Internet, email messages, URLs and IPs of internal systems.
Confidential Data
Confidential data requires protection to ensure it remains within the organization. There may be legal
restrictions for handling this data and disclosure could result in legal or financial penalties and harm
business operations and reputation.
Examples of confidential data include: company data such as contracts or marketing plans and sensitive
personal information such as ID card and Social Security numbers, credit card information (i.e., account
data, card numbers, PINs), medical records and insurance provider information, biometric identifiers,
financial records, and employee certification license numbers.
Restricted Data
Restricted data is highly sensitive information that requires strict controls to ensure need-to-know access.
Exposure of this data both within and outside of the organization could result in significant legal or finan-
cial consequences to the organization.
Examples of restricted data include: information covered by a confidentiality agreement, intellectual

property (IP) and trade secrets, personally identifiable information (PII), protected health information (PHI),
tax-related data, and cardholder data.
Related content: Read our guide to data classification types
Data Classification Policy Examples
Organizations use data classification policies to organize their stored data according to sensitivity levels.
These policies provide a comprehensive plan to ensure the correct handling of data and minimize
risk—they identify sensitive data and establish a framework for protecting it, including the rules, proce-
dures, and processes required for each category.
Organizations must identify the various types of data they hold, determine the value of all information,
evaluate the risks associated with the data, and establish guidelines for handling each type of data to
reduce and mitigate threats. They can then ensure the appropriate level of protection for each data
class. Data classification policies also help organizations avoid wasting resources to protect non-sensi-
tive data that doesn’t carry significant risk.
Related content: Read our guide to data classification policy
Here are two examples of companies benefiting from a data classification policy:
Example 1: Company acquisition

When a large enterprise acquires a smaller company, it enters a short due diligence period and must
demonstrate its value and viability. The company under review must list all its assets and liabilities. The
larger company can then assess how the company it is acquiring manages risk.
A clear data classification policy ensures that employees can easily access all the information they need and
understand how data is classified and stored. An efficient data classification system makes it easier to
locate important data and helps reduce risks and liability, increasing the company’s value and enabling a
smooth acquisition.
Example 2: Healthcare company

If a company holds confidential patient data, it must comply with HIPAA security standards. Regulation
authorities may request evidence of compliance and assess the company’s data protection processes.
A data classification policy enables the company to demonstrate how it classifies personal patient informa-
tion (i.e., as sensitive) and provides the highest level of security for this data. The staff file all evidence
according to the classification policy, making it easily accessible for regulators and auditors. Authorities can
view this evidence proving the company takes data security seriously, protecting the company from the
reputational damage and legal or financial penalties resulting from non-compliance with HIPAA.
Satori offers continuous data classification and sensitive data discovery that requires no pre-configura-
tion and works out of the box. Learn more about Data Classification With Satori.
What are the Data

Classification Levels?
Data is considered to be the lifeblood of businesses, but not all data is the same, and thus shouldn’t be
treated in a similar way. Data security is not only crucial, but quite valuable as well, and it requires
several layers of protection in order to prevent data breaches and leaks.
One way to do so is data classification, which is the core objective of several compliance standards and
requirements.
Data classification is also important for companies to ensure that their critical and valuable data is
protected from several risks and compliance issues. This article covers:
What is the Purpose of Data Classification?
The Four Levels of Data Classification
Are the Levels of Data Classification Still Relevant?
Different Types of Data Classification
This is part of our extensive data classification guide.
Data classification is considered to be a focal point for compliance requirements and standards, and it
involves the identification, categorization, and maintenance of data protection, while also reducing legal
risk and implementing security controls. In turn, this helps organizations in effectively allocating
resources.
Data classification hinges on the fact that you should know the data your organization collects, process-
es, and uses for its operations, as well as the level of security that needs to be applied to each type of
data. Therefore, you classify each type of data in order to achieve compliance and prevent cyberattacks.
What is the Purpose of Data Classification?
Data classification is integral not just for organizations to meet compliance requirements, but also to
implement stronger security measures in order to protect companies from any cyberattacks and
threats. It also helps businesses perform a risk assessment for their operations. Once you understand
how your organization stores and processes data, you would be able to implement data security
controls that can eliminate any risks.
When risk assessment is being conducted within the organization, it is crucial to find out about sensitive
data in order to detect any threats or loopholes that might trigger a data breach. It can actually be
cost-effective for companies, since they can allocate data security resources in a better manner. Moreover,
it would help them comply with data privacy standards and also contain any hacks or data breaches that
might take place within an organization.
The Four Levels of Data Classification
There are various levels of data classification in an organization. Generally, government agencies have more
classification levels, namely top secret, secret, confidential, sensitive, and unclassified. However, these don’t
apply to other organizations, which is why they usually employ the following four classification levels.
Public
The first data classification level is known as public, and it involves public data that can be openly used and
shared on the company website, as well as with the general public. Public information can be used without
any additional controls and security protocols, and it can be discussed openly as well.
For instance, it could include a datasheet about the company’s products and services or other promotional
content.
Internal
Another type of data classification is called internal information, and it is implemented across the organiza-
tion. Although this information is not sensitive, it should not be shared externally.
An example of this is the employee handbook and company memos which, even if disclosed to the general
public, won’t cause the company any harm.
Confidential
As the name suggests, confidential information has stricter access control and is limited to a particular
team only. Therefore, it is much more sensitive and is limited for use within the business.
Examples of confidential information include pricing policies, employee reviews, vendor contracts, and
other sensitive data. If this type of information is disclosed or leaked, it can have a negative impact on the
business or the brand.
Restricted
Last but not least, restricted data is a notch higher than confidential information, and its access is much
more restricted as well. Basically, it is limited to a need-to-know basis, and is protected through a Non-Dis-
closure Agreement (NDA), to minimize legal risk and ensure compliance.
Examples of restricted information include trade secrets, potentially identifiable information, credit card
information, financial data, and even health information. If this type of information is revealed, it can cause
massive legal and financial damage to the organization.
Are the Levels of Data Classification Still Relevant?
Data classification levels are critical in order for organizations to maintain the confidentiality, privacy,
and integrity of the data that is key to their operations. It also helps them mitigate the risk of sensitive
information being compromised.
If the data classification levels aren’t maintained and enforced within an organization, it can lead to
catastrophic results. Therefore, it is still important for companies to categorize their data according to
the different classification levels, to maintain compliance and minimize risks that lead to security issues
and data breaches. So, they are certainly relevant.
However, in many organizations, data is classified without the use of data classification levels. As long as
that does not conflict with compliance requirements, and the results are clear data access policies to
sensitive data, that is perfectly ok.
Different Types of Data Classification
Data classification is usually conducted according to each business’ requirements, but there are a few
common types of data classification, which are as follows:
Data-Based Classification – This type of classification is used to describe the nature of the data, i.e. an
email address, phone number, or credit card number.
Context-Based Classification – This type of classification involves a description of the business content of
the data, and it generally involves more sensitive data, such as the company’s revenue or earnings data.
Source-Based Classification – This type of classification provides a description of the source of the data.
This can include data collected from customers through several sources, i.e. a webinar, contact form, etc.
There are several other types of data classification that are relative to individual businesses and their
requirements. Plus, various compliance standards and regulations require companies to classify their data
efficiently, although the requirements might vary from standard to standard.
By following the data classification methods and sticking to the levels, companies would be able to ensure
better compliance and reporting to the local and global regulations, and it would also help them manage
data access and authorization in a better way as well.
Summary
This concludes our guide on the data classification levels, and whether they are still relevant or not. If you
are looking to establish a data classification policy, you can start by conducting a data risk assessment and
follow it up with a data inventory, which can help you in setting up stricter data security controls.
To learn more about how you can use Satori to improve your data governance for data-driven organizations,
go here. Read here about our core capabilities:
Fine-Grained Access Control
Dynamic Data Masking
Decentralized Data Access Workflows
Data Access Auditing & Monitoring
Continuous Data Discovery & Classification
Data Classification Examples

and its Importance
In today’s world, there is no organization or company that doesn’t have valuable data that needs to be
secured. This data can be in the form of customers’ or clients’ personal information, transaction receipts,
financial details, or more sensitive type of data.
In order to protect this data, there is a strong need for data classification, which is often considered the first
tier in protecting the individual or collective information to reduce the risk of data breaches.
By knowing everything about data classification, you can develop professional workflows and processes
that can be used in any industry to protect data. In this article, we will discuss:
Data Classification Definition
Why is Data Classification Important?
When is Data Classification Required?
Examples of Data Classification Tools
This is part of our comprehensive data classification guide.
Data Classification Definition
Data classification refers to the method of assigning a category to data, depending on how sensitive it is. It
is necessary for determining the type of security controls that need to be implemented for particular data
based on its classification.
If you are a data classification or data management professional, you might work as a data scientist or
manager in order to handle such operations.
An example of work that a data classification professional does is reviewing all the files and digital transac-
tions for an organization, classifying all the data into categories, and designing and implementing parame-
ters in order to safeguard each classification.
Data classification is integral for several industries, and it applies to different organizations and roles as well.
To determine how data classification can be applied to your company, you need to consider several factors,
which include the following:
The nature and type of information collected from customers, clients, vendors, or other business entities
The information or data created by your company, such as files, spreadsheets, receipts, customer
profiles, etc.
The security or sensitivity level of the data
The people that need to access your data
The frequency of data access

The digital records maintained by your company
The duration for which each category is documented
Why is Data Classification Important?
Let’s face it. Without data security, businesses can’t thrive. This is why data classification is crucial, and also
because it helps you organize data to keep it secure, thus avoiding the likelihood of data breaches, cyberat-
tacks, and hacking attempts. It acts as a firewall for businesses, especially since they have started using
digital platforms like email, cloud computing, online payment, and several others.
If your business data gets compromised or leaked, it might have a low, moderate, or high impact. Therefore,
it is important to find out the level of risk, and also implement protocols and measures that can be used to
protect the data, and data classification is used for this purpose.
Basically, data classification can be used to:
Protect the integrity and confidentiality of the data

Prevent personally-identifying information and business information from getting leaked
Comply with data privacy regulations and laws
Determine who gets data access, as well as the frequency and method
Establish the duration of record-keeping, as well as the security measures needed to protect the records
Maintain client trust
Establish a culture of data security
Preserve the company and brand reputation
Save time and money by placing targeted controls on integral data
When is Data Classification Required?
As mentioned above, data classification is necessary for every organization, and it is required when you
have a large amount of sensitive data in your organization, as well as an influx of valuable data over time. If
you don’t implement data classification protocols at the right time, you might risk having your data
exposed to cyberattacks and data breaches.
When data classification is required largely depends on the sensitivity of the data that you are trying to
protect. For instance, if your company’s data includes low-sensitivity information like public websites
content, press releases, marketing materials, and other data, you are at a lower risk of getting your data
compromised.
On the other hand, if your company information contains medium-sensitivity data, i.e. supplier contracts, IT
service management information, organizational correspondence, and other types of information, you will
need to implement data classification protocols as soon as possible.
Lastly, if you have high-sensitivity data like credit card information, customer personal data, privileged
information, social security numbers, and similar information, then it becomes highly crucial for you to
classify the data and implement security protocols as soon as possible.
Examples of Data Classification Tools
There are various types of data classification tools that companies can use in order to keep their data
security in check, and also prevent their valuable data from being compromised. Moreover, these tools can
be divided according to the various types of data they are used on, namely unstructured data, struc-
tured/semi-structured data, and continuous data classification.
Unstructured Data
Unstructured data refers to data that has no predetermined data model or pattern. Therefore, you can also
call it qualitative data or unorganized data. Moreover, it isn’t easily searchable through artificial intelligence
or machine learning. In simpler terms, unstructured data is created by individuals, rather than systems.
Some examples of unstructured data include audio files, text, presentations, social media data, videos,
mobile usage data, etc. It can also include source code, binary code, documents, and many others. Data
classification tools make use of machine learning algorithms to classify data based on their sensitivity, risk,
availability, duplicity, and usefulness. Since this type of data can’t be analyzed by machines, they require
additional processing.
Structured or Semi-Structured Data

As compared to unstructured data, structured data can be processed and analyzed by humans, and it can
also be indexed.
Some of the examples of structured data include spreadsheets and database objects. Moreover, performing
data classification on structured data is much easier and less complex as compared to classifying unstruc-
tured data.
Continuous Data
Continuous data refers to information that is in fractional numbers. This may include the Android phone
version, a person’s height, the length or width of an object, and similar types of data. Basically, it represents
data that can be broken down into smaller levels. This also means that the continuous variable can take any
value within the range.
Summary
This concludes our guide on data classification examples and how data classification tools help you protect
all types of sensitive company and customer data. Data privacy is the need of the hour, especially due to the
wide range of data breaches occurring in the past. This is why organizations should protect the integrity
and availability of their data.
Satori, The DataSecOps platform, provides continuous sensitive data classification, to make sure you’re
always on top of your sensitive data, whether it’s in databases, data warehouses, or data lakes. Among the
other key capabilities of Satori are:
Fine-Grained Access Control
Dynamic Data Masking
Decentralized Data Access Workflows
Data Access Auditing & Monitoring
To learn more about Satori, go here.
For more information go to:
www.satoricyber.com

Data Classification Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Classification Guide

Uploaded by

Copyright:

Available Formats

Guide

Data Classiﬁcation: Compliance,

Conﬁdentiality—enabling and application of stronger security measures for sensitive data.

Why Is Data Classiﬁcation Important?

What Are the Four Data Classiﬁcation Levels?

What Are the Different Types of Classiﬁcation of Data?

Challenges of Data Classiﬁcation

How Do Compliance Standards Impact Data Classiﬁcation?

Data Classiﬁcation Levels

Establishing a Data Classiﬁcation Policy

4 Data Classiﬁcation Best Practices

Conduct a Data Risk Assessment

Create a Data Inventory

Establish Data Security Controls

Maintenance and Monitoring

Why Is Data Classiﬁcation Important?

What Are the Four Data Classiﬁcation Levels?

There are typically four data classiﬁcation levels in information security:

What Are the Different Types of Classiﬁcation of Data?

Challenges of Data Classiﬁcation

The challenges organizations usually face when classifying data are:

How Do Compliance Standards Impact Data Classiﬁcation?

Data Classiﬁcation Levels

1. Controlled Unclassiﬁed Information (CUI)

Learn more in our detailed guide to data classiﬁcation levels

Establishing a Data Classiﬁcation Policy

Here are several key aspects your policy should cover:

4 Data Classiﬁcation Best Practices

Conduct a Data Risk Assessment

Create a Data Inventory

Establish Data Security Controls

Maintenance and Monitoring

Learn more in our detailed guide to data classiﬁcation best practices

Data Classiﬁcation with Satori

Data Classiﬁcation Policy: Beneﬁts,

What are the Beneﬁts of a Data Classiﬁcation Policy?

Examples of Data Classiﬁcation Policy

Example #1: Healthcare

Example #2: Acquisitions

Data Classiﬁcation Policy Techniques

Automated Classiﬁcation Policy

User-Driven Classiﬁcation Policy

What is the Difference Between Data Classiﬁcation Policies,

What are the Beneﬁts of a Data Classiﬁcation Policy?

A data classiﬁcation policy can help you achieve the following:

Examples of Data Classiﬁcation Policy

Example #1: Healthcare

Example #2: Acquisitions

Data Classiﬁcation Policy Techniques

Automated Classiﬁcation Policy

User-Driven Classiﬁcation Policy

User-driven classiﬁcation has several beneﬁts:

Related content: Read our guide to data classiﬁcation best practices

Data Classiﬁcation with Satori

Prioritizing Placement of Security Controls

Monitoring and Enforcing Access Controls Speciﬁc to PII

Limiting Resource Access to Speciﬁc Individuals

Data Classiﬁcation for Compliance

Data Classiﬁcation for Data Protection & Privacy Acts/Regulations

Data Classiﬁcation for Contractual Reasons

Why Is Data Classiﬁcation Hard?

Data Is a Moving Target

Data Itself Is Changing