What is the process of organizing data into categories or groups for its most effective and efficient use?

Try the new Google Books

Check out the new look and enjoy easier access to your favorite features

What is the process of organizing data into categories or groups for its most effective and efficient use?

Learn about the different types of classification and how to effectively classify your data in Data Protection 101, our series on the fundamentals of data security.

Data classification is broadly defined as the process of organizing data by relevant categories so that it may be used and protected more efficiently. On a basic level, the classification process makes data easier to locate and retrieve. Data classification is of particular importance when it comes to risk management, compliance, and data security.

Data classification involves tagging data to make it easily searchable and trackable. It also eliminates multiple duplications of data, which can reduce storage and backup costs while speeding up the search process. Though the classification process may sound highly technical, it is a topic that should be understood by your organization’s leadership.

Reasons for Data Classification

Data classification has improved significantly over time. Today, the technology is used for a variety of purposes, often in support of data security initiatives. But data may be classified for a number of reasons, including ease of access, maintaining regulatory compliance, and to meet various other business or personal objectives. In some cases, data classification is a regulatory requirement, as data must be searchable and retrievable within specified timeframes. For the purposes of data security, data classification is a useful tactic that facilitates proper security responses based on the type of data being retrieved, transmitted, or copied.

Types of Data Classification

Data classification often involves a multitude of tags and labels that define the type of data, its confidentiality, and its integrity. Availability may also be taken into consideration in data classification processes. Data’s level of sensitivity is often classified based on varying levels of importance or confidentiality, which then correlates to the security measures put in place to protect each classification level.

There are three main types of data classification that are considered industry standards:

  • Content-based classification inspects and interprets files looking for sensitive information
  • Context-based classification looks at application, location, or creator among other variables as indirect indicators of sensitive information
  • User-based classification depends on a manual, end-user selection of each document. User-based classification relies on user knowledge and discretion at creation, edit, review, or dissemination to flag sensitive documents.

Content-, context-, and user-based approaches can be both right or wrong depending on the business need and data type.

Determining Data Risk

In addition to the types of classification, it’s wise for an organization to determine the relative risk associated with the types of data, how that data is handled and where it is stored/sent (endpoints). A common practice is to separate data and systems into three levels of risk

  • Low risk: If data is public and it’s not easy to permanently lose (e.g. recovery is easy), this data collection and the systems surrounding it are likely a lower risk than others.
  • Moderate risk: Essentially, this is data that isn’t public or is used internally (by your organization and/or partners). However, it’s also not likely too critical to operations or sensitive to be “high risk.” Proprietary operating procedures, cost of goods and some company documentation may fall into the moderate category.
  • High risk: Anything remotely sensitive or crucial to operational security goes into the high risk category. Also, pieces of data that are extremely hard to recover (if lost). All confidential, sensitive and necessary data falls into a high risk category.

Note: Some also use a more granular scale, adding “severe” risk or other categories to help further differentiate data.

Using a Data Classification Matrix

Creating and labeling data may be easy for some organizations. If there aren’t a large number of data types or perhaps your business has fewer transactions, determining the risk of data and your systems is likely less difficult. That said, many organizations dealing with high volume or multiple types of data are likely to need a comprehensive way of determining their risk. For this, many use a “data classification matrix.”

Creating a matrix rating data and/or systems from how likely they are to be compromised and how sensitive that data is will help you quickly determine how to better classify and protect all things sensitive.

An Example of Data Classification

An organization may classify data as Restricted, Private or Public. In this instance, public data represents the least-sensitive data with the lowest security requirements, while restricted data is in the highest security classification and represents the most sensitive data. This type of data classification is often the starting point for many enterprises, followed by additional identification and tagging procedures that label data based on its relevance to the enterprise, quality, and other classifications. The most successful data classification processes employ follow-up processes and frameworks to keep sensitive data where it belongs.

The Data Classification Process

Data classification can be a complex and cumbersome process. Automated systems can help streamline the process, but an enterprise must determine the categories and criteria that will be used to classify data, understand and define its objectives, outline the roles and responsibilities of employees in maintaining proper data classification protocols, and implement security standards that correspond with data categories and tags. When done correctly, this process will provide employees and third parties involved in the storage, transmission, or retrieval of data with an operational framework. The video clip below gives techniques for classifying sensitive data and is from our webinar, How Classification Defines Your Data Security Strategy, which is presented by Garrett Bekker, Senior Analyst, Information Security at 451 Research. You can watch the full webinar here.


Policies and procedures should be well-defined, considerate of the security requirements and confidentiality of data types, and straightforward enough that they are easy for employees promoting compliance to interpret. For instance, each category should include information about the types of data included in the classification, security considerations with rules for retrieving, transmitting, and storing data, and potential risks associated with a breach of security policies.

GDPR Data Classification

With the General Data Protection Regulation (GDPR) in effect, data classification is more imperative than ever for companies that store, transfer, or process data pertaining to EU citizens. It is crucial for these companies to classify data so that anything covered by the GDPR is easily identifiable and the appropriate security precautions can be taken.

Additionally, GDPR provides elevated protection for certain categories of personal data. For instance, GDPR explicitly prohibits the processing of data related to racial or ethnic origin, political opinions, and religious or philosophical beliefs. Classifying such data accordingly can significantly reduce the risk of compliance issues.

Steps for Effective Data Classification

  • Understand the Current Setup: Taking a detailed look at the location of current data and all regulations that pertain to your organization is perhaps the best starting point for effectively classifying data. You must know what data you have before you can classify it.
  • Creating a Data Classification Policy: Staying compliant with data protection principles in an organization is nearly impossible without proper policy. Creating a policy should be your top priority.
  • Prioritize and Organize Data: Now that you have a policy and a picture of your current data, it’s time to properly classify the data. Decide on the best way to tag your data based on its sensitivity and privacy.

There are more benefits to data classification than simply making data easier to find. Data classification is necessary to enable modern enterprises to make sense of the vast amounts of data available at any given moment.

Data classification provides a clear picture of all data within an organization’s control and an understanding of where data is stored, how to easily access it, and the best way to protect it from potential security risks. Once implemented, data classification provides an organized framework that facilitates more adequate data protection measures and promotes employee compliance with security policies.

Additional Data Classification Resources

Tags: Data Protection 101

Data classification is a vital component of any information security and compliance program, especially if your organization stores large volumes of data. It provides a solid foundation for your data security strategy by helping you understand where you store sensitive and regulated data, both on premises and in the cloud. Moreover, data classification improves user productivity and decision-making, and reduces storage and maintenance costs by enabling you to eliminate unneeded data.

In this article you will learn what benefits data classification offers, how to implement it and how to choose the right software solution.

Key Data Classification Terms and Definitions

Data classification is the process of organizing structured and unstructured data into defined categories that represent different types of data. Standard classifications used in data categorization include:

  • Public
  • Confidential
  • Sensitive
  • Personal

Sensitive data is a general term representing data restricted to use by specific people or groups. Sensitive and confidential data are often used interchangeably. Examples of sensitive data include intellectual property and trade secrets.

Data reclassification is re-categorization of data to apply appropriate updates, for example, based on changes to legal or contractual obligations, data usage or value, or new or revised regulatory mandates.

Data tagging or labeling adds metadata to files indicating the classification results.

Purpose of Data Classification

Data classification helps you understand what types of data you store and where that data is located. This intelligence:

  • Informs risk management, legal discovery and regulatory compliance processes
  • Helps prioritize security measures
  • Improves user productivity and decision-making by streamlining search and e-discovery
  • Reduces data maintenance and storage costs by identifying duplicate and stale data
  • Helps IT teams justify requests for investments in data security.

Benefits of Data Classification

More broadly, data classification helps organizations improve data security and ensure regulatory compliance.

Data Security

Classification is an effective way to protect your valuable data. By identifying the types of data you store and pinpointing where sensitive data resides, you are well positioned to:

  • Prioritize your security measures, adjusting your security controls based on data sensitivity
  • Understand who can access, modify or delete data
  • Assess risks, such the business impact of a breach, ransomware attack or other threat

Regulatory Compliance

Compliance regulations require organizations to protect specific data, such as cardholder information (PCI DSS) or the personal data of EU residents (GDPR). Data classification enables you to identify the data subject to particular regulations so you can apply the required controls and pass audits.

Here’s how data classification can help you meet common compliance standards:

  • GDPR — Data classification helps you uphold the rights of data subjects, including satisfying data subject access request by retrieving the set of documents with data about a given individual.
  • HIPAA — Knowing where all health records are stored helps you implement security controls for proper data protection.
  • ISO 27001 — Classifying information according to value and sensitivity helps you meet requirements for preventing unauthorized disclosure or modification.
  • NIST SP 800-53 — Categorizing data helps federal agencies properly architect and manage their IT systems.
  • PCI DSS — Data classification enables you to identify and secure consumer financial information used in payment card

What is the process of organizing data into categories or groups for its most effective and efficient use?

Types of Data Classification

  • Content-based classification inspects and interprets files to identify sensitive information.
  • Context-based classification looks at application, location, creator tags and other variables as indirect indicators of sensitive information.
  • User-based classification depends on manual selection of each document by a person.

Examples of Data Classification Categories

Example of a Basic Classification Scheme

The simplest scheme is three-level classification:

  • Public data — Data that can be freely disclosed to the public. Examples include your company contact information and browser cookie policy.
  • Internal data — Data that has low security requirements but is not meant for public disclosure, like marketing research.
  • Restricted data — Highly sensitive internal data. Disclosure could negatively affect operations and put the organization at financial or legal risk. Restricted data requires the highest level of security protection.

Example of a Government Classification Scheme

Government agencies often use three levels of sensitivity but give them different labels than listed above: top secret, secret and public. For more complex data structures, more levels may be added. Here is a five-level strategy with examples:

  • Top secret — Cryptologic and communications intelligence
  • Secret — Select military plans
  • Confidential — Data indicating the strength of ground forces
  • Sensitive unclassified — Data tagged “For Official Use Only”
  • Unclassified — Data that may be publicly released with authorization

Example of Commercial Classification

Typically, organizations that store and process commercial data use four levels to classify data: three confidential levels and one public level. Some expand that to a five-level system with the following levels:

  • Sensitive — Intellectual property, PHI
  • Confidential — Vendor contracts, employee reviews
  • Private — Customer names or images
  • Proprietary — Organizational processes
  • Public — Information that may be disclosed to anyone

Data Classification Process

Effective Information Classification in Five Steps

  1. Establish a data classification policy, including objectives, workflows, data classification scheme, data owners and handling
  2. Identify the sensitive data you store.
  3. Apply labels by tagging data.
  4. Use results to improve security and compliance.
  5. Data is dynamic, and classification is an ongoing process.

What is the process of organizing data into categories or groups for its most effective and efficient use?

Building an Effective Data Classification Policy

A data classification policy is a document that includes a classification framework, a list of responsibilities for identifying sensitive data, and descriptions of the various data classification levels.

A good classification policy:

  • Uses criteria that are straightforward and avoid ambiguity, but that are generic enough to apply to different data sets and circumstances
  • Is clear and written in simple language
  • Fits the organization’s business
  • Is limited to 3 or 4 classification levels
  • Contains a point of contact for clarification
  • Establishes a review schedule

How to Select a Data Classification Solution

Look for these features:

  • Compound term search — Improves accuracy by minimizing false positives and false negatives.
  • Index — Enables you to identify sensitive terms without re-crawling the data.
  • Flexible taxonomy manager — Makes it easy to add and modify terms and rules.
  • Workflows — Automatically takes specific actions when a document is classified in a certain way. For example, a workflow might move sensitive data away from a public share.
  • Breadth of coverage — Supports both cloud and on-premises data sources, including both structured and unstructured data.

FAQ

What is the purpose of data classification?  

Data classification sorts data into categories based on its value and sensitivity.

Why is data classification important? What benefits does it offer?

Data classification helps you prioritize your data protection efforts to improve data security and regulatory compliance. It also improves user productivity and decision-making, and reduces costs by enabling you to eliminate unneeded data.

What are common data classification levels? 

Data is often classified as public, confidential, sensitive or personal.

What are the data classification types? 

Classification can be content-based, context-based or user-based (manual).

What software should I use for data classification? 

Look for data classification software, like that offered by Netwrix, which:

  • Uses compound word search to ensure accurate classification that minimizes false positives
  • Has an index so you can find sensitive terms without re-crawling your data stores
  • Includes a flexible taxonomy manager that empowers you to customize your classification parameters
  • Provides workflows to automate processes such as migrating sensitive data from public shares
  • Supports both on-premises and cloud content sources, including both structured, and unstructured data

Who is responsible for data classification in an organization? 

Organizations typically designate a Security and Risk Manager, a Data Protection Manager, Compliance Committee or a similar entity.