Data Discovery and Classification

If you’re a data person, or even if you’re not, you may have heard the statistic cited by Eric Schmidt, executive chairman at Google: “There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.”

That’s not even the crazy part: Schmidt’s quote is from 2011. Now, roughly 330 exabytes of information are created every day.

If your brain isn’t broken yet, a single exabyte is equal to one billion gigabytes. If an exabyte was burned onto DVD, the stack of DVDs would reach halfway to the moon.

Clearly, that’s a nearly unfathomable amount of data. But what is collected? Why? Where is it stored? And, perhaps most importantly, why does it all matter? On the one hand, these are existential questions. But they’re also questions that every business needs to ask for the sake of compliance, for operational excellence, and to ensure they’re using the right data in the right way—because it’s the right thing to do.

Every organization has an ever-expanding structured and unstructured data footprint footprint, which makes it challenging to understand where data resides and whether it is handled in compliance with privacy regulations like the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA), and others.

A data map helps you figure out the answers by supporting data discovery and data classification. In turn, data discovery and classification enable you to fulfill a spectrum of compliance needs.

We’ll dive into what a data map is, how it relates to data discovery and classification, and how data discovery and classification can support your organization’s compliance.

Regulatory Compliance and Its Role in Data Discovery and Classification

Data discovery and classification will help your business align with various regulatory requirements you may be subject to, such as the GDPR, CPRA, and HIPAA. By integrating these concepts into your compliance strategies, your business can build consumer trust and prevent fines associated with data breaches and privacy violations.

Data regulations mandate transparency when it comes to protecting data. By understanding where your consumers' personal information resides and how it's categorized—i.e. by mapping all of your organization's data—your business can better ensure compliance with these regulations.

For example, under the CPRA, consumers have the right to obtain and delete their own personal information at any time and have it corrected. Additionally, if they've asked you to delete their information, you have to ensure that any third parties you've shared it with or sold it to have also deleted it. Through sensitive data discovery and classification and data mapping, your business can more easily access your customer's data for a timely response.

Now that we've highlighted the importance of regulatory compliance, let's take a closer look at data mapping:

What Is Data Mapping?

First, let’s clarify what we mean by data mapping. It can mean different things in different contexts. A common definition is the technical process of mapping fields from one database to another. But that’s not the one we’re using for the purposes of this article.

Instead, we’ll focus on the privacy concept of a data map. That definition refers to a visualization of all the stores and flows of personal information across your organization. To create this visualization, you’ll first need a data inventory that lists out all the different applications and systems as well as metadata about those applications and systems, like owner/admin, connected data stores, types of data handled, and so on. That’s what we’re referring to when we talk about it in the context of data discovery and classification.

What Is Data Discovery?

So, in the context of data mapping for privacy compliance, what is data discovery?

Essentially, it’s the process of discovering data within your data map. Outside of privacy compliance, you could engage in data discovery for all sorts of purposes—such as ways to reduce spend, spotting redundant vendors, identifying new market opportunities, and more.

Typically, organizations interested in this broader sense of data discovery invest in powerful business intelligence tools and data science experts. This approach enables you to ask pretty much any question about your data, but the trouble is, that flexibility comes with complexity. Ultimately, that complexity translates into slower outcomes.

How Data Discovery Fits into Data Management

If your interest is primarily in achieving an outcome like privacy compliance, data discovery can be a much narrower and less complex concept. When it comes to keeping data private, data discovery is the process of finding data that must be managed to achieve compliance.

For example, with your data map, you’ll be able to discover:

The data needed to fulfill subject rights requests.
What data you are sending to third-party vendors.
Where you collect sensitive personal information.
And other compliance-related use cases for data.

While general business intelligence solutions for data discovery do exist, privacy professionals will want to invest in a privacy-focused solution for data mapping and data discovery that reduces complexity and operational headaches (such as being bottlenecked by in-demand data science resources) rather than adds to them.

What Is Data Classification?

Data classification is the process of categorizing data based on various characteristics, such as its sensitivity, importance, and access controls.

These categorizations are well articulated and documented by standards organizations. In general, NIST and ISO standards suggest four classification levels:

Public data: Information that is freely available to the public. This could include data from news articles, government data sets, or open-source software code.
Private or internal data: Data meant for internal use within the organization. This includes employee records, internal memos, or payroll data.
Confidential data: Data that requires strict access controls, such as personally identifiable information–or data that can identify an individual, such as their address and phone number–or information found in financial records.

Highly confidential or restricted: This definition includes sensitive personal information, such as personal health records, biometric data, data subject to privacy laws, identity or access management data, national security information, or other types of sensitive data that are considered highly confidential or restricted.

How Classification Levels Align with Data Security Controls

In terms of privacy compliance, we’re most concerned about the latter two categories (although employee data, which would fall under “private or internal data” is covered under the GDPR and CPRA).

Your data map should classify which systems and data flows handle sensitive personal information versus regular consumer information. Outside the context of privacy compliance, you may need additional classifications in your data map to align with data security requirements and other regulatory needs.

The Difference Between Data Discovery and Classification

Let's do a side-by-side to make sure we know what the differences are between these two data mapping concepts:

	Data Discovery	Data Classification
What is it?	The process of identifying and locating data across systems and repositories.	The process of categorizing data based on its sensitivity, importance, or purpose.
What does it do?	Uncover and inventory all data assets in your organization.	Correctly label data types for security, compliance, and data governance.
What does it focus on?	Finding unknown, hidden, and unstructured data sources.	Organizing data into established classifications or categories, e.g., confidential, private, sensitive, etc.
Outputs	A comprehensive list or inventory of data sources and their locations.	Data tags or classifications (e.g., confidential, internal, public).
Use cases	Regulatory compliance (e.g., GDPR, CCPA) Risk assessment for unprotected or forgotten data	Implementing data security protocols Ensuring access control and appropriate handling of data
Challenges	Identifying all sources of data, especially unstructured data Managing large volumes of data spread across multiple systems	Applying accurate classifications without disrupting workflows Keeping classifications up to date as data evolves
Steps to compliance	Ensures organizations know where data resides for compliance.	Helps enforce regulations by ensuring data is appropriately handled.
Where does it fit?	Often the first step in understanding what data exists within an organization	Builds on discovery to apply meaningful classifications to data.

Challenges of Sensitive Data Discovery and Classification Implementation

Despite its many benefits, implementing data discovery and data classification can be a complex and resource-intensive process, especially if there's a proliferation of data, which might be difficult to overcome if your business is faced with the following challenges:

Many Companies are Still in Analog

Considering we live in a world of self-driving cars (kind of) and generative AI that can do anything from solving complex mathematical problems to giving relationship advice, it's surprising to know that 48% of businesses are still in the process of automating their processes.

When it comes to data management and protecting sensitive data, many companies are still handling these things manually, which not only slows down data mapping practices, but also reduces classification accuracy and increases the risk of data breaches.

Businesses Are Slow to Adopt a Modern Data Culture

Data management teams look to leadership to take a proactive approach to the discovery, classification, and protection of data. If your company still hasn't adopted a modern data culture, you may struggle to convince them that the data discovery and classification process is worthwhile.

This often happens when leadership teams falsely think that data privacy and protection aren't really a priority, or that data breaches can't happen to them. But it's like fire prevention: you don't avoid buying a smoke detector because you've taught your kids about fire safety; you install smoke detectors because you know that anything can happen, even if you've done everything to protect your home from fire.

Data Privacy and Data Security Still Aren't a Priority

Some businesses prioritize other operational aspects over their data management strategy. Unfortunately, wherever there is data, there is a need for a privacy strategy, and that includes data mapping. Business leaders may not see the immediate benefit of investing in data discovery tools, but of course, responsible data handling is a necessary operational cost.

Automation and Integration: The Keys to Effective Data Mapping and Classification

Privacy-focused data discovery and classification solutions need to provide an integrated approach to gaining visibility and control by combining data mapping capabilities with a suite of privacy compliance tools. Automation is also key: By automating data discovery, you can save significant time and effort.

For example, the Osano platform works like this:

Osano data mapping automatically discovers connected systems that process personal information by connecting with your organization’s single sign-on (SSO) provider or customer data platform (CDP).

Here we see two sources of personal data stores: an Okta SSO instance and an assessment that will be sent to owners of data stores that sit outside of Okta.

The platform scans systems containing personal data and provides metadata about the data field types, vendor data flows, and more, enabling you to prioritize high-risk systems for assessment.

This data map shows flows of data between systems, as well as automatically and manually identified metadata.

For systems outside your SSO or CDP ecosystem, Osano provides automated workflows to quickly map and track those data assets while informing relevant stakeholders of outstanding tasks.

Then, the discovered data flows into Osano's Subject Rights Management to fulfill data subject access requests. Processing activities can also feed into Osano's Assessments to generate records of processing activities (RoPAs).

Additional capabilities within Osano, such as cookie consent, PIAs, vendor assessments, and more, are all unified around your organization’s central data map. So, the data you collect, the information you discover, and the processes you set around it become part of a unified, integrated privacy program that helps you reduce work, improve compliance and get a handle on your own stratospheric stack of data.

Discover and Classify Your Data Across Your Organization with Osano

A strong data security posture starts with knowing what data you have and where to find it. If you’d like to learn how the Osano Data Mapping’s data discovery and classification tools will help you build out a more comprehensive privacy program, schedule a demo today.

Schedule a demo of Osano today

Data Mapping Checklist

Wondering how to get started with your first data map? Look no further: This checklist guides you through the essential steps.

Download Now

Matt Davis, CIPM (IAPP)

Matt Davis is a writer at Osano, where he researches and writes about the latest in technology, legislation, and business to spread awareness about the most pressing issues in privacy today. When he’s not writing about data privacy, Matt spends his time exploring Vermont with his dog, Harper; playing piano; and writing short fiction.

The Osano Platform Overview

Cookie Consent

Subject Rights Management

Assessments

Unified Consent & Preference Hub

Data Mapping

Vendor Privacy Risk Management

Features & Integrations

TrustHub

Privacy Templates

GDPR Representative

Consult Privacy Team

Regulatory Guidance

Integrations

CPRA

CCPA

GDPR

For Non-Privacy Experts

For Legal & Compliance

For GRC, Risk & Security

Consent Management

DSAR Automation

Privacy Program Management

Vendor Risk Management

Articles

Resource Center

Customer Stories

U.S. Data Privacy Laws

Product Updates

The Newsletter

The Podcast

The Book

Events

Multi-Hyphenate Privacy Professionals: 3 Strategies for Success

About Us

Careers

Contact

Our Pledge

Data Licensing

Osano Swag Store

Press & Media

Partners & Resellers

Data Mapping

Data Discovery and Classification: Key Concepts for Data Mapping

Matt Davis, CIPM (IAPP)

In this article

Sign up for our newsletter

Share this article

Regulatory Compliance and Its Role in Data Discovery and Classification

What Is Data Mapping?

What Is Data Discovery?

How Data Discovery Fits into Data Management

What Is Data Classification?

How Classification Levels Align with Data Security Controls

The Difference Between Data Discovery and Classification

Data Discovery

Data Classification

Challenges of Sensitive Data Discovery and Classification Implementation

Many Companies are Still in Analog

Businesses Are Slow to Adopt a Modern Data Culture

Data Privacy and Data Security Still Aren't a Priority

Automation and Integration: The Keys to Effective Data Mapping and Classification

Discover and Classify Your Data Across Your Organization with Osano

Data Mapping Checklist

Matt Davis, CIPM (IAPP)

Matt Davis, CIPM (IAPP)

Share this article

Blog

Check out some of our latest articles

Privacy Program Management

Multi-Hyphenate Privacy Professionals: 3 Strategies for Success

Privacy Program Management

EU Privacy Law

US Privacy Law

Subject Rights Management

How Osano Does DSARs

Privacy Program Management

Vendor Privacy Risk

Privacy Assessments

AI Governance and Why It’s Necessary