If you’re a data person, or even if you’re not, you may have heard the statistic cited by Eric Schmidt, executive chairman at Google: “There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.”
That’s not even the crazy part: Schmidt’s quote is from 2011. Now, roughly 330 exabytes of information are created every day.
If your brain isn’t broken yet, a single exabyte is equal to one billion gigabytes. If an exabyte was burned onto DVD, the stack of DVDs would reach halfway to the moon.
Clearly, that’s a nearly unfathomable amount of data. But what is collected? Why? Where is it stored? And, perhaps most importantly, why does it all matter? On the one hand, these are existential questions. But they’re also questions that every business needs to ask for the sake of compliance, for operational excellence, and to ensure they’re using the right data in the right way—because it’s the right thing to do.
Every organization has an ever-expanding data footprint, which makes it challenging to understand where data resides and whether it is handled in compliance with data privacy regulations like the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA) and the California Privacy Rights Act (CPRA), and others.
A data map helps you figure out the answers by supporting data discovery and data classification. In turn, data discovery and classification enable you to fulfill a spectrum of compliance needs. We’ll dive into what a data map is, how it relates to data discovery and classification, and how data discovery and classification can support your organization’s compliance.
First, let’s clarify what we mean by data mapping. It can mean different things in different contexts. A common definition is the technical process of mapping fields from one database to another. But that’s not the one we’re using for the purposes of this article.
Instead, we’ll focus on the data privacy concept of a data map. That definition refers to a visualization of all the stores and flows of personal information across your organization. To create this visualization, you’ll first need a data inventory that lists out all the different applications and systems as well as metadata about those applications and systems, like owner/admin, connected data stores, types of data handled, and so on. That’s what we’re referring to when we talk about it in the context of data discovery and data classification.
So, in the context of data mapping for data privacy compliance, what is data discovery?
Essentially, it’s the process of discovering data within your data map. Outside of data privacy compliance, you could engage in data discovery for all sorts of purposes—such as ways to reduce spend, spotting redundant vendors, identifying new market opportunities, and more.
Typically, organizations interested in this broader sense of data discovery invest in powerful business intelligence tools and data science experts. This approach enables you to ask pretty much any question about your data, but the trouble is, that flexibility comes with complexity. Ultimately, that complexity translates into slower outcomes.
If your interest is primarily in achieving an outcome like data privacy compliance, data discovery can be a much narrower and less complex concept. When it comes to data privacy, data discovery is the process of finding data that must be managed to achieve compliance.
For example, with your data map, you’ll be able to discover:
While general business intelligence solutions for data discovery do exist, privacy professionals will want to invest in a privacy-focused solution for data mapping and data discovery that reduces complexity and operational headaches (such as being bottlenecked by in-demand data science resources) rather than adds to them.
Data classification is the process of categorizing data based on various characteristics, such as its sensitivity, importance, and access controls.
These categorizations are well articulated and documented by standards organizations. In general, NIST and ISO standards suggest four classification levels:
In terms of data privacy compliance, we’re most concerned about the latter two categories (although employee data, which would fall under “private or internal data” is covered under the GDPR and CPRA). Your data map should classify which systems and data flows handle sensitive personal information versus regular consumer information. Outside the context of data privacy compliance, you may need additional classifications in your data map to align with cybersecurity requirements and other regulatory needs.
Privacy-focused data discovery and classification solutions need to provide an integrated approach to gaining visibility and control by combining data mapping capabilities with a suite of privacy compliance tools. Automation is also key: By automating data discovery, you can save significant time and effort.
For example, the Osano platform works like this:
Here we see two sources of personal data stores: an Okta SSO instance and an assessment that will be sent to owners of data stores that sit outside of Okta.
This data map shows flows of data between systems, as well as automatically and manually identified metadata.
Then, the discovered data flows into Osano's Subject Rights Management to fulfill data subject access requests. Processing activities can also feed into Osano's Assessments to generate records of processing activities (RoPAs).
Additional capabilities within Osano, such as cookie consent, PIAs, vendor assessments, and more, are all unified around your organization’s central data map. So, the data you collect, the information you discover, and the processes you set around it become part of a unified, integrated privacy program that helps you reduce work, improve compliance and get a handle on your own stratospheric stack of data.
If you’d like to learn how the Osano Data Mapping’s data discovery and classification tools will help you build out a more comprehensive privacy program, schedule a demo today.