They say that every company is a tech company; if that’s true, then every company is also a data company. No matter which industry your organization is in, the odds are good that you participate in data collection: product data, business data, and consumer data.
That last category is especially crucial for businesses to be aware of. Data privacy regulations impose strict responsibilities on businesses that process consumer’s personal information.
A crucial component of compliance with these data privacy regulations is data discovery. When data subjects (i.e., the people whom you collect data from) exercise their rights under data privacy regulations, businesses need to be able to find, update, delete, communicate, and manage their relevant data. Without a data discovery capability, it isn’t possible to accomplish this task in a sustainable way.
So: What is data discovery, really? How is data discovered? And what is the process?
Data discovery is the process of exploring, finding, and classifying data sources, making it useful for some purpose. Data discovery helps businesses in a variety of ways, often to discover patterns, solve business problems, inform strategy, and gather other insights from advanced analytics.
In terms of privacy compliance, data discovery refers to the process of finding data that must be managed in some way in order to achieve compliance and data security. That could be identifying systems that collect personal information so you can act on opt-out requests, finding where you send data to downstream vendors, or finding data subject’s personal information and making it available for response to a subject rights request.
Under laws such as the General Data Protection Regulation (GDPR) and the California Privacy Rights Act (CPRA), data subjects have a right to inquire about the personal information you have on them, request that their personal information be updated or deleted, and make other requests.
Being able to find that data across the organization and respond to the request in a timely manner is a requirement under these laws—typically, businesses have 30 to 45 days (depending on the law) to respond to a DSAR. What’s more, attorneys general, privacy advocacy groups, and data protection authorities use DSAR obligations as a means of testing for a business’s compliance. If a business can’t fulfill a DSAR accurately and on time, they’ll either take action themselves or report the violation to the relevant authorities.
Thirty days might sound like plenty of time, but only if you underestimate the degree of complex data sprawl in modern businesses—especially if the rest of your data privacy program isn’t fully mature. On average, businesses use over 130 different SaaS applications. Each of those applications has a high degree of likelihood of containing consumer information, whether that’s collected directly from the consumer, copied from another system, or derived from an upstream third party sending you their consumer’s information (which you also must find if they receive a DSAR).
Without a proper process in place, the only way to diligently fulfill a DSAR is to look through each of those systems manually. And once your organization starts receiving multiple DSARs a month, then that process will quickly become untenable. You could always stick with the main systems you use, but then you will be noncompliant by negligence and will risk enforcement.
Things can be even more difficult when dealing with sensitive information, which includes any data that reveals the data subject’s:
In order to manage the flows of sensitive personal information throughout your organization and ensure it receives the protection it deserves, you’ll need to know where it is first. Data discovery and data mapping can help you identify where stores of personal information live, where it is flowing, and whether it’s receiving adequate protection.
Data governance and data discovery work together to ensure data management is effective and secure. By locating and classifying data based on sensitivity, relevance, compliance requirements, and data quality, data discovery provides the granular details governance needs to operate effectively.
Discovery allows governance frameworks to properly enforce policies like access controls, retention schedules, and regulatory compliance. Governance sets the rules for discovery, like deciding which types of data are most important to focus on (e.g., PII for GDPR compliance) or which sensitive data needs to be tracked for updates or changes.
Without discovery, governance lacks the insight to enforce rules effectively; without governance, discovery lacks the structure to turn insights into actionable policies.
Data discovery does play a critical role in data security by helping companies and organizations understand the data they have. Without a structured data discovery process, organizations risk leaving sensitive data unprotected, which increases their vulnerability, risk of non-compliance, and ultimately their risk of data breaches.
Effective data discovery can help businesses apply security measures and ensure compliance with regulations like the GDPR. Data discovery empowers organizations to uncover where sensitive data resides and helps them implement robust data security measures to safeguard their assets and maintain trust.
There are a range of approaches to data discovery, some of which will be more appropriate for a particular use case than others. In organizations with a business intelligence function, it's likely that data discovery will be handled by a data scientist. However, if your goal is data privacy compliance, relying on a data science expert may not be the best data discovery approach.
For one, this expert won’t be familiar with the intricacies and requirements of data privacy. They’ll also likely have multiple competing priorities.
If compliance is your goal, then it’s better to take the following approach to data discovery.
This process involves a few simple steps.
In most cases, data is scattered across multiple systems and departments, from human resources and customer support to marketing and finance. Discovering personal information in a single system isn’t too challenging, but the odds are any given data subject’s information will be transferred to, copied to, and stored in multiple systems.
Thus, the first step in the process is to map your organization’s data systems. This isn’t about discovering data per se; rather, it is a preparatory step that will make it easier to discover specific data later on.
The data mapping process has numerous benefits beyond enabling data discovery for, say, fulfilling DSARs—it's actually a key component to fulfilling other compliance requirements, such as data minimization, privacy risk assessments, generating Records of Processing Activities (RoPAs), and more.
However, given the sheer number of systems in a given organization’s ecosystem, manually mapping your data systems can be prohibitively time-consuming. Fortunately, there are data privacy compliance platforms like Osano that automate the process. In the case of Osano, the platform discovers systems connected to your organization’s Single-Sign-On (SSO) provider, generating a map that you can use to direct your data discovery efforts.
Over the course of your organization’s growth, it’s likely that you’ll accumulate systems that could be used to store personal information but do not contain any such data, systems that are no longer used, and so on. You’ll want to flag these as such so you don’t waste any effort later on exploring and re-exploring these irrelevant or deprecated data stores.
Once you’ve mapped your data stores and identified stores that don’t need to be explored, you’ll want to start tagging your data stores with metadata that will facilitate the data discovery process.
This could include things like:
Again, it is possible to do all of this manually in a spreadsheet, but most organizations will benefit from using an automated solution. Osano Data Mapping is one such data discovery solution that automates and streamlines the mapping and tagging workflow.
Having mapped, filtered, and tagged your organization’s data stores and the data fields they contain, it will be relatively straightforward to search for the data you need to work with.
Often, you’ll perform data discovery in order to fulfill a DSAR—you might search through your data stores for all fields associated with a given contact. Because you’ll have identified which data stores are sending what information to where, you’ll know which down- and upstream data stores to investigate for relevant information.
Osano performs this process for you and has the added benefit of automating common DSAR types. If “John Smith” requests the deletion of their data, for instance, Osano will search through your data map, discover all of John Smith’s data, and then delete it for you (upon human verification).
In the end, make sure you record your findings and challenges. Data discovery and mapping isn’t a one-time process. It is something you should do on a continuous basis, refining your process or conducting data analysis from different angles.
Data mapping helps lay the foundations of your data discovery process. Many regulations now require businesses to have records of all their processing activities. Data mapping, while not specifically mandatory, makes the compliance process much easier. It helps you identify key elements of your data processing flow, such as legal basis, transfer methods, access, and more.
Automated data discovery tools, which we’ll talk about more in the next section, can also ensure the identification of essential information by circumventing issues that manual discovery methods come with.
Used together, data discovery and mapping help a company create unified data inventories. These make compliance much easier by:
Data discovery and data classification tools are an essential part of the data visualization process.
Manual discovery can be tedious and almost impossible, especially when you have data from various sources. Even without dozens of systems, smaller companies that feel like they can cope without any discovery tools risk overlooking certain data sets.
The tools you use should be focused on compliance first and foremost. These tools will truly help you with DSARs and other regulation-specific requirements, thus taking some of the stress off of your shoulders.
Ready to check out an intuitive data discovery and classification solution that saves you hundreds of hours and helps you on your journey to compliance? Osano Data Mapping might be the best place to start. Schedule a demo with us today to learn how we can help you.