The ultimate guide to personal data

Written by Osano Staff | December 3, 2021

What is personal data?

The term "personal data" has many definitions, depending on where you are in the world and what context you're using it in. However, as an overarching definition, it's generally understood that "personal data" refers to any information — digital or analog — that refers to a specific person and can be used — alone or in combination with other information — to identify that particular person.

Common examples of personal data include:

Name.
Address.
Email address.
IP address.
Phone number.
Unique identifiers like Social Security numbers or other national identification numbers.
Photographs that have people.
Videos that include people.
Credit card numbers or banking information.
Information about a person's health.
Biometric information, like fingerprints or iris scans.

However, context matters quite a lot. If you have a name like "John Smith," with no other information attached to it whatsoever, many jurisdictions would not consider that personal data for any legal purpose. There are many John Smiths, and there's no way to know which one you're talking about if all you have is that one name. But if you have John Smith, 123 Maple Street, Andover, Maine, 207-245-9876, all in one file, just about every jurisdiction would consider that personal data and may attach some legal obligations to your possession of that information.

The more information you have about a person collected in one place, such as a database, filing cabinet, folder or file, the more "personal" that data becomes and the more likely it becomes that you have a legal obligation related to that information. If compliance with legal obligations is important to you, you should make sure to understand the legal definition of personal data in every jurisdiction in which you do business.

Those definitions can vary by state and the industry in which you operate in the United States, and it can vary by country or even group of countries around the world.

Further, as we'll discuss below, certain information is considered "sensitive." It has additional legal obligations attached to it, depending on how you collected it and other information associated with it.

But first, there are some common laws you may encounter while doing business, depending on the location in which you're operating. We'll discuss the specific definitions they use for "personal data," which might also be called "personal information," "personally identifiable information," "PII," "personal health information" or "PHI,"

What is personal data under the GDPR?

Unless you've been doing business in a cave, you have likely heard of the European Union's General Data Protection Regulation (GDPR). It is one of the strongest privacy laws globally and strictly regulates the use of personal data.

The GDPR defines personal data as: "any information relating to an identified or identifiable natural person ("data subject"); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

That's complicated, though! It might be helpful to simplify it this way:

Could a very smart person use the data to determine which person it belongs to or who created it?

Could a very smart person use the data to narrow it down to, for example, "the person who lives at a specific address" or "the person who is identified by this number"?

If you can answer "yes" to either of those questions, you've got personal data on your hands.

Why do we say "a very smart person"? Because very smart people are very good at identifying people. For example, some studies show that just four pieces of location data make it very easy to identify the person in those four places. If you just had a database of individual locations that people have visited, with each person a different entry in the database, a very smart person could tell you who created the data in each entry. If you had a database with zero names or email addresses and just a list of characteristics (hair color, eye color, skin color and illness they had at one time), a very smart person could figure out the specific person described. That's personal data.

While this can seem confusing and maybe even silly, the European Union considers privacy a human right. It considers the misuse of personal data a serious infraction, so it's essential to take these definitions of personal data seriously and handle data very cautiously and according to the law.

What is personal data under the CCPA?

In the United States, the "strictest" law that regulates personal data is generally considered the California Consumer Privacy Act, or CCPA.

The CCPA defines personal data this way: "information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household."

This definition is similar to the GDPR's, but the CCPA introduces the "reasonableness" standard, common in U.S. law. In this case, what a "very smart person" can do isn't as relevant, and it's more like, "what could your average person with a bit of skill do with the data?"

The CCPA also goes into some depth with examples of personal data:

Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, Social Security number, driver's license number, passport number or other similar identifiers.
Characteristics of protected classifications under California or federal law.
Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.
Biometric information.
Internet or other electronic network activity information, including, but not limited to, browsing history, search history and information regarding a consumer's interaction with an internet website application, or advertisement.
Geolocation data.
Audio, electronic, visual, thermal, olfactory, or similar information.
Professional or employment-related information.
Education information, defined as information that is not publicly available personally identifiable information as defined in the Family Educational Rights and Privacy Act (20 U.S.C. Sec. 1232g; 34 C.F.R. Part 99).
Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer's preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities and aptitudes.

Note that the last category of personal data specifically: Even having a person in a category like "enjoys chocolate" is a piece of personal information. This matters, for example, because the CCPA allows people to request a list of all personal information you have about them — and failing to produce something like "enjoys chocolate" could theoretically place you in violation of the CCPA. But that's probably the subject of another article entirely.

Also, the CCPA is clear that "de-identified" or "aggregate" information is not personal information. Just make sure you know what you're doing. De-identification and aggregation is a lot more than just deleting the "names" column in your spreadsheet.

What is personal data under the CPRA? Is it different from the CCPA?

The California Privacy Rights Act (CPRA) did provide several amendments and fixes to the CCPA, which go into effect on Jan. 1, 2023 (although there is a "lookback period" you should be aware of). However, the definition of personal information remained largely the same.

The biggest difference is the extension of the "reasonableness" standard. The CPRA makes it very clear that you can't be responsible for what every data scientist in the world is currently working on in terms of new impressive ways to re-identify data or figure out whom data is associated with via mathematical wizardry.

There's also an extension of the carve-out for "publicly available" data, which is not considered "personal" for the CPRA. If it's available via government website or a person has posted it publicly, and it doesn't violate the terms of service (for example, don't scrub social media sites), then you're free to use it, and you don't have to consider it personal information for your compliance program.

What is personal data under Virginia's CDPA?

The Virginia Consumer Data Protection Act was passed in early 2021 and comes into force on Jan. 1, 2023, alongside the CCPA. In many ways, it leverages the definition of personal information that the GDPR and the CCPA have created:

"Personal data," the law reads, "means any information that is linked or reasonably linkable to an identified or identifiable natural person. 'Personal data' does not include de-identified data or publicly available information."

Very few other details are provided to make it clear what is and what is not personal data. However, you can assume that it includes the same basic information that is considered personal by other laws across the globe, including the GDPR and CCPA. The language is so similar that it's easy to see they were not looking to reinvent the wheel when they drafted it.

Even the supplied definition of "Identified or identifiable natural person" is pretty brief: "a person who can be readily identified, directly or indirectly."

What is personal data under the Colorado Privacy Act (CPA)?

The Colorado Privacy Act (CPA) was passed in July of 2021 and takes effect July 1, 2023.

Like the Virginia Consumer Data Protection Act, it leverages the definitions in other laws. It defines personal data this way: "Information that is linked or reasonably linkable to an identified or identifiable individual." Also, it says that de-identified and publicly available data are expressly not personal data.

Further, it uses the reasonableness standard for "publicly available." If you think it's public, it basically is, unless it violates the terms of service to collect it.

However, Colorado does provide a more robust definition of "identifiable individual" than Virginia: "an individual who can be readily identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, specific geolocation data, or an online identifier."

This makes it clear that just scrubbing names is not enough. For example, you might collect all of the information posted by an anonymous poster on a message board, but if that message board has a "handle," that's an online identifier, and therefore you likely need to treat that information you've collected as "personal."

What is sensitive data?

Sensitive data is personal data that various jurisdictions have determined should be treated with a different standard of care. Depending on the law, you might need a different type of consent to collect it, you might have to delete it sooner, or you might need to apply more security measures in protecting it.

Further, you might suffer worse penalties for allowing unauthorized access to it.

In general, sensitive data is data that could be used to harm or discriminate against a person, such as:

Race.
Sexual orientation or activity.
Religious beliefs.
Health information.
Exact location.

Also, data collected from children, variously defined as younger than 13 through 16, depending on location, is generally regarded as sensitive.

However, various laws have many variations, as what's considered sensitive can change considerably based on the culture of the jurisdiction. It's important to check the definition in each jurisdiction before collecting personal information.

What is considered sensitive personal data under the GDPR?

The GDPR explicitly lists the following as "special categories" of personal data that require a different level of consideration before collection or processing and so can be considered "sensitive":

Racial or ethnic origin.
Political opinions.
Religious or philosophical beliefs.
Trade union membership.
Genetic data.
Biometric data.
Health data.
Data concerning a person's sex life or sexual orientation.

However, there is also text in the GDPR that is vaguer, talking about "personal data which are, by their nature, particularly sensitive in relation to fundamental rights and freedoms" and what those personal data "should include." This implies that a person may be able to make an argument that a type of data not explicitly listed is in fact, "sensitive" and should be handled that way.

As with many things related to the GDPR, when in doubt, get some advice from someone who has studied the law closely.

What is considered sensitive personal data under the CCPA?

The California Consumer Privacy Act as originally drafted discusses "sensitive information" in a number of areas but does not define it explicitly. The California Privacy Rights Act rectified this oversight.

However, under the declarations portion of the CCPA, there is a statement that there should be notification especially when a consumer's "most sensitive" has been breached. This led some lawyers to advise notification when data commonly understood to be "sensitive" has been breached.

What is considered sensitive personal data under the CPRA?

The California Privacy Rights Act, which comes into force Jan. 1, 2023, amends the CCPA to be very explicit about what data is "sensitive":

Personal information that reveals:

A consumer's Social Security, driver's license, state identification card or passport number.
A consumer's account log-in, financial account, debit card, or credit card number in combination with any required security or access code, password or credentials allowing access to an account.
A consumer's precise geolocation.
A consumer's racial or ethnic origin, religious or philosophical beliefs or union membership.
The contents of a consumer's mail, email, and text messages unless the business is the intended recipient of the communication.
A consumer's genetic data.
The processing of biometric information for the purpose of uniquely identifying a consumer.
Personal information collected and analyzed concerning a consumer's health.
Personal information collected and analyzed concerning a consumer's sex life or sexual orientation.

However, as in other areas of the CCPA, sensitive personal information that is "publicly available" is not considered sensitive personal information or personal information.

This is among the most robust definitions of "sensitive personal data" anywhere in the world.

Note that the personal data of children is not explicitly labeled as "sensitive." Still, the law does contain this language: "penalties should be higher when the violation affects children," and children are defined as under the age of 16. There are specific requirements for children between the ages of 13 and 16 and children below 13 that you should make sure to understand in other parts of the law.

What is considered sensitive personal data under the VCDPA?

The Virginia Consumer Data Protection Act, which comes into force Jan. 1, 2023, defines the following as "sensitive data":

Racial or ethnic origin.
Religious beliefs.
Mental or physical health diagnosis.
Sexual orientation.
Citizenship or immigration status.
The processing of genetic or biometric data for the purpose of uniquely identifying a natural person.
The personal data collected from a known child.
Precise geolocation data.

A "child" is defined as under the age of 13.

What is considered sensitive personal data under the CPA?

The Colorado Privacy Act, which comes into force July 1, 2023, defines the following as "sensitive data":

Racial or ethnic origin.
Religious beliefs.
Mental or physical health condition or diagnosis.
Sexual orientation or sex life.
Citizenship or citizenship status.
The processing of genetic or biometric data for the purpose of uniquely identifying a natural person.
The personal data collected from a known child.

A "child" is defined as under the age of 13.

How much personal data should an organization collect and store?

While an organization's business plan should determine the answer to this question, area of business, tolerance for risk and many other factors, the basic principles of privacy and data protection largely agree:

You should collect and store personal data necessary for conducting the organization's business and for which you have consent.

In the privacy world, people use a handy phrase to help decide whether they should collect and store personal data: Surprise Minimization. Simply put, would a person be surprised you are collecting and storing their information? If so, don't do it. If not, you're probably in the clear.

However, try not to be too cynical about that. Just because a person might think, "Oh, nothing is private anymore," doesn't mean you have the right to collect their data.

Try to use a common-sense approach and remember that personal data can mean significant risk for your organization. If you have data you shouldn't have, it could lead to fines or other penalties for your business. And in some places, it can mean personal liability for the people responsible.

What is NOT considered personal data?

In general, personal data is data that either refers to a specific person or was created by a specific person in a way that could lead back to them. If a reasonable person could not figure out who the data is about, you're probably in the clear.

By itself, for example, a physical address is not personal. It's just a place and places are listed on maps and exist in reality. It's not personal. But if that address is in a spreadsheet next to someone's name, then it likely becomes personal. It might also be personal if it's linked to a phone number or some other piece of information where you could put the two things together to figure out which person they're connected to.

Further, so-called "phone book data" — names, addresses, and landline phone numbers — is probably not personal, regardless of whether you could identify a person with it. Suppose the information has been published publicly by the government or the person who owns the data or another commonly viewed source. In that case, the data is generally not considered personal in a lot of jurisdictions. But you might want to check the law to make sure.

Also, the use of de-identification takes personal data and makes it not personal. This is usually done with special software that makes the data look like gobbledygook, but which your software can still understand and use to answer questions about the data. But you need to make sure that the data cannot be re-identified.

If someone can quickly just push a button and see the actual data, it's not really de-identified. It's just obscured.

Why do companies collect personal data?

Companies collect personal data all the time in the course of doing business. When someone buys something, schedules an appointment, asks for information, or does any of a million other things with a business, they are providing that company with personal information.

Of course, that doesn't mean you have to hold onto that personal information. You could just delete it immediately.

However, most consumers like it when companies retain some of their personal information. They like to create accounts they can easily access later. Or they like it when they don't have to re-enter their address whenever they want something shipped to them.

And if the company is a bank, a health care provider, or some other business where the very service involves the collection of personal data, there's no real way around it.

What does processing personal data mean?

A somewhat technical legal term, "processing" personal data generally just means collecting it or storing it. While it might imply that something is being done to the personal data, most laws that use the term "process" just mean that the data has come into your company's possession.

However, in law like the EU's General Data Protection Regulation, there is a distinction between a "controller" of personal data and a "processor" of personal data.

While both a controller and a processor of personal data process personal data, the controller is the "owner" of that data and bears more responsibility. A processor of personal data processes that data on behalf of the controller. For example, suppose your business uses a payroll company like ADP. In that case, you are the controller of your employees' data and ADP is the processor, even though both of you are processing your employees' data.

Essentially, if you can see the data, you're processing it.

What are lawful reasons to process personal data?

Every privacy law defines the reasons for the legal processing of personal data differently. It's important to understand the law in each jurisdiction in which you do business and collect personal data.

However, there are some reasons that are basically universal:

Consent: If you ask a person if it's okay to process their data and they say "yes," and they are not children (defined anywhere from under 13 to under 16 around the world), you're legally allowed to process the data. Unless you hold some kind of power imbalance over that person and they couldn't be expected to say "no" without consequences.

For example, in the European Union, employees can't give consent to employers. You need a different legal reason.

To fulfill a contract: If you have agreed to do something for someone, in exchange for value of some time, and processing personal data is necessary to fulfill that contract, you're legally allowed to process that personal data. This is generally the legal reason you can process employee data in the European Union.

You've agreed to pay someone for their work, for example. It's hard to do that without processing their personal data.

To save someone from harm: If you believe you are saving someone from serious injury, illness, or other harm, it's generally okay to process their personal data.

The data is publicly available: While in some jurisdictions the data being public means it's not personal information at all, even in those places where it's still personal information, you can generally process the data. You just might still be responsible if that data is lost or stolen.

To comply with the law: If failing to process the personal information would require you to break the law in some way — such as make it impossible for you to respond to a law-enforcement request — it's legal to process the data in most places. One exception might be China, where they explicitly say you need to get Chinese government permission to provide the data of a Chinese resident to a non-Chinese law enforcement agency.

"Legitimate interest": Many laws allow you to process data essentially if it makes sense for what you do as a business. If someone buys something from you, many laws say it's legal to send them an email offering them a discount in order to buy another item.

Further, if lots of people have bought things from you, the laws usually allow you to analyze the data they've provided you in order to learn more about why they bought what they did and how you might be able to market to other similar people.

It's best to be careful when using this legal basis for processing, however. Your definition of "legitimate" might not be the same as your customer's or a judge's.

There are, however, many jurisdictions where you can process personal information as a matter of course, as long as the data isn't particularly sensitive. It depends on the legal environment. In the U.S., for example, you can legally send people emails as long as you stop when they ask.

If your operation is sophisticated enough to know for sure where a person is when they give you data — or when you come across it — you can figure out when you need a legal reason and when you don't.

What are lawful reasons to process sensitive personal data?

The lawful reasons for processing sensitive personal data are largely the same as for processing "regular" personal data, except in the matter of the consent you get.

For sensitive personal information, the consent usually needs to be more explicit than something like a pre-checked box or an opt-out. You need to explain expressly what you're going to do with the data and get an unambiguous response to a question of whether it's okay to collect the data or not.

This often involves a signature, digital or otherwise, or an active checking of a box or clicking of a button that says, "I agree."

Also, even in relatively lax jurisdictions that do not regulate personal information tightly, sensitive information is often regulated, so best not to assume that sensitive data is fair game, no matter where you are in the world.

How to protect personal data?

Most laws around the world that regulate personal data say that you have an obligation not to allow unauthorized people to access that data. This means you have to secure the data in some way against people who might want to look at it or steal it.

How much security is the right amount of security? That changes just about every day. However, some basic things your organization should be doing include:

Keeping personal data encrypted by default.
Requiring user name and password to access data and unencrypt it.
Locking the doors of cabinets that contain personal information.
Deleting data that is no longer actively being used, or shredding physical paper.
Employing a security professional or security service that makes sure your barriers to accessing the data are up-to-date and sufficiently robust.
Not sharing spreadsheets and databases of personal information with people outside your organization unless there is a specific business purpose.
Using a contract to dictate the way personal data can be used if you're sharing personal data with an outside vendor for a business purpose.

The concept of "reasonable" security is the subject of much debate around the world and is a constantly moving target. However, the basic concept is relatively simple: Are you taking active and up-to-date steps to ensure only authorized people access the data? If you can answer "yes" to that question, you're probably in the clear.

View full post