Ted is writing things

On privacy, research, and privacy research.

On “simplifying” GDPR's definition of personal data

Last November, the European Commission published a proposal named "Digital Omnibus", to change and clarify multiple regulations. You can find the full text of the proposal here, and a summary in their press release.

One of the proposed changes concerns the definition of personal data under GDPR. It's a critical definition, because if something is not personal data, then the GDPR doesn't apply. So classifying something as not personal data — for example, by applying robust anonymization methods — essentially gives organizations a "get-out-of-data-protection-compliance-obligations" card.

In this blog post, I want to make a simple point: this change would not lead to the desired goal of making it simpler for organizations to comply with data protection law. It would make it more difficult for organizations to achieve a good compliance posture, and more likely that they need expert help as part of their compliance programs. This would be great news for my business! But I don't think it would lead to great outcomes for society.

Currently, data is considered personal data if it relates to an identified or identifiable person. The definition doesn't depend on who holds a piece of data. This suggests that if someone, somewhere, can take that data and figure out the specific person it relates to, then this is personal data. And if you're the one holding such data, then you have to treat it carefully, and comply with a bunch of GDPR obligations.

The Digital Omnibus proposes to change this, to make that definition depend on who is holding the data. Say I give you some data, and I have a way of mapping each record in that data to a specific person. But I don't tell you how I can do it. You have no clear way of identifying the person it relates to. You could make the case that this data is anonymous for you, and avoid having to comply with all the annoying GDPR rules.

This changes an absolute definition into a relative one. If you want to categorize data as anonymized, you have to consider all the other data and capabilities that your organization has. Different organizations might classify data differently. Worse, changes in context (like your company merging with another company, or launching a new product generating new data) might change whether you can consider a dataset as anonymized. Already, this seems like this doesn't achieve the "simplification" goal: the criterion gets more complex, and more likely to change over time.

But with this change, companies will be able to classify more of their data as anonymized, even if other organizations could re-identify it. This can be very valuable1: it would allows companies to keep data for longer, use it for secondary purposes, possibly re-share it, and so on. So we should expect companies to try to use this new determination for more of the data they hold.

Because the criterion is more complicated, this would require more work, and this work would require more specific expertise. That's why this would be great news for my independent consultancy: doing re-identification analysis and designing anonymization schemes is exactly the kind of service I provide to my clients. More demand would translate to more business and higher rates.

However, I fear that that besides privacy consultancies like mine, only big companies would benefit from this change. Large organizations have lots of data, want to do many different things with it, and have money to invest into making it possible. It makes sense for them to hire people like me to help them pass the new anonymization bar while keeping as much data value as possible.

Smaller organizations, by contrast, typically don't have very mature privacy programs, and they have a lot less data. How will their behavior change if the new definition of personal data is adopted? I can foresee two likely outcomes.

  1. They might not change anything. Keep things simple, use a conservative, "absolute" approach to anonymization. This does not change the status quo, but means that larger companies can increase their competitive advantage over smaller players.
  2. They might try to adopt the new definition, but will be likely to make the determination without the necessary expertise, leading to increased privacy and compliance risk. Incorrectly categorizing some data as anonymized is already a common occurence: people convince themselves that the redaction scheme they just made up is robust, and make privacy claims that don't stand up to scrutiny. Making the definition more complex and less strict can only make this problem worse.

Neither outcome seems like an improvement to me.


For a more in-depth analysis of the proposed changes in the Digital Omnibus, I recommend this report report by EDRi, a collective of civil and human rights organizations from across Europe.


  1. From a business perspective, not for end users. (There is a lot to say about the consequences of the proposed changes for end users, but this is not the point I'm making in this post.) 

Feedback on these posts is welcome! Reach out via e-mail (se.niatnofsed@neimad) for comments and suggestions.
Interested in using privacy-enhancing technology to do more with your data, with respect and compliance built-in? I can help! Check out the website of my independent consultancy, Hiding Nemo, to learn more.