October 05, 2021
7 min read

How to obfuscate personal data in practice

Learn how to preserve the confidentiality, integrity, and availability of the personal data your organisation has while staying compliant.

Setting the context

In order to reduce the possibility of a data breach, organisations are applying controls to ensure personal data is adequately protected. The main motivator for this shift is still the ‘dissuasive’ effect of the fines that can be applied by supervisory authorities under the GDPR (Article 83.1).

Fines to sanction non-compliance issues have been steadily escalating to the upper levels allowed by the Regulation, as in the case of Amazon, facing a €746 million penalty (July 2021), or WhatsApp in Ireland, with €225 million (September 2021).

Minimize or limit

The main two avenues to decrease the likelihood of a data breach are not new: either limit your ingestion of data at the entry point, also called data minimisation; or reduce the personal data you already have, termed storage limitation. The latter is a requisite once the data has no valid legal basis of processing.

These two principles are ‘destructive’: applied by means of not taking more data than that which is strictly necessary, and deleting the data already stored.

The question is: how can you preserve the confidentiality, integrity, and availability of the personal data your organisation already has, as per GDPR Article 32?

There are many ways to accomplish this objective, with the ultimate decisions made at Board level in view of the Business Impact Analysis (BIA). In this article we will focus on one particular way to address the problem at the root: modification of the data so that it no longer points at individuals.

The benefits of this approach are twofold:

  • You still keep the data that can be used for metrics, analytics, and other purposes
  • You can reduce data to be processed to satisfy a customer who requests one of their rights under GDPR, like right of access

What techniques are available?

Below are some of the most well-known and popular methods for modifying personal data so that individuals can no longer be identified due to their popularity. They can be applied to personal data in general, or just to fields that correspond to special category data, to avoid facing more stringent requirements for their handling:

Obfuscation

This is the overarching concept of changing the real data so that it no longer identifies an individual. In the event this data is compromised, the impact will be reduced considerably, if not altogether avoided. Not generally permanent, the data could be reconstituted as personal data again, which makes it markedly distinct from anonymisation.

Masking

This technique is performed by replacing some of the personal data with other characters. Mostly used when showing information on a display, for instance, when a user enters their password or credit-card details and the screen shows asterisks where the characters are being entered.

Tokenization

Tokenization replaces personal data with meaningless values (tokens): data with no intrinsic value. Normally these tokens are in the same format as the original data.

Aggregation

Aggregation is when data is combined from multiple sources, with the purpose of diluting what can be inferred from it. If individuals cannot be identified after data has been aggregated: aggregation has been applied to a satisfactory level of abstraction. A good example of this is changing the age of a customer in the storage so that it now reflects a range (e.g. 18-25), or reducing the accuracy of a postcode so that it associates with a wider area.

Redaction

Mostly used when fulfilling data subject requests, redaction operates under the basis that any superfluous personal data can, and should be, made unreadable. For instance, when responding to a Data Subject Access Request (DSAR), only the personal data of the claimant should be provided, and the personal data of other individuals either deleted, masked, or removed.

So far so good, is there a catch?

The issue is that although the concepts are relatively simple to understand, they can be difficult to implement. In fact, many organisations fail to apply these techniques in a sound way, giving them a false feeling of security. This disconnect represents a risk in itself: an enterprise may be tempted to lower the controls applied to this data, considering it low risk.

The concepts are relatively simple to understand, but they can be difficult to implement.

The truth is, the technical implementation can be flawed in some aspect, or other factors may be omitted, that could affect how the data is processed and accessed.

Flaws in technical implementation

I have seen numerous examples of in-house algorithms implementing obfuscation that is not fit for purpose. Oftentimes, they tend to be overly simplistic, yet the organisation remains highly hopeful their solution is adequate, even when being told it is not the case.

There are many reasons for this: for SMEs, and indeed larger companies, budget is the main issue that stands in the way, preventing them from using an external product, perhaps one that has been tested and subjected to scrutiny.

Other than financial constraints there are also other factors that can be sometimes surprising.

Need to know security

I once provided consultancy services to a firm that handled large volumes of data, and had applied an ‘algorithm’ for tokenization (although I wouldn’t call it an algorithm per se). It was a rather rudimentary process to replace the fields associated with special category data with other ‘random’ data.

Random data is notoriously difficult to create by means of digital processes (unless we are dealing with quantum computers), hence the term pseudo random number generator (PRNG) when applied to numbers. Their algorithm was poorly constructed, so that the sought-after randomness was actually mostly deterministic, offering only a choice of 3 to create a token. Besides, it wasn’t an algorithm based on mathematical insight by any stretch of the imagination, rather it was more like a ‘rule of thumb’ with 3 predefined outcomes.

Suffice to say, the solution wasn’t fit for purpose.

The argument of “I’m using a tokenization process” becomes a moot point when the purpose and outcome are not aligned.

When I indicated this, I was confronted with a defensive stance about their ‘novel algorithm’ (it was their CEO, after all, who had designed it). The argument of “well, I’m using a tokenization process” becomes a moot point when the purpose and outcome are not aligned. It is easy to lose sight of the goal, which is compliance, when egos come into play.

As I pointed out, the inner workings of the algorithm were known by all employees, as the document describing its operation was in a shared folder along with other policies and procedures! Not that it would take long for an attacker to figure out how it worked, but now it was readily available to internal attack by disgruntled employees (or anyone with access to the document).

I am no advocate for achieving security by obscurity — the belief that making something hidden is enough to ensure its security — but I had to educate my client on the principle of least knowledge and need to know.

It is all about the integration, not just individual components

This highlights another fundamental aspect of the use of technologies: no matter how well conceived they might be, it is the sum of all the parts and how well they work together that is conducive to security.

For example, some encryption schemes have been used in the past where the symmetric key (the same key is used for both encryption and decryption) was shared among all entities. On the surface, encryption was being used, but it was so poorly configured that any user could access other user’s information, and also be prone to the ‘Break Once, Break Everywhere’ (BOBE) effect: cracking or obtaining a single key that can then be used for multiple systems.

Final thoughts

To protect your information assets, intent must be combined with knowledge to do things in an effective manner. A best effort is only good enough if it meets the objectives set out for compliance, anything else can be a missed opportunity, costing you money and reputational damage.

Always strive to create the best environment possible with administrative, technical and physical controls, and foster a security culture that permeates throughout the entire organisation.

Automate monitoring and analytics of your personal data in flow

Author
Anselmo
Experienced Principal Consultant and Associate Lecturer with an extensive academic background in Law, Information Security, and Engineering, including globally recognised certifications such as Fellow of Information Privacy (FIP), CIPP/E, CIPM, CIPT, CDPSE, and CISSP.

Receive helpful tips, practical content, and updates

Thank you! You have been successfully subscribed
Oops! Something went wrong while submitting the form.