You’ve heard of the GDPR and other regulations geared towards protecting personal data with threats of large fines, but it’s worth remembering that customers take the protection of their data seriously and are ready to switch brands if they lose faith. In fact, one study showed 81% of consumers would stop engaging with a brand online following a data breach.
So we know that we need to control and protect personal data, but how do we do that?
The majority of companies start gaining control over the personal data they process by implementing standard processes. Put another way, they try to create a process that should result in them having a full picture of the personal data they have in the company, where it is stored, and how it is used and protected.
This process starts with defining what personal data is.
Since one of the main motivators here is compliance, the company sets its legal team to work in defining what personal data is. Then this definition is handed over to IT. The problem is that the regulations are written by legal professionals for legal professionals, and this means you need to translate the legalese into a language your engineers can understand. This isn’t always as easy as it first sounds.
The next thing is that personal data is used everywhere in your company. This means you have to pretty much get all departments involved. You name it: marketing, sales, support, analytics, the list goes on.
At the end of all this, you are presented with a document (let’s call it map) that describes the where, what, and why you have personal data. But this map is out of data as soon as it is created because as the company operates it collects new data and uses data for different purposes.
As a result, you need to keep the map updated and control almost every process in the company, including marketing, development, support, and sales.
Making sure everyone gets approval for any new operation or changes to current practices with personal data works, but it means every employee needs to know what personal data is, which isn’t a bad idea in theory. Problem is: this approach is influenced by the human factor and is subject to people making mistakes.
Another big downside is that it consumes time and resources because the data needs to be kept updated at all times, meaning that each stakeholder is forced to dedicate time to this on a constant basis. In this case, you have to work with stakeholders (usually managers) and they can provide info only if they were asked for approval by their peers. This chain of command plus bureaucracy is a drag on the process.
Once you know how to do things manually, you will want to start automating the process. Here I mean that after you have created the initial document which maps all your personal data, you will understand that keeping it updated involves all employees and will consume a lot of working hours. So you will want to automate this somehow.
You will start looking into how you can automate updates and because there are many changes in the scope of personal data, you will need to automate in several places. It's not enough to automate only one process; personal data exists in different places and departments and is a part of many processes, so automation is required everywhere.
The easiest way to automate this is to build it into CI/CD and the test chain. Luckily, most development processes nowadays are already well automated, so you don’t need to come up with something new. However, you’ll start running into problems when you start looking for a solution that allows you to automatically notify employees at the development stage that they are collecting new personal data.
It’s interesting that controlling shadow IT has been around much longer than the need to control personal data, but still hasn’t been solved. Especially with the emergence of SaaS products which departments like marketing can start using without involving IT teams. For example, email marketing automation tools.
In my experience there have been cases when marketing has come to the CTO and asked to use the paid version of an email marketing tool. Before they make the request, though, they are already using a trial version.
At the end of the day, the fact that the CTO learns that the tool is in use is already a plus, but if we are talking about processing personal data, even when using the trial version of a product: customer personal data has been shared with a third party.
There are several different types of asset management and CMDB tools on the market, some of which allow you to discover new apps in your perimeter. However, you will still need to have a complete view of all the systems you use, know which personal data is where and how it is being used.
The next step in managing your personal data is to automate its discovery. The standard approach is to use storage scanners. There are a whole range of products out there that can connect to your data stores and apps to search for and identify personal data.
If you’ve completed all of the previous steps, you will know about all the apps that the company is using, about all the new things in development inside your product, and about all of the data stores that you are using. One thing remains: how can you integrate automated data scanners across all sources?
Luckily for you, the majority of data scanners can work with pretty much any database. However, it is a different story with SaaS applications and there are thousands of B2B SaaS apps that process and store personal data.
So now you understand that the standard process-system-personal data approach doesn’t work, you are probably wondering if there’s another way. For example, can you discover personal data wherever it is, and then understand why it is there and how it is protected?
The answer is: you can if you are focusing solely on the data; not on the systems that store or process the data and not on the processes or people.
There is the only way to do this: catch and identify the data in all communications between all systems that exist in the company environment. Luckily, most communications in the modern world are machine to machine, are pretty structured, and can be monitored on the network layer.
Unfortunately, this is where problems with encrypted traffic, paper documents, and outdated systems crop up. However, this doesn’t mean that a data-centric approach doesn’t work. It is actually something you should be working towards. I mean, if it has been figured out for PCI DSS then why not for personal data?
Sometimes a data-centric approach will require some changes in your infrastructure and sometimes it should be supplemented by specific processes and procedures. Yes, this is extra work, but at the end of the day, we are talking about one of the most valuable and important assets of every company: customer data.
Personal data, by its very nature, is an asset that is difficult to control. It’s involved in almost every process in any company and controlling its usage is problematic. However, the current situation with regulations and costs of potential data exposure leaves no choice but to establish transparent, up-to-date, and precise monitoring of personal data usage.
Right now, the task cannot be solved by one tool or product, so it requires the combination of processes and tools that are able to identify the data in the different storages and tools that are able to catch the data in every communication.