CISO story: how to protect personal data
Figuring out how to build your systems to keep data secure is a tricky task and there is much to consider. Throughout my work as a CISO for over a decade, I have come across and helped implement different data security for banks, e-commerce, and as a consultant. I have put together this 10-step guide to help you get your data practices in shape and let you know about some of the finer points.
1. Types of personal data
The main difficulties with personal data for international companies are the technical and organizational measures that need to be implemented. The thing is, the measures are difficult whether they are due to legal requirements in place in various jurisdictions or internal ones that the company puts in place.
We can actually split personal data into two camps:
Sensitive personal data is stuff like a scan of your passport or your medical records, while technical personal data relates to IP addresses, geolocation, email, and other. Another thing to take into account when assessing your data protection is the actual amount of personal data you process, store, and transfer, as well as the channels used to transfer and collect data.
Also, you shouldn’t forget about your employees’ personal data. Generally speaking, people will tell you that the requirements for protecting employee personal data aren’t as strict as that of customer data and implementation of data protection for employees is simpler.
2. Enriching data
Principal data like name, telephone, address, etc. are collected first, e.g. on work-search platforms like linkedin.com and indeed.com. A successful outcome for sites like these will see this data enriched by even more sensitive data like passport data, employment history, etc.
This data might be enriched further with data about pay or even medical data (e.g. information about whether a person has passed certain medical requirements, as requested by companies hiring drivers). As a result, we end up with a pretty solid set of personal data, whose collection and storage is justified by the company’s rules which comply with the laws of the jurisdictions in which it operates.
Enriched data which contains special category personal data will mean extra safeguards need to be put in place.
3. Data storage
When developing a way to store data, we are guided by two things:
- Required actions
Required actions are the rules about data storage mandated by law regarding the how data is processed and held.
- Recommended actions
Recommended actions are ones that are industry standards or those which are considered best practice in the company and are considered enough for their data practices.
More often than not, it is impossible to satisfy all of the regulations to the letter, meaning that organizations need to come up with ways to compensate that involve taking on additional risk.
The first step is to define the data flow from the employee to HR and from HR to the data storage location.
Let’s say that the scope of the flow includes systems X and Y, in addition to several departments which process data types A, B, and C, and the resulting personal data is stored on paper in a safe, with an electronic copy stored in system Z.
So how do we know what measures to put in place to keep the data secure?
We can be guided by a set of standard controls for data protection:
- Access management
- Change management
- Endpoint management
- Monitoring and alert management
These controls are well formulated and described in documents and standards such as:
- PCI DSS
By selecting a set of measures, you can begin to look at the different controls and select the ones that are most applicable to your situation.
4. Read-access management
Access to personal data should be limited to personnel who actually need access to this data.
Access to systems and workstations from where personal data can be accessed should be limited to personnel who actually need access to this data.
Access to personal data should be provided on a limited basis, with rights set on an access scale, depending on the function and responsibilities of the employee.
You need to have set procedures in place to monitor what your employees are doing with personal data.
Any access to personal data and actions with it should be recorded and saved.
The relevant people should be notified about any attempts of unlawful access to personal data.
You should have a mechanism which prevents unlawful access to personal data (e.g. this could be a restriction on the number of login attempts from a single HR employee’s computer).
Taking the above into account, let’s consider two situations:
Employee data is entered into HR system X. A HR employee then processes this personal data and forwards it to storage system Z. When the data is in use, it rests on both systems X and Z. Only HR (A) has access to these systems. This means that we need to consider the security measures in place for the protection of personal data in X, Z, and A — and this is two systems, several people (HR department) and their workstations, in addition to the network environment through which the data flows and is accessible.
Employee data is entered into HR system X. An employee from accounting (B), who has access to system X, copies part of the data to system 1C - Y1 (manually through excel - Y2), enriches the data, and sends the personal data back to the HR system X. Immediately we see that the number of measures that need to be in place and the risks involved increase and change in comparison to example 1. We have now added a completely different department (B), 2 systems (Y1 and Y2), plus 3 new data flows of X to Y1, Y1 to Y2, and Y2 to X.
Conclusion: formalize and define
In example 1 we would need to formalize the process. OK, we all understand that processes change all the time. Nevertheless, it’s no coincidence that auditors are often asking whether the processes in place are formalized. When a process is formalized and described, and the list of people, systems and data flows are recorded, building a framework for safe processes becomes so much easier.
The next step is to precisely define the set of data which needs to be processed, stored, and transferred. It’s important to define the measures required and those needed to provide the minimum protection: let’s call this “simplified risk analysis”. Without knowing how to protect and what to protect personal data from, you can’t select decent security measures, especially when the cost of protecting the data could exceed the cost of the data itself.
The third step is to define a sufficient set of measures which will cover all the security controls, regulator requirements, and standards which can be reasonably implemented to protect the data. Here you are basically looking to cover all the risks that come with handling personal data.
6. Customer data
This is where things go international. So as not to have to rewrite the data protection standards of different countries, choose the one that covers the lot. In general, when handling customer data:
- You can collect personal data if you have a real justification and need to do so in accordance with the law
- You must protect personal data while you are processing, transferring, and storing it
- You must delete or mask data after a set period
- You cannot publish personal data, wilfully allow data breaches, or disclose personal data to third parties without the permission of the data owner (in general, you shouldn’t do anything with the data that the data owner didn’t give you permission for)
The most important of the above is the obligation to protect personal data. This deserves a section of its own.
7. Protecting personal data
The main way we protect personal data in my company is by following the strictest rules for personal data out there: the GDPR.
Let’s have a look at the key parts of the regulation.
Data entry point
In our case, our website or mobile app is where personal data enters our remit. This could be the homepage or any part of the website, registration page, order page, and form for customs documents. Since these are the personal data entry points, we can say with full confidence that we know what personal data we collect/receive from customers and in what form (name, telephone, address, etc.).
The data isn’t stored somewhere technical, but in the customer’s personal or profile area. Employees and customers both have access to this area and we can consider it public from the perspective that it can be accessed from anywhere online.
The main risks here are therefore:
- Unauthorized access to the account by third parties
- Unauthorized access to the data by company employees (personnel that shouldn’t have access)
- Unauthorized changes, deletion, or copying of the personal data
To protect access from external actors, you will use different login methods, whether via social media, email, or Google. The customer takes greater responsibility for all login methods other than email with password since they are already signed in to these accounts. For email with password, it is best to add additional security measures like increased password difficulty or login confirmation via email.
Next we have anti-fraud measures which are a set of monitoring rules for customer logins. These measures are based on rules and metrics, and they track things like: login location (geolocation), number of login attempts, login success/failure, etc. By analyzing the lot together, we get an idea of who has been logging in: the owner of the account or someone else. In case we suspect the latter, as a security measure we can hide pretty much all of the customer’s personal data, especially card numbers, addresses and order history, and other personal information until we are sure that the person accessing the account is definitely the account owner.
8. Data localization
In some countries where Joom operates there are laws that mandate that customer personal data must be stored in the location where they live.
To satisfy this requirement, we house data locally and employ a mechanism to discover where the data is coming from (geolocation, IP, or other technical data). When we know where the data has come from, it is either sent to the main data center or the one local to the customer.
Take a look at the schema below:
Later, the data flow is split and one copy goes back to the main data center, with the local copy replicated in the localized data center for a particular region. As a result, we have the main database and a load of other small, localized ones. For security reasons, the data in the localized databases is encrypted with limited access and even if you have a legitimate reason to access it you’d have to jump through several security hoops.
So what are the requirements when transferring data from the backend to customers? Well, for example, communicating with the client via server-based infrastructure takes place using encrypted https for public networks, with no possibility to see the data will it is in transit. Customer authorization takes place as standard, using a client backend API.
9. Data processing and transferral
Data processing takes place manually and automated. Processing is a danger point where there is a pretty high risk that the data could be tampered with, lost, or accessed by third parties. You feel like the problems start piling up when your company’s infrastructure gets bigger and you have a load of systems and services which take part in the processing and transferral of data.
Problem 1. Data transfer format
When we are talking about personal data, especially the most sensitive types like passport data, you’ll need to make sure that the traffic/requests which contain the actual data aren’t in clear text, even within your own infrastructure. This brings about its own technical complications, like the need for two-way encryption of traffic and, in turn, implementation comes with it’s own technical hurdles. That's also taking into account that, while data is in transit, there will be a service that shows the TLS protocol.
Problem 2. Request log
Everyone always wants to know what broke and where, which request wasn’t executed, where the mistake happened. The thing is, this is secondary. The logging and generation of system messages is supposed to be able to solve the problems that it causes when you find out that a few of the logs start to contain parts of the requests with personal data. Even the best databases end up with personal data in the log archives over a period of six months.
Problem 3. Accessing the data
You should restrict access to data with different levels of access depending on whether the person needs to view or change the data. This sounds pretty easy, but isn’t in practice. Every system that works with data needs to give access to employees in the correct way: separating who has access to what and organizing the data so it isn’t in one big pile. At Joom, we spent days creating a role-based access matrix which sets out who has access to what depending on their needs.
Other than that, you will need to monitor and control how many times data can be accessed. For example, accessing a customer’s data up to 100 times (100 requests per day) is normal and is most likely a person manually working with data. 100-150 times is above normal but could mean that the person is working very diligently and over 150 requests is suspicious and reason to think that it is not a person accessing the data, but rather a script or bot. We want to be notified about suspicious activity.
Problem 4. Ant nest of data
A widespread problem that doesn’t always seem to be a problem is the places where data are sent to and stored temporarily. I’m not talking about databases which store the data, but temporary memory dumps, documents, analytical platforms, logistics, etc. If your data is in transit from its entry point to the places where it is stored, the data leaves bread crumbs behind. The places where the data is held up on its way to the database often aren’t so well protected or just aren’t suitable for storing data. This means the risk of unauthorized access to it can balloon.
Problem 5. External access and transfer
Questions about security when transferring data to third parties is rarely the last thing you should be thinking about. Making sure you are following legal requirements and regulatory documents about the technical and organizational methods for protecting data takes up much of a data security officer’s work, with the rest being access management, secure data transfer, and monitoring, amongst others.
There are a few ways you can transfer data to third parties: from building VPN tunnels to creating microservices and integrating APIs. Each of these transfers is a chink in your armour: a potential weak spot in your data security perimeter. The main approach to keeping things secure is to give data, but not allow it to be taken. This helps you avoid potential problems with dealing with too many requests for the data in question.
As an example, you could create a service which allows you to securely pass sensitive data to a third party service (e.g. Visa, Stripe, etc.).
The idea behind this is that you form a request not to the third party’s API directly, but via a special proxy which automatically converts the tokens transferred (on the basis of our example, your database will have tokens) into sensitive data, which is then passed to the third party API endpoint.
The API’s response will also go through a tokenization filter and you will only receive the information that you can store securely in your database.
One of the biggest problems is how and where to store personal data. Let’s take a look at how we can simplify the problems I listed above and group them together.
Create an independent system where you can transfer all personal or sensitive data to and then build the network security. PCI DSS contours can be built like this. You need to formulate your defence measures, building the solution architecture so that the data can be sent there when we know we have 100% of the data and none of it has gone astray.
This kind of system means you:
- Store the data only in one place
- Have all requests for this data in a structured manner and you can define who and what is making the requests
- Develop a role-based data access matrix to define the basis for who can access the data and for what purposes
- Don’t always need the data or have it exposed, meaning you can give the hashes to the backend and mask the data
- Can conduct centralized audits and monitoring of the access to the system, data, etc.
I could spend a while listing the benefits of this, but there are some minuses that come with it too. One of the biggest negatives is that all critical data is stored in one place and you will have to really make sure that the security measures in place are enough to really protect the data. Another minus is that, although access to the contour is limited to a set number of systems and requests, the requests come from automated systems and are impossible to track.
For example, an employee is using a new client support system which needs information about a customer’s order. The order data is in the CRM, but instead of asking the CRM which is authorized to receive the data from the contour, it decides to use another system (let’s call it an interface for a CRM) that was deployed in just half a day without really thinking it through because it was an unwanted distraction from core work. The interface is authorized in the CRM via a system account login for anyone that works with it. As a result, all requests will be logged in the CRM under this one system user, and all of the contour logs will be from the CRM itself. The end result here is that we have no idea what personal data was passed to the CRM, nor any idea if the data has been passed on further to a place out of our control and where it could remain.
To avoid this kind of situation, you have to either know exactly what and where things are developed and react to the changes on time, or you need to set the default to allowing only direct requests to the contour (without any services that employ intermediaries). Moreover, some systems will need to be taught not to store data or operate data in different formats, like with a hash or masking the data. In general, this is a pretty big job and there are a few technical aspects to it.
10. Data deletion
The last - but certainly not least - requirement is for deleting personal data at the end of set time periods. There are a few a couple of ways to solve this:
- Completely delete the data (a real headache for the company)
- Anonymize the data so that analysts can make reports on historical data
It’s a headache to delete data because more often than not there will be certain connections in your architecture that aren’t set up for data to be deleted. i.e. client data that is deleted from the CRM isn’t automatically wiped from all of the databases it is connected to, or the analysts’ reports, workflow apps, excel files, etc.
The second option allows you to build the process and try to automate data deletion as much as possible. Conversely, if you have implemented the variant with a single contour for storage, you can collect and hold the personal data in one place to be deleted when you need.
Keeping these basic ten points in mind, you can take steps to improve your security architecture and increase the protection of your customers’ personal data.