GDPR data mapping: ultimate guide
GDPR data mapping is a great way to start tracking data in your organization and to have a complete inventory of personal data. However, it can be a difficult task and you may not know where to even start. We have created an ultimate guide to mapping personal data in your organization to simplify the entire process and get you on the right track to being fully GDPR compliant.
Read our in-depth guides one by one to learn the step-by-step process to GDPR data mapping:
- GDPR data mapping: where do I start?
- GDPR data mapping: where is personal data held?
- GDPR data mapping: documenting basis and retention
- GDPR data mapping: getting granular
What is GDPR data mapping?
Data mapping allows organizations to have a full picture of the data they store, and how this data is processed and flows across the organization’s different systems. As such, to identify data flows, you must first understand what data you have and where you are storing it. This inventory is generally referred to as a data map.
Holding a complete map of the customer data you store across all of your different systems is useful should you need to locate some data, but it is also mandated by the General Data Protection Regulation (GDPR). The Regulation requires organizations which process lots of consumer data to produce reports on demand. These reports should highlight exactly how the consumer data is being processed, giving a full indication of the data collected on consumers, why it has been collected, where it is stored, the safeguards employed, etc.
A data map is the go-to instrument for a full overview of all of your customer personal data. You first need to find out about all of the different data points you store. Here it is worth noting that if you are a new business or organization, it makes sense to define policies in the form of a data map to plan what data you will collect before you even start interacting with consumers. This “privacy-by-design” approach will keep you compliant from the ground up.
To give you an overview of the data you store, you can break down the compilation of your GDPR data map into two separate sections:
- Categorizing personal data
- Understanding processing purposes
How to start GDPR data mapping
Let’s start by taking a look at how to identify specific categories of personal data. Starting here will make the whole GDPR mapping process easier at later stages because it gives you a logical foundation on which to structure your inventory.
GDPR data mapping categories
Personal data can be grouped into several different types, with some categories being considered more sensitive than others. A great way to categorize your personal data is to group it according to usage purposes. For example: identity documents are a type of personal data that are used to identify a person for various other reasons.
The point of this exercise is to come up with logical categories that anyone looking at your GDPR data map should understand. This document isn’t created just for legal professionals to understand. Moreover, don’t aim to create categories for everything as this might become confusing because you may have categories with some overlap or a situation where you have difficulty classifying the data into one or another category.
Take a look at the suggested possible categories of personal data for your GDPR data map:
- Identity documents
- Employment details
- Tracking data
- Special category
Since we are making a data map to comply with the GDPR, it is important to keep the regulation’s definition of personal data in mind when we are categorizing. The GDPR defines personal data as:
“any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier”
Article 4 of the GDPR, Definitions (1)
Being able to identify someone using the data is a core factor according to the law. This means that something like a person’s date of birth: when on its own — is not personal data; when it is able to be combined with other data points — is personal data. The main question here is: can this data point be used to identify an individual? If yes, it is personal data that you need to categorize.
Now that we have a better understanding of the legislation and its definitions, let’s take a look at the specific categories that were listed above.
As we saw just above, personal data is information which can be used to identify someone. This is a very broad category that can include an email, name, address, etc.
This kind of data is generally related to anything that is collected for finance and accounting purposes. This can include customer payment details such as credit card numbers.
Data generated by collecting and scanning any documents to identify customers or staff is a separate category of personal data for your GDPR data map. Think: passports, driving licenses, etc.
Details about someone’s employment history should also be documented separately. Unless you work in a recruitment agency, generally this category concerns your employees.
We are all aware of tracking data such as cookies, but this category also includes CCTV in and around the office. By recording people’s behavior you are documenting personal data.
Special category data
Special category data is especially sensitive data that needs extra safeguards. It includes health data, data on sexual orientation or religious belief, and other sensitive information.
Now that we have a better understanding of how to categorize the personal data that we may store, let’s move on to think about how we can classify what we use each category for.
Personal data usage
With our base of categories sorted, we can now look at how we use each category in our organization. Here we need to find out what we actually use these data categories for. Generally, we can say that each category is used for:
- Customer support
- Legal risk management
- Sales and marketing
You may process personal data for different or additional purposes. If so, add the reasons why. It is best to be specific, yet with clear categories. Let’s quickly run through the different categories to make clear how to choose.
Processing personal data for accounting purposes relates to any incoming or outgoing payments. For example, this covers the billing details of your customers or bank accounts of your employees where you send their salaries.
You may process customer personal data in order to offer them support. For example, if a customer rings you with a query, you will ask their name, document the time they rang, and the number that they called from.
Legal risk management
In order to comply with various sets of regulations, you are required by law to keep specific records. For example, in accordance with anti-money-laundering regulations, companies are obligated to store financial information of customers for certain periods.
Sales and marketing
The majority of organizations collect personal data from consumers for sales and marketing purposes. For example, address information may be used by organizations to launch marketing campaigns in specific localities.
While profiling is often linked to marketing activities because it involves categorizing people into certain profiles, it can also be considered separately. This is because in some instances it can be used to generate automated decisions. For example, profiling based on age may take place for decisions about whether to issue credit.
The usage categories above are something that feature in the vast majority of organizations. You may also have several uses for personal data that are specific to your company. For example, financial data will be used for accounting purposes, but so will personal data too. And this is a good point to remember: you will likely use data categories for several purposes.
By understanding and structuring your data into categories and knowing the uses for these categories, you are well on your way to identifying your entire data inventory.
While you are completing your data map, you may come across data for which you cannot find the reason why your organization stores and processes it. This is when you should look to erase this data in accordance with data minimization principles of the GDPR.
Article 5, 1(c) of the GDPR stipulates that the processing of personal data should be:
“adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (‘data minimization’)”
As your data map becomes more complete, it will become more obvious where your organization is collecting and storing data but has no reason to do so. Use your data map to identify these areas and erase data as part of your data minimization commitments.
Now that we have got our personal data and usage categorization understood, we can now move on to looking at where the data is specifically held.
GDPR data mapping: where is personal data held?
Categorizing data is just the very beginning; you also need to know where the sources of your data are. This means taking steps to finding out the different databases and tools your organization implements in its day-to-day business.
GDPR data mapping: finding the data
Finding out where personal data is stored across your organization is one of the fundamental tasks of data mapping. But this can be one of the most difficult aspects to creating a GDPR data inventory. If your organization has been operating for a prolonged period of time, it is likely that you have personal data stored all over the place.
Personal data in databases
Find out if your organization uses its own databases that have been customized for your needs. These custom databases may be on premise or cloud based. You will need to find out who is the owner of the different systems. This person should be able to tell you which data is stored in the systems they control.
Custom databases come in all shapes and forms. Some examples include:
- Microsoft SQL
A point of note here is that your organization could use applications that it has built itself and these apps operate using custom databases. Speak with the relevant departmental stakeholders to find out the apps you are using and the databases which they are built on. Then you will be able to dive a little deeper into which personal data is stored there.
In addition to custom databases, there are many different paid services that you probably use in the running of your business or organization. These services are likely to process personal data on behalf of your organization.
Personal data in applications
There are so many different technologies out there that you can purchase to save your organization making their own custom applications. These are generally referred to as-a-service tools and your organization will probably use purchased software that process personal data on your behalf. These software-as-a-service (SaaS) tools need to be accounted for in your GDPR data map.
Some SaaS tools are more popular than others and each organization is specific in what it implements. However, there are some which are more popular than others. Let’s run through the more popular services, what their function is, and the personal data they are most likely to process for you.
Google Workspace and personal data
Google Workspace’s business operations tools are used across a range of businesses and organizations. The software stores a colossal amount of data for your organization and much of this is personal data. The types of personal data stored there can range from emails from your staff, customers, and contractors, to billing documents and audio recordings of conversations.
If your organization has been using G-Suite for a while, this will be a monumental task in which you will need to trawl through the G-Drive to discover and classify the different types of personal data that it has been processing for you. Get other members of staff involved by creating a survey about what they store there and find out why. By making the survey detailed and getting as many people as you can involved, you will be able to drastically reduce the time it will take to find out and classify the data that is collected there.
Slack and personal data
The growing popularity of workplace messaging tools has facilitated a shift away from emails. Slack is one of the most popular such tools and it will process personal data for you. The most obvious categories of personal data that will be found here are employee data and details on customers that employees share using Slack.
If you use Slack in your organization, go ahead and create a poll right in Slack to check who is doing what with personal data. You can then take steps to minimize the sharing of personal data and have a full picture of what is going on in Slack that relates to your work and the GDPR.
Hubspot and personal data
Marketing tools such as Hubspot are really handy for businesses of all different sizes. SaaS like Hubspot have a wide offering of tools that gather personal data, such as tracking data, customer data such as names and emails, in addition to data relating to a person’s professional life.
Hubspot is principally a marketing service, but it is a good idea to speak to the system owner to find out the exact different data points collected through it.
Salesforce and personal data
Premier marketing automation tools such as Salesforce are generally used in larger organizations as they offer a full kit for sales and marketing teams. Salesforce and similar SaaS tools have a big focus on collecting all sorts of information so that businesses can make data-driven decisions. This means that, other than the usual personal data related to marketing (such as tracking data), Salesforce may be processing financial data, in addition to profiling data on your employees.
Salesforce and similar SaaS giants demonstrate the importance of speaking to the relevant stakeholders to find out the different categories of personal data that are collected.
Stripe and personal data
There are also SaaS tools out there that handle accounting and finance for organizations. One of the most popular is Stripe. This software tool facilitates online payments, but the personal data it collects on your behalf also expands to email addresses, names, and shipping addresses.
Just as with sales and marketing tools, SaaS tools for financial applications also collect different categories of personal data and so you need to check which to document everything properly.
Zendesk and personal data
Another type of SaaS is for customer service and a popular tool out there is Zendesk. This customer service software boosts businesses’ customer engagement and sales. However, it may also be profiling your customers and employees, in addition to collecting geolocation data and audio-visual information.
Since customer service software like Zendesk processes many different types of personal data, speak to the system owner to find out exactly which so that you can complete your GDPR data map.
Don’t shy away from asking for help: get in touch with different departments in your organization to find out which services you use. When you have this information, speak with the system owner to find out which personal data is being collected and for what purposes.
You might be asking, “who is this ‘system owner’ that keeps being mentioned?”
The system owner is the employee in your organization who is responsible for the SaaS tool or custom database. This colleague should know all the ins and outs of the system and be able to show you the different data that it processes. This person is a key stakeholder for your organization’s data practices and it is best to document this person and their contact details should you need to get in touch.
Furthermore, remind this person to get in touch with you if anything changes regarding the practices or settings concerning the collection of personal data via the system they own. By getting them to reach out to you, you can keep your data map updated without having to pester people periodically.
Personal data location
If you are using custom databases on premise, then the question of where your data source is located is simple. However, with cloud-based SaaS tools you will have to check. The vast majority of tools listed above store their data in the US.
Follow changes to legislation and relevant court decisions to make sure you stay compliant.
Keeping track of any changes in the legal sphere of data protection is paramount for any DPO. Changes such as UK-EU adequacy or rulings on transfers to third countries such as the US could have a big impact on the way your organization does business. For example, should transfers to third countries not be covered by an international agreement, you will need to obtain additional consent to do so from your customers. This is very relevant if you are using US services from the UK or EU.
The responsibility is yours
When it comes to using handy SaaS tools in the day-to-day running of operations in your organization, you need to keep a close eye on which personal data they are collecting. This is because they are being directed by you to collect such data, making what the services do your responsibility. Stay in the loop about the data practices in your organization and, if you are unsure of anything, reach out to your colleagues in different departments to find out which data they are collecting and what they are doing with it.
Once you have a full handle on what type of data is being stored, for what purpose, and where it resides, you can move on to documenting the legal basis for why you are processing this data, and the period you will retain it for.
GDPR data mapping: documenting basis and retention
Consent and retention
As we saw above, the various types of personal data are collected for different reasons. This is what makes defining the categories and uses such a foundational part to GDPR mapping. When you have a clear overview of what data is held where and what it is used for, you need to start documenting the legal basis for why you collect this data and the length of time you store it for.
GDPR legal bases for processing
The GDPR stipulates six different legal bases for storing and processing personal data:
- Legal obligation
- Vital interest
- Public task
- Legitimate interest
So what does each mean and how do they justify your organization processing people’s personal data?
Consent is an explicit act in which the individual whose data you are processing has given expressed written permission for you to do so. “Expressed” means that the consumer has to opt in to their personal data being collected. Moreover, expressed permission can be easily revoked at any time.
Pre-ticked, opt-out checkboxes do not comply with legislation such as GDPR.
The explicit nature of consent means that it should be separate from any other conditions and be specific to the reason for the processing. An example of explicit consent would be a checkbox separate from other fields which needs to be ticked for the consumer to agree to marketing materials being sent to them. You should then store this consent so that it can be shown on demand to indicate that it was freely given.
When entering into a contract and in order to fulfil it, the consumer gives you the right to process their data. As such, this is another basis on which you can justify processing personal data. An example of this would be a purchase agreement in which you need to collect a customer’s name and billing address in order to sell them a good or service.
A point of note here is that other personal information which is not needed to fulfil the contract cannot be processed on the basis of contract. This means that you cannot collect data on a customer’s age if you do not need this information to make the sale; you need to receive additional consent or use another legal basis to process this data.
Another legal basis which permits the processing of personal data is to satisfy legal requirements in order to comply with other laws. For example, in order to conclude an employment contract, you need to collect relevant personal data and store this for tax purposes. As such, employment law serves as the reason for the processing.
Laws make clear which data you need to store and you need to be able to refer to the legal obligation if you wish to use this basis to process an individual’s personal data on these grounds.
Vital interests of individuals generally are reasons which relate to protecting a person’s life. This means that this basis is often used for processing data when an individual’s life is in danger. An example of this would be collecting data on a person in order to give them emergency medical care.
It is important to note here that the person should be physically unable to consent to the processing in the situation of vital interest. If they are able, then another basis, such as explicit consent should be used.
Using the public task legal basis for processing personal data for the most part concerns public authorities like government or local councils. Although, your organization may process data to perform a public task if it has been granted permission to act in the public interest. For example, there are many private companies that government authorities contract out services to.
It is worth noting here that the task being performed should be in the public interest and/or be directly connected to the official authority. Moreover, the task should be stipulated in law. For example, should the local council contract out information collection for census purposes, the organization employed to collect the data would be processing it on the grounds of carrying out a public task.
Lastly is legitimate interest. This legal basis is one of the most widely used since it is very broad. Processing personal data based on the legitimate interest of the individual relates to any reason which concerns the rights and interests of people. Moreover, this extends to the interest of business and society as a whole.
As you can guess, the legitimate interests of people span across many different spheres: from the right to be informed about something, to the right to free speech. Because of this, legitimate interest covers even B2B marketing activities where it can be argued that information is processed and stored in order to inform individuals of better products or services on offer.
GDPR special category data: bases for processing
In addition to the legal bases above, the GDPR specifies additional bases for data which is classified as sensitive. Som sensitive data is classified as “special category” and has additional legal grounds which cater to the specific needs of handling these types of information:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Trade union membership
- DNA (genes)
- Biometrics data (to identify a person)
- A person’s sex life
- A person’s sexual orientation
Since this data is especially sensitive, it needs additional safeguards which require separate legal bases for processing the information. As you will see from the list below, some of the bases are the same, but there are additional grounds:
- Legal obligation
- Vital interests
- Public interest
- Legitimate activity
- Publicized by the subject
- Judicial purpose
- Medical diagnosis
- Public health
- Archiving purposes
It is worth noting that if you were to conclude a contract and use this for the basis of processing, yet asked the person for data which is considered special category, then you would need to obtain consent separately.
Since the bases repeat themselves, let’s run through the items which were not covered above.
Legitimate activity generally concerns processing activities which are needed to carry out the functions of organizations. For example, should the person be a member of a political organization, their personal data will need to be processed to carry out the functions of the organization. This would be processing on the grounds of legitimate activity.
Publicized by the subject
Any personal data that an individual makes public themselves allows organizations to process this data based on the fact that it is already in the public domain. This is especially relevant for special category data since its sensitive nature means that all reasons for processing it should be recorded. An example of such a situation would be if an individual shares their health status publicly. This would allow any organization to process this data and the fact that it has been shared publicly would need to be documented.
As suggested in the title, if the special category data is needed in court proceedings, it can be processed on these grounds.
For a medical diagnosis to be given, special category data, such as health data, needs to be processed.
When disaster strikes and public health is at risk, special category data needs to be processed to protect people. The law accounts for this.
Lastly, data protection law accounts for processing activities concerned with archiving when it is in the public interest to archive special category data on living people.
So now we understand the different legal bases for processing personal data, we can take a look at assigning the retention periods.
Personal data retention
A retention period is the time that you store personal data for before anonymizing or erasing it. Anonymization is a method that depersonalizes data so that the person it relates to can no longer be identified from this data.
Data protection legislation specifies that organizations should not keep personal data longer than they need it for. As such, you need to justify why you are storing the data and this is where you use the legal bases above.
The problem is that the GDPR doesn’t mandate specific rules and time periods during which you should retain data. You need to formulate your own policies.
By creating a retention schedule and keeping it together with your legal bases for personal data processing, you set your policy crystal clear as to how long you store different categories of data and why.
You should review the personal data you store on a constant basis and check whether you still need it. This will help you minimize the data you hold and reduce risks related to the effects of data breaches.
Set your own policies for data retention based on market best practice. Look to industry guidelines to see if there are already policies out there that you can implement. Remember that other legal obligations will be relevant and define the retention periods for certain categories of personal data. For example, financial data must be stored on record for accounting purposes in line with the law.
If you are unsure of how long you should store the data and there is no information available in the form of regulations or guidelines, you need to make a decision based on the proportionality of storing the data. This means that storing CCTV records for 10 years will probably be unjustifiable, whereas storing the records for 6 months may be justifiable. Here it depends on the nature of your business and what is being recorded.
In general, you need to set your own guidelines and include what you will do with the personal data when the retention period ends. Keep best practices in mind, along with the proportionality of your reasons for the length of time that you will store this data.
Taking the steps to document all of the legal bases for why you store different categories of personal data and being concrete about the time you will store the data for demonstrates that your organization takes data privacy seriously. Setting records of processing activities in the form of policies in your data map is a key foundation from which you can build out your GDPR compliance.
Now that we have the bases for GDPR compliance firmly in place, you can go into more detail about the exact data points in the categories of personal data that you store.
Having categorized the personal data your organization stores, with the knowledge of where it is stored, why you have it, and how long you should keep it for, it is time to get granular with each piece of data to show why you have it.
Granular knowledge of personal data
You’ve now reached the final stage of creating a data map for your organization and it involves digging into all the different categories of data you store to find out exactly what is there and for what purpose. While you already categorized the personal data you have, you now need to get your hands dirty to understand:
- What precise data you hold
- Who you hold the data on
- The reason for having this data
Having a complete granular overview of all the data you hold and is best practice in the sphere of data protection. If you are in a situation where you need to report to executives in your organization or to the authorities, you will be able to show that you have structured the documenting of processing activities in a logical and complete manner.
Define the fields in the categories
Above we sorted the personal data categories already, let’s start here. Go through each category and list which data points you collect. You may have chosen to include others, but at the start of this guide we suggested six categories that should cover all the personal data your organization collects:
- Identity documents
- Employment details
- Tracking data
- Special categories
After you have dug up all the different data points for each category, you then need to classify the categories of people whose personal data you process. Earlier we suggested the following universal categories:
- Potential staff
- Emergency contacts
You may have different classifications for your data subjects: specify them in your data map in accordance with your organization’s needs.
There are two categories above which we haven’t covered before: potential staff and emergency contacts.
Potential staff are the people that send you vast amounts of information about themselves to your organization in the hope that you will employ them. Generally, this information is held on file for future reference. Potential staff differ from employees in that they send you their personal data and give explicit consent to process it, whereas the basis for processing employee personal data is based on the legal obligations stipulated in employment law.
Emergency contact data concerns data about another individual that a person provides as a point of contact in case of emergency. This personal data will be processed based on the legitimate interest of the data subject (the person who is the emergency contact) as we can assume that this person would want to be informed if something happened to the person who indicated the emergency contact.
Personal data usage
Having generated a full list of data points for your personal data categories and identified any personas which you may collect data on, you can now move to the reasons that you have the data: what you use the data for. As in the first part of your GDPR data map, we propose you use the following reasons for processing personal data in your organization:
- Customer support
- Legal risk management
- Sales and marketing
If you process personal data for other reasons: add them in or modify the ones we propose.
Indicate the uses for each data point for every persona type. If you use the data for several purposes, indicate that too.
If you have gone through the ultimate guide and completed all of the steps properly, you should have a great GDPR data map that shows you take data privacy seriously. This will be really helpful should you need to show the report to the board or data protection authorities. Moreover, should you need to find certain personal data on a specific person, you can open your map and use it to find out where the data points are stored.
Not a one-off
Keep in mind that this is a living document that needs to be updated periodically. This is not a one-time exercise and the job is never finished. Keep on top of your data map, keeping it up to date to stay GDPR Article 30 audit ready.