Anonymisation step-by-step

Anonymisation ensures that the risk of identifying a data subject in released data is negligible. Careful considerations must be given to the applicable legislation as this will influence the definition of anonymised data.

In the UK, the Information Commissioner’s Office (ICO) provides in-depth information about personal data, including definitions and considerations for pseudonymisation and anonymisation, within the scope of the UK General Data Protection Regulation and the Data Protection Act 2018.

Anonymisation techniques can be different for quantitative (e.g. survey) data than for qualitative (e.g. transcript) data. When preparing to anonymise your data, you should first consider doing a data situation audit. A data situation audit considers every situation where data is going to be shared or published, from data excerpts in journal articles to full datasets archived with a responsible repository.

Rather than considering anonymisation to be a single procedure that is done at the end of data collection, anonymisation, and all the strategies that help to protect participants, should be done throughout the project and everywhere where full or partial data is published. Once you’ve considered all the situations or circumstances in which data would be shared and/or published, you are ready to begin the anonymisation process.

A simplified high-level approach, for sharing data for future reuse in a responsible repository, can consist of the following three core steps.

Step 1: Find and assess identifiers

Start by identifying potential identifiers. This should include both direct identifiers (information identifying data subjects directly e.g. names, addresses) and indirect identifiers (information which when combined might identify data subjects e.g. age, sex, educational attainment, occupation).
Evaluate the likelihood of reidentification by considering both the data itself and the potential availability of external information that could be linked to it.
Key questions to consider:
- Can the identity of a participant be known from information in the data file?
- Is there a possibility of inadvertently disclosing or causing harm to a third party based on the information in the data file?

Step 2: Implement anonymisation techniques

Ensure that all direct identifiers have been removed (deleted) or pseudonymised (replaced with fake names or codes).
Next, address the indirect identifiers you have identified as potentially leading to identification. Techniques may include:
- Banding, binning and aggregation: Group data points to decrease identifiability. For example, rather than using specific ages, categorise them into broader age ranges such as 20-24, 25-29, 30-34, 35-39, etc.
- Generalisation: Modify detailed information to more general terms to prevent identification. This is highly applicable for qualitative data such as transcripts, but also for survey data containing string variables. For example, generalise “living in the city of Preston in Lancashire” to “living in a countryside location in the North West of England”.
- Data specific techniques: Incorporate specialised techniques such as recoding, top/bottom coding or statistical disclosure control in quantitative data, and methods like blurring or altering features in visual data, or voice distortion in audio data. Careful consideration should be given to methods such as blurring or voice distortion as depending on the context and envisioned usage of the data, the usability of the data might be compromised. Find out more about applicable techniques for qualitative data and for quantitative data and consult the Government Statistical Service policy for the release of social survey microdata.
Key questions to consider:
- How can the data be altered to prevent identification while retaining its utility for secondary analysis?
- Are the anonymisation techniques employed sufficient to protect against reidentification? Make sure to consider the nature of the data including the data type and format, sensitivity, and uniqueness, as well as, intended usage.

Step 3: Review the data and reassess any remaining disclosure risk

Ensure that the anonymisation process has been consistently applied across the data.
Conduct a thorough review to confirm that no real residual risk of disclosing personal or sensitive information remains. If there is a low residual risk of disclosure, consider an effective anonymisation approach (PDF), a concept introduced by the ICO. Check our licensing and access framework web page for further information on data classification.
Key questions to consider:
- Have all identifiers, both direct and indirect, been adequately anonymised or removed?
- Is there any remaining information that, when combined with other available data, could lead to the identification of individuals?

This site uses necessary cookies

Website stats

Anonymisation step-by-step