Anonymising qualitative data

Anonymising qualitative data

Anonymising qualitative data

When anonymising qualitative data (such as transcribed interviews) textual or audio-visual data, pseudonyms or generic descriptors, should be used to edit identifying information, rather than blanking-out information.

Consideration should be given to the level of anonymity required to meet the needs agreed during the informed consent process. Pre-planning and agreeing with participants during the consent process, on what may and may not be recorded or transcribed, can be a much more effective way of creating data that accurately represents the research process and the contribution of participants.

For example, if an employer’s name cannot be disclosed, it should be agreed in advance that it will not be mentioned during an interview. This is easier than spending time later removing it from a recording or transcript.

 

 

  • Do not collect disclosive data unless this is necessary. For example, do not ask for full names if they cannot be used in the data.
  • Plan anonymisation at the time of transcription, or initial write up (longitudinal studies may be an exception if relationships between waves of interviews need special attention for harmonised editing).
  • Use pseudonyms or replacements that are consistent within the research team and throughout the project. For example, use the same pseudonyms in publications and follow-up research.
  • Use ‘search and replace’ techniques carefully, so that unintended changes are not made, and mispelt words are not missed.
  • Identify replacements in text clearly, for example with [brackets] or using XML tags, such as <seg>word to be anonymised</seg>.
  • Keep unedited versions of data for use within the research team and for preservation.
  • Create an anonymisation log of all replacements, aggregations or removals made and store such a log separately from the anonymised data files.
  • Consider redacting statements where there is an increased risk of harm or disclosure.

Our text anonymisation helper tool can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data, but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.

In an interview transcript a person’s name is replaced with a pseudonym or with a tag that typifies the person [farmer Bob, paternal grandmother, council employee]. This is also done when reference is made to other identifiable people.

An exact geographical location may be replaced with a meaningful descriptive term that typifies the location [southern part of town, near the local river, a moorland farm or his native village]. See this example with markups.

Examples of ‘over’ and ‘under’ anonymisation

Original:So my first workplace was Arronal, which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.

Example A, too heavy: So my first workplace was X, which was about X minutes from my home in X. My best colleagues from day one were X, X and X and in fact, I am still very good friends with X to this day. X lives in the same parish still with her husband X and their X X.

Example B, too lightSo my first workplace was [name], which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.

Anonymisation of audio-visual data, such as editing of digital images or audio recordings, should be done sensitively. Bleeping out real names or place names is acceptable, but disguising voices by altering the pitch in a recording, or obscuring faces by pixellating sections of a video image significantly, reduces the usefulness of data. These processes are also highly labour intensive and expensive.

If confidentiality of audio-visual data is an issue, it is better to obtain the participant’s consent to use and share the data unaltered. Where anonymisation would result in too much loss of data content, regulating access to data can be considered as a better strategy.

We urge researchers to consider and judge at an early stage the implications of depositing materials containing confidential information and to get in touch to consult on any potential issues.