Research data management
The importance of managing and sharing data
In the digital age, the generation of research data has grown exponentially, and data are nowadays very easily stored, kept and exchanged around the world. Digital infrastructures and the internet facilitate both the creation of ever larger amounts of research data, as well as their sharing.
Since 2000 we have seen a boom in both the drivers of data sharing, as well as the development of human and technological capability to do so. Data sharing is increasingly encouraged or required by research funders and journal publishers, but also from within the research community itself. Research funders want to maximise the scientific outputs and benefits to society from their investments and make sure that data can be reused for future research.
Data lifecycle
Increase opportunities for learning and innovation
Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased. Follow-up projects may also analyse or add to the data, and data is also often re-used by other researchers.
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.
Plan to share
Plan ahead to create high-quality, shareable research data
In research projects, early planning is essential to ensure that activities are considered in detail and are organised, to ensure efficiency and successful completion of the work.
The same applies to the planning of how research data will be managed over the length of a research project and beyond. In this digital age, most research projects are data centric and therefore research needs to be planned around the data. A data management plan is therefore the ideal planning tool for researchers.
Find out more
Below are links to web pages containing best practice guidance on data management planning:
Rights in data
Rights relating to research data
Intellectual property (IP) rights apply to research data and play a role when creating, sharing and reusing data.
Many kinds of data created as part of a research project are subject to the same rights as literary or artistic work. Such items acquire rights like copyright or more general intellectual property rights when they are created. This gives the rights owner control over the exploitation of their work, such as the right to copy and adapt the work, the right to rent or lend it, the right to communicate it to the public and the right to licence and distribute.
These rights need to be taken into account when creating, using and sharing data.
Find out more
Below are links to web pages which give further information on rights relating to research projects:
Collaborative research
Provide a data management framework for researchers
Large-scale and collaborative research is becoming more commonplace, with many research projects taking a cross-national and interdisciplinary approach to research.
This brings additional data management challenges for providing shared storage, access and the transfer of research data across the various partners or institutions.
Due to the nature of these projects, the coordination and streamlining of data management become important tasks.
Find out more
Below are links to web pages which give further information on strategies for collaborative research:
Data Protection
The legal landscape
Much research data – even sensitive data – can be shared legally if researchers employ strategies of informed consent, anonymisation and controlling access to data.
Researchers obtaining data from people are expected to comply with the relevant legislation, such as data protection legislation (e.g. the UK General Data Protection Regulation (UK GDPR) and the UK Data Protection Act 2018). Carrying out an assessment of disclosure risk can help to apply best practices of gaining consent, anonymising data and regulating access to enable data to be shared.
Some ethical issues, such as the duty of confidentiality, are legally-binding.
Find out more
Below are links to web pages which give further information on legal obligations and practical guidance on how to address this:
Ethical issues
Upholding ethical standards
Most researchers will confront ethical considerations in practice when writing a research proposal, applying for ethics approval, or having to deal with ethical dilemmas that arise during a research project.
Collecting, using and sharing data in research with people all require that researchers meet ethical obligations. These responsibilities relate to respecting people, being fair with both research participation and the benefits of research, and minimising harm.
Upholding of scientific standards (integrity) and the compliance with the law (data protection and intellectual property) are often also considered ethical duties.
In the aftermath of the Facebook and Cambridge Analytica scandal. involving inappropriate data sharing, it is even more imperative that the research community upholds ethical standards.
Interactive module
Ethical consent and data sharing
Find out more
Below are links to web pages which give further information on ethical obligations and practical guidance on obtaining consent for sharing data:
Storing data
Keep your digital data safe, secure and recoverable
Ensuring your data are safe is crucial to any research project. A good storage and backup strategy will help prevent potential data loss.
Ensuring the security of data requires paying attention to physical security, network security, plus the security of computer systems and files to prevent unauthorised access or unwanted changes to data, disclosure or the destruction of data. Data security arrangements need to be proportionate to the nature of the data and the risks involved.
Encryption can be used for safely storing and sending files. Regular backups protect against accidental or malicious data loss and this procedure can be easily automated. Data needs to be securely destroyed once it is no longer needed, as merely deleting files and reformatting a hard drive will not prevent data recovery.
Find out more
Below are links to web pages containing best practice guidance on data storage and security:
Formatting data
Create well organised and sustainable data
Research data exist in many different forms: Textual, numerical, databases, geospatial, images, audio-visual recordings and data generated by machines or instruments. Digital data exists in specific file formats, which are coded so that a software programme can read and interpret these data.
Using standard and interchangeable or open lossless data formats ensures longer-term usability of data. For long term preservation, digital data is converted to such formats.
Data files should be clearly named, well organised, structured and quality, and version-controlled throughout the research. It is vital to develop suitable procedures before data gathering starts in order to adhere to any conventions, instructions, guidelines or templates that will help to ensure quality and consistency across a data collection.
Find out more
Below are links to web pages containing best practice guidance on formatting data:
Anonymising data
Preserving the privacy of participants
Anonymisation is a valuable tool that allows data to be shared, whilst preserving privacy. The process of anonymising data requires that identifiers are changed in some way, such as being removed, substituted, distorted, generalised or aggregated.
A person’s identity can be disclosed from:
- Direct identifiers such as names, postcode information or pictures.
- Indirect identifiers which, when linked with other available information, could identify someone, for example information on workplace, occupation, salary or age.
Balancing anonymisation with keeping data useful
You decide which information to keep for data to be useful and which to change. Removing key variables, applying pseudonyms, generalising and removing contextual information from textual files and blurring image or video data could result in important details being missed or incorrect inferences being made.
Anonymising research data is best planned early in the research process, to help reduce anonymisation costs. It should also be considered alongside obtaining informed consent for data sharing or imposing access restrictions.
Personal data should never be disclosed from research information, unless a participant has given consent to do so, ideally in writing.
Interactive modules
De-identification and anonymisation of transcript data
De-identification and anonymisation of quantitative data
Find out more
Below are links to web pages containing best practice guidance on anonymising data:
Documenting data
Make data clear to understand and easy to use
A crucial part of ensuring that research data can be shared and reused by a wide range of researchers for a variety of purposes is by taking care that those data are accessible, understandable and (re)usable.
Original researchers wishing to return to their data some time later, or new users wanting to use data, need sufficient contextual and explanatory information to make sense of those data.
Documentation deposited alongside data files should enable users, with no prior knowledge of the research project and data collected, to understand exactly how the research was carried out and what the data mean, in order to (re)use the data correctly in their respective projects and for their respective purposes.
This requires clear and detailed data description and annotation. Besides the information that is needed to reuse the data, data also need to be accompanied by information for citing and discovering the data.
To prepare data for secondary research, researchers should document data appropriately. They should also explain the procedures and fieldwork methods, the objectives and methodology of the research, and explicitly describe the meanings of variables and codes used. Additionally, they should describe any derivation, transformations, de-identification (pseudonymisation/anonymisation) or data cleaning carried out.
They should also ensure that data are held in an organised manner. Documentation is invaluable in enabling secondary users to contextualise data and conduct better, informed re-use of the material.
Any consent and confidentiality concerns that may inhibit archiving data should be resolved before the deposit is made. See our guidance on consent for data sharing.
Creating comprehensive data documentation is easiest when begun at the onset of a project and continued throughout the research.
Interactive module
Best practices for documenting data collections
Find out more
Below are links to web pages containing best practice guidance on documenting data: