This site uses cookies

Some of these cookies are essential, while others help us to improve your experience by providing insights into how the site is being used.

For more detailed information please check our Cookie notice


Necessary cookies

Necessary cookies enable core functionality. This website cannot function properly without these cookies.


Cookies that measure website use

If you provide permission, we will use Google Analytics to measure how you use the website so we can improve it based on our understanding of user needs. Google Analytics sets cookies that store anonymised information about how you got to the site, the pages you visit, how long you spend on each page and what you click on while you’re visiting the site.

Skip to content
UK Data Service Logo
  • Login
  • Find data
  • Deposit data
  • Learning Hub
  • Training and events
  • About
  • News
  • Impact
  • Help
  • Contact
  • Site search
  • Login
  • Find data
  • Deposit data
  • Learning Hub
  • Training and events
  • About
  • News
  • Impact
  • Help
  • Contact
  • Login
Back to How to deposit data
  • Home
  • Help
  • How to deposit data
  • Prepare your data collection for deposit

Prepare your data collection for deposit

Whether depositing large-scale population-representative collections in our curated repository or smaller research collections via our self-deposit repository, ReShare, data creators should follow our guidance below on preparing data. Ideally, this should be before the start of fieldwork or data collection.

In addition to the summary points noted below, we provide comprehensive best practice guidance for individual researchers, teams and research support staff, which can be found in the Research data management section of our Learning Hub.

We run a regular training workshop programme covering key areas of managing and sharing research data. Please contact us if you would like to discuss any of these issues further.

Data collected from people, including personal and/or sensitive data, can be shared ethically and legally by paying attention to three critical aspects:

  • Inform research participants about data-sharing plans. Read about informed consent and data sharing; while other lawful bases might be used for processing personal data, it is an ethical duty to keep participants informed on how their data will be used. Careful attention must be given when processing special category personal data as an additional lawful basis must be identified as per Article 9 of the General Data Protection Regulation (GDPR).
  • De-identify or anonymise the data as needed. Remove or mask identifiable information to ensure participants’ privacy while maintaining the utility of the data for research.
  • Consider appropriate restrictions to access the data.

Additionally, Research Ethics Committees (RECs) may place particular demands on handling research data, leading to tensions between data protection and sharing, such as destroying data. Raise data sharing during the ethical review and consult our guidance for RECs on sharing research data without breaching ethical or legal responsibilities.

Proper preparation of data files is essential to ensure they meet the required standards for deposit and reuse. Please allocate enough time during and at the end of your project to carry out the following steps, including thorough quality control checks for data capture and cleaning.

  • Use consistent and meaningful filenames that reflect the content of the files. Avoid spaces and special characters, and indicate sensitivity or restrictions in filenames, if applicable.
  • Include detailed metadata, such as descriptions of variables, a clear header at the beginning of each transcript with information, and speaker identification.
  • Verify that copyright permissions are secured for all relevant data, particularly for third-party content.
  • Use recommended file formats and ensure no data or internal metadata is lost or altered during file format conversions.
  • Ensure the data files are complete and ready for a single deposit, with any issues resolved before submission.

Additional checklist for preparing quantitative data files

  • Use meaningful and self-explanatory variable names, value labels, and abbreviations.
  • Ensure variable and value labels are accurate and consistent across datasets for both questionnaire and derived variables.
  • Remove temporary, administrative, or dummy variables that are irrelevant to secondary researchers.
  • Eliminate redundant or duplicate derived variables.
  • Include weights as variables, but do not apply them in the data files.
  • Provide anonymised primary sampling unit (PSU) information where possible to support advanced analyses.
  • Apply anonymisation techniques, such as aggregating categories, top/bottom coding outliers, and anonymising unique identifiers.
  • Check textual variables to ensure no disclosive information or internal comments are included.

Additional checklist for preparing qualitative transcript data files

  • Follow consistent formatting for textual data, such as interview transcripts or observational notes.
  • Include clear headers with metadata at the beginning of each transcript.
  • Check our recommended transcription format for qualitative textual data.
  • Apply anonymisation techniques and remove or aggregate personally identifiable information and sensitive content.
  • Use pseudonyms or codes consistently to replace names or identifiers.
  • Use clear and consistent methods to indicate non-verbal communication (e.g. [pause], [laughter]) or contextual notes.

Data documentation should provide secondary researchers with sufficient information to understand and reuse the data effectively and ethically. Consider what information is necessary to explain the data’s meaning, including the methods and tools used to collect and prepare it.

Study-level documentation

Study-level documentation provides an overview of the research and its context. It may be a technical report, user guide, or a combination of supplementary materials that outline the purpose, content, and processes underlying the data collection and preparation.

  • Include details about the purpose of the data collection, such as the project’s history, aims, objectives, and research questions or hypotheses. Specify the names of investigators and funders, information about who collected the data, the geographic locations of data collection, the temporal coverage or dates of collections, and any publications or outputs associated with the research.
  • Provide an overview of the content of the data files, including the types of data (e.g. interviews, surveys, observational data, multimedia), the structure of the data files (e.g. number of records, cases, or variables), and any relationships among data elements.
  • Describe how the data was collected, including methodologies and protocols such as template consent forms and Participant Information Sheets, sampling design, sample structure and representation, and the tools or software used. If applicable, include details of digitisation, transcription, or coding methods and the use of any secondary data sources.
  • Provide details on data processing and preparation, including editing, cleaning, coding, or classification processes. Include information about any anonymisation methods and the tools or software used.
  • Document the quality assurance procedures applied, such as validation, checking, or cleaning processes. Address any measures to ensure accuracy, such as calibration, transcription checking, or resolution adjustments. If the data was modified, explain any changes made to the methodology, variables, or labelling. For longitudinal or time-series data, include details of any changes over time, such as adjustments to question wording or sampling methods.
  • Include information about how the data can be accessed. Specify where the data collection is available, the persistent identifier and details about access conditions, licensing, and terms of use. Provide copyright information and how to cite the data collection.

Where multiple files are archived, create and deposit a ReadMe file (Word) to provide an overview of the data collection.

Responsible repositories might be unable to accept data for deposit unless templates of consent forms or statements and the Participant Information Sheet used during the study are provided. These materials are essential to demonstrate that ethical and legal requirements for data sharing have been met. If providing these documents is not possible, please get in touch to discuss specific circumstances and explore potential solutions.

Additional study-level documentation for quantitative data collections

  • Detailed codebooks or data dictionaries with links to the questions from which variables were derived.
  • Syntax and scripts for derived variables and any transformations applied.
  • For longitudinal studies, a summary of changes in methodology, variable content, or sampling over time.

Additional study-level documentation for qualitative data collections

  • Data list to serve as a finding aid.
  • For collections including audio or video recordings, provide supplementary documentation, the data list can be modified accordingly, including information about recording dates, locations, file formats, and any technical specifications.

Additional study-level documentation for secondary data collections

  • Variable log (Excel) to ensure transparency and ethical use of the original data used.

Every data collection requires a plan for its pathway to access. Planning appropriate access should take into account ethical and legal considerations.

Most data collection organisations have guidance on disclosure control for the data they publish. These bodies advise on the appropriate level of detail to be included in published data for any given access level. Obtain legal advice where necessary.

Where possible, categorise your data according to its information classification level, which can then be used to determine the most appropriate access conditions for onward sharing. We encourage depositors to consider preparing multiple versions of data under different access levels according to our licensing and access framework.

Before data can be made available to secondary researchers:

  • Choose appropriate access conditions.
  • Clear any rights permissions and confirm copyright.
  • Choose the most appropriate licence (PDF).

Contact our Collections Development team if you have any questions.

More help in 'How to deposit data'

  • Guidance for depositing syntax/code
  • Deposit in the ReShare repository
  • FAQs on depositing data with the UK Data Service
  • Deposit in the curated repository
  • Curated repository: Submit data and documentation
  • Curated repository: Licensing and access framework
  • Curated repository: In-house pre-ingest checks and curation processes
UKRI Logo

Funded by UKRI through the ESRC with contributions from our partners.

  • Accessibility
  • Cookies
  • Privacy
  • Sitemap
  • Terms and conditions
  • About
  • Contact
  • Help
  • Impact
  • News
  • Email
  • Bluesky
  • LinkedIn
  • YouTube
  • Website feedback
© 2025 UK Data Service We are supported by the University of Essex, University of Manchester, Jisc, UCL and University of Edinburgh. We are funded by UKRI through the Economic and Social Research Council.