Guidance on preparing and managing data

Whether depositing large-scale survey data in our curated collection, or smaller research collections via our self-deposit system, ReShare, data creators should consult our guidance below on preparing data. Ideally, this should be prior to the start of fieldwork or data collection.

In addition to the summary points noted below, we also provide comprehensive best practice guidance aimed at individual researchers, teams and research support staff, which can be found in our Research data management section of Learning hub.

We run a programme of regular training workshops and cover key areas of managing and sharing research data. Please get in touch with us if you would like to discuss any of these issues further.

Checklist for preparing data files

Allow sufficient time during and towards the end of a project for these preparations listed below. Build in quality control checks for your data capture and cleaning processes.

  • Use consistent and meaningful filenames that reflect the file content, avoiding spaces and special characters; if data are sensitive or restricted, indicate this in the file name.
  • Use meaningful and self-explanatory variable names, codes and abbreviations.
  • Ensure internal consistency checks are completed.
  • Ensure variable and value labels are complete and consistent – both questionnaire and derived variables.
  • Remove all your own temporary, administrative, or dummy variables, created for internal purposes/not of use to researcher.
  • Ensure no repetition of variables, especially redundancy in derived variables.
  • Check that the level of detail included in the data is suitable for the agreed access arrangements and licensing.
  • Apply an appropriate level of anonymisation e.g. serial numbers anonymised, so that they cannot be linked to other sources, have any top coding applied, or cases removed.
  • Provide anonymised Primary Sampling Unit information if possible, so that researchers can incorporate the sampling design into their analyses.
  • Check that any textual variables included are suitable for dissemination e.g. no disclosive information or internal comments in free-text variables. Ensure consistent treatment and labelling of missing values.
  • Include weights as variables but do not apply them in the deposited data files.
  • Use our recommended file formats.
  • Check our recommended transcription format for qualitative textual data.
  • If converting data across file formats, check that no data or internal metadata have been lost or changed.
  • Check whether copyright permission needs to be sought with regard to data ownership.
  • Finally, make sure that data are complete and try to ensure one deposit only, with any data issues resolved before deposit.

Requirements for publishing survey data in Nesstar

Nesstar is the UK Data Service’s online data browsing, analysis, subsetting and download tool, that enables easy access to richly documented variables. It is used to publish the major survey series

Instant tabulation and graphing can be done. Full question text, universe and routing information is displayed alongside variable names, code values, labels and frequencies.

A selection of key data, typically from government departments, are made available through our Nesstar service. These require additional processing work to render them suitable for user-friendly online browsing. This includes:

  • Variable and value labels must be clear and consistent, avoiding truncation of variable and value labels.
  • Non-compliant characters, such as &, @ and <>, should be removed.
  • Question text should be made available in as structured a format as possible, e.g. XML or spreadsheet.

Confidential and sensitive data

These data can be shared ethically and legally by paying attention to three important aspects:

  • Inform research participants or discuss data sharing with them. Read about informed consent for participation in research and data sharing.
  • Anonymise or pseudonymse data where needed.
  • Consider controlling access to data.

Research Ethics Committees (RECs) may place particular demands upon how to handle research data, which may lead to tensions between data protection and sharing, such as destroying data. Raise data sharing during an ethical review and consult our guidance for RECs on how to share research data without breaching ethical or legal responsibilities.

Preparing data documentation

Data documentation should give future researchers sufficient information to be able to understand and reuse the data.

Consider what kind of documentation from the research can help explain what data mean and to provide context. For example, what methods and tools were used to collect and prepare the data? For survey series, ensure that documentation refers to the current year’s data.

Types of survey documentation to include:

  • Survey technical report with standard headings, describing sampling, achieved sample size, fieldwork and weighting and so on – the level of detail may vary depending on the scale and resourcing of the survey. See the Health Survey for England Report for gold-standard documentation.
  • Information leaflets and consent forms.
  • Questionnaires with universe and routing instructions.
  • Showcards.
  • Interviewer instructions – if these are commercially sensitive, then a summary of briefing content.
  • Coding frames and coding instructions.
  • Links to primary reports and publications.
  • Information about questionnaire variables that have been removed and the reason for this e.g. confidentiality.
  • Information about any known errors or issues in the data.
  • Structured information for the UK Data service collection-level metadata record, using DDI-compliant metadata and controlled vocabularies, as set out in our deposit form.

Types of data documentation

Survey and numeric data:

  • Variable list.
  • Links from variables to questions in the questionnaire (CAI or otherwise).
  • Codebook or data dictionary (we can also generate a data dictionary from the data files at the ingest stage).
  • Clear, unique definitions of variables.
  • Geographical identifiers, or spatial units, should be defined using a unique name, referenced definition, or an authority and date stamp of when unit boundaries were defined, not when the sampling was recorded.
  • Weighting variables described.
  • Syntax for any derived variables.
  • Where possible or practical, change in key content over time – questions and variables, e.g. the summary of changes in topics and sampling from the User Guide for the Health Survey for England.

Qualitative data:

  • Topic guide for qualitative data.
  • Data list to serve as finding aid.

Access and licensing

Every data collection requires a plan for its pathway to access. Planning appropriate access should take into account ethical and legal issues.

Most data collection organisations have guidance on disclosure control for data they release, such as local Release Panels for government departments. These bodies advise on the appropriate level of detail to be included in published data for any given access level. Obtain legal advice where necessary.

Where possible, categorise your data according to its information classification level, which can then be used to determine the most appropriate access conditions for onward sharing. We encourage depositors to consider preparing multiple versions of data under different access levels according to our licensing and access framework. We can provide access to data under open, safeguarded or controlled licences and conditions.

Before data can be made available to users:

  • Choose appropriate access conditions.
  • Clear any rights permissions and confirm copyright.
  • Choose the most appropriate licence.

Get in touch with our Collections Development team if you have any questions.