This site uses necessary cookies

Some of these cookies are essential. Strictly necessary cookies enable core functionality, without which, the website cannot function properly. For more detailed information please see our Cookie Policy.


Website stats

We use Matomo Analytics to understand how our website is used and to improve your experience. This tool gathers limited information about the device you use to access the UK Data Service website. To learn more, please see our Privacy Policy.

Glossary

Data archives

A data archive is a centralised database system that collects, manages, and stores datasets for later use. Similar to a data repository.

Data licensing

Data licensing is a legal arrangement between the creator of the data and the end-user specifying what users can do with the data.

Source: How to FAIR.

Data linkage

Data linkage is the process of joining together records from different sources that pertain to the same entity.

Source: ONSUnderstanding Society.

Data manipulation

Data manipulation is the process of arranging and organising data to make it easier to use, analyse and interpret.

Data mining

Data mining is defined as the process of extracting useful information from large data sets through the use of any relevant data analysis techniques developed to help people make better decisions.

Source: SAGE Research Methods .

Data repository

A data repository is a centralised database system that collects, manages, and stores datasets for later use, similar to a data archive.

Dataset

Any computer file (or set of files) which is organised under a single title and is capable of being described as a coherent unit.

Derived variable

variable that is created from one or more already existing variables by following some sort of calculation or other data processing technique. For example, each respondent’s estimated annual income from savings and investments could be derived from several reported income variables.

Descriptive statistic

Descriptive statistics are those that describe data. Examples include means, medians, variances, standard deviations, correlation coefficients, etc.

Source: SAGE Research Methods.

Documentation

Accompanying files that enable users to understand a dataset, exactly how the research was carried out and what the data mean. Usually consisting of data-level documentation i.e. about individual databases or data files and study-level documentation i.e. high-level information on the research context and design, the data collection methods used, any data preparations and manipulations, plus summaries of findings based on the data.