This site uses cookies

Some of these cookies are essential, while others help us to improve your experience by providing insights into how the site is being used.

For more detailed information please check our Cookie notice


Necessary cookies

Necessary cookies enable core functionality. This website cannot function properly without these cookies.


Cookies that measure website use

If you provide permission, we will use Google Analytics to measure how you use the website so we can improve it based on our understanding of user needs. Google Analytics sets cookies that store anonymised information about how you got to the site, the pages you visit, how long you spend on each page and what you click on while you’re visiting the site.

Developing a new tool to assess data quality

The UK Data Service is leading a project to create a simple
open sourced tool to assess the quality of quantitative data files and
create and deliver associated training materials for it.

The
year-long project, QAMYData, has been funded by the ESRC’s National
Centre for Research Methods, and is led by Louise Corti, Director of
Collections Development and Producer Relations at the UK Data Service.
She is joined by co-investigator Vernon Gayle from the Applied
Quantitative Methods Network (AQMeN) at the University of Edinburgh.

Louise says, “We
know that across the social science community we need to improve
transparency and replicability. Research funders and journal publishers
now expect researchers to make explicit and share the data sources they
have used to underpin their findings. But sharing data is often the last
thing on the priority list of a busy researcher, so data often suffer
from a ‘quick and dirty’ upload.”

“Research data get
uploaded in repositories around the world, and almost every ‘data
publisher’ uses a different way of checking data they acquire. Data
quality is not always rigorously assessed, partly because repository
managers may lack the skills to appreciate disciplinary issues or the
detail of data.”

“The aim of this project is to pass on
expertise in what makes a high quality dataset to the research and data
publishing communities with an easy-to-use tool that assesses
quantitative data for known quality issues.”

The tool will
automatically detect some of the most common problems in numeric data
and create a ‘data health check’ – and help to create associated
training materials. It will be possible to submit data multiple times
until any problems which are identified have been remedied and the data
have a ‘clean bill of health’. The tool will also produce a high quality
codebook/data dictionary to demonstrate quality assurance to a journal
or data repository.

Louise adds: “The tool will be useful for
anyone wanting to share their research data, or reuse less than clean
data. The associated training through the UK Data Service, AQMen and
NCRM can help to improve awareness of what makes high quality data.”

The project runs until January 2019.

Full details of the project