Developing a new tool to assess data quality

The UK Data Service is leading a project to create a simple
open sourced tool to assess the quality of quantitative data files and
create and deliver associated training materials for it.

The
year-long project, QAMYData, has been funded by the ESRC’s National
Centre for Research Methods, and is led by Louise Corti, Director of
Collections Development and Producer Relations at the UK Data Service.
She is joined by co-investigator Vernon Gayle from the Applied
Quantitative Methods Network (AQMeN) at the University of Edinburgh.

Louise says, “We
know that across the social science community we need to improve
transparency and replicability. Research funders and journal publishers
now expect researchers to make explicit and share the data sources they
have used to underpin their findings. But sharing data is often the last
thing on the priority list of a busy researcher, so data often suffer
from a ‘quick and dirty’ upload.”

“Research data get
uploaded in repositories around the world, and almost every ‘data
publisher’ uses a different way of checking data they acquire. Data
quality is not always rigorously assessed, partly because repository
managers may lack the skills to appreciate disciplinary issues or the
detail of data.”

“The aim of this project is to pass on
expertise in what makes a high quality dataset to the research and data
publishing communities with an easy-to-use tool that assesses
quantitative data for known quality issues.”

The tool will
automatically detect some of the most common problems in numeric data
and create a ‘data health check’ – and help to create associated
training materials. It will be possible to submit data multiple times
until any problems which are identified have been remedied and the data
have a ‘clean bill of health’. The tool will also produce a high quality
codebook/data dictionary to demonstrate quality assurance to a journal
or data repository.

Louise adds: “The tool will be useful for
anyone wanting to share their research data, or reuse less than clean
data. The associated training through the UK Data Service, AQMen and
NCRM can help to improve awareness of what makes high quality data.”

The project runs until January 2019.

Full details of the project

This site uses necessary cookies

Website stats

Developing a new tool to assess data quality

Share this page