The UK Data Service has developed a free, simple, open
source tool that provides a data health check.
QAMyData automatically
assesses numeric data quality and integrity to address the most common errors
found in numeric data files and metadata, and to highlight the most obvious
disclosure issues.
It has been in development since January 2018, when staff
from the UK Data Service and AQMeN began
working on a tool which researchers could use to check the quality of their
data before using them or uploading them to a repository.
Louise Corti, Director of Data Publishing and Access at the UK Data Service, explains: “Evidence to show that
data have been assessed is important for a number of reasons. Not only does our
QAMyData health check help data creators show they have produced high quality
data, and have met GDPR requirements when processing and de-identifying data,
it also helps supports transparency and reproducibility agendas from funders
and journals. It’s also useful to have a tool simply to check third party data
sources before using them.
“QAMyData automatically assesses key elements of quality in
numeric data. These include the amount of missing data, duplication, outliers
and direct identifiers. The tool can reduce the manual effort to check data by
repository staff, and can also be integrated into data cleaning and processing
pipelines for data creators, users, reviewers and publishers. It’s useful for
hands-on training, allowing students to experiment with evaluating data for
integrity and metadata problems.
“To carry out a data health check, users simply edit a
configuration file to specify and set acceptable thresholds for each test in
the tool, such as ‘no missing data’ or ‘data must be fully labelled’. Issues
are identified in both a summary and detailed report. Files generated in Stata,
SPSS, SAS and CSV formats can be checked.
“QAMyData is open source, free to use, and quite
straightforward, and goes some way to help us all to make sure data are FAIR:
findable, accessible, interoperable and re-usable.”
Find out
more or read Louise’s Impact
Blog about the new tool.
We are running a webinar introducing QAMyData on 2 December at 3pm GMT