This site uses cookies

Some of these cookies are essential, while others help us to improve your experience by providing insights into how the site is being used.

For more detailed information please check our Cookie notice


Necessary cookies

Necessary cookies enable core functionality. This website cannot function properly without these cookies.


Cookies that measure website use

If you provide permission, we will use Google Analytics to measure how you use the website so we can improve it based on our understanding of user needs. Google Analytics sets cookies that store anonymised information about how you got to the site, the pages you visit, how long you spend on each page and what you click on while you’re visiting the site.

UK Data Service launches QAMyData quality tool

The UK Data Service has developed a free, simple, open
source tool that provides a data health check.

QAMyData automatically
assesses numeric data quality and integrity to address the most common errors
found in numeric data files and metadata, and to highlight the most obvious
disclosure issues.

It has been in development since January 2018, when staff
from the UK Data Service and AQMeN began
working on a tool which researchers could use to check the quality of their
data before using them or uploading them to a repository.

Louise Corti, Director of Data Publishing and Access at the UK Data Service, explains: “Evidence to show that
data have been assessed is important for a number of reasons. Not only does our
QAMyData health check help data creators show they have produced high quality
data, and have met GDPR requirements when processing and de-identifying data,
it also helps supports transparency and reproducibility agendas from funders
and journals. It’s also useful to have a tool simply to check third party data
sources before using them.

“QAMyData automatically assesses key elements of quality in
numeric data. These include the amount of missing data, duplication, outliers
and direct identifiers. The tool can reduce the manual effort to check data by
repository staff, and can also be integrated into data cleaning and processing
pipelines for data creators, users, reviewers and publishers. It’s useful for
hands-on training, allowing students to experiment with evaluating data for
integrity and metadata problems.

“To carry out a data health check, users simply edit a
configuration file to specify and set acceptable thresholds for each test in
the tool, such as ‘no missing data’ or ‘data must be fully labelled’. Issues
are identified in both a summary and detailed report. Files generated in Stata,
SPSS, SAS and CSV formats can be checked.

“QAMyData is open source, free to use, and quite
straightforward, and goes some way to help us all to make sure data are FAIR:
findable, accessible, interoperable and re-usable.”

Find out
more
 or read Louise’s Impact
Blog about the new tool
.

We are running a webinar introducing QAMyData on 2 December at 3pm GMT