Generating Synthetic Data for Statistical Disclosure Control
This short course will
provide a detailed overview of the topic, covering all important aspects
relevant for the synthetic data approach. Starting with a short introduction to
data confidentiality in general and synthetic data in particular, the workshop
will discuss the different approaches to generating synthetic datasets in
detail. Possible modelling strategies and analytical validity evaluations will
be assessed and potential measures to quantify the remaining risk of disclosure
will be presented.
The aim is to provide
the participants with hands on experience, the course will include practical
sessions using R, in which the students generate and evaluate synthetic data
based on real data examples.
Target Audience
The course intends to
summarize the state of the art in synthetic data. The main focus will be on
practical implementation and not so much on the motivation of the underlying statistical
theory. Participants may be academic researchers or practitioners from
statistical agencies working in the area of data confidentiality and data
access. Basic knowledge in R is expected. Some background in Bayesian
statistics is helpful but not obligatory.
Further course details can be found here.
More information regarding our courses can be found here.
Podcast
for some of our previous courses can be found here.
Course Leader: Dr Jörg Drechsler (IAB)
Course Outline:
The
course covers:
- the fully synthetic data approach
- the partially synthetic data approach
- modelling strategies for generating
synthetic data - data utility evaluations
- disclosure risk assessment
By
the end of the course participants will:
- have a practical understanding of the concept of synthetic data
- be able to judge in which situations the approach could be useful
- know how to generate synthetic data from their own data
- have a number of tools available to evaluate the analytical validity of
the synthetic datasets -
know how to assess the disclosure risk of the
generated data