Introduction to synthetic data for Trusted Research Environments data experts
Format
Interactive presentations followed by a live coding demonstration via Zoom.
Overview
Are you working in a Trusted Research Environment? We are excited to invite you to this workshop about synthetic data designed for those working in Trusted Research Environments (TREs). The workshop will introduce the fundamentals of synthetic data, with a particular focus on its relevance to TREs and data owners. This event is part of our broader project, Balancing the Data Scales: A Cost-Benefit Analysis of Low-Fidelity Synthetic Data for Data Owners and Providers, funded by the Economic and Social Research Council.
Workshop details
The workshop is structured into four parts:
- Introduction to benefits, costs and utility of synthetic data project
We will begin with a brief presentation on our ongoing project. We will introduce the project focusing on one of the project's aim which is to critically assess the value and implications of using synthetic data within TREs. - Introduction to synthetic data
The first part will cover the basic concepts of synthetic data, showcase relevant examples, and discuss its significance, particularly in the context of TREs. We will explore key questions such as:
- What is synthetic data?
- What types of synthetic data are there?
- How can synthetic data benefit TREs and data owners?
- What are the cost and resource implications for TREs in managing synthetic data?
During this segment, we will also touch upon our project’s objectives, highlighting how synthetic data can be beneficial to TREs and data owners. - Live coding demonstration
After a short break, the workshop switches over to jupyter notebook to demonstrate several methods in python for generating synthetic data of various forms. - Q&A session
We will conclude the workshop with an open Q&A session, offering participants the opportunity to engage directly with the content and ask questions related to the presentation or live coding demonstration.
Presenters: Jools Kasmire, Cristina Magder and Hina Zahid.
Resources
Participants will have access to all workshop materials, including slide decks and Jupyter notebooks, via a GitHub repository. Additionally, a recording of the workshop will be made available post-event.
Prerequisites
No formal prerequisites are required to attend. However, those who wish to actively participate in the coding demonstration should have:
• Access to a computer with Python installed, or an online Python environment.
• Basic Python knowledge (e.g., loading packages, handling data, etc.).
Additional opportunity
As part of this project, we will also be conducting a focus group with TRE representatives to further explore the practical implications and efficiencies of synthetic data. More details will be provided during the workshop for those interested in participating.
Any questions regarding this workshop can be sent to comms@ukdataservice.ac.uk.