This site uses cookies

Some of these cookies are essential, while others help us to improve your experience by providing insights into how the site is being used.

For more detailed information please check our Cookie notice


Necessary cookies

Necessary cookies enable core functionality. This website cannot function properly without these cookies.


Cookies that measure website use

If you provide permission, we will use Google Analytics to measure how you use the website so we can improve it based on our understanding of user needs. Google Analytics sets cookies that store anonymised information about how you got to the site, the pages you visit, how long you spend on each page and what you click on while you’re visiting the site.

Introduction to synthetic data for Trusted Research Environments data experts

18 Nov 2024 10:00 am - 12:30 pm
Online
Training
Data skills
Synthetic data
Workshop

Format
Interactive presentations followed by a live coding demonstration via Zoom.

Overview

Are you working in a Trusted Research Environment? We are excited to invite you to this workshop about synthetic data designed for those working in Trusted Research Environments (TREs). The workshop will introduce the fundamentals of synthetic data, with a particular focus on its relevance to TREs and data owners. This event is part of our broader project, Balancing the Data Scales: A Cost-Benefit Analysis of Low-Fidelity Synthetic Data for Data Owners and Providers, funded by the Economic and Social Research Council.

Workshop details
The workshop is structured into four parts:

  1. Introduction to benefits, costs and utility of synthetic data project
    We will begin with a brief presentation on our ongoing project. We will introduce the project focusing on one of the project's aim which is to critically assess the value and implications of using synthetic data within TREs.
  2. Introduction to synthetic data
    The first part will cover the basic concepts of synthetic data, showcase relevant examples, and discuss its significance, particularly in the context of TREs. We will explore key questions such as:
     - What is synthetic data?
     - What types of synthetic data are there?
     - How can synthetic data benefit TREs and data owners?
     - What are the cost and resource implications for TREs in managing synthetic data?
    During this segment, we will also touch upon our project’s objectives, highlighting how synthetic data can be beneficial to TREs and data owners.
  3. Live coding demonstration
    After a short break, the workshop switches over to jupyter notebook to demonstrate several methods in python for generating synthetic data of various forms.
  4. Q&A session
    We will conclude the workshop with an open Q&A session, offering participants the opportunity to engage directly with the content and ask questions related to the presentation or live coding demonstration.

Presenters: Jools Kasmire, Cristina Magder and Hina Zahid.

Resources
Participants will have access to all workshop materials, including slide decks and Jupyter notebooks, via a GitHub repository. Additionally, a recording of the workshop will be made available post-event.

Prerequisites
No formal prerequisites are required to attend. However, those who wish to actively participate in the coding demonstration should have:
• Access to a computer with Python installed, or an online Python environment.
• Basic Python knowledge (e.g., loading packages, handling data, etc.).

Additional opportunity
As part of this project, we will also be conducting a focus group with TRE representatives to further explore the practical implications and efficiencies of synthetic data. More details will be provided during the workshop for those interested in participating.

Any questions regarding this workshop can be sent to comms@ukdataservice.ac.uk.