Using agent-based models as a simulation tool to generate synthetic data
This free online workshop explores the intersection of two computational social science ideas. First, synthetic data are generated rather than observed and can be useful when research needs to balance research demands against data availability or security. Second, agent-based models are a kind of simulation which can generate synthetic data in diverse, anticipated or even counterfactual scenarios. Both are popular tools within social science research but benefit from careful attention to whether the synthetic data generation method and the research problem match in terms of being top-down or bottom-up.
Top-down problems arise in systems with a single, centralised controller and a clear, shared perspective. Examples include engines, some infrastructures, and some highly formalised organisations. Research on top-down problems can use data generated through top-down processes, such as most machine learning models or random number generators. Fortunately, these methods are typically fast, cheap and easy to understand.
In contrast, bottom-up problems emerge in systems with distributed interactions and unique perspectives. Examples include ecosystems, traffic jams, and social injustice. Research on bottom-up problems benefits most from data generated by bottom-up methods, including agent-based models and other distributed generation methods. These can sometimes be fast and cheap but may need a bit more explanation.
Research projects should match the synthetic data generation methods they want to use to the problem they want to address. This workshop will help researchers on that path by laying out the key ideas behind top-down and bottom-up processes and how agent-based modelling can be a relatively easy way to generate bottom-up synthetic data for bottom-up research problems.
Workshop details
The workshop is structured into four parts:
- Introduction to synthetic data
What is and is not synthetic data? What is it good for? What are some common ways to generate it? - Introduction to top-down versus bottom-up
What do top-down and bottom-up mean? How do these relate to research problems and data generation methods? - Introduction to agent-based models
What are agent-based models? How do they work? What data do they generate? This part includes demonstrations of free and user-friendly agent-based modelling software. - Q&A session
Participants can engage directly with the content, ask questions related to the presentation or live coding demonstration, and get links to useful resources.
There will be a comfort break during the workshop.
Presenter: Jools Kasmire, UK Data Service
Resources
Participants will have access to all workshop materials (slide decks, web-only agent-based models, and links to free agent-based modelling software). Additionally, a recording of the workshop will be made available post-event.
Prerequisites
No formal prerequisites are required to attend. However, those who wish to actively participate in the software demonstration should have access to a computer with NetLogo installed or have a browser open to the NetLogo web environment.
Level: Suitable for all
Experience/knowledge required: None, although participants who want to follow along with the live software demonstration may benefit from exploring NetLogo tutorials
Target audience: Anyone who wants to learn more about how to:
- generate synthetic data
- gain insight in matching data to a research problem, and/or
- use agent-based models to generate data for social science research
Any questions regarding this workshop can be sent to booking@ukdataservice.ac.uk.
This event will be livestreamed on our UK Data Service YouTube channel but the chat will be disabled. By registering and attending the Zoom event you will be able to ask questions and interact.
Recordings of UK Data Service events are made available on our YouTube channel and, together with the slides, on our past events pages soon after the event has taken place.