Handling missing data in administrative studies: multiple imputation and inverse probability weighting
Course Summary
This ADRC-E course will
consider the issues raised by missing data (both item and unit non-response) in
studies using routinely collected data, for example electronic health records.
Following a review of the issues raised by missing data, we will focus on two
methods of analysis: multiple imputation and inverse probability weighting. We
will also discuss how they can be used together. The concepts will be
illustrated with medical and social data examples.
Target Audience
The
course is aimed at quantitative researchers, who have an interest or experience
in analysing administrative data. PhD students are also welcome. Detailed
technical arguments will not be presented; instead the focus will be on
concepts and examples, with participants encouraged to bring their own data for
discussion.
This
course includes computer workshops, using the statistical software package
Stata. Full details of all commands will be given, so no previous experience
with Stata is necessary, though it will inevitably be an advantage.
Pre-requisites
Practical
experience using regression modelling (including survival data modelling) and
preferably multilevel modelling.
Further
course details can be found here.
More information regarding our courses can be found here.
Podcast for some of our previous courses can be found here.
Course Leader: Professor James Carpenter
Course contents
- Issues raised by missing data in the
administrative setting: when is a complete records analysis sufficient? - Shortcomings of ad-hoc methods
- Introduction to multiple imputation,
including algorithms, common pitfalls, reporting and examples - Introduction to inverse probability
weighting for missing data, and its pros and cons viz-a-viz multiple
imputation - Combining inverse probability weighting
and multiple imputation to improve robustness - Strategies for large datasets, including
the two-fold multiple imputation algorithm - Discussion of participants’ data.