Combining Data from Multiple Administrative and Survey Sources for Statistical Purposes
Course Summary:
Day one provides a
general introduction to combining multiple administrative and survey datasets
for statistical purposes. A total-error framework is presented for integrated
statistical data, which provides a systematic overview of the origin and nature
of the various potential errors. The most typical data configurations are
illustrated and the relevant statistical methods reviewed.
Day two covers a
handful of selected statistical methods. Training will be given on the
techniques of data fusion, or statistical matching, by which joint statistical
data is created from separate marginal observations. The participants will be
introduced to several imputation or adjustment techniques, in the presence of
constraints arising from overlapping data sources.
Target Audience:
This course is ideal for social and
medical researchers with interests in combining data from multiple sources or
analysing data from different sources; staff at National Statistical Institutes
(or similar organisations) who are involved in the design, management and
quality assurance of statistical processes based on data from multiple sources
including censuses, administrative data and sample surveys.
Pre-requisites:
Understanding of the following are required:
central concepts of statistical
uncertainty (such as bias, variance, confidence interval) and distribution,
basic knowledge of data cleaning and imputation, basic experience/skill of R
for statistical computing. Methodological
training, knowledge and experience will be helpful.
Further details regarding this course can be
found here.
To know more about our Short Courses, visit our
webpage here.
Podcast for some of our previous courses can be found here.
Course
Leader: Prof Li-Chun Zhang
Course
Contents:
- Life-cycle of
integrated statistical data and transformation processes - A framework of
error sources associated with data integration - Population
coverage and unit errors - Uncertainty
and techniques of categorical data fusion, or statistical matching - Imputation and
adjustment methods subjected to micro- and macro-level constraints
end of the course participants will have gained:
- Understanding
of potential errors and statistical uncertainty involved in data integration - Ability to
apply relevant concepts and methods in practice - Appreciation of opportunities and challenges of inference based on data
integration