Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching | Dr. Kristian Kleinke

Multiple Imputation Under Violated Distributional Assumptions: A Systematic Evaluation of the Assumed Robustness of Predictive Mean Matching

Abstract

Predictive mean matching (PMM) is a standard technique for the imputation of incomplete continuous data. PMM imputes an actual observed value, whose predicted value is among a set of k ≥ 1 values (the so-called donor pool), which are closest to the one predicted for the missing case. PMM is usually better able to preserve the original distribution of the empirical data than fully parametric multiple imputation (MI) approaches, when empirical data deviate from their distributional assumptions. Use of PMM is therefore especially worthwhile in situations where model assumptions of fully parametric MI procedures are violated and where fully parametric procedures would yield highly implausible estimates. Unfortunately, today there are only a handful of studies that systematically tested the robustness of PMM and it is still widely unknown where exactly the limits of this procedure lie. I examined the performance of PMM in situations where data were skewed to varying degrees, under different sample sizes, missing data percentages, and using different settings of the PMM approach. It was found that small donor pools overall yielded better results than large donor pools and that PMM generally worked well, unless data were highly skewed and more than about 20% to 30% of the data had to be imputed. Also, PMM generally performed better when sample size was sufficiently large.

Publication
Journal of Educational and Behavioral Statistics, 42(4), 371–404