Background Both CD4 count and viral load in HIV infected persons are measured with error. sciences and closely related to multiple imputation can be used to account for both missing at random data measurement error in HIV research under a general framework. Multiple overimputation treats mismeasured Rabbit polyclonal to PLEKHA9. data as an extreme case of missing data: values measured with error are replaced with values obtained from an imputation model that incorporates the mismeasured values as well as knowledge and assumptions about the measurement error process in prior distributions on individual measurements. After generating multiple overimputed datasets standard multiple imputation combining rules can be applied to obtain valid inference under assumptions which are similar to missing at random. The Masitinib mesylate method has the main advantages of (i) being easy to implement with existing software (ii) being applicable to a wide range of analysis models and settings including longitudinal data analyses and (iii) addressing measurement error and missing data simultaneously. While the method has been tested in the political sciences and first simulations showed promising results in the context of linear and logistic regression models little is known about the assumptions behaviour and success of the method in the context of HIV analyses particularly survival analyses. We therefore aim to 1) identify an appropriate measurement error model for CD4 count and viral load 2 to investigate the implications assumptions and challenges related to the implementation of multiple overimputation in HIV research using South African HIV treatment cohort data from patients starting on highly active antiretroviral treatment (HAART) and 3) to quantify the association of both baseline and follow-up CD4 count and viral load with all-cause mortality and to explore the possible bias resulting from ignoring measurement error and missing data in this illustrative example. In addition 4 simulations are used to evaluate the extent to which multiple overimputation is able to reduce bias arising from measurement error and missing data in a wide range of survival analysis settings Methods Framework of multiple overimputation in general and for HIV research Multiple Overimputation Multiple Overimputation builds on multiple imputation by interpreting mismeasured values as missing data but including the mismeasured values as prior information in the imputation model. The procedure is as follows: Multiply impute (say M=5 occasions) Masitinib mesylate missing values and multiply overimpute (replace overwrite) mismeasured values based on an appropriate imputation model which uses assumptions about the mismeasured data as prior information. Conduct any statistical inference (Cox model Kaplan-Meier estimator …) on each overimputed set of data. Combine the M estimates related to the M overimputed sets of data according to standard multiple imputation combining rules (“Rubin’s rules”) 26. For example if we had 1000 patients and 800 of them had available baseline CD4 counts we would impute the remaining 200; the 800 measured CD4 counts would be treated as mismeasured as we know that they don’t exactly represent the true CD 4 count number of a patient but rather randomly differ from the true value. We would thus overwrite these values from an imputation model which uses our assumptions about the measurement error process as prior information. Subsequently we would perform our analysis on each overimputed dataset and combine the results accordingly. Multiple imputation with Amelia II It is known from multiple imputation theory that multiple imputations (yielding valid inference under the missing-at-random assumption) are realized via draws from the posterior predictive distribution of the unobserved data given the observed data 25. These draws can for example be generated by specifying a multivariate distribution of the data and simulating the predictive posteriori distribution with a suitable algorithm. For our analysis we consider the Expectation Maximization Bootstrap (EMB) algorithm 27 from the bootstrap samples of the data (including missing values) are drawn and in each bootstrap sample the EM algorithm 29 is usually applied Masitinib mesylate to Masitinib mesylate obtain estimates of μ and Σ which can then be used to generate proper imputations by means of the sweep-operator 27 30 Of note the algorithm can handle highly skewed variables by.