U.S. Bureau of the Census

09/05/2024 | Press release | Distributed by Public on 09/05/2024 14:20

Comparison of Tests and Confidence Intervals for Univariate Normal Mean Based on Multiply Imputed Synthetic Data Obtained by Posterior Predictive Sampling

There is a huge literature on data analysis under privacy or confidentiality protection. Among many inferential statistical methods based on parametric models, data analysis based on perturbation of original sensitive data using plug-in and posterior predictive sampling are quite common. In this paper we consider a very basic inferential problem of tests and confidence intervals for a normal mean with unknown variance based on synthetic data obtained from multiple imputations under posterior predictive sampling method. Several methods are suggested and compared. A general expression of the local power of a class of tests is also derived which can be used in a design context to determine a combination of sample size and number of imputations to guarantee a desired level of local power. A measure of privacy protection is derived to demonstrate that privacy would be compromised if too many imputations are released. An application to draw inference about the household earnings, corresponding to a US Census Bureau data, is illustrated.