Estimation of the mean of positively skewed distributions with applications to estimation of exposure to contaminated soils
We consider estimating the mean of a positively skewed distribution from both random and purposive samples. It has been noted that in random samples the sample mean has a large probability of falling below the mean of the distribution, because of such skewness. Various ad hoc procedures have been proposed to correct this low coverage of the mean in order to conservatively estimate the long-term exposure to contaminated soils at toxic waste sites. We propose a direct estimate of the mean based on a penalized loss function. This loss function is made up of a squared error loss plus a penalty for each observation that falls above the estimate. The resulting minimum risk estimate, called the penalized mean, is derived iteratively and shown to be biased in favor of greater coverage. We show that, asymptotically, a one-step iterate of the penalized mean is unbiased, converges almost surely to the true mean, and with mild assumptions on the form of the penalty, is normally distributed. Based on a penalized loss, we show that the sample mean is inadmissible in the normal distribution and a class of positively skewed distributions. Choices of a penalty parameter result in a penalized estimate of the mean with greater coverage than that of the sample mean. Approaches to choosing this parameter optimally are also investigated. Extensions of this procedure to nonrandom purposive samples that arise from selective sampling in highly contaminated regions are also developed.