Bayesian Approach for Sample Selection Bias Correction in Regression
Selection bias occurs when samples are self-selected rather than randomly selected from the target population. This is a well-known problem and has been extensively studied in research studies in statistics and economics. I this work, I adopt a Bayesian approach to correct sample selection bias under the self-selection setup proposed in Heckman model. Bayesian methods treat the population parameters of interest as random variables instead of unknown constants. The distributions of these random parameters are called prior distributions. Statistical inference is based on the posterior distribution, which combines information from the data and the prior. Markov Chain Monte Carlo (MCMC) methods are used for Bayesian computation of the posterior distributions. The results from the proposed Bayesian method are compared to that of Heckman's two-step estimation via various simulation studies. A comprehensive simulation study is conducted where various scenarios are considered for the simulation setup and design. Furthermore, in addition to the most common self-selection setup, the new approach is extended to handle self-selection with Binary outcome model.