Unsupervised domain adaptation under hidden confounding

Published in arXiv, 2024

We introduce a new predictive mechanism, Generative Invariance, that operates in the presence of hidden confounding across distributionally diverse data sources while ensuring consistent estimation of causal parameters—despite their recognized suboptimality for prediction in the literature. Our method is based on a novel estimand that captures the dependence structure between response noise and covariates, incorporating causal parameters into a generative model that adaptively replicates the conditional distribution of the test environment. Identifiability is achieved under a straightforward, empirically verifiable assumption. Our approach ensures probabilistic alignment with test distributions uniformly across arbitrary interventions, enabling valid predictions without requiring worst-case optimization or assumptions about the strength of perturbations at test time. Through extensive simulations, we demonstrate that our method outperforms state-of-the-art invariance-based and domain adaptation approaches. Additionally, we validate its practical applicability and superior target risk performance on a cardiovascular disease dataset.