Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Journal article

Elliot I. Layne, Dhanya Sridhar, Jason S. Hartford, M. Blanchette
2022

Semantic Scholar

Cite

APA Click to copy
Layne, E. I., Sridhar, D., Hartford, J. S., & Blanchette, M. (2022). Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations.

Chicago/Turabian Click to copy
Layne, Elliot I., Dhanya Sridhar, Jason S. Hartford, and M. Blanchette. “Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations” (2022).

MLA Click to copy
Layne, Elliot I., et al. Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations. 2022.

BibTeX Click to copy

@article{elliot2022a,
  title = {Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations},
  year = {2022},
  author = {Layne, Elliot I. and Sridhar, Dhanya and Hartford, Jason S. and Blanchette, M.}
}

Abstract

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning methods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.