A probabilistic graphical model for estimating selection coefficient of nonsynonymous variants from human population sequence data

Zhao Y, Zhong G, Hagen J, Pan H, Chung WK, Shen Y

medRxiv, 2023.

Lab members marked as bold

Abstract

Accurately predicting the effect of missense variants is important in discovering disease risk genes and clinical genetic diagnostics. Commonly used computational methods predict pathogenicity, which does not capture the quantitative impact on fitness in humans. We developed a method, MisFit, to estimate missense fitness effect using a graphical model. MisFit jointly models the effect at a molecular level (d) and a population level (selection coefficient, s), assuming that in the same gene, missense variants with similar d have similar s. We trained it by maximizing probability of observed allele counts in 236,017 European individuals. We show that s is informative in predicting allele frequency across ancestries and consistent with the fraction of de novo mutations in sites under strong selection. Further, s outperforms previous methods in prioritizing de novo missense variants in individuals with neurodevelopmental disorders. In conclusion, MisFit accurately predicts s and yields new insights from genomic data.