Accurate pathogenicity prediction of missense variants is critically important in genetic studies and clinical diagnosis. Previously published prediction methods have facilitated the interpretation of missense variants but have limited performance. Here we describe MVP, a new prediction method that uses deep residual network to leverage large training data sets and many correlated predictors. We trained the model separately in genes that are intolerant of loss of function variants and the ones that are tolerant, in order to take account of potentially different genetic effect size and mode of action. We compiled cancer mutation hotspots and de novo variants from developmental disorders for benchmarking. Overall, MVP achieved better performance in prioritizing pathogenic missense variants than previous methods, especially in genes tolerant of loss of function variants. Finally, using MVP, we estimated that de novo coding variants contribute to 7.8% of isolated congenital heart disease, nearly doubling previous estimate.
A preprint of earlier version is available at bioRxiv: https://doi.org/10.1101/259390