Whole Genome De Novo Variant Identification with FreeBayes and Neural Network Approaches

Richter F *, Morton SU *, Qi H *, Kitaygorodsky A *, Wang J, Homsy J, DePalma S, Patel N, Gelb BD, Seidman JG, Seidman CE, Shen Y

bioRxiv, 2020.

Lab members marked as bold; * authors with equal contribution

Abstract

Motivation De novo variant (DNV) calling typically relies on heuristic filters intrinsic to specific platforms and variant calling algorithms. FreeBayes and neural network approaches have overcome this limitation for variant calling, and we implemented a similar approach for DNV identification.

Results We developed a DNV calling framework that uses Genome Analysis Toolkit (GATK), FreeBayes and a neural network trained on Integrative Genomics Viewer pile-up plots (IGV-bot). We identified DNVs in 2,390 WGS trios and benchmarked results against heuristics based on GATK parameters. Results were validated in silico and with Sanger sequencing, with the latter showing true positive rates of 98.4% and 97.3% for SNVs and indels, respectively. Taken together we describe a scalable framework for DNV identification based on both FreeBayes and neural network methods.

Availability Source code and documentation are available at https://github.com/ShenLab/igv-classifier and https://github.com/frichter/dnv_pipeline under the MIT license.