A hidden Markov model for copy number variant prediction from whole genome resequencing data.

Shen Y, Gu Y, Pe'er I

BMC bioinformatics, 2011.

Lab members marked as bold


MOTIVATION: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. RESULTS: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. AVAILABILITY: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel/