Human Genetics
Identify genetic causes of human conditions provides the foundation for precise diagnosis and risk prediction, in-depth understanding of disease mechanisms, and effective targets for intervention. However, the vast majority of genetics of complex conditions are still unknown. To improve the ability to identify risk variants and genes, we develop new computational methods to integrate gene expression and epigenomic data with genetic data and to leverage large scale population genome data. We apply these methods in genetic studies of a broad range of human diseases and conditions. Recently we have been working on autism, congenital diaphragmatic hernia, congenital heart disease, pulmonary hypertension, tracheoesophageal defects, and breast cancer.
Selected papers
- Identification and validation of novel candidate risk genes in endocytic vesicular trafficking associated with esophageal atresia and tracheoesophageal fistulas. HGG Advances. 2022.
- Integrating de novo and inherited variants in over 42,607 autism cases identifies mutations in new moderate risk genes. medRxiv. 2021.
- Imputing cognitive impairment in SPARK, a large autism cohort. Autism Research. 2022.
- Rare variant analysis of 4,241 pulmonary arterial hypertension cases from an international consortium implicate FBLN2, PDGFD and rare de novo variants in PAH. Genome Medicine. 2021.
- Penetrance of breast cancer genes from the eMERGE III Network. JNCI Cancer Spectrum. 2021.
- Functional interrogation of DNA damage response variants with base editing screens. Cell. 2021.
- Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nature Genetics. 2020.
- Likely damaging de novo variants in congenital diaphragmatic hernia patients are associated with worse clinical outcomes. Genetics in Medicine. 2020.
- Novel Candidate Genes in Esophageal Atresia/Tracheoesophageal Fistula Identified by Exome Sequencing. EJHG. 2020.
- Dissecting Autism Genetic Risk Using Single-cell RNA-seq Data. bioRxiv. 2020.
- EM-mosaic detects mosaic point mutations that contribute to congenital heart disease. Genome medicine. 2020.
- Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes. NPJ genomic medicine. 2019.
- De novo variants in congenital diaphragmatic hernia identify MYRF as a new syndrome and reveal genetic overlaps with other developmental disorders. PLoS genetics. 2018.
- Rare variants in SOX17 are associated with pulmonary arterial hypertension with congenital heart disease. Genome medicine. 2018.
- A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes. Human mutation. 2016.
- Deep Genetic Connection Between Cancer and Developmental Disorders. Human mutation. 2016.
- De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science. 2016.
- Increased burden of de novo predicted deleterious variants in complex congenital diaphragmatic hernia. Human molecular genetics. 2015.
* * * * * *
Software
- gMVP (updated 2021), a graph attention neural network method for predicting functional effect of missense variants. Preprint is on bioRxiv
- MVP (updated 2021), an ensemble predictor of deleterious genetic effect of missense variants using residual convolutional neural networks, as described in Qi et al 2021.
- EM-mosaic (updated 2019), a method to detect mosaic mutations from trio exome or whole genome sequencing data. Use EM algorithm to infer overall rate of mosaicism among de novo mutations. Published in Hsieh et al 2020
- A-risk (updated 2020), a method to predict plausibility of being risk genes of autism based on expression patterns in brain single cells. Manuscript under prep.
- Episcore (Updated 2018), a method to predict gene haploinsufficiency based on human epigenomic profiles under normal conditions, as described in Han, Chen et al 2018
- HotSpots (Updated 2016) a method to infer cancer somatic mutation hotspots, as descriibed in Qi et al 2016
- CANOES, (Updated 2014) a method to call CNVs from exome sequencing data with arbitrary number of reference samples.
- Repertoire, (Updated 2015) Scripts for analyzing T cell receptor repertoire sequencing data.
- OPERA , (updated 2012) an online power calculator for genome sequencing and genetic studies
- Zinfandel, (updated 2012) calling Copy Number Variants (CNVs) from whole genome data based on both depth of coverage and mate pair information
- Atlas-SNP, (last updated 2010) bioinformatics analysis of next-generation sequencing, optimized for variant calling from 454 data
- Genometer, (last updated 2011) estimating genome size in the presence of repeats
Data
Homsy et al, Science, 2015: de novo mutations from 1120 congenital heart disease cases
All supplementary tables (a zip file):
- S1: Phenotypes for each case proband, including cardiac, neurodevelopmental disorders and extra-cardiac congenital anomalies.
- S2: List of de novo Mutations in CHD case cohort.
- S3: List of de novo Mutations in Control cohort.
- S4: List of de novo probabilities for each variant class in each protein-coding gene on the Nimblegen V2 exome, adjusted for depth in Cases.
- S5: List of de novo probabilities for each variant class in each protein-coding gene on the Nimblegen V2 exome, adjusted for depth in Controls.
- S6: Functional term enrichment analysis of all Genes with Damaging (loss of function + deleterious missense) de novo mutations in all cases.
- S7: Functional term enrichment analysis of all Genes with Loss of Function de novo mutations in 860 new cases.
- S8: List of 1,563 variants (1,161 unique genes) with damaging de novo mutations from 7 independent NDD cohorts.
- S9: Functional term enrichment analysis among 69 genes with Damaging de novo mutations overlapping between CHD cases and the published NDD (P-NDD) cohort.
- S10: Percentile ranks of genes by expression in the developing mouse heart and brain.
GWAS of drug adverse reactions
GWAS data sets from Serious Adverse Event Consortium, related to these papers (Daly et al 2009; Shen et al 2011; Lucena et al 2011; Overby et al 2014) can be found at SAEC Data Portal (Registration required). Processed files ready for PLINK are available upon request (yshen@c2b2.columbia.edu).
Detecting CNVs from exome sequencing
Exome sequencing and genotyping data used in CANOES (Backenroth et al 2014) is from NHLBI Pediatric Cardiac Genomics Consortium (PCGC) and available through dbGaP