Our laboratory uses computation and quantitative modeling to understand how molecular interactions between regulatory proteins and DNA shape the behavior of the biological cell. Genetic coding sequence makes uponly 2% of the human genome sequence, and is highly conserved across the tree of life. The remaining 98% of the genome consists of non-coding sequence thatis far less conserved, and holds the key to the biological complexity of higher organisms. Non-coding DNA determines how gene expression changes between genetically different individuals, between cell types, or in response to signals from outside the cell. The language that the cellular machinery uses to interpret the regulatory programs contained in the non-coding sequence, however, remains largely undeciphered. Being able to predict gene expression levels from the genome sequence therefore is one of the key challenges in the post-genomic era.
Technological advances have given rise to a variety of high-throughput methods for probing the state of the cell. Two of the most important types of functional genomics data, both generated at an ever-increasing pace in laboratories around the world, are: (i) genome-wide messenger RNA expression data, which is available for a wide range of tissues and conditions;and (ii) genome-wide protein-DNA interaction data for a variety of transcription factors (DNA-binding proteins known to contribute to gene expression regulation). However, three key characteristics of transcription factors – their DNA sequence specificity, tissue/condition-specific activity, and the in vivo connectivity to their target genes – are not measured directly in such experiments. They have to be inferred from the data. Our laboratory has pioneered novel computational approaches for doing so.
Through integrative computational analysis of steady-state gene expression data and genome sequence data we have discovered that the half-lives of most RNA transcripts in the model organism yeast are dynamically controlled in response to external signals, and that this mode of regulation is far more important then had previously been assumed. In addition, we have demonstrated both computationally and experimentally that the RNA-binding Puf3p protein is a downstream effector of the target of rapamycin (TOR) pathway; such a factor had been actively sought after for many years, but the focus had been on DNA-binding proteins.
We have developed, validated, and applied an accurate biophysical model of the protein-DNA interactions that shape the gene regulatory network. Our work updates and simplifies a 20-year old paradigm for modeling DNA sequence specificity, which was never designed to take full advantage of genome-scale datasets now available. Our MatrixREDUCE software is available through our lab webpage.
Our work has also provided new insight into the relation between transcription factor binding and gene expression. Through joint modeling of transcription factor binding and RNA expression data, we were able to demonstrate that roughly 40% of all transcription factor binding in yeast is non-functional. Thus, transcription factor binding data should be interpreted with care. Our approach provides a practical alternative to the use of comparative genomics for distinguishing functional from non-functional transcription factor binding sites.
Finally, in collaboration with Bas van Steensel and Kevin White, we have made the surprising discovery that 5% of the Drosophila genome is simultaneously targeted by most euchromatic transcription factors through a mechanism that does not require the DNA-binding domain of those factors.
MedLine Listing of Dr. Bussemaker's Publications