|
The principal question being pursued in our laboratory is
how the cellular splicing machinery recognizes the exons it must join during
the maturation of mRNA from long primary transcripts. The 3 sequence motifs
that are almost always associated with exons -- the branch site, the upstream
acceptor splice site, and the downstream donor splice site -- provide
insufficient information for molecular recognition. “Pseudo” exons bordered by
these elements outnumber the real exons by at least an order of magnitude. We
are trying to provide a global definition of the additional informational
elements that play roles in defining exons for constitutive and alternative
splicing and to uncover their mode of action.
Over the last several years we have used
computational methods to ferret out some of this information. Using machine
learning techniques we have found that 50 nt intronic stretches on either side
of exons, beyond the splices site consensus sequences themselves, contain
information that is necessary for the efficient splicing of most human exons (9,6).
The information at this stage is in the form of 5-mers that are overrepresented
in these regions. We would now like to know the exact nature of these signaling
elements (intronic splicing enhancers, ISEs), the step(s) in splicing at which
they act, and the proteins that mediate their effects.
Additional information lies within the
exon bodies in the form of exonic splicing enhancers (ESEs) and exonic splicing
silencers (ESSs). Using genomic statistical analysis, we compiled lists of
8-mers as putative ESEs (PESEs) and putative ESSs (PESSs) in each class and
showed that the most of the predicted motifs can function as expected (8,7).
You can visit our online PESX
utility to find these 8-mers in your own sequence and see reference 5 for
our computational approaches.
This work along with similar successes of
other laboratories now makes it clear that exons and their flanks are filled
with a dense population of regulatory elements. Our task now lies in figuring
out how this rich sequence information is integrated to make what is usually a
binary decision to splice. But the high density makes it difficult to make
solitary genetic changes, and makes interpretations ambiguous. To circumvent this problem we have turned to
synthetic exons that we design to contain isolated enhancer, silencer and
neutral modules (1). We hope that the
rules governing splicing will be more apparent by the pointed manipulation of
these “designer exons.”
Comparative genomics is another tool we
are using to decipher splicing information and to view the evolutionary
pressures exerted upon these sequences. In the course of these experiments we
discovered that the most recently created mammalian exons stem largely from
repeated sequences and are spliced inefficiently and are often non-protein
coding (5).We also find that new ESEs are constantly being created and ESSs
destroyed as the genomes strives to maintain splicing efficiency in the face of
continual mutation (3).
Our review dealing with the definition of
splicing regulatory motifs can be found in reference 4.
Our present efforts include:
A) Algorithm development to predict
splice sites based on information available to the cell, including secondary
structure prediction.
B) Genetic definition of flank signals
that function as ISEs, including parameters that define a branch point.
C) Learning the rules of ESE, ESS, ISE
and ISS interaction through the de novo design of synthetic exons.
D) Testing the hypothesis that ESSs play
a general role in the repression of false splice sites.
E) Defining the effect of sequence
context on the action of splicing regulatory elements, using deep sequencing
technology.
F) Defining the splicing factor interactions
necessary for exon definition.
G) Computational analysis of the effect
of secondary structure on splice site selection.
____________________________________________________________________
A second project in the area of
biotechnology: We are isolating engineered derivatives of Chinese hamster ovary
(CHO) cells that are capable of rapid gene amplification to speed up the
development of recombinant protein based therapeutics, and developing vectors
that increase recombinant protein production through more efficient
posttranscriptional processing of recombinant transcripts.
_________________________________________________________________
PESE score profile of human chuk exon 8 (black curve) and the effect of mutations on PESE score (red curves or blue curves) and splicing (rectangles).
|