The Effect of Nonsense Codons on Splicing: A Genomic Analysis
Xiang Zhang, James Lee and Lawrence A. Chasin
Department of Biological Sciences, Columbia University, New York, NY 10027
Two recent papers have bolstered the idea that the translatability of an exon can influence its splicing. Wang et al. (Wang et al. 2002) have shown that an exon in the T-cell receptor transcript is not only excluded when it contains a nonsense codon but that this event is accompanied by the inclusion of an overlapping cryptic alternative exon. The new exon restores an open reading frame. Independently, Li et al. (Li et al. 2002) have shown that latent 5' splice sites downstream of an exon can be activated if all in-frame nonsense codons are removed from the region between it and the upstream exon. In these mutated pre-mRNA molecules a new enlarged exon is chosen for splicing, based on its new-found extended translatability. The explanations offered for these two results entail nuclear recognition of the translatability of an exon before it is spliced.
The recognition of exons or introns during the splicing process cannot rest on the splice site sequences alone, since similar sequences abound in large pre-mRNA molecules (Senapathy et al. 1990). There is considerable evidence that it is the exon that is the initial element of recognition, providing a size constraint for choosing real splice sites. However, if we define “pseudo exons” as intronic regions of typical exon size (50-250 nt) bounded by sequences that closely match the consensuses for 3' and 5' splice sites, their abundance still outweighs that of real exons by an order of magnitude (Sun and Chasin 2000). Splicing enhancers, assayed mostly in the context of alternatively spliced exons, are also quite varied and degenerate, and candidate sequences are easily found in pseudo exons (data not shown). How then does the cell distinguish real exons from pseudo exons? Exon translatability could be providing the missing information.
To test this idea we examined the predicted translatability of pseudo exons. If pseudo exons are being generally discriminated against because they contain nonsense codons, then each should contain at least one in-frame nonsense codon. If on the other hand nonsense codons play no such role, then nonsense codons should occur at a frequency dictated simply by chance. We culled pseudo exons from a human intron-exon database (Saxonov et al. 2000) after eliminating redundant and predicted genes. We defined these pseudo exons as intronic sequences that have at the upstream end a pseudo splice site with a consensus 3’ matrix score of 78 and at the downstream end a pseudo splice site with a consensus 5’ matrix score of 75; using these criteria, 75% of real exons are captured. In addition, pseudo exons had to be far from real exons (>400 nt), not overlap, be free of highly repeated sequences, and not resemble any sequence found in the human EST database. Only exons of length less than 110 nt were considered; this size limitation was imposed because the chance occurrence of a nonsense codon approaches 100% for longer sequences. In-frame nonsense codons were counted in these pseudo exons, the reading frame being defined by that of the upstream real exon. Out-of-frame nonsense codons were also tallied for comparison. We calculated the proportion of pseudo exons expected to have at least one in-frame stop codon by chance based on the overall frequency of stop triplets in repeat-free intron sequences (0.049 per triplet, or slightly more than 3/64=0.047). The results for these 2850 pseudo exons are shown as a function of pseudo exon size in Fig. 1A. The frequency of in-frame nonsense codons was not different from that expected by chance alone (indicated by the line in Fig. 1A). For instance, for pseudo exons up to length 70, the proportion with at least one in-frame nonsense codon was 0.46, that expected by chance is 0.44 (p=0.11), and that expected from the translatability hypothesis is 1.0 (p<10-10 even taking the expectation to be 0.75 rather than 1.0). We conclude that nonsense codons do not play a role in the cellular exclusion of these pseudo exons.
Similarly, we tested the idea that nonsense codons are interposed between real 5’ splice sites and downstream latent (pseudo) 5’ splice sites so as to subvert the use of the downstream site. If this were a general mechanism for 5’ splice site definition then the proportion of such intron regions containing at least one in-frame nonsense codon should be 1.0. If not, nonsense codons should occur simply by chance. To calculate the proportion based on chance, we considered the intron flank to be made up of 2 components: the real splice site itself, which has a high probability of harboring a nonsense codon due to the consensus (C or A)AG/GURAGU, which carries a nonsense URA triplet at positions +2 to +4; and the remaining sequence. For the former, we used the weight matrix of real sites to calculate the probability of finding a stop codon. For the latter, we used the overall frequency of stop codons in the downstream 100 nt flanks of the real exons in our database. This frequency was 0.038 per triplet, which is less than 3/64 (0.047) expected on a random basis, reflecting the non-random nature of sequences flanking exons (Nussinov, 1989; Engelbrecht et al. 1992). We analyzed 1164 real 5’ splice sites situated less than or equal to 100 nt upstream of a latent 5’ splice site that had a consensus matrix score greater than the median of real sites. The occurrence of nonsense codons as a function of the distance between the splice site and the downstream latent 5’ splice site is shown in Fig. 1B. The proportion of intervening intron sequences having at least 1 in-frame nonsense codon was not greater than that expected by chance (the line in Fig. 1B), and considerably less than the 1.0 predicted by a translatability hypothesis. For a statistical treatment of this data we considered the 598 sequences with false sites within 50 nt in the downstream flanks. Of this set 0.42 have at least one stop codon between the false sites and the 5' splice sites (0.18 within the splice site and 0.24 in the region beyond). The corresponding expectation for the total based on chance is 0.43. The values for the totals are not statistically different (p=0.71). In contrast, the observed value of 0.42 is different from the expectation based on a nonsense requirement to inactivate the latent splice site of 1.0 (p<10-80even taking the expectation to be 0.75 rather than 1.0). We conclude that nonsense codons do not act in a general way to stifle potential competition by a neighboring latent 5’ splice site.
These conclusions are in agreement with several genetic studies in which nonsense mutations that did not affect splicing were described, for example in the genes for dhfr (Urlaub et al. 1989), aprt (Kessler and Chasin 1996) or hprt (Valentine 1998). Similarly, no nonsense mutations were found among 70 mutants of a 3-exon dhfr minigene selected for skipping of the central exon (Chen and Chasin 1993).
The work that instigated
this analysis (Li et al. 2002; Wang et al. 2002) has provided impressive
evidence that translatability can affect splicing decisions. Nuclear
recognition of translatability presumably involves a complex mechanism.
Our analysis now adds the question of why such complexity would emerge if it
were not to be used generally.


Fig.1. The occurrence of in-frame (solid circles) nonsense
codons in intron regions. The average for the 2 types of out-of-frame nonsense
codons are included for comparison (open circles). The line represents the
result expected by chance alone for an in-frame nonsense codon, using a Poisson
distribution. A. Pseudo exons;
windows of 10 were grouped. B. Flanks downstream of exons; windows of 5 were
grouped; the starting point of each window is plotted. Lists of the sequences underlying this
data can be found at www.columbia.edu/data/cu/biology/faculty/chasin/rnajournal
References
Chen, I.T. and L.A. Chasin. 1993. Direct selection for mutations affecting specific splice sites in a hamster dihydrofolate reductase minigene. Mol Cell Biol 13: 289-300.
Engelbrecht, J., S. Knudsen, and S. Brunak. 1992. G+C-rich tract in 5' end of human introns. J Mol Biol 227: 108-113.
Kessler, O. and L.A. Chasin. 1996. Effects of nonsense mutations on nuclear and cytoplasmic adenine phosphoribosyltransferase RNA. Mol Cell Biol 16: 4426-35.
Li, B., C. Wachtel, E. Miriami, G. Yahalom, G. Friedlander, G. Sharon, R. Sperling, and J. Sperling. 2002. Stop codons affect 5' splice site selection by surveillance of splicing. Proc Natl Acad Sci U S A 99: 5277-82.
Nussinov, R. 1989. Conserved signals around the 5' splice sites in eukaryotic nuclear precursor mRNAs: G-runs are frequent in the introns and C in the exons near both 5' and 3' splice sites. J Biomol Struct Dyn. 6: 985-1000.
Saxonov, S., I. Daizadeh, A. Fedorov, and W. Gilbert. 2000. EID: the Exon-Intron Database-an exhaustive database of protein-coding intron-containing genes. Nucleic Acids Res 28: 185-90.
Senapathy, P., M.B. Shapiro, and N.L. Harris. 1990. Splice junctions, branch point sites, and exons: sequence statistics, identification, and applications to genome project. Methods Enzymol 183: 252-78.
Sun, H. and L.A. Chasin. 2000. Multiple splicing defects in an intronic false exon. Mol Cell Biol 20: 6414-25.
Urlaub, G., P.J. Mitchell, C.J. Ciudad, and L.A. Chasin. 1989. Nonsense mutations in the dihydrofolate reductase gene affect RNA processing. Mol Cell Biol 9: 2868-80.
Valentine, C.R. 1998. The association of nonsense codons with exon skipping. Mutat Res 411: 87-117.
Wang, J., J.I. Hamilton, M.S. Carter, S. Li, and M.F. Wilkinson. 2002. Alternatively spliced TCR mRNA induced by disruption of reading frame. Science 297: 108-10.