Web supplement to
"Exon Trapping Genome"

Nick Stepankiw1,2*, Ally W.H. Yang1,2*, and Timothy R. Hughes1,2§

1Department of Molecular Genetics, University of Toronto, Toronto, ON
2Donnelly Centre, University of Toronto, Toronto, ON

§To whom correspondance should be addressed:

Abstract

Eukaryotic mRNAs and lncRNA exons are often small compared to introns. The exon definition model predicts that exons splice autonomously, dependent on proximal exon sequence features, explaining their delineation within large introns. This model has not been examined on a genome-wide scale, however, leaving open the question of how often mRNA and lncRNA exons are autonomous. It is also unknown how frequently such exons can arise by chance. Here, we directly assayed large fragments (500-1000 bp) of the human genome by exon trapping, which detects exons spliced into a heterologous transgene, here designed with a large intron context. We define these exons as “autonomous”. We obtained ~1.25 million exons, including most known mRNA and well-annotated lncRNA internal exons, demonstrating that human exons are predominantly autonomous. mRNA exons are trapped with highest efficiency. Nearly a million of the trapped exons are unannotated, most located in intergenic regions and antisense to mRNA, with depletion from the forward strand of introns. These exons are not conserved, indicating they are non-functional and likely arose from random mutations. They are nonetheless highly enriched with known splicing promoting sequence features delineating known exons. Novel autonomous exons are more abundant than annotated lncRNA exons, and computational models also indicate they will occur with similar frequency in any randomly generated sequence. These results show that most human coding exons splice autonomously, and provide an explanation for the existence of many unconserved lncRNAs, as well as a new annotation and inclusion levels of spliceable loci in the human genome.

Supplemental data

Supplemental data (large zip file)

Source code

Code on GitHub