Web supplement to
"A Unified Model for Yeast Transcript Definition"

Carl G. de Boer1, Harm van Bakel2, Kyle Tsui3, Joyce Li2, Quaid D. Morris1,2, Corey Nislow1,2, Jack F. Greenblatt1,2, Timothy R. Hughes1,2*

Affiliations: 1 Department of Molecular Genetics, 2 Banting and Best Department of Medical Research and Donnelly Centre for Cellular and Biomolecular Research, 3 Department of Pharmaceutical Sciences, University of Toronto, Toronto, ON, M5S 3E1, CANADA
Corresponding authors: and

The predictions of the model across the genome are available in the genome browser.

Abstract

Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions, and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling the entire process of transcription using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-Seq data, and test it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of non-genic transcripts, and provides insight into genome evolution.

Supplementary information

Models

Combinatorial promoter library

Note the following sequencing files are very large, so please only download them off peak hours.

6KB random DNA sequences

Genome Browser Files

May 2008 Genome Version


20110203 R64 Genome Version

These are the same tracks as those available in the genome browser.