The functional landscape of mouse gene expression
Wen Zhang[1,2*], Quaid D. Morris[1,3*], Richard Chang[1], Ofer Shai[3], Malina
A. Bakowski[1], Nicholas Mitsakakis[1], Naveed Mohammad[1], Mark D. Robinson[1],
Ralph Zirnglibl[2], Eszter Somogyi[2], Nancy Laurin[2], Eftekhar Eftekharpour[4],
Eric Sat[5], Jörg Grigull[1], Qun Pan[1], Wen-Tao Peng[1], Nevan Krogan[1],
Jack Greenblatt[1,2], Michael Fehlings[4,6], Derek van der Kooy[2], Jane Aubin[2],
Benoit G. Bruneau[2,7], Janet Rossant[2,5], Benjamin J. Blencowe[1,2], Brendan
J. Frey[3], and Timothy R. Hughes[1,2§]
[1] Banting and Best Department of Medical Research
[2] Department of Medical Genetics and Microbiology
[3] Department of Electrical and Computer Engineering
[4] Department of Surgery, University of Toronto
1 King's College Circle, Toronto, Ontario M5S 1A8
[5] Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University
Avenue, Toronto, Ontario M5G 1X5
[6] Division of Cell and Molecular Biology
Toronto Western Research Institute and Krembil Neuroscience Center 399 Bathurst
St. Toronto Ontario M5T 2S8
[7] The Hospital for Sick Children, 555 University Ave., Toronto, ON M5G 1X8
*These authors contributed equally
§ To whom correspondence should be addressed:
t.hughes@utoronto.ca 416-946-8260
FAX: 416-978-8528
Running title: functional landscape of mouse gene expression
Abstract
Background. Large-scale quantitative analysis of transcriptional co-expression
has been used to dissect regulatory networks and to predict the functions
of new genes discovered by genome sequencing in model organisms such as yeast.
Although the idea that tissue-specific expression is indicative of gene function
in mammals is widely accepted, it has not been objectively tested nor compared
with the related but distinct strategy of correlating gene co-expression as
a means to predict gene function.
Results. We generated microarray expression data for nearly 40,000
known and predicted mRNAs in 55 mouse tissues, using custom-built oligonucleotide
arrays. We show that quantitative transcriptional co-expression is a powerful
predictor of mammalian gene function. Hundreds of functional categories, as
defined by Gene Ontology "Biological Processes", are associated with characteristic
expression patterns across all tissues, including categories that bear no
overt relationship to the the tissue of origin. In contrast, simple tissue-specific
restriction of expression is a poor predictor of which genes are in which
functional categories. As an example, the highly conserved mouse gene PWP1
is widely expressed across different tissues but is co-expressed with many
RNA-processing genes; we show that the uncharacterized yeast homolog of PWP1
is required for rRNA biogenesis.
Conclusions. We conclude that 'functional genomics' strategies based
on quantitative transcriptional co-expression will be as fruitful in mammals
as they have been in simpler organisms, and that transcriptional control of
mammalian physiology is more modular than is generally appreciated. Our data
and analyses provide a public resource for mammalian functional genomics.
Supplementary Data:
- These are downloadable files. A graphical user interface can be found
at http://mgpd.med.utoronto.ca
- XM/XP sequences
- XM (predicted mRNA sequences) [FASTA]
- XP (encoded protein sequences) [FASTA]
- Array Design
- Array 1 (spot map) [text]
- Array 2 (spot map) [text]
- Master file - 41,699 probes [Excel,
ZIP]
The Master file contains probe sequences, columns listing the closest
sequences in RIKEN, ENSEMBL, Refseq, and Unigene; GenBank description,
EST overlap, GO-BP annotations, Domain (most significant)
- Array Images
- Array Atlas (1-30 ~700 MB) [ZIP]
- Array Atlas (31-69 ~500 MB) [ZIP]
- Array Atlas (71-100 ~600 MB) [ZIP]
- Array Atlas (101-140 ~900 MB) [ZIP]
- Array Atlas (141-172 ~800 MB) [ZIP]
- Array info after removal of redundant probes
- Probe combinations [text]
- Master file - 39,309 presumed distinct transcripts [Excel,
ZIP]
- Hybridization records [Excel]
- Figure Data
- Supplementary Figures
- Data
- 41,699 probes - single channel, arcsinh intensities [text,
ZIP]
- 41,699 probes
- 41,699 probes - median-subtracted, zeroed [text,
ZIP]
- 39,309 presumed distinct transcripts
- 21,622 presumed distinct transcripts - median-subtracted and zeroed,
expressed above 99% of negative-control spots [text,
ZIP]
- Binary matrix of expression above 99% of negative-control spots
[text, ZIP]
- Raw data (GPR files output by Genepix) WARNING: 175 Mb [ZIP]
- Annotations
- GO annotations among 39,309 presumed distinct transcripts (12,543
annotated genes, 47,900 annotations)
- Annotations [ZIP,
two-column text],
- Genes annotated in a GO-BP category not among the 992 considered (these genes are always deemed as negatives for classification purposes)
[text]
- GO annotations among 21,622 presumed distinct transcripts (9,499
annotated genes, 37,876 annotations)
- Annotations [ZIP,
two-column text],
- Genes annotated in a GO-BP category not among the 992 considered (these genes are always deemed as negatives for classification purposes)
[text]
- superGO annotations [ZIP,
two-column text],
- map between GO and superGO annotations [Excel]
- SVM Predictions
- GO-BP SVM Predictions, 15% precision [Excel]
- GO-BP SVM Predictions, 50% precision [Excel,
Text]
- superGO SVM Predictions, 15% precision [Excel]
- superGO SVM Predictions, 50% precision [Excel,
Text]
- Other
- RT-PCR primer sequences [Excel]
- XM genes with gene trap lines [Excel]
- XM motifs [Excel]
- 779 known and putative DNA-binding transcription factors among XM
genes [Text]
- Table of accession numbers of genes common to Zhang, Su, and Bono
data [Text]
- Tech report on spatial detrending [PDF]
- Map locations of XM genes (BLASTed against Build 32) - retained
only if top hit of both XM sequence and array probe overlap. (30,387
presumed distinct transcripts) [Excel]
- SVM functional predictions for 7,147 unannotated mapped transcripts
(may be useful for positional cloning), also see http://mgpd.med.utoronto.ca
[Excel]
- 175 lists of genes that are expressed in individual tissues, highest
in individual tissues, or specific to individual tissues [Text]
- Latest Mapping for the 39,309 presumed distinct transcripts (Feb. 2006).