Web supplement to
"A resource of RNA-binding protein motifs across eukaryotes reveals evolutionary dynamics and gene-regulatory function"
Alexander Sasse1,2,3,4,*, Debashish Ray2,*, Kaitlin Laverty1,2,4,5,*,
Cyrus L. Tam5,6, Mihai Albu2, Hong Zheng2, Yevgen Levdansky7, Olga Lyudovyk5,6,
Kate Nie1,2,4, Cedrik Magis8,9, Cedric Notredame8,9, Eugene Valkov7,
Matthew T. Weirauch10,11,‡, Timothy R. Hughes1,2,‡,
Quaid Morris1,2,4,5,6,12,‡
1Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
2Donnelly Centre, University of Toronto, Toronto, ON, Canada
3Department of Computer Science, University of Washington, Seattle, WA, USA
4Vector Institute, Toronto, ON, Canada
5Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
6Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
7National Cancer Institute, National Institutes of Health, Frederick, MD, USA
8Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
9Universitat Pompeu Fabra, Barcelona, Spain
10Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital, Cincinnati, OH, USA
11Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
12Ontario Institute for Cancer Research, Toronto, ON, Canada
*these authors contributed equally
‡To whom correspondance should be addressed:
Abstract
RNA-binding proteins (RBPs) are key regulators of gene expression, however, their RNA-binding specificities, that is, motifs, have not been comprehensively determined. Here, we introduce Eukaryotic Protein–RNA Interactions (EuPRI), a freely available resource of RNA motifs for 34,746 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of predicted motifs. We predict these motifs with an algorithm, Joint Protein–Ligand Embedding, which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of available RBP motifs, expanding the motif repertoire across all major eukaryotic clades and assigning motifs to the majority of human RBPs. We demonstrate the utility of EuPRI for inferring post-transcriptional function and evolutionary relationships by identifying rapid, recent evolution of post-transcriptional regulatory networks in worms and plants, in contrast to the vertebrate RNA motif set, which has remained relatively stable after a large expansion between the metazoan and vertebrate ancestors.
Supplementary Data Tables
- Table S1. RNAcompete experimental details. Table S1(.xlsx)
- Table S2. Performance of JPLE and other RNA-specificity prediction methods for the 355 training set proteins. Table S2(.xlsx)
- Table S3. Performance of residue importance scores and other prediction metrics for 26 PDB co-complex structures. Table S3(.xlsx)
- Table S4. Count of identified RBPs and RBPs with assigned motifs across 690 eukaryotes.Table S4(.xlsx)
- Table S5. Conserved RNA motif group assignments for 8,957 RBPs from 53 species.Table S5(.xlsx)
- Table S6. Conserved RNA motif group ages and clade assignments.Table S6(.xlsx)
- Table S7. Half-life data for putative stability-regulating A. thaliana RBPs.Table S7(.xlsx)
- Table S8. Deadenylation assay quantification. Table S8(.xlsx)
- File S1. Extended profile HMM for the RRM domain. File S1(.hmm)
Array Information, Raw and Processed Data
Z-Scores & Motifs
Z-scores & Motifs for previously published RNAcompete experiments used in this study are found
here.
Z-score bootstrap analysis results
For all 420 RNAcompete experiments, probes were resampled 100 times and the mean and standard deviation of k-mer Z-scores were calculated. Results are contained in the following files.
JPLE training data
Code