Web supplement to
"Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes"
Alexander Sasse1,2,3,4,*, Debashish Ray2,*, Kaitlin Laverty1,2,4,5,*,
Cyrus L. Tam5,6, Mihai Albu2, Hong Zheng2, Yevgen Levdansky7, Olga Lyudovyk5,6,
Kate Nie1,2,4, Cedrik Magis8,9, Cedric Notredame8,9, Eugene Valkov7,
Matthew T. Weirauch10,11,‡, Timothy R. Hughes1,2,‡,
Quaid Morris1,2,4,5,6,12,‡
1Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
2Donnelly Centre, University of Toronto, Toronto, ON, Canada
3Department of Computer Science, University of Washington, Seattle, WA, USA
4Vector Institute, Toronto, ON, Canada
5Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
6Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
7National Cancer Institute, National Institutes of Health, Frederick, MD, USA
8Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
9Universitat Pompeu Fabra, Barcelona, Spain
10Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital, Cincinnati, OH, USA
11Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
12Ontario Institute for Cancer Research, Toronto, ON, Canada
*these authors contributed equally
‡To whom correspondance should be addressed:
Abstract
RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce EuPRI (Eukaryotic Protein-RNA Interactions) – a freely available resource of RNA motifs for 34,746 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of predicted motifs. We predict these motifs with a new computational platform — Joint Protein-Ligand Embedding (JPLE) — which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of known RBP motifs, expanding the motif repertoire across all major eukaryotic clades, and assigning motifs to the majority of human RBPs. EuPRI drastically improves knowledge of RBP motifs in flowering plants. For example, it increases the number of Arabidopsis thaliana RBP motifs 7-fold, from 14 to 105. EuPRI also has broad utility for inferring post-transcriptional function and evolutionary relationships. We demonstrate this by predicting and validating a role for a set of Arabidopsis thaliana RBPs in RNA stability and identifying rapid and recent evolution of post-transcriptional regulatory networks in worms and plants. In contrast, the vertebrate RNA motif set has remained relatively stable after its drastic expansion between the metazoan and vertebrate ancestors. EuPRI represents a powerful resource for the study of gene regulation across eukaryotes.
Supplementary Data Tables
- Table S1. RNAcompete experimental details. Table S1(.xlsx)
- Table S2. Performance of JPLE and other RNA-specificity prediction methods for the 355 training set proteins. Table S2(.xlsx)
- Table S3. Performance of residue importance scores and other prediction metrics for 26 PDB co-complex structures. Table S3(.xlsx)
- Table S4. Count of identified RBPs and RBPs with assigned motifs across 690 eukaryotes.Table S4(.xlsx)
- Table S5. Conserved RNA motif group assignments for 8,957 RBPs from 53 species.Table S5(.xlsx)
- Table S6. Conserved RNA motif group ages and clade assignments.Table S6(.xlsx)
- Table S7. Half-life data for putative stability-regulating A. thaliana RBPs.Table S7(.xlsx)
- Table S8. Deadenylation assay quantification. Table S8(.xlsx)
- File S1. Extended profile HMM for the RRM domain. File S1(.hmm)
Array Information, Raw and Processed Data
Z-Scores & Motifs
Z-scores & Motifs for previously published RNAcompete experiments used in this study are found
here.
Z-score bootstrap analysis results
For all 420 RNAcompete experiments, probes were resampled 100 times and the mean and standard deviation of k-mer Z-scores were calculated. Results are contained in the following files.
JPLE training data
Code