Web supplement to
"Reconstructing the sequence specificities of RNA-binding proteins across eukaryotes"

Alexander Sasse1,2,3,4,*, Debashish Ray2,*, Kaitlin Laverty1,2,4,5,*, Cyrus L. Tam5,6, Mihai Albu2, Hong Zheng2, Yevgen Levdansky7, Olga Lyudovyk5,6, Kate Nie1,2,4, Cedrik Magis8,9, Cedric Notredame8,9, Eugene Valkov7, Matthew T. Weirauch10,11,‡, Timothy R. Hughes1,2,‡, Quaid Morris1,2,4,5,6,12,‡

1Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
2Donnelly Centre, University of Toronto, Toronto, ON, Canada
3Department of Computer Science, University of Washington, Seattle, WA, USA
4Vector Institute, Toronto, ON, Canada
5Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
6Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
7National Cancer Institute, National Institutes of Health, Frederick, MD, USA
8Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
9Universitat Pompeu Fabra, Barcelona, Spain
10Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital, Cincinnati, OH, USA
11Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
12Ontario Institute for Cancer Research, Toronto, ON, Canada

*these authors contributed equally
To whom correspondance should be addressed:

Abstract

RNA-binding proteins (RBPs) are key regulators of gene expression. Here, we introduce EuPRI (Eukaryotic Protein-RNA Interactions) – a freely available resource of RNA motifs for 34,746 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of predicted motifs. We predict these motifs with a new computational platform — Joint Protein-Ligand Embedding (JPLE) — which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of known RBP motifs, expanding the motif repertoire across all major eukaryotic clades, and assigning motifs to the majority of human RBPs. EuPRI drastically improves knowledge of RBP motifs in flowering plants. For example, it increases the number of Arabidopsis thaliana RBP motifs 7-fold, from 14 to 105. EuPRI also has broad utility for inferring post-transcriptional function and evolutionary relationships. We demonstrate this by predicting and validating a role for a set of Arabidopsis thaliana RBPs in RNA stability and identifying rapid and recent evolution of post-transcriptional regulatory networks in worms and plants. In contrast, the vertebrate RNA motif set has remained relatively stable after its drastic expansion between the metazoan and vertebrate ancestors. EuPRI represents a powerful resource for the study of gene regulation across eukaryotes.

Supplementary Data Tables

Array Information, Raw and Processed Data

Z-Scores & Motifs

Z-scores & Motifs for previously published RNAcompete experiments used in this study are found here.

Z-score bootstrap analysis results

For all 420 RNAcompete experiments, probes were resampled 100 times and the mean and standard deviation of k-mer Z-scores were calculated. Results are contained in the following files.

JPLE training data

Code