Web supplement to
"A resource of RNA-binding protein motifs across eukaryotes reveals evolutionary dynamics and gene-regulatory function"

Alexander Sasse1,2,3,4,*, Debashish Ray2,*, Kaitlin Laverty1,2,4,5,*, Cyrus L. Tam5,6, Mihai Albu2, Hong Zheng2, Yevgen Levdansky7, Olga Lyudovyk5,6, Kate Nie1,2,4, Cedrik Magis8,9, Cedric Notredame8,9, Eugene Valkov7, Matthew T. Weirauch10,11,‡, Timothy R. Hughes1,2,‡, Quaid Morris1,2,4,5,6,12,‡

1Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
2Donnelly Centre, University of Toronto, Toronto, ON, Canada
3Department of Computer Science, University of Washington, Seattle, WA, USA
4Vector Institute, Toronto, ON, Canada
5Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
6Graduate Program in Computational Biology and Medicine, Weill-Cornell Graduate School, New York, NY, USA
7National Cancer Institute, National Institutes of Health, Frederick, MD, USA
8Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
9Universitat Pompeu Fabra, Barcelona, Spain
10Center for Autoimmune Genomics and Etiology, Divisions of Allergy & Immunology, Human Genetics, Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital, Cincinnati, OH, USA
11Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
12Ontario Institute for Cancer Research, Toronto, ON, Canada

*these authors contributed equally
To whom correspondance should be addressed:

Abstract

RNA-binding proteins (RBPs) are key regulators of gene expression, however, their RNA-binding specificities, that is, motifs, have not been comprehensively determined. Here, we introduce Eukaryotic Protein–RNA Interactions (EuPRI), a freely available resource of RNA motifs for 34,746 RBPs from 690 eukaryotes. EuPRI includes in vitro binding data for 504 RBPs, including newly collected RNAcompete data for 174 RBPs, along with thousands of predicted motifs. We predict these motifs with an algorithm, Joint Protein–Ligand Embedding, which can detect distant homology relationships and map specificity-determining peptides. EuPRI quadruples the number of available RBP motifs, expanding the motif repertoire across all major eukaryotic clades and assigning motifs to the majority of human RBPs. We demonstrate the utility of EuPRI for inferring post-transcriptional function and evolutionary relationships by identifying rapid, recent evolution of post-transcriptional regulatory networks in worms and plants, in contrast to the vertebrate RNA motif set, which has remained relatively stable after a large expansion between the metazoan and vertebrate ancestors.

Supplementary Data Tables

Array Information, Raw and Processed Data

Z-Scores & Motifs

Z-scores & Motifs for previously published RNAcompete experiments used in this study are found here.

Z-score bootstrap analysis results

For all 420 RNAcompete experiments, probes were resampled 100 times and the mean and standard deviation of k-mer Z-scores were calculated. Results are contained in the following files.

JPLE training data

Code