[1] Department of Molecular Genetics, [2] Donnelly Centre, [3] Department of Computer Science, [4] Department of Electrical and Computer Engineering, University of Toronto
[*] Current address: Department of Genome Sciences, University of Washington
[5] Corresponding authors:
RNA-binding proteins recognize RNA sequences and structures, but there is currently no systematic and accurate method to derive large (>12 base) motifs de novo that reflect a combination of intrinsic preference to both sequence and structure. To address this absence, we introduce RNAcompete-S, which couples a single-step competitive binding reaction with an excess of random RNA 40-mers to a custom computational pipeline for interrogation of the bound RNA sequences and derivation of SSMs (Sequence and Structure Models). RNAcompete-S confirms that HuR, QKI, and SRSF1 prefer binding sites that are single stranded, and recapitulates known 8-10 bp sequence and structure preferences for Vts1p and RBMY. We also derive an 18-base long SSM for Drosophila SLBP, which to our knowledge has not been previously determined by selections from pure random sequence, and accurately discriminates human replication-dependent histone mRNAs. Thus, RNAcompete-S enables accurate identification of large, intrinsic sequence-structure specificities with a uniform assay.