Web supplement to
"Diverse CGG binding proteins across eukaryotes have been produced by independent domestications of hAT transposons"

Isaac Yellan1, Ally W.H. Yang2, and Timothy R. Hughes1,2*

1Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8,
2Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1.
*To whom correspondance should be addressed:

Abstract

The human transcription factor (TF) CGGBP1 (“CGG Binding Protein”) is conserved only in amniotes, and is believed to derive from the zf-BED and Hermes transposase DNA-binding domains (DBDs) of a hAT DNA transposon. Here, we show that TFs with this bipartite domain structure have resulted from dozens of independent hAT domestications in different eukaryotic lineages. CGGBPs display a wide range of sequence specificity, usually including preferences for CGG or CGC trinucleotides, while some bind AT-rich motifs. The CGGBPs are almost entirely non-syntenic, and their protein sequences, DNA binding motifs, and patterns of presence or absence in genomes are uncharacteristic of ancestry via speciation. At least eight CGGBPs in the coelacanth Latimeria chalumnae bind distinct motifs, and the expression of the corresponding genes varies considerably across tissues, indicating two overlapping modes of neofunctionalization.

Web supplementary files

Data underlying figures

PBM data availability

PBM Data are available at GEO accession GSE157085. HK and ME array designs are available at GEO accession GPL11260.