Web supplement to
"C2H2 zinc finger proteins greatly expand the human regulatory lexicon"

Hamed S. Najafabadi*,1, Sanie Mnaimneh*,1, Frank W. Schmitges*,1, Michael Garton1, Kathy N. Lam2, Ally Yang1, Mihai Albu1, Matthew T. Weirauch3,6, Ernest Radovani2, Philip M. Kim1,2,4, Jack Greenblatt1,2, Brendan J. Frey1,4-6, and Timothy R. Hughes**,1,2,6

1 Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto M5S 3E1, Canada
2 Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
3 Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Rheumatology and Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
4 Department of Computer Science, University of Toronto, Toronto M5S 2E4, Canada
5 Department of Electrical and Computer Engineering, University of Toronto, Toronto, M5S 3G4, Canada
6 6Canadian Institutes For Advanced Research

*These authors made equal contributions to the manuscript.

**To whom correspondance should be addressed:

Abstract

Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors (TFs). Their expansion and diversification in animals, and frequent association with the KRAB domain in vertebrates, suggest a widespread role in silencing endogenous retroelements (EREs). However, it is unknown whether most C2H2-ZFs even bind DNA, or what sequences they bind. We show that most natural C2H2-ZFs bind DNA both in vitro and in vivo, and infer a new DNA recognition code using DNA-binding motifs for thousands of natural C2H2-ZFs. In vivo binding data for dozens of human C2H2-ZF proteins is generally consistent with our recognition code and indicate that C2H2-ZF proteins encode the majority of motifs among human TFs. We show for the first time that most KRAB-C2H2-ZF proteins do bind specific EREs, ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that humans contain an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.

C2H2 sequences, vectors, and experiment log

Bacterial one-hybrid (B1H) data

Protein binding microarray (PBM) data

Gold standard C2H2-ZF motifs

ChIP-seq data

B1H-based recognition code