Variation in homeodomain DNA-binding revealed by high-resolution analysis of sequence preferences

Michael F. Berger*,1,3, Gwenael Badis*,5, Andrew R. Gehrke*1, Shaheynoor Talukder*5, Anthony A. Philippakis1,3,6, Lourdes Peña-Castillo4, Trevis M. Alleyne5, Sanie Mnaimneh4, Savina Jaeger1, Esther T. Chan5, Olga B. Botvinnik1,7, Faiqua Khalid4, Wen Zhang5, Daniel Newburger 1, Quaid D. Morris4,5, Martha L. Bulyk†1-3,6 and Timothy R. Hughes†4,5


1 Division of Genetics, Department of Medicine, and 2 Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115.
3 Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138.
4Banting and Best Department of Medical Research,  5 Department of Medical Genetics and Microbiology, University of Toronto, Toronto, ON M4T 2J4
6 Harvard/MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA 02115.
7 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139.
* Co-first authors
To whom correspondence should be addressed:
t.hughes@utoronto.ca, mlbulyk@receptor.med.harvard.edu



  • Abstract
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity, and showing for the first time that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.






  • Plasmids and proteins
  • 8-mer data
  • Mouse HD data
  • Predicted sequence preferences
  • Figures' data