Polyglutamine (PolyQ) repeats are implicated in several neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxias. The length of the polyQ repeat is critical to pathogenesis, however other protein factors, including the location, type and number of flanking domains are thought to modulate pathogenesis. Many other human non-disease related proteins contain polyQ repeats, which are intrinsically prone to expansion at the genetic level. The polyQ database provides a tool to compare the polyQ repeat location, the occurrence/type of domains and the number of domain repeats present across disease and non-disease proteins. All Human (Homo sapiens) protein sequences contained within the NCBI non-redundant database (NCBI NR), that contain glutamine repeats >7 in length were compiled and their Pfam domains were mapped. We have developed a web interface to the data to allow other researchers to search and perform their own analysis.


PolyQ was constructed by extracting Homo sapiens sequences from the NCBI NR that contain chains of >7 Gin residues. We then used Pfam's domain search to find protein domains within the subset of sequences. After we had the domains, we divided the data up into different sequence classifications:

We have also further reduced the redundancy in the data by clustering sequential homologues & have tagged known disease proteins (appear with grey background colour in search results).

PolyQ was created by Amy Robertson, Mark Bate, Steve Androulakis, Stephen Bottomley and Ashley Buckle.

Please cite the following when referring to PolyQ:

Amy L. Robertson, Mark A. Bate, Steve G. Androulakis, Stephen P. Bottomley, and Ashley M. Buckle (2010) PolyQ: a database describing the sequence and domain context of polyglutamine repeats in proteins. Nucl. Acids Res. doi: 10.1093/nar/gkq1100 [Nucleic Acids Research]

