What are intrinsically disordered proteins?
Intrinsic order and disorder are defined at the amino acid residue level.
The backbone atoms, hence also the residues, of ordered regions or
proteins undergo small-amplitude, thermally-driven motions about their
equilibrium positions determined as time-averaged values. In some cases,
ordered regions cooperatively switch between two or more specific
conformations. In contrast, intrinsically disordered proteins or regions exist
as dynamic ensembles in which the atom positions and the backbone Ramachandran
angles vary significantly over time with no specific equilibrium values and
typically involve non-cooperative conformational changes. Thus, the existence of
disorder is determined by a protein's dynamical properties, and not necessarily
by the presence or absence of local secondary structure. We define an
intrinsically disordered protein as one that contains at least one disordered
Despite the fact that intrinsically disordered proteins fail to form fixed
3-D structure under physiological conditions, existing instead as ensembles
of conformations, they carry out critically important biological functions
(1,2). Several recent reviews (e.g. 3-6) attest to the growing interest in
these proteins. In addition, whole-cell NMR experiments demonstrate that
intrinsic disorder can exist in vivo (7) and thus does not result merely
from the failure to find the correct conditions or ligand for folding to
occur. A collection of about 100 intrinsically disordered examples, many of
which were characterized by two or more experimental methods, were assigned
one of 28 specific functions and grouped into four functional classes: (i)
molecular recognition; (ii) molecular assembly; (iii) protein modification;
and (iv) entropic chain activities (1). Intrinsically disordered regions are
typically involved in regulation, signaling and control pathways in which
interactions with multiple partners and high-specificity/low-affinity
interactions are often requisite. In this way, the functional diversity
provided by disordered regions complements those of ordered protein regions.
At the structural level, we proposed that intrinsically disordered regions
may exist in molten globule-like (collapsed) and random coil-like (extended)
forms (8). Another form of disorder, the pre-molten globule, has been
proposed (4), but it is unclear whether it is truly distinct from the random
coil-like class. We also suggested that functions of disordered proteins may
arise from the specific disorder form, from interconversion of disordered
forms, or from transitions between disordered and ordered conformations (8).
These function-associated conformational changes may be brought about by
alterations in environmental or cellular conditions (e.g. disorder-to-order
transition upon binding during signal transduction).
Towards the objective of understanding
commonness, flavors, complexity and function of protein disorder,
we assembled a database of known disordered protein sequence segments
and used it for developing predictors of protein disorder from primary
sequence information. The preliminary results were obtained by analyzing
sequences from the Protein Data Bank (PDB). Swiss Protein (SwissProt)
database and 34 complete or nearly complete genomes. In summary,
these prior studies provide strong evidence that: (1) disorder is
a very common element of protein structure; (2) the strength of
disorder prediction is correlated with sequence complexity; and
(3) eukaryotes evidently have a much larger fraction of proteins
with intrinsic disorder than eubacteria or archaebacteria.
Prediction of disorder from sequence
Since amino acid sequence determines protein 3 D structure, we reasoned
that, if disorder were crucial to function, then amino acid sequence
would determine lack of 3D structure, or disorder, as well. To test
the hypothesis that disorder is encoded by the sequence, we have
assembled a dataset of ordered and disordered protein sequence segments
and used it to develop several predictors of disorder. Observed
prediction accuracies were in the 70-83% range [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK.
Proc. Pacific Symposium
on Biocomputing, Hawaii, 1998, vol. 3, pp. 435-446][Romero P, Obradovic Z, and Dunker AK. Artificial Intelligence Review,
2000, Vol. 14, No. 6, S2, pp. 447-484][Romero P, Obradovic Z, and Dunker AK. Proc. IEEE Int. Conf. on Neural
Networks, Houston, TX, 1997, vol. 1, pp. 90-95][Garner E, Cannon P, Romero P, Obradovic Z, and Dunker AK. Proc.
Genome Informatics 1998,Tokyo, Japan, pp. 201-213][Li X, Romero P, Rani M, Dunker AK, and Obradovic Z. Proc.
Genome Informatics 10, Tokyo, Japan, 1999, pp.
30-40]. That far exceeded the 50% expected by chance, demonstrating
that disorder is indeed very likely to be encoded by the sequence.
Our most accurate predictor [Vucetic S, Radivojac P, Obradovic Z, Brown CJ, and Dunker AK.
Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks,
Washington D.C., 2001, vol. 4, pp. 2718-2723] with 82.6% overall
accuracy (88.8% accuracy on ordered proteins, and 76.5% accuracy
on disordered proteins) is an ensemble of neural networks. However,
the difference in accuracy as compared to logistic regression classifiers
is smaller than 1% [Vucetic S, Radivojac P, Obradovic Z, Brown CJ, and Dunker AK.
Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks,
Washington D.C., 2001, vol. 4, pp. 2718-2723]. Such relatively
high accuracies strongly support the hypothesis that disorder is
an element of native protein structure that is encoded by the amino
Understanding the relationship between protein sequence and disordered protein
We have constructed more than 6,000 composition-based and 265 property-based
sequence attributes with respect to their ability to discriminate
protein order and disorder[Li X, Obradovic, Z, Brown CJ, Garner EC, and Dunker AK.
Proc. Genome Informatics 11, Tokyo, Japan, 2000, pp. 172-184] [Williams RM, Obradovic Z, Mathura V, Braun W, Garner EC, Young J, Takayama S, Brown CJ, and Dunker AK. 2000, Proc.
6th Pacific Symposium on Biocomputing, Maui, Hawaii, pp. 89-100].
Our studies [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK. Proc. IEEE Int.
Conf. on Neural Networks, Houston, TX, 1997, vol. 1, pp. 90-95]
[ Xie Q, Arnold GE, Romero P, Obradovic Z, Garner E, and Dunker AK. Proc. Genome Informatics 1998, Tokyo, Japan, pp. 193-200]
suggest that, compared to ordered sequences, disordered sequences
tend to have lower aromatic content, higher net charge, higher values
for the flexibility indices, and greater values for hydropathy as
well as other identifiable characteristics. Although ordered globular
proteins apparently have a lower bound for sequence
complexity[Romero P, Obradovic Z, and Dunker AK. FEBS
Letters. 1999, vol. 462, pp.363-367], disorder does not have
such a lower bound[Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, and Dunker AK. Proteins: Structure,
Function and Genetics, 2001, vol. 42, pp. 38-48. ]. Overall,
the sequence differences observed between ordered and disordered
proteins make biochemical sense. Having amino acid compositions
that would be expected to lead to disorder adds weight to the view
that disorder is indeed encoded by the sequence.
Estimation of the commonness of protein disorder
Proteins with long disordered regions (>40 amino acids) were
occasionally found in protein structures characterized by X-ray
diffraction [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK. Proc. IEEE Int. Conf. on Neural Networks, Houston,TX, 1997,
vol. 1, pp. 90-95]. We applied our predictors to sequence and
structure databases (SwissProt and PDB, respectively) with the result
that disorder appears to be much more common than previously thought.
Conservative estimates indicate that at least 25% of the sequences
in SwissProt contain long disordered regions [Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, and Dunker AK. Proteins: Structure, Function and Genetics, 2001, vol. 42,
pp. 38-48][Romero P, Obradovic Z, Kissinger CR, Villafranca JE, Guilliot S, Garner E, and Dunker AK. Proc. Pacific Symposium on Biocomputing,
Hawaii, 1998, vol. 3, pp. 435-446]. Similar analysis on 32 complete
genomes resulted in the estimates that the percentage of proteins
with long disorder in 22 bacteria, 7 archaea, and 5 eucaryotae ranges
from 7-33%, 9-37%, and 36-63%, respectively [Dunker AK, Obradovic Z, Romero P, Garner EC, and Brown CJ.
Proc. Genome Informatics
11, Tokyo, Japan, 2000 pp. 161-171].
Evolution of disordered protein
Differences in the amino-acid composition of ordered and disordered
protein may result in or from evolutionary differences between these
two types of protein. We find that both the quantity and quality
of amino-acid replacements in disordered protein differs from ordered.
We recently completed an evolutionary study of 28 protein families
with ordered and disordered regions, and found that 20 of the families
have disordered regions that evolve significantly more rapidly than
their ordered regions, and 3 families have disordered regions that
evolve more slowly [Brown, C.J., Takayama, S., Campen, A.M., Vise,
P., Marshall, T., Oldfield, C.J., Williams, C.J., and Dunker, A.K.,
2002]. Differences in amino-acid composition may also affect the
types of amino acid replacements that accumulate in disordered protein.
Matrices that furnish the probability for replacing a given amino
acid by another are generally based on ordered protein sequences.
We are developing scoring matrices using disordered protein families.
We find that scoring matrices based on disordered protein are more
successful in aligning homologous disordered protein sequences than
the commonly used scoring matrices [Radivojac P, Obradovic Z, Brown CJ, and Dunker AK. Proc. 7th Pacific
Symposium on Biocomputing, Hawaii, 2002 pp. 589-600].
Function of confirmed disordered proteins
We recently completed a survey of functions associated with disordered
protein from over 100 proteins. [Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, and Obradovic Z. Biochemistry, 2002,
May 28th, vol. 41, issue 21, pp. 6573 - 6582] Disordered protein
was identified either by missing electron density in x-ray crystal
structure entries in PDB, or by word searches for "NMR" or "circular dichroism" and
"disordered" or "unstructured" or "unfolded" in PubMed. The circular dichroism
papers generally had detailed discussions of the functions of their
disordered protein. NMR papers had somewhat less functional information,
and X-ray crystallography papers had very little functional information
for disordered regions. In order to find as much functional information
as possible for each disordered region, the SwissProt database was
searched, in depth literature reviews were performed and corresponding
authors were contacted by email. We found 28 functions performed
by the disordered regions of proteins. These functions can be summarized
into four broad categories: molecular recognition, molecular assembly/disassembly,
protein modification and entropic chains.
Disorder in cell-signaling and cancer
Many disordered regions are involved in binding to DNA, RNA, or other proteins [Dunker
AK, Brown CJ, Lawson JD, Iakoucheva LM, and Obradovic
Z. Biochemistry, 2002, May 28th, vol. 41, issue 21, pp. 6573 - 6582 ]
this observation resulted in the hypothesis that disorder plays
an important role in the processes of molecular recognition, signaling
and regulation. To test this hypothesis, we applied our predictor
of disorder to a database of signaling proteins involved in the
broadest cascade of macromolecular interactions. Cancer-associated
proteins were also tested, since they are closely interrelated to
the cell signaling machinery; many are transcription factors overexpressed
as a result of activation during tumorogenesis. We found that there
is significantly more predicted disorder in signaling and cancer-associated
proteins than in several other categories of protein function, such
as, metabolism, biosynthesis and degradation [Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, and Dunker AK.
Journal of Molecular Biology, 2002, vol. 323, pp. 573-584].