About Disordered Protein

What are intrinsically disordered proteins?    

Intrinsic order and disorder are defined at the amino acid residue level. The backbone atoms, hence also the residues, of ordered regions or proteins undergo small-amplitude, thermally-driven motions about their equilibrium positions determined as time-averaged values. In some cases, ordered regions cooperatively switch between two or more specific conformations. In contrast, intrinsically disordered proteins or regions exist as dynamic ensembles in which the atom positions and the backbone Ramachandran angles vary significantly over time with no specific equilibrium values and typically involve non-cooperative conformational changes. Thus, the existence of disorder is determined by a protein's dynamical properties, and not necessarily by the presence or absence of local secondary structure. We define an intrinsically disordered protein as one that contains at least one disordered region.

Background

Despite the fact that intrinsically disordered proteins fail to form fixed 3-D structure under physiological conditions, existing instead as ensembles of conformations, they carry out critically important biological functions (1,2). Several recent reviews (e.g. 3-6) attest to the growing interest in these proteins. In addition, whole-cell NMR experiments demonstrate that intrinsic disorder can exist in vivo (7) and thus does not result merely from the failure to find the correct conditions or ligand for folding to occur. A collection of about 100 intrinsically disordered examples, many of which were characterized by two or more experimental methods, were assigned one of 28 specific functions and grouped into four functional classes: (i) molecular recognition; (ii) molecular assembly; (iii) protein modification; and (iv) entropic chain activities (1). Intrinsically disordered regions are typically involved in regulation, signaling and control pathways in which interactions with multiple partners and high-specificity/low-affinity interactions are often requisite. In this way, the functional diversity provided by disordered regions complements those of ordered protein regions.

At the structural level, we proposed that intrinsically disordered regions may exist in molten globule-like (collapsed) and random coil-like (extended) forms (8). Another form of disorder, the pre-molten globule, has been proposed (4), but it is unclear whether it is truly distinct from the random coil-like class. We also suggested that functions of disordered proteins may arise from the specific disorder form, from interconversion of disordered forms, or from transitions between disordered and ordered conformations (8). These function-associated conformational changes may be brought about by alterations in environmental or cellular conditions (e.g. disorder-to-order transition upon binding during signal transduction).

Summary

Towards the objective of understanding commonness, flavors, complexity and function of protein disorder, we assembled a database of known disordered protein sequence segments and used it for developing predictors of protein disorder from primary sequence information. The preliminary results were obtained by analyzing sequences from the Protein Data Bank (PDB). Swiss Protein (SwissProt) database and 34 complete or nearly complete genomes. In summary, these prior studies provide strong evidence that: (1) disorder is a very common element of protein structure; (2) the strength of disorder prediction is correlated with sequence complexity; and (3) eukaryotes evidently have a much larger fraction of proteins with intrinsic disorder than eubacteria or archaebacteria.

Prediction of disorder from sequence

Since amino acid sequence determines protein 3 D structure, we reasoned that, if disorder were crucial to function, then amino acid sequence would determine lack of 3D structure, or disorder, as well. To test the hypothesis that disorder is encoded by the sequence, we have assembled a dataset of ordered and disordered protein sequence segments and used it to develop several predictors of disorder. Observed prediction accuracies were in the 70-83% range [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK. Proc. Pacific Symposium on Biocomputing, Hawaii, 1998, vol. 3, pp. 435-446][Romero P, Obradovic Z, and Dunker AK. Artificial Intelligence Review, 2000, Vol. 14, No. 6, S2, pp. 447-484][Romero P, Obradovic Z, and Dunker AK. Proc. IEEE Int. Conf. on Neural Networks, Houston, TX, 1997, vol. 1, pp. 90-95][Garner E, Cannon P, Romero P, Obradovic Z, and Dunker AK. Proc. Genome Informatics 1998,Tokyo, Japan, pp. 201-213][Li X, Romero P, Rani M, Dunker AK, and Obradovic Z. Proc. Genome Informatics 10, Tokyo, Japan, 1999, pp. 30-40]. That far exceeded the 50% expected by chance, demonstrating that disorder is indeed very likely to be encoded by the sequence. Our most accurate predictor [Vucetic S, Radivojac P, Obradovic Z, Brown CJ, and Dunker AK. Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks, Washington D.C., 2001, vol. 4, pp. 2718-2723] with 82.6% overall accuracy (88.8% accuracy on ordered proteins, and 76.5% accuracy on disordered proteins) is an ensemble of neural networks. However, the difference in accuracy as compared to logistic regression classifiers is smaller than 1% [Vucetic S, Radivojac P, Obradovic Z, Brown CJ, and Dunker AK. Proc. 2001 IEEE/INNS International Joint Conference on Neural Networks, Washington D.C., 2001, vol. 4, pp. 2718-2723]. Such relatively high accuracies strongly support the hypothesis that disorder is an element of native protein structure that is encoded by the amino acid sequence.

Understanding the relationship between protein sequence and disordered protein

We have constructed more than 6,000 composition-based and 265 property-based sequence attributes with respect to their ability to discriminate protein order and disorder[Li X, Obradovic, Z, Brown CJ, Garner EC, and Dunker AK. Proc. Genome Informatics 11, Tokyo, Japan, 2000, pp. 172-184] [Williams RM, Obradovic Z, Mathura V, Braun W, Garner EC, Young J, Takayama S, Brown CJ, and Dunker AK. 2000, Proc. 6th Pacific Symposium on Biocomputing, Maui, Hawaii, pp. 89-100]. Our studies [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK.  Proc. IEEE Int. Conf. on Neural Networks, Houston, TX, 1997, vol. 1, pp. 90-95] [ Xie Q, Arnold GE, Romero P, Obradovic Z, Garner E, and Dunker AK. Proc. Genome Informatics 1998, Tokyo, Japan, pp. 193-200] suggest that, compared to ordered sequences, disordered sequences tend to have lower aromatic content, higher net charge, higher values for the flexibility indices, and greater values for hydropathy as well as other identifiable characteristics. Although ordered globular proteins apparently have a lower bound for sequence complexity[Romero P, Obradovic Z, and Dunker AK. FEBS Letters. 1999, vol. 462, pp.363-367], disorder does not have such a lower bound[Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, and Dunker AK. Proteins: Structure, Function and Genetics, 2001, vol. 42, pp. 38-48. ]. Overall, the sequence differences observed between ordered and disordered proteins make biochemical sense. Having amino acid compositions that would be expected to lead to disorder adds weight to the view that disorder is indeed encoded by the sequence.

Estimation of the commonness of protein disorder

Proteins with long disordered regions (>40 amino acids) were occasionally found in protein structures characterized by X-ray diffraction [Romero P, Obradovic Z, Kissinger CR, Villafranca JE, and Dunker AK. Proc. IEEE Int. Conf. on Neural Networks, Houston,TX, 1997, vol. 1, pp. 90-95]. We applied our predictors to sequence and structure databases (SwissProt and PDB, respectively) with the result that disorder appears to be much more common than previously thought. Conservative estimates indicate that at least 25% of the sequences in SwissProt contain long disordered regions [Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, and Dunker AK. Proteins: Structure, Function and Genetics, 2001, vol. 42, pp. 38-48][Romero P, Obradovic Z, Kissinger CR, Villafranca JE, Guilliot S, Garner E, and Dunker AK. Proc. Pacific Symposium on Biocomputing, Hawaii, 1998, vol. 3, pp. 435-446]. Similar analysis on 32 complete genomes resulted in the estimates that the percentage of proteins with long disorder in 22 bacteria, 7 archaea, and 5 eucaryotae ranges from 7-33%, 9-37%, and 36-63%, respectively [Dunker AK, Obradovic Z, Romero P, Garner EC, and Brown CJ.  Proc. Genome Informatics 11, Tokyo, Japan, 2000 pp. 161-171].

Evolution of disordered protein

Differences in the amino-acid composition of ordered and disordered protein may result in or from evolutionary differences between these two types of protein. We find that both the quantity and quality of amino-acid replacements in disordered protein differs from ordered. We recently completed an evolutionary study of 28 protein families with ordered and disordered regions, and found that 20 of the families have disordered regions that evolve significantly more rapidly than their ordered regions, and 3 families have disordered regions that evolve more slowly [Brown, C.J., Takayama, S., Campen, A.M., Vise, P., Marshall, T., Oldfield, C.J., Williams, C.J., and Dunker, A.K., 2002]. Differences in amino-acid composition may also affect the types of amino acid replacements that accumulate in disordered protein. Matrices that furnish the probability for replacing a given amino acid by another are generally based on ordered protein sequences. We are developing scoring matrices using disordered protein families. We find that scoring matrices based on disordered protein are more successful in aligning homologous disordered protein sequences than the commonly used scoring matrices [Radivojac P, Obradovic Z, Brown CJ, and Dunker AK. Proc. 7th Pacific Symposium on Biocomputing, Hawaii, 2002 pp. 589-600].

Function of confirmed disordered proteins

We recently completed a survey of functions associated with disordered protein from over 100 proteins. [Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, and Obradovic Z.  Biochemistry, 2002, May 28th, vol. 41, issue 21, pp. 6573 - 6582] Disordered protein was identified either by missing electron density in x-ray crystal structure entries in PDB, or by word searches for "NMR" or "circular dichroism" and "disordered" or "unstructured" or "unfolded" in PubMed. The circular dichroism papers generally had detailed discussions of the functions of their disordered protein. NMR papers had somewhat less functional information, and X-ray crystallography papers had very little functional information for disordered regions. In order to find as much functional information as possible for each disordered region, the SwissProt database was searched, in depth literature reviews were performed and corresponding authors were contacted by email. We found 28 functions performed by the disordered regions of proteins. These functions can be summarized into four broad categories: molecular recognition, molecular assembly/disassembly, protein modification and entropic chains.

Disorder in cell-signaling and cancer

Many disordered regions are involved in binding to DNA, RNA, or other proteins [Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, and Obradovic Z. Biochemistry, 2002, May 28th, vol. 41, issue 21, pp. 6573 - 6582 ] this observation resulted in the hypothesis that disorder plays an important role in the processes of molecular recognition, signaling and regulation. To test this hypothesis, we applied our predictor of disorder to a database of signaling proteins involved in the broadest cascade of macromolecular interactions. Cancer-associated proteins were also tested, since they are closely interrelated to the cell signaling machinery; many are transcription factors overexpressed as a result of activation during tumorogenesis. We found that there is significantly more predicted disorder in signaling and cancer-associated proteins than in several other categories of protein function, such as, metabolism, biosynthesis and degradation [Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, and Dunker AK. Journal of Molecular Biology, 2002, vol. 323, pp. 573-584].



Disprot-footer
Contact us