DisProt explained
Description: | Manually curated database of Intrinsically Disordered Proteins (IDPs) and regions (IDRs) |
Scope: | Intrinsically Disordered Proteins |
Organism: | all |
Laboratory: | BioComputing UP laboratory (Dept. of Biomedical Sciences, University of Padova) |
Pmid: | 34850135 |
Url: | https://disprot.org/ |
Download: | https://disprot.org/download |
License: | Creative Commons Attribution 4.0 International (CC BY 4.0) License |
Curation: | Manual curation from professional and community biocurators |
DisProt is a manually curated biological database of intrinsically disordered proteins (IDPs) and regions (IDRs).[1] [2] [3] DisProt annotations cover state information on the protein but also, when available, its state transitions, interactions and functional aspects of disorder detected by specific experimental methods. DisProt is hosted and maintained in the BioComputing UP laboratory (Dept. of Biomedical Sciences, University of Padua).
Website
The latest DisProt version, DisProt 9,[4] includes more than 2300 protein entries and more than 4500 pieces of evidence of structural state, state transitions, interactions and functions, along with more than 2500 scientific publications annotated.
Biocuration in DisProt
DisProt entries are annotated by professional and community biocurators from experimental data published in scientific literature. The DisProt home page features examples of DisProt entries, i.e. p53 and Catenin beta-1, along with entries of proteins belonging to the SARS-CoV-2 virus, e.g. Nucleoprotein.
Thematic datasets
Starting 2020, DisProt releases ‘thematic datasets’ describing biological areas where IDPs are involved in and play a crucial role.[5] All the entries belonging to these datasets are tagged with the name of the theme.
- Unicellular toxins and antitoxins (DisProt release 2020_12)
- Extracellular matrix proteins (DisProt release 2021_06)
- Viral proteins (DisProt release 2021_12)
Model organism entries
In the DisProt home page model organisms are represented by an icon, the name of the species and the number of DisProt entries belonging to each specific organism. Entries from the following organisms are accessible from the DisProt home page under the ‘Organisms’ section and can be downloaded as single files: Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomices cerevisiae, Escherichia coli, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans.
DisProt versions and releases
DisProt versions and releases include changes to the website and to the manually curated content of the database.
- DisProt 7[6] (2016): more than 800 protein entries and 1000 publications annotated. Each protein entry in DisProt is characterized by a DisProt identifier which takes the form of the prefix DP followed by a 5 digit protein identifier, e.g. DP00016 corresponds to the Cyclin-dependent kinase inhibitor 1 protein. It featured a new web interface based on Angular.JS.
- DisProt 8[7] (2019): more than 1400 protein entries and over 3000 disordered protein regions. DisProt 8 also introduced the concept of a stable DisProt region identifier. DisProt has been widely used to train machine learning (ML) methods to predict disordered regions in proteins. In addition, DisProt has been used to understand the properties of intrinsically unstructured proteins.[8] DisProt 8 featured a new web interface and an extended API and a new annotation interface integrating text mining technologies.
- DisProt 9[9] (2021): more than 2300 protein entries and more than 4500 pieces of evidence, annotated from over 2500 scientific articles. DisProt 9 features a restyled web interface and a refactored Intrinsically Disordered Proteins Ontology (IDPO). Better interoperability is provided through the adoption of the Gene Ontology (annotations of interactions and functions of IDPs and IDRs) and the Evidence and Conclusion Ontology (annotations of experimental methods).
DisProt ontologies
DisProt uses three different ontologies to annotate disordered regions, the Intrinsically Disordered Proteins Ontology (IDPO), the Evidence and Conclusion Ontology (ECO) and the Gene Ontology (GO). DisProt has a dedicated page for each IDPO term that include the identifier, name and definition of the term and cross-references to external ontologies, e.g. Gene Ontology. Each IDPO term page list all the DisProt entries annotated with that specific term.
- Intrinsically Disordered Proteins Ontology: used to annotate the following types of evidence, 1. structural state (i.e. disorder, pre-molten globule, molten globule, order), 2. structural transition (transitions between structural states), and 3. self-functions (e.g. self-inhibition) and functions associated with the unstructured state of the protein (e.g. flexible linker/spacer)
- Evidence and Conclusion Ontology: used to annotate the experimental methods used to assess the presence of intrinsic disorder or one of its aspects, e.g. circular dichroism evidence.
- Gene Ontology: used to annotate binding partners, e.g. protein binding, and other functions, e.g. RNA folding chaperone.
External links
Notes and References
- Vucetic. Slobodan. Obradovic. Zoran. Vacic. Vladimir. Radivojac. Predrag. Peng. Kang. Iakoucheva. Lilia M.. Cortese. Marc S.. Lawson. J. David. Brown. Celeste J.. 2005-01-01. DisProt: a database of protein disorder. Bioinformatics. 21. 1. 137–140. 10.1093/bioinformatics/bth476. 1367-4803. 15310560. free.
- Sickmeier. Megan. Hamilton. Justin A.. LeGall. Tanguy. Vacic. Vladimir. Cortese. Marc S.. Tantos. Agnes. Szabo. Beata. Tompa. Peter. Chen. Jake. 2007-01-01. DisProt: the Database of Disordered Proteins. Nucleic Acids Research. 35. Database issue. D786–793. 10.1093/nar/gkl893. 1362-4962. 1751543. 17145717.
- Quaglia. Federica. Mészáros. Bálint. Salladini. Edoardo. Hatos. András. Pancsa. Rita. Chemes. Lucía B.. Pajkos. Mátyás. Lazar. Tamas. Peña-Díaz. Samuel. Santos. Jaime. Ács. Veronika. 2021-11-25. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research. 50. D1. D480–D487. 10.1093/nar/gkab1082. 1362-4962. 34850135. 8728214.
- Quaglia. Federica. Mészáros. Bálint. Salladini. Edoardo. Hatos. András. Pancsa. Rita. Chemes. Lucía B.. Pajkos. Mátyás. Lazar. Tamas. Peña-Díaz. Samuel. Santos. Jaime. Ács. Veronika. 2021-11-25. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research. 50. D1. D480–D487. 10.1093/nar/gkab1082. 1362-4962. 34850135. 8728214.
- Quaglia. Federica. Mészáros. Bálint. Salladini. Edoardo. Hatos. András. Pancsa. Rita. Chemes. Lucía B.. Pajkos. Mátyás. Lazar. Tamas. Peña-Díaz. Samuel. Santos. Jaime. Ács. Veronika. 2021-11-25. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research. 50. D1. D480–D487. 10.1093/nar/gkab1082. 1362-4962. 34850135. 8728214.
- Piovesan. Damiano. Tabaro. Francesco. Mičetić. Ivan. Necci. Marco. Quaglia. Federica. Oldfield. Christopher J.. Aspromonte. Maria Cristina. Davey. Norman E.. Davidović. Radoslav. 2016-11-28. DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Research. 45. D1. D219–D227. 10.1093/nar/gkw1056. 1362-4962. 5210544. 27899601.
- Hatos. András. Hajdu-Soltész. Borbála. Monzon. Alexander M.. Palopoli. Nicolas. Álvarez. Lucía. Aykac-Fas. Burcu. Bassot. Claudio. Benítez. Guillermo I.. Bevilacqua. Martina. Chasapi. Anastasia. Chemes. Lucia. 2019. DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Research. 48. D1. D269–D276. en. 10.1093/nar/gkz975. 31713636. 7145575. free.
- Kovačević JJ. June 2012. Computational analysis of position-dependent disorder content in DisProt database. Genomics Proteomics Bioinformatics. 10. 3. 158–65. 10.1016/j.gpb.2012.01.002. 5056116. 22917189.
- Quaglia. Federica. Mészáros. Bálint. Salladini. Edoardo. Hatos. András. Pancsa. Rita. Chemes. Lucía B.. Pajkos. Mátyás. Lazar. Tamas. Peña-Díaz. Samuel. Santos. Jaime. Ács. Veronika. 2021-11-25. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research. 50. D1. D480–D487. 10.1093/nar/gkab1082. 1362-4962. 34850135. 8728214.