List of software to detect low complexity regions in proteins explained

Computational methods can study protein sequences to identify regions with low complexity, which can have particular properties regarding their function and structure.

NameLast updateUsageDescriptionOpen source?Reference
SAPS1992downloadable / webIt describes several protein sequence statistics for the evaluation of distinctive characteristics of residue content and arrangement in primary structures.yes[1]
SEG1993downloadableIt is a two pass algorithm: first, identifies the LCR, and then performs local optimization by masking with Xs the LCRsyes[2]
fLPS2017downloadable / webIt can readily handle very large protein data sets, such as might come from metagenomics projects. It is useful in searching for proteins with similar CBRs and for making functional inferences about CBRs for a protein of interestyes[3]
CAST2000webIt identifies LCRs using dynamic programming.no[4]
SIMPLE2002downloadable webIt facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold.yes[5]
Oj.py2001on requestA tool for demarcating low complexity protein domains.no[6]
DSR2003on requestIt calculates complexity using reciprocal complexity.no[7]
ScanCom2003on requestCalculates the compositional complexity using the linguistic complexity measure.no[8]
CARD2005on requestBased on the complexity analysis of subsequences delimited by pairs of identical, repeating subsequences.no[9]
BIAS2006downloadable / webIt uses discrete scan statistics that provide a highly accurate multiple test correction to compute analytical estimates of the significance of each compositionally biased segment.yes[10]
GBA2006on requestA graph-based algorithm that constructs a graph of the sequence.no[11]
SubSeqer2008webA graph-based approach for the detection and identification of repetitive elements in low–complexity sequences.no[12]
ANNIE2009webThis method creates an automation of the sequence analytic process.no[13]
LPS-annotate2011on requestThis algorithm defines compositional bias through a thorough search for lowest-probability subsequences (LPSs; Low Probability Sequences) and serves as workbench of tools now available to molecular biologists to generate hypotheses and inferences about the proteins that they are investigating.no[14]
LCReXXXplorer2015webA web platform to search, visualize and share data for low complexity regions in protein sequences. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Also, users may perform queries against a custom designed sequence/LCR-centric database.no[15]
XNU1993downloadableIt uses the PAM120 scoring matrix for the calculation of complexity.yes[16]
AlcoR2022downloadableA compression-based and alignment-free tool for detecting low-complexity regions in biological datayes[17]

For a comprehensive review on the various methods and tools, see.[18]

In addition, a web meta-server named PLAtform of TOols for LOw COmplexity (PlaToLoCo) has been developed, for visualization and annotation of low complexity regions in proteins.[19] PlaToLoCo integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. Furthermore, the union or intersection of the results of the search on a query sequence can be obtained.

A Neural Network webserver, named LCR-hound has been developed to predict the function of prokaryotic and eukaryotic LCRs, based on their amino acid or di-amino acid content.[20]

Notes and References

  1. Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S . Methods and algorithms for statistical analysis of protein sequences. . Proc Natl Acad Sci U S A . 89 . 6 . 2002–2006. 15 Mar 1992 . 1549558 . 48584 . 10.1073/pnas.89.6.2002 . free . 1992PNAS...89.2002B .
  2. Wootton JC, Federhen S . Statistics of local complexity in amino acid sequences and sequence databases. . Computers and Chemistry . 17 . 2 . 149–163. June 2003 . 10.1016/0097-8485(93)85006-X.
  3. Harrison PM . fLPS: Fast discovery of compositional biases for the protein universe. . BMC Bioinformatics . 13 Nov 2017 . 18 . 1 . 476. 10.1186/s12859-017-1906-3. 29132292. 5684748 . free .
  4. Promponas VJ, Enright AJ, Tsoka S, Kreil DP, Leroy C, Hamodrakas S, Sander C, Ouzounis CA . CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. . Bioinformatics . Oct 2000 . 16 . 10 . 915–922. 11120681 . 10.1093/bioinformatics/16.10.915. free .
  5. Albà MM, Laskowski RA, Hancock JM . Detecting cryptically simple protein sequences using the SIMPLE algorithm. . Bioinformatics . May 2002. 18 . 5 . 672–678. 12050063 . 10.1093/bioinformatics/18.5.672. free .
  6. Wise MJ . 0j.py: a software tool for low complexity proteins and protein domains. . Bioinformatics . 2001. 17 . Suppl 1 . S288–S295. 11473020 . 10.1093/bioinformatics/17.suppl_1.s288. free .
  7. Wan H, Li L, Federhen S, Wootton JC . Discovering simple regions in biological sequences associated with scoring schemes. . J Comput Biol . 2003. 10 . 2 . 171–185. 12804090 . 10.1089/106652703321825955.
  8. Nandi T, Dash D, Ghai R, B-Rao C, Kannan K, Brahmachari SK, Ramakrishnan C, Ramachandran S . A new algorithm for detecting low-complexity regions in protein sequences. . J Biomol Struct Dyn . 2003. 20 . 5 . 657–668. 12643768. 10.1080/07391102.2003.10506882 . 45635217 .
  9. Shin SW, Kim SM . A novel complexity measure for comparative analysis of protein sequences from complete genomes. . Bioinformatics . 15 Jan 2005. 21 . 2 . 160–170. 15333459 . 10.1093/bioinformatics/bth497. free .
  10. Kuznetsov IB, Hwang S . A novel sensitive method for the detection of user-defined compositional bias in biological sequences. . Bioinformatics . 1 May 2006. 22 . 9 . 1055–1063. 16500936 . 10.1093/bioinformatics/btl049. free .
  11. Li X, Kahveci T . A Novel algorithm for identifying low-complexity regions in a protein sequence. . Bioinformatics . 15 Dec 2006. 22 . 24 . 2980–2987. 17018537 . 10.1093/bioinformatics/btl495. free .
  12. He D, Parkinson J . SubSeqer: a graph-based approach for the detection and identification of repetitive elements in low-complexity sequences. . Bioinformatics . 1 Apr 2008. 24 . 7 . 1016–1017. 10.1093/bioinformatics/btn073 . 18304932. free .
  13. Ooi HS, Kwo CY, Wildpaner M, Sirota FL, Eisenhaber B, Maurer-Stroh S, Wong WC, Schleiffer A, Eisenhaber F, Schneider G . ANNIE: integrated de novo protein sequence annotation. . Nucleic Acids Res . Jul 2009. 37 . Web server issue . W435–W440 . 2703921. 19389726. 10.1093/nar/gkp254 .
  14. Harbi D, Kumar M, Harrison PM . LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase. . Database (Oxford) . 6 Jan 2011. 2011 . baq031 . 10.1093/database/baq031 . 3017391. 21216786.
  15. Kirmitzoglou I, Promponas VJ . LCR-eXXXplorer: a web platform to search, visualize and share data for low complexity regions in protein sequences. . Bioinformatics . 1 Jul 2015. 31 . 13 . 2208–2210 . 10.1093/bioinformatics/btv115 . 4481844 . 25712690.
  16. Claverie JM, States D . Information enhancement methods for large scale sequence analysis. . Computers Chem.. June 1993. 17 . 2 . 191–201 . 10.1016/0097-8485(93)85010-a.
  17. Silva . Jorge M . Qi . Weihong . Pinho . Armando J . Pratas . Diogo . 2022-12-28 . AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data . GigaScience . 12 . 10.1093/gigascience/giad101 . 2047-217X . 10716826 . 38091509.
  18. Mier. Pablo. Paladin. Lisanna. Tamana. Stella. Petrosian. Sophia. Hajdu-Soltész. Borbála. Urbanek. Annika. Gruca. Aleksandra. Plewczynski. Dariusz. Grynberg. Marcin. Bernadó. Pau. Gáspári. Zoltán. 2020-03-23. Disentangling the complexity of low complexity proteins. Briefings in Bioinformatics. en. 21. 2. 458–472. 10.1093/bib/bbz007. 1467-5463. 7299295. 30698641.
  19. Jarnot. Patryk. Ziemska-Legiecka. Joanna. Dobson. Laszlo. Merski. Matthew. Mier. Pablo. Andrade-Navarro. Miguel A. Hancock. John M. Dosztányi. Zsuzsanna. Paladin. Lisanna. Necci. Marco. Piovesan. Damiano. 2020-07-02. PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins. Nucleic Acids Research. en. 48. W1. W77–W84. 10.1093/nar/gkaa339. 0305-1048. 7319588. 32421769.
  20. Ntountoumi. Chrysa. Vlastaridis. Panayotis. Mossialos. Dimitris. Stathopoulos. Constantinos. Iliopoulos. Ioannis. Promponas. Vasilios. Oliver. Stephen G. Amoutzias. Grigoris D. 2019-11-04. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Research. en. 47. 19. 9998–10009. 10.1093/nar/gkz730. 0305-1048. 6821194. 31504783.