LLPS often involves sequence regions that have unique functional characteristics, as well as the presence of prion-like and RNA-binding domains. Nowadays there are just a few methods to predict the propensity of a protein to drive LLPS. The range of biological mechanisms involved in LLPS, the limited knowledge about these mechanisms and the important context-dependent component of LLPS make this problem challenging. In the last years, despite the advances in this field, just few predictors, specific for LLPS, have been developed, trying to understand the relationship between protein sequence properties and the capability to drive LLPS. Here we will revise the state-of-the-art LLPS sequence-based predictors, briefly introducing them and explaining which are the individual protein characteristics that they identify in the context of LLPS.
PSPer[1] | 2019 | PSPer is a method trained to identify prion-like RNA binding phase-separation proteins (PSPs). This method is focused on a particular feature of LLPS proteins and provides an overall score for a given protein depending on the presence of this feature. The method is trained on an experimental dataset of FUS-like PSPs, and the biophysical characteristics (PLD and RNA binding domain, RNA-recognition motif, disordered and additional domains) that belong to each region, implemented in a probabilistic model. This method was also trained including a negative dataset of ordered proteins, so it is expected that its performance is increased on those disordered proteins driven LLPS. | |
PLAAC[2] | 2014 | PLAAC predicts prion-like amino acid composition, usually enriched in polar-residues by using Hidden Markov Model (HMM). This method was originally developed before realizing the implication of PLDs in LLPS, and consequently it is not trained to identify the majority of phase separating regions. | |
PScore[3] | 2017 | PScore is a statistical scoring algorithm that predicts pi-pi interactions. It compares pi-pi interactions predicted in the target proteins with all proteins found in the PDB to assign a score of phase-separation propensity. | |
catGRANULE[4] | 2016 | catGRANULE is a method that was originally trained against yeast protein but it has been shown to be useful to predict human phase-separating proteins.[5] The algorithm is based on sequence composition statistics to differentiate proteins that are localized in yeast granules from the rest of the yeast proteome. The features considered to weight the residues are disorder and nucleic-acid binding propensities, as well as properties of some amino acids. | |
PSPredictor[6] | 2019 | PSPredictor is a machine learning approach to predict proteins that phase separate, trained on a set of experimentally validated protein sequences in the LLPSDB database. | |
PSAP[7] | 2021 | PSAP is a random forest classifier to predict the probability of proteins to mediate phase separation. This classifier is trained on a set of 90 high-confident HUMAN proteins that drive LLPS. | |
FuzDrop[8] | 2020 | FuzDrop is a method to predict droplet-driver promoting regions and proteins. The algorithm was trained on a dataset of drivers collected from different public databases, and the output is a per-residue probability of droplet formation. | |
ParSe[9] | 2022 | ParSe v2 explores the possibility that protein mediated phase separation can be predicted from sequence-based calculations of hydrophobicity, α-helix propensity, and a model of the polymer scaling exponent (νmodel). The algorithm was trained on a curated dataset of homotypic phase-separating intrinsically disordered sequences that were experimentally verified to phase-separate in vitro. |
Another important computational resource in the field of LLPS are the theoretic simulations of proteins, particularly Intrinsically disordered proteins (IDPs), driving LLPS. These simulations are complementary to the experiments and provide important insights about the molecular mechanisms of individual proteins driving LLPS. A review from Dignon et al.[10] discussed how these simulations can be applied to interpret the experimental results, to explain the phase behavior and to provide predictive frameworks to design proteins with tunable phase transition properties. The challenge is the compromise between the resolution of the model and the computational efficiency, since all-atom simulations of big systems involving IDPs are still difficult to be performed. Moreover, the molecular interactions among IDPs in the droplet-state are still poorly understood, and the combination of experimental data and simulations are indispensable to elucidate them. Improvements in sampling and simulation methods might occur in the next few years, in order to enlighten these mechanisms.[11]