PSL | |
Developer: | LINQS Lab |
Latest Release Version: | 2.2.2 |
Programming Language: | Java |
Platform: | Linux, macOS, Windows |
Genre: | Machine Learning, Statistical relational learning |
License: | Apache License 2.0 |
Probabilistic Soft Logic (PSL) is a statistical relational learning (SRL) framework for modeling probabilistic and relational domains.It is applicable to a variety of machine learning problems, such as collective classification, entity resolution, link prediction, and ontology alignment.PSL combines two tools: first-order logic, with its ability to succinctly represent complex phenomena, and probabilistic graphical models, which capture the uncertainty and incompleteness inherent in real-world knowledge.More specifically, PSL uses "soft" logic as its logical component and Markov random fields as its statistical model.PSL provides sophisticated inference techniques for finding the most likely answer (i.e. the maximum a posteriori (MAP) state).The "softening" of the logical formulas makes inference a polynomial time operation rather than an NP-hard operation.
The SRL community has introduced multiple approaches that combine graphical models and first-order logic to allow the development of complex probabilistic models with relational structures.A notable example of such approaches is Markov logic networks (MLNs).Like MLNs, PSL is a modelling language (with an accompanying implementation) for learning and predicting in relational domains.Unlike MLNs, PSL uses soft truth values for predicates in an interval between [0,1].This allows for the underlying inference to be solved quickly as a convex optimization problem.This is useful in problems such as collective classification, link prediction, social network modelling, and object identification/entity resolution/record linkage.
Probabilistic Soft Logic was first released in 2009 by Lise Getoor and Matthias Broecheler.This first version focused heavily on reasoning about similarities between entities.Later versions of PSL would still keep the ability to reason about similarities, but generalize the language to be more expressive.
In 2017, a Journal of Machine Learning Research article detailing PSL and the underlying graphical model was published along with the release of a new major version of PSL (2.0.0).The major new features in PSL 2.0.0 was a new type of rule mainly used in specifying constraints and a command-line interface.
A PSL model is composed of a series of weighted rules and constraints.PSL supports two types of rules: Logical and Arithmetic.
Logical rules are composed of an implication with only a single atom or a conjunction of atoms in the body and a single atom or a disjunction of atoms in the head.Since PSL uses soft logic, hard logic operators are replaced with Łukasiewicz soft logical operators.An example of a logical rule expression is:
Arithmetic rules are relations of two linear combinations of atoms.Restricting each side to a linear combination ensures that the resulting potential is convex.The following relational operators are supported: =
, <=
, and >=
.
A commonly used feature of arithmetic rules is the summation operation.The summation operation can be used to aggregate multiple atoms.When used, the atom is replaced with the sum of all possible atoms where the non-summation variables are fixed.Summation variables are made by prefixing a variable with a +
.Fox example:
A PSL program defines a family of probabilistic graphical models that are parameterized by data. More specifically, the family of graphical models it defines belongs to a special class of Markov random field known as a Hinge-Loss Markov Field (HL-MRF).An HL-MRF determines a density function over a set of continuous variables
y=(y1, … ,yn)
[0,1]n
x=(x1, … ,xm)
w=(w1, … ,wk)
\phi=(\phi1, … ,\phik)
\phii(x,y) |
=max(\elli(x,y),
di | |
0) |
\elli
di\in\{1,2\}
y
x
P(y|x)=
1 | |
Z(y) |
k | |
\exp(\sum | |
i=1 |
wi\phii(x,y))
Where
Z(y)=\inty
k | |
\exp(\sum | |
i=1 |
wi\phii(x,y))
y
Predicates in PSL can be labeled as open or closed.
When a predicate is labeled closed, PSL makes the closed-world assumption: any predicates that are not explicitly provided to PSL are assumed to be false.In other words, the closed world assumption presumes that a predicate that is partially true is also known to be partially true.For example, if we had the following constants in the data for representing people:
\{Alice,Bob\}
\{Avatar\}
\{rating(Alice,Avatar)=0.8\}
rating( ⋅ )
\{rating(Bob,Avatar)=0\}
If a predicate is labeled as open, then PSL does not make the closed-world assumption. Instead, PSL will attempt to collectively infer the unobserved instances.
Data is used to instantiate several potential functions in a process called grounding.The resulting potential functions are then used to define the HL-MRF.
Grounding predicates in PSL is the process of making all possible substitutions of the variables in each predicate with the existing constants in the data, resulting in a collection of ground atoms,
y=\{y1, … ,yn\}
Each of the ground rules are interpreted as either potentials or hard constraints in the induced HL-MRF.A logical rule is translated as a continuous relaxation of Boolean connectives using Łukasiewicz logic.A ground logical rule is transformed into its disjunctive normal form.Let
I+
I-
1-
\sum | |
i\inI+ |
yi-
\sum | |
i\inI- |
(1-yi)\leq0
If the logical rule is weighted with a weight
w
d\in\{1,2\}
\phi(y)= (max \{1-
\sum | |
i\inI+ |
yi-
\sum | |
i\inI- |
(1-yi),0 \} )d
is added to the HL-MRF with a weight parameter of
w
An arithmetic rule is manipulated to
\ell(y)\leq0
\phi(y)=(max\{\ell(y),0\})d
PSL is available via three different language interfaces: CLI, Java, and Python.PSL's command line interface (CLI) is the recommended way to use PSL.It supports all the features commonly used in a reproducible form that does not require compilation.Since PSL is written in Java, the PSL Java interface is the most expansive and users can call directly into the core of PSL.The Java interface is available through the Maven central repository.The PSL Python interface is available through PyPiand uses pandas DataFrames to pass data between PSL and the user.
PSL previously provided a Groovy interface.It has been deprecated in 2.2.1 release of PSL, and is scheduled to be removed in the 2.3.0 release.
The LINQS lab, developers of the official PSL implementation, maintain a collection of PSL examples.These examples cover both synthetic and real-world datasets and include examples from academic publications using PSL.Below is a toy example from this repository that can be used to infer relations in a social network.Along with each rule is a comment describing the motivating intuition behind the statements.
/* People who have not lived in the same location are not likely to know one another. */5: Lived(P1, L1) & Lived(P2, L2) & (P1 != P2) & (L1 != L2) -> !Knows(P1, P2) ^2
/* Two people with similar interests are more likely to know one another. */10: Likes(P1, X) & Likes(P2, X) & (P1 != P2) -> Knows(P1, P2) ^2
/* People in the same circles tend to know one another (transitivity). */5: Knows(P1, P2) & Knows(P2, P3) & (P1 != P3) -> Knows(P1, P3) ^2
/* Knowing one another is symmetric. */Knows(P1, P2) = Knows(P2, P1) .
/* By default, assume that two arbitrary people do not know one another (negative prior). */5: !Knows(P1, P2) ^2