Discriminative models, also referred to as conditional models, are a class of models frequently used for classification. They are typically used to solve binary classification problems, i.e. assign labels, such as pass/fail, win/lose, alive/dead or healthy/sick, to existing datapoints.
Types of discriminative models include logistic regression (LR), conditional random fields (CRFs), decision trees among many others. Generative model approaches which uses a joint probability distribution instead, include naive Bayes classifiers, Gaussian mixture models, variational autoencoders, generative adversarial networks and others.
P(x,y)
P(y|x)
x
y
x
P(y|x)
y
x
A conditional model models the conditional probability distribution, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.[1]
The following approach is based on the assumption that it is given the training data-set
D=\{(xi;yi)|i\leqN\inZ\}
yi
xi
We intend to use the function
f(x)
\phi(x,y)
f(x;w)=\argmaxywT\phi(x,y)
wT\phi(x,y)
c(x,y;w)
x
y
\argmax
P(y|x;w)
w
P(y|x;w)=
1 | |
Z(x;w) |
\exp(wT\phi(x,y))
Z(x;w)=style\sumy\displaystyle\exp(wT\phi(x,y))
L(w)=style\sumi\displaystylelogp(yi|xi;w)
llog(xi,yi,c(xi;w))=-logp(yi|xi;w)=logZ(xi;w)-wT\phi(xi,yi)
\partialL(w) | |
\partialw |
=style\sumi\displaystyle\phi(xi,yi)-
E | |
p(y|xi;w) |
\phi(xi,y)
E | |
p(y|xi;w) |
p(y|xi;w)
The above method will provide efficient computation for the relative small number of classification.
Let's say we are given the
m
n
Y:\{y1,y2,\ldots,ym\},X:\{x1,x2,\ldots,xn\}
A generative model takes the joint probability
P(x,y)
x
y
\widetilde{y}\inY
\widetilde{x}
Discriminative models, as opposed to generative models, do not allow one to generate samples from the joint distribution of observed and target variables. However, for tasks such as classification and regression that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute).[4] [5] On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently supervised and cannot easily support unsupervised learning. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model.
Discriminative models and generative models also differ in introducing the posterior possibility.[6] To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities,
P(y|x)
P(k)
P(y|x)=
p(x|y)p(y) | = | |
style\sumip(x|i)p(i)\displaystyle |
p(x|y)p(y) | |
p(x) |
In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster. However, in Ulusoy and Bishop's joint work, Comparison of Generative and Discriminative Techniques for Object Detection and Classification, they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).
Significant advantages of using discriminative modeling are:
P(y|x)
Compared with the advantages of using generative modeling:
Since both advantages and disadvantages present on the two way of modeling, combining both approaches will be a good modeling in practice. For example, in Marras' article A Joint Discriminative Generative Model for Deformable Model Construction and Classification,[7] he and his coauthors apply the combination of two modelings on face classification of the models, and receive a higher accuracy than the traditional approach.
Similarly, Kelm[8] also proposed the combination of two modelings for pixel classification in his article Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning.
During the process of extracting the discriminative features prior to the clustering, Principal component analysis (PCA), though commonly used, is not a necessarily discriminative approach. In contrast, LDA is a discriminative one.[9] Linear discriminant analysis (LDA), provides an efficient way of eliminating the disadvantage we list above. As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.
Examples of discriminative models include: