In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs).In particular, VGLMs allow for response variables outside the classical exponential familyand for more than one parameter. Each parameter (not necessarily a mean) can be transformed by a link function.The VGLM framework is also large enough to naturally accommodate multiple responses; these areseveral independent responses each coming from a particular statistical distribution withpossibly different parameter values.
Vector generalized linear models are described in detail in Yee (2015).[1] The central algorithm adopted is the iteratively reweighted least squares method,for maximum likelihood estimation of usually all the model parameters. In particular, Fisher scoring is implemented by such, which, for most models,uses the first and expected second derivatives of the log-likelihood function.
GLMs essentially cover one-parameter models from the classical exponential family,and include 3 of the most important statistical regression models:the linear model, Poisson regression for counts, and logistic regressionfor binary responses.However, the exponential family is far too limiting for regular data analysis.For example, for counts, zero-inflation, zero-truncation and overdispersion are regularlyencountered, and the makeshift adaptations made to the binomial andPoisson models in the form of quasi-binomial andquasi-Poisson can be argued as being ad hoc and unsatisfactory.But the VGLM framework readily handles models such aszero-inflated Poisson regression,zero-altered Poisson (hurdle) regression,positive-Poisson regression, andnegative binomial regression.As another example, for the linear model,the variance of a normal distribution is relegated as a scale parameter and it is treatedoften as a nuisance parameter (if it is considered as a parameter at all).But the VGLM framework allows the variance to be modelled using covariates.
As a whole, one can loosely think of VGLMs as GLMs that handle many models outside the classical exponential family and are not restricted to estimating a single mean.During estimation,rather than using weighted least squares during IRLS, one uses generalized least squares to handle the correlation between the M linear predictors.
We suppose that the response or outcome or the dependent variable(s),
\boldsymbol{y}=(y1,\ldots,y
Q1 |
)T
Q1=1
Q1=2
Sometimes we write our data as
(\boldsymbol{x}i,wi,\boldsymbol{y}i)
i=1,\ldots,n
\boldsymbol{y}i=(yi1
,\ldots,y | |
iQ1 |
)T
wi
wi=1
The explanatory or independent variables are written
\boldsymbol{x}=(x1,\ldots,x
T | |
p) |
\boldsymbol{x}i=(xi1,\ldots,xip)T
x1=1
xi1=1
Actually, the VGLM framework allows for S responses, each of dimension
Q1
\boldsymbol{y}i
Q=S x Q1
vglm(cbind(y1, y2, y3) ~ x2 + x3, ..., data = mydata)
for S = 3.To simplify things, most of this article has S = 1.The VGLM usually consists of four elements:
1. A probability density function or probability mass function from some statistical distribution which has a log-likelihood
\ell
\partial\ell/\partial\thetaj
2. Linear predictors
ηj
\thetaj
j=1,\ldots,M.
3. Link functions
gj
\thetaj=
-1 | |
g | |
j |
(ηj).
4. Constraint matrices
\boldsymbol{H}k
k=1,\ldots,p,
Each linear predictor is a quantity which incorporatesinformation about the independent variables into the model. The symbol
ηj
ηj
\boldsymbol{\beta}j,
\beta(j)k
The jth parameter,
\thetaj
\boldsymbol{x},
gj(\thetaj)=ηj=
T | |
\boldsymbol{\beta} | |
j |
\boldsymbol{x}.
Let
\boldsymbol{η}=(η1,\ldots,
T | |
η | |
M) |
\boldsymbol{η}
\boldsymbol{x}
ηj
xk
Each link function provides the relationship between a linear predictor and a parameter of the distribution. There are many commonly used link functions, and their choice can be somewhat arbitrary. It makes sense to try to match the domain of the link function to the range of the distribution's parameter value.Notice above that the
gj
(0,1)
VGAM
package has function identitylink
for parameters that can assume both positive and negative values.More generally, the VGLM framework allows for any linear constraints between the regression coefficients
\beta(j)k
\boldsymbol{η}=
p | |
\sum | |
k=1 |
\boldsymbol{\beta} | |
(k) |
xk
p | |
= \sum | |
k=1 |
\boldsymbol{H}k
* | |
\boldsymbol{\beta} | |
(k) |
xk
where the
\boldsymbol{H}k
\boldsymbol{H}k=\boldsymbol{1}M
k=2,\ldots,p
k=1
\boldsymbol{H}k=\boldsymbol{I}M
k=1,\ldots,p
\thetaj
\boldsymbol{H}k=
\boldsymbol{0}T
k=2,\ldots,p
ηj=
* | |
\beta | |
(j)1 |
The unknown parameters,
\boldsymbol{\beta}*=
*T | |
(\boldsymbol{\beta} | |
(1) |
*T | |
,\ldots,\boldsymbol{\beta} | |
(p) |
)T
\boldsymbol{η}i=\boldsymbol{B}T\boldsymbol{x}i=
T | |
\begin{pmatrix} \boldsymbol{\beta} | |
1 |
\boldsymbol{x}i\\ \vdots
T | |
\\ \boldsymbol{\beta} | |
M |
\boldsymbol{x}i\ \end{pmatrix}=\left(
\boldsymbol{\beta} | |
(1) |
,\ldots,
\boldsymbol{\beta} | |
(p) |
\right) \boldsymbol{x}i.
With even more generally, one can allow the value of a variable
xk
ηj
xij
facility of VGAM
allows one togeneralize ηj(\boldsymbol{x}i)
ηj(\boldsymbol{x}ij)
The most general formula is
\boldsymbol{η}i=\boldsymbol{o}i
p | |
+ \sum | |
k=1 |
diag(xik1,\ldots,xikM) Hk
* | |
\boldsymbol{\beta} | |
(k) |
.
Here the
\boldsymbol{o}i
n x M
VGAM
package has an xij
argument that allowsthe successive elements of the diagonal matrix to be inputted.Yee (2015)[1] describes an R package implementation in the called VGAM.[2] Currently this software fits approximately 150 models/distributions.The central modelling functions are vglm
and vgam
.The family
argument is assigned a VGAM family function,e.g., family = negbinomial
for negative binomial regression,family = poissonff
for Poisson regression,family = propodds
for the proportional odd model orcumulative logit model for ordinal categorical regression.
We are maximizing a log-likelihood
\ell=
n | |
\sum | |
i=1 |
wi\elli,
where the
wi
\boldsymbol\beta(a+1)=\boldsymbol\beta(a)+\boldsymbol{l{I}}-1(\boldsymbol\beta(a)) u(\boldsymbol\beta(a)),
where
\boldsymbol{l{I}}(\boldsymbol\beta(a))
For the computation, the (small) model matrix constructedfrom the RHS of the formula in vglm
and the constraint matrices are combined to form a big model matrix.The IRLS is applied to this big X. This matrix is known as the VLMmatrix, since the vector linear model is the underlying least squaresproblem being solved. A VLM is a weighted multivariate regression where thevariance-covariance matrix for each row of the response matrix is notnecessarily the same, and is known.(In classical multivariate regression, all the errors have thesame variance-covariance matrix, and it is unknown).In particular, the VLM minimizes the weighted sum of squares
ResSS
n | |
= \sum | |
i=1 |
wi\left\{
(a-1) | |
z | |
i |
-
(a-1) | |
\boldsymbol{η} | |
i |
(a-1) | |
\right\} | |
i |
\left\{
(a-1) | |
z | |
i |
-
(a-1) | |
\boldsymbol{η} | |
i |
\right\}
This quantity is minimized at each IRLS iteration.The working responses (also known as pseudo-response and adjusted dependent vectors) are
zi=\boldsymbol{η}i+
-1 | |
W | |
i |
ui,
where the
Wi
Computationally, the Cholesky decomposition is used to invert the working weight matrices and to convert the overall generalized least squares problem into an ordinary least squares problem.
Of course, all generalized linear models are a special cases of VGLMs.But we often estimate all parameters by full maximum likelihood estimation ratherthan using the method of moments for the scale parameter.
If the response variable is an ordinal measurement with M + 1 levels, then one may fit a model function of the form:
g(\thetaj)=ηj
\thetaj=Pr(Y\leqj),
for
j=1,\ldots,M.
VGAM
family function cumulative(link = probit)
assigns a probit link to the cumulativeprobabilities, therefore this model is also called the cumulative probit model.In general they are called cumulative link models.For categorical and multinomial distributions, the fitted values are an (M + 1)-vector of probabilities, with the property that all probabilities add up to 1. Each probability indicates the likelihood of occurrence of one of the M + 1 possible values.
If the response variable is a nominal measurement, or the data do not satisfy the assumptions of an ordered model, then one may fit a model of the following form:
log\left[
Pr(Y=j) | |
Pr(Y=M+1) |
\right]=ηj,
for
j=1,\ldots,M.
VGAM
family function multinomial
fits the above model,and it has an argument called refLevel
that can be assignedthe level used for as the reference group.Classical GLM theory performs Poisson regression for count data. The link is typically the logarithm, which is known as the canonical link.The variance function is proportional to the mean:
\operatorname{Var}(Yi)=\tau\mui,
where the dispersion parameter
\tau
\tau
\tau
In contrast, VGLMs offer a much richer set of models to handle overdispersion with respect to the Poisson, e.g., the negative binomial distribution and several variants thereof. Another count regression model is the generalized Poisson distribution. Other possible models are the zeta distribution and the Zipf distribution.
RR-VGLMs are VGLMs where a subset of the B matrix is of a lower rank.Without loss of generality, suppose that
T, | |
\boldsymbol{x}=(\boldsymbol{x} | |
1 |
T) | |
\boldsymbol{x} | |
2 |
T
\boldsymbol{x}2
\boldsymbol{A}\boldsymbol{C}T
\boldsymbol{A}
\boldsymbol{C}
M x p
\boldsymbol{\nu}=\boldsymbol{C}T\boldsymbol{x}2=(\nu1,\ldots,\nu
T | |
R) |
\nu=\boldsymbol{c}T\boldsymbol{x}2
\boldsymbol{x}2
(\boldsymbol{x}1,\boldsymbol{\nu})
It can be shown that RR-VGLMs are simply VGLMs where the constraint matrices forthe variables in
\boldsymbol{x}2
\boldsymbol{H}k=\boldsymbol{A}
\boldsymbol{A}
\boldsymbol{C},
\boldsymbol{C}
\boldsymbol{A}
In practice, some uniqueness constraints are needed for
\boldsymbol{A}
\boldsymbol{C}
VGAM
, the rrvglm
function uses corner constraints by default, which means that the top R rows of \boldsymbol{A}
\boldsymbol{I}R
A special case of RR-VGLMs is when R = 1 and M = 2. This is dimension reduction from 2 parameters to 1 parameter. Then it can be shown that
\theta2=
-1 | |
g | |
2 |
\left(t1+a21 ⋅ g1(\theta1)\right),
where elements
t1
a21
η2=t1+a21 ⋅ η1.
This formula provides a coupling of
η1
η2
\operatorname{Var}(Y\mid\boldsymbol{x})=\mu(\boldsymbol{x})+\delta1
\delta2 | |
\mu(\boldsymbol{x}) |
.
This has been called the NB-P variant by some authors. The
\delta1
\delta2
Incidentally, several other useful NB variants can also be fitted, with the help of selecting the right combination of constraint matrices. For example, NB - 1, NB - 2 (negbinomial
default), NB - H; see Yee (2014)[4] and Table 11.3 of Yee (2015).[1]
The subclass of row-column interaction models(RCIMs) has also been proposed; these are a special type of RR-VGLM. RCIMs apply only to a matrix Y response and there areno explicit explanatory variables
\boldsymbol{x}
\boldsymbol{A}\boldsymbol{C}T
qvcalc
R package.RCIMs can be defined as a RR-VGLM applied to Y with
g1(\theta1)\equiv η1ij=\beta0+\alphai+\gammaj+
R | |
\sum | |
r=1 |
cirajr.
For the Goodman RC association model, we have
η1ij=log\muij,
Another example of a RCIM is if
g1
In VGAM
, rcim
and grc
functions fit the above models.And also Yee and Hadi (2014)[5] show that RCIMs can be used to fit unconstrained quadratic ordinationmodels to species data; this is an example of indirect gradient analysis inordination (a topic in statistical ecology).
Vector generalized additive models (VGAMs) are a major extension to VGLMs in which the linear predictor
ηj
xk
xk
\boldsymbol{η}(\boldsymbol{x})=\boldsymbol{H}1
* | |
\boldsymbol{\beta} | |
(1) |
+\boldsymbol{H}2
* | |
\boldsymbol{f} | |
(2) |
(x2)+\boldsymbol{H}3
* | |
\boldsymbol{f} | |
(3) |
(x3)+ …
where
* | |
\boldsymbol{f} | |
(k) |
(xk)=
* | |
(f | |
(1)k |
(xk),f
* | |
(2)k |
T. | |
(x | |
k),\ldots) |
* | |
f | |
(j)k |
VGAM
package.For M > 1 they are actually vector splines, which estimate the component functionsin * | |
f | |
(j)k |
(xk)
Currently, work is being done to estimate VGAMs using P-splines of Eilers and Marx (1996).[9] This allows for several advantages over using smoothing splines and vector backfitting, such as theability to perform automatic smoothing parameter selection easier.
These add on a quadratic in the latent variable to the RR-VGLM class.The result is a bell-shaped curve can be fitted to each response, asa function of the latent variable.For R = 2, one has bell-shaped surfaces as a function of the 2latent variables---somewhat similar to a bivariate normal distribution.Particular applications of QRR-VGLMs can be found in ecology, in a field of multivariate analysis called ordination.
As a specific rank-1 example of a QRR-VGLM,consider Poisson data with S species.The model for Species s is the Poisson regression
log\mus(\nu)= ηs(\nu)= \beta(s)1+\beta(s)2\nu+\beta(s)3\nu2= \alphas-
1 | |
2 |
\left(
\nu-us | |
ts |
\right)2,
for
s=1,\ldots,S
\alphas,
us,
ts,
\beta(s)3<0
QRR-VGLMs fit Gaussian ordination models by maximum likelihood estimation, andthey are an example of direct gradient analysis.The cqo
function in the VGAM
package currentlycalls optim
to search for the optimal
\boldsymbol{C}
\boldsymbol{\nu}
\boldsymbol{η}