Confidence distribution explained

Confidence distribution should not be confused with Confidence interval.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation (fiducial distribution), although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

In recent years, there has been a surge of renewed interest in confidence distributions. In the more recent developments, the concept of confidence distribution has emerged as a purely frequentist concept, without any fiducial interpretation or reasoning. Conceptually, a confidence distribution is no different from a point estimator or an interval estimator (confidence interval), but it uses a sample-dependent distribution function on the parameter space (instead of a point or an interval) to estimate the parameter of interest.

A simple example of a confidence distribution, that has been broadly used in statistical practice, is a bootstrap distribution. The development and interpretation of a bootstrap distribution does not involve any fiducial reasoning; the same is true for the concept of a confidence distribution. But the notion of confidence distribution is much broader than that of a bootstrap distribution. In particular, recent research suggests that it encompasses and unifies a wide range of examples, from regular parametric cases (including most examples of the classical development of Fisher's fiducial distribution) to bootstrap distributions, p-value functions, normalized likelihood functions and, in some cases, Bayesian priors and Bayesian posteriors.

Just as a Bayesian posterior distribution contains a wealth of information for any type of Bayesian inference, a confidence distribution contains a wealth of information for constructing almost all types of frequentist inferences, including point estimates, confidence intervals, critical values, statistical power and p-values,[1] among others. Some recent developments have highlighted the promising potentials of the CD concept, as an effective inferential tool.

History

Neyman (1937) introduced the idea of "confidence" in his seminal paper on confidence intervals which clarified the frequentist repetition property. According to Fraser, the seed (idea) of confidence distribution can even be traced back to Bayes (1763) and Fisher (1930). Although the phrase seems to first be used in Cox (1958).[2] Some researchers view the confidence distribution as "the Neymanian interpretation of Fisher's fiducial distributions", which was "furiously disputed by Fisher". It is also believed that these "unproductive disputes" and Fisher's "stubborn insistence" might be the reason that the concept of confidence distribution has been long misconstrued as a fiducial concept and not been fully developed under the frequentist framework. Indeed, the confidence distribution is a purely frequentist concept with a purely frequentist interpretation, although it also has ties to Bayesian and fiducial inference concepts.

Definition

Classical definition

Classically, a confidence distribution is defined by inverting the upper limits of a series of lower-sided confidence intervals. In particular,

For every α in (0, 1), let (−∞, ξn(α)] be a 100α% lower-side confidence interval for θ, where ξn(α) = ξn(Xn,α) is continuous and increasing in α for each sample Xn. Then, Hn(•) = ξn−1(•) is a confidence distribution for θ.

Efron stated that this distribution "assigns probability 0.05 to θ lying between the upper endpoints of the 0.90 and 0.95 confidence interval, etc." and "it has powerful intuitive appeal". In the classical literature, the confidence distribution function is interpreted as a distribution function of the parameter θ, which is impossible unless fiducial reasoning is involved since, in a frequentist setting, the parameters are fixed and nonrandom.

To interpret the CD function entirely from a frequentist viewpoint and not interpret it as a distribution function of a (fixed/nonrandom) parameter is one of the major departures of recent development relative to the classical approach. The nice thing about treating confidence distributions as a purely frequentist concept (similar to a point estimator) is that it is now free from those restrictive, if not controversial, constraints set forth by Fisher on fiducial distributions.

The modern definition

The following definition applies; Θ is the parameter space of the unknown parameter of interest θ, and χ is the sample space corresponding to data Xn=:

A function Hn(•) = Hn(Xn, •) on χ × Θ → [0, 1] is called a confidence distribution (CD) for a parameter θ, if it follows two requirements:

Also, the function H is an asymptotic CD (aCD), if the U[0, 1] requirement is true only asymptotically and the continuity requirement on Hn(•) is dropped.

In nontechnical terms, a confidence distribution is a function of both the parameter and the random sample, with two requirements. The first requirement (R1) simply requires that a CD should be a distribution on the parameter space. The second requirement (R2) sets a restriction on the function so that inferences (point estimators, confidence intervals and hypothesis testing, etc.) based on the confidence distribution have desired frequentist properties. This is similar to the restrictions in point estimation to ensure certain desired properties, such as unbiasedness, consistency, efficiency, etc.

A confidence distribution derived by inverting the upper limits of confidence intervals (classical definition) also satisfies the requirements in the above definition and this version of the definition is consistent with the classical definition.

Unlike the classical fiducial inference, more than one confidence distributions may be available to estimate a parameter under any specific setting. Also, unlike the classical fiducial inference, optimality is not a part of requirement. Depending on the setting and the criterion used, sometimes there is a unique "best" (in terms of optimality) confidence distribution. But sometimes there is no optimal confidence distribution available or, in some extreme cases, we may not even be able to find a meaningful confidence distribution. This is not different from the practice of point estimation.

A definition with measurable spaces

A confidence distribution[3]

C

for a parameter

\gamma

in a measurable space is a distribution estimator with

C(Ap)=p

for a family of confidence regions

Ap

for

\gamma

with level

p

for all levels

0<p<1

. The family of confidence regions is not unique.[4] If

Ap

only exists for

p\inI\subset(0,1)

, then

C

is a confidence distribution with level set

I

. Both

C

and all

Ap

are measurable functions of the data. This implies that

C

is a random measure and

Ap

is a random set. If the defining requirement

P(\gamma\inAp)\gep

holds with equality, then the confidence distribution is by definition exact. If, additionally,

\gamma

is a real parameter, then the measure theoretic definition coincides with the above classical definition.

Examples

Example 1: Normal mean and variance

Suppose a normal sample Xi ~ N(μσ2), i = 1, 2, ..., n is given.

(1) Variance σ2 is known

Let, Φ be the cumulative distribution function of the standard normal distribution, and

F
tn-1

the cumulative distribution function of the Student

tn-1

distribution. Both the functions

H\Phi(\mu)

and

Ht(\mu)

given by

H\Phi(\mu)=\Phi\left(

\sqrt{n
(\mu-\bar{X})}{\sigma}\right)

, and Ht(\mu)=

F\left(
tn-1
\sqrt{n
(\mu-\bar{X})}{s}\right)

,

satisfy the two requirements in the CD definition, and they are confidence distribution functions for μ. Furthermore,

HA(\mu)=\Phi\left(

\sqrt{n
(\mu-\bar{X})}{s}\right)

satisfies the definition of an asymptotic confidence distribution when n→∞, and it is an asymptotic confidence distribution for μ. The uses of

Ht(\mu)

and

HA(\mu)

are equivalent to state that we use

N(\bar{X},\sigma2)

and

N(\bar{X},s2)

to estimate

\mu

, respectively.

(2) Variance σ2 is unknown

For the parameter μ, since

H\Phi(\mu)

involves the unknown parameter σ and it violates the two requirements in the CD definition, it is no longer a "distribution estimator" or a confidence distribution for μ. However,

Ht(\mu)

is still a CD for μ and

HA(\mu)

is an aCD for μ.

For the parameter σ2, the sample-dependent cumulative distribution function

H
\chi2
(\theta)=1-F
2
\chi
n-1

((n-1)s2/\theta)

is a confidence distribution function for σ2. Here,

F
2
\chi
n-1

is the cumulative distribution function of the
2
\chi
n-1

distribution .

In the case when the variance σ2 is known,

H\Phi(\mu)=\Phi\left(

\sqrt{n
(\mu-\bar{X})}{\sigma}\right)
is optimal in terms of producing the shortest confidence intervals at any given level. In the case when the variance σ2 is unknown,

Ht(\mu)=

F\left(
tn-1
\sqrt{n
(\mu-\bar{X})}{s}\right)
is an optimal confidence distribution for μ.

Example 2: Bivariate normal correlation

Let ρ denotes the correlation coefficient of a bivariate normal population. It is well known that Fisher's z defined by the Fisher transformation:

z={1\over2}ln{1+r\over1-r}

N({1\over2}ln{{1+\rho}\over{1-\rho}},{1\overn-3})

with a fast rate of convergence, where r is the sample correlation and n is the sample size.

The function

Hn(\rho)=1-\Phi\left(\sqrt{n-3}\left({1\over2}ln{1+r\over1-r}-{1\over2}ln{{1+\rho}\over{1-\rho}}\right)\right)

is an asymptotic confidence distribution for ρ.

An exact confidence density for ρ is[5] [6]

\pi(\rho|r)=

\nu(\nu-1)\Gamma(\nu-1)
\sqrt{2\pi

\Gamma(\nu+

1
2

)} (1-r2)

\nu-1
2

(1-\rho2)

\nu-2
2

(1-r\rho

1-2\nu
2
)F(
3,-
2
1
2

;\nu+

1
2

;

1+r\rho
2

)

where

F

is the Gaussian hypergeometric function and

\nu=n-1>1

. This is also the posterior density of a Bayes matching prior for the five parameters in the binormal distribution.[7]

The very last formula in the classical book by Fisher gives

\pi(\rho|r)=

(1-r2)
\nu-1
2
(1-\rho2)
\nu-2
2
\pi(\nu-2)!
\nu-2
\partial\left\{
\rhor
\theta-
1
2
\sin2\theta
\sin3\theta

\right\}

where

\cos\theta=-\rhor

and

0<\theta<\pi

. This formula was derived by C. R. Rao.[8]

Example 3: Binormal mean

Let data be generated by

Y=\gamma+U

where

\gamma

is an unknown vector in the plane and

U

has a binormal and known distribution in the plane. The distribution of

\Gammay=y-U

defines a confidence distribution for

\gamma

. The confidence regions

Ap

can be chosen as the interior of ellipses centered at

\gamma

and axes given by the eigenvectors of the covariance matrix of

\Gammay

. The confidence distribution is in this case binormal with mean

\gamma

, and the confidence regions can be chosen in many other ways.[4] The confidence distribution coincides in this case with the Bayesian posterior using the right Haar prior.[9] The argument generalizes to the case of an unknown mean

\gamma

in an infinite-dimensional Hilbert space, but in this case the confidence distribution is not a Bayesian posterior.[10]

Using confidence distributions for inference

Confidence interval

From the CD definition, it is evident that the interval

(-infty,

-1
H
n

(1-\alpha)],

-1
[H
n

(\alpha),infty)

and
-1
[H
n

(\alpha/2),

-1
H
n

(1-\alpha/2)]

provide 100(1 - α)%-level confidence intervals of different kinds, for θ, for any α ∈ (0, 1). Also
-1
[H
n

(\alpha1),

-1
H
n

(1-\alpha2)]

is a level 100(1 - α1 - α2)% confidence interval for the parameter θ for any α1 > 0, α2 > 0 and α1 + α2 < 1. Here,
-1
H
n

(\beta)

is the 100β% quantile of

Hn(\theta)

or it solves for θ in equation

Hn(\theta)=\beta

. The same holds for a CD, where the confidence level is achieved in limit. Some authors have proposed using them for graphically viewing what parameter values are consistent with the data, instead of coverage or performance purposes.[11] [12]

Point estimation

Point estimators can also be constructed given a confidence distribution estimator for the parameter of interest. For example, given Hn(θ) the CD for a parameter θ, natural choices of point estimators include the median Mn = Hn-1(1/2), the mean

\bar{\theta}n=\int

infty
-infty

tdHn(t)

, and the maximum point of the CD density

\widehat{\theta}n=\argmax\thetahn(\theta),hn(\theta)=H'n(\theta).

Under some modest conditions, among other properties, one can prove that these point estimators are all consistent. Certain confidence distributions can give optimal frequentist estimators.[10]

Hypothesis testing

One can derive a p-value for a test, either one-sided or two-sided, concerning the parameter θ, from its confidence distribution Hn(θ). Denote by the probability mass of a set C under the confidence distribution function

ps(C)=Hn(C)=\intCdH(\theta).

This ps(C) is called "support" in the CD inference and also known as "belief" in the fiducial literature. We have

(1) For the one-sided test K0: θ ∈ C vs. K1: θ ∈ Cc, where C is of the type of (-∞, b] or [''b'',&nbsp;∞), one can show from the CD definition that sup<sub>''θ''&nbsp;∈&nbsp;''C''</sub>''P''<sub>''θ''</sub>(''p''<sub>''s''</sub>(''C'')&nbsp;≤&nbsp;''α'')&nbsp;=&nbsp;''α''. Thus, ''p''<sub>''s''</sub>(''C'')&nbsp;=&nbsp;''H''<sub>''n''</sub>(''C'') is the corresponding p-value of the test. (2) For the singleton test ''K''<sub>0</sub>: ''θ''&nbsp;=&nbsp;''b'' vs. ''K''<sub>1</sub>: ''θ''&nbsp;≠&nbsp;''b'', ''P''<sub>{''K''<sub>0</sub>: ''θ''&nbsp;=&nbsp;''b''}</sub>(2&nbsp;min{''p''<sub>''s''</sub>(''C''<sub>lo</sub>), one can show from the CD definition that p<sub>s</sub>(''C''<sub>up</sub>)}&nbsp;≤&nbsp;''α'')&nbsp;=&nbsp;''α''. Thus, 2&nbsp;min{''p''<sub>''s''</sub>(''C''<sub>lo</sub>),&nbsp;''p''<sub>''s''</sub>(''C''<sub>up</sub>)} =&nbsp;2&nbsp;min{''H''<sub>''n''</sub>(''b''), 1&nbsp;&minus;&nbsp;''H''<sub>''n''</sub>(''b'')} is the corresponding p-value of the test. Here, ''C''<sub>lo</sub> =&nbsp;(&minus;∞,&nbsp;''b''] and Cup = [''b'',&nbsp;∞). See Figure 1 from Xie and Singh (2011)<ref name = "Xie2011"/> for a graphical illustration of the CD inference. == Implementations == A few statistical programs have implemented the ability to construct and graph confidence distributions. [[R (programming language)|R]], via the concurve,[13] pvaluefunctions, and episheet packages

Excel, via episheet[14]

Stata, via concurve

See also

Bibliography

Notes and References

  1. Fraser. D. A. S.. 2019-03-29. The p-value Function and Statistical Inference. The American Statistician. 73. sup1. 135–147. 10.1080/00031305.2018.1556735. 0003-1305. free.
  2. Cox. D. R.. June 1958. Some Problems Connected with Statistical Inference. The Annals of Mathematical Statistics. en. 29. 2. 357–372. 10.1214/aoms/1177706618. 0003-4851. free.
  3. Taraldsen. Gunnar. 2021. Joint Confidence Distributions. en. 10.13140/RG.2.2.33079.85920.
  4. Liu. Dungang. Liu. Regina Y.. Xie. Min-ge. 2021-04-30. Nonparametric Fusion Learning for Multiparameters: Synthesize Inferences From Diverse Sources Using Data Depth and Confidence Distribution. Journal of the American Statistical Association. 117 . 540 . en. 2086–2104. 10.1080/01621459.2021.1902817. 233657455 . 0162-1459.
  5. Taraldsen. Gunnar. 2021. The Confidence Density for Correlation. Sankhya A. 85 . 600–616 . en. 10.1007/s13171-021-00267-y. 244594067 . 0976-8378. free.
  6. Taraldsen. Gunnar. 2020. Confidence in Correlation. en. 10.13140/RG.2.2.23673.49769.
  7. Berger. James O.. Sun. Dongchu. 2008-04-01. Objective priors for the bivariate normal model. The Annals of Statistics. 36. 2. 10.1214/07-AOS501. 14703802 . 0090-5364. free. 0804.0987.
  8. Book: Fisher, Ronald Aylmer, Sir. Statistical methods and scientific inference. 1973. Hafner Press. 0-02-844740-9. [3d ed., rev. and enl.]. New York. 785822.
  9. Eaton. Morris L.. Sudderth. William D.. 2012. Invariance, model matching and probability matching. Sankhyā: The Indian Journal of Statistics, Series A (2008-). 74. 2. 170–193. 10.1007/s13171-012-0018-4 . 42003718 . 120705955 . 0976-836X.
  10. Taraldsen. Gunnar. Lindqvist. Bo Henry. 2013-02-01. Fiducial theory and optimal inference. The Annals of Statistics. 41. 1. 10.1214/13-AOS1083. 88520957 . 0090-5364. free. 1301.1717.
  11. Book: Cox. D. R.. Theoretical Statistics. Hinkley. D. V.. 1979-09-06. Chapman and Hall/CRC. 978-0-429-17021-8. en. 10.1201/b14832.
  12. Rafi. Zad. Greenland. Sander. 2020-09-30. Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology. 20. 1. 244. 10.1186/s12874-020-01105-9. 1909.08579. 1471-2288. 7528258. 32998683 . free .
  13. Web site: Concurve plots consonance curves, p-value functions, and S-value functions « Statistical Modeling, Causal Inference, and Social Science. statmodeling.stat.columbia.edu. en-US. 2020-04-15.
  14. Web site: Modern Epidemiology, 2nd Edition. www.krothman.org. 2020-04-15. 2020-01-29. https://web.archive.org/web/20200129153412/http://www.krothman.org/. dead.