Law of the unconscious statistician explained

In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function of a random variable in terms of and the probability distribution of .

The form of the law depends on the type of random variable in question. If the distribution of is discrete and one knows its probability mass function, then the expected value of is\operatorname[g(X)] = \sum_x g(x) p_X(x), \,where the sum is over all possible values of . If instead the distribution of is continuous with probability density function, then the expected value of is\operatorname[g(X)] = \int_^\infty g(x) f_X(x) \, \mathrmx

Both of these special cases can be expressed in terms of the cumulative probability distribution function of, with the expected value of now given by the Lebesgue - Stieltjes integral\operatorname[g(X)] = \int_^\infty g(x) \, \mathrmF_X(x).

In even greater generality, could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral. In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

Etymology

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the identity as the very definition of the expected value, rather than (more formally) as a consequence of its true definition. The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions. Many statistics textbooks do present the result as the definition of expected value.

Joint distributions

A similar property holds for joint distributions, or equivalently, for random vectors. For discrete random variables X and Y, a function of two variables g, and joint probability mass function

pX,(x,y)

:\operatorname[g(X, Y)] = \sum_y \sum_x g(x, y) p_(x, y)In the absolutely continuous case, with

fX,(x,y)

being the joint probability density function,\operatorname[g(X, Y)] = \int_^\infty \int_^\infty g(x, y) f_(x, y) \, \mathrmx \, \mathrmy

Special cases

A number of special cases are given here. In the simplest case, where the random variable takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

Discrete case

Suppose that is a random variable which takes on only finitely or countably many different values, with probabilities . Then for any function of these values, the random variable has values, although some of these may coincide with each other. For example, this is the case if can take on both values and and .

Let enumerate the possible distinct values of

g(X)

, and for each let denote the collection of all with . Then, according to the definition of expected value, there is\operatorname[g(X)]=\sum_i y_i p_(y_i).

Since a

yi

can be the image of multiple, distinct

xj

, it holds thatp_(y_i) = \sum_ p_X(x_j).

Then the expected value can be rewritten as\sum_i y_i p_(y_i) = \sum_i y_i \sum_ p_X(x_j) = \sum_i \sum_ g(x_j) p_X(x_j) = \sum_x g(x)p_X(x).This equality relates the average of the outputs of as weighted by the probabilities of the outputs themselves to the average of the outputs of as weighted by the probabilities of the outputs of .

If takes on only finitely many possible values, the above is fully rigorous. However, if takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem. Because of this, it is necessary to assume the absolute convergence of the sums in question.

Continuous case

Suppose that is a random variable whose distribution has a continuous density . If is a general function, then the probability that is valued in a set of real numbers equals the probability that is valued in, which is given by\int_ f(x)\,\mathrmx.Under various conditions on, the change-of-variables formula for integration can be applied to relate this to an integral over, and hence to identify the density of in terms of the density of . In the simplest case, if is differentiable with nowhere-vanishing derivative, then the above integral can be written as\int_K f(g^(y))(g^)'(y)\,\mathrmy,thereby identifying as possessing the density . The expected value of is then identified as\int_^\infty yf(g^(y))(g^)'(y)\,\mathrmy=\int_^\infty g(x)f(x)\,\mathrmx,where the equality follows by another use of the change-of-variables formula for integration. This shows that the expected value of is encoded entirely by the function and the density of .

The assumption that is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as . The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula. In even greater generality, using the Lebesgue theory as below, it can be found that the identity\operatorname[g(X)]=\int_^\infty g(x)f(x)\,\mathrmxholds true whenever has a density (which does not have to be continuous) and whenever is a measurable function for which has finite expected value. (Every continuous function is measurable.) Furthermore, without modification to the proof, this holds even if is a random vector (with density) and is a multivariable function; the integral is then taken over the multi-dimensional range of values of .

Measure-theoretic formulation

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral. Here, the setting is that of a measure space and a measurable map from to a measurable space . The theorem then says that for any measurable function on which is valued in real numbers (or even the extended real number line), there is \int_\Omega g \circ X \, \mathrm\mu = \int_ g \, \mathrm(X_\sharp \mu),(interpreted as saying, in particular, that either side of the equality exists if the other side exists). Here denotes the pushforward measure on . The 'discrete case' given above is the special case arising when takes on only countably many values and is a probability measure. In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of the monotone convergence theorem. Without any major changes, the result can also be formulated in the setting of outer measures.

If is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable. In the special case that the measure is absolutely continuous relative to some background σ-finite measure on, there is a real-valued function on representing the Radon–Nikodym derivative of the two measures, and then\int_ g \, \mathrm(X_\sharp \mu)=\int_gf_X\,\mathrm\nu.In the further special case that is the real number line, as in the contexts discussed above, it is natural to take to be the Lebesgue measure, and this then recovers the 'continuous case' given above whenever is a probability measure. (In this special case, the condition of σ-finiteness is vacuous, since Lebesgue measure and every probability measure are trivially σ-finite.)

References