Probability mass function explained

In probability and statistics, a probability mass function (sometimes called probability function or frequency function[1]) is a function that gives the probability that a discrete random variable is exactly equal to some value.[2] Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

A probability mass function differs from a probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be integrated over an interval to yield a probability.[3]

The value of the random variable having the largest probability mass is called the mode.

Formal definition

Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function

p:\R\to[0,1]

defined by

for

-infin<x<infin

, where

P

is a probability measure.

pX(x)

can also be simplified as

p(x)

.[4]

The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,

\sum_x p_X(x) = 1 and p_X(x)\geq 0.

Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes

x

.

Measure theoretic formulation

A probability mass function of a discrete random variable

X

can be seen as a special case of two more general measure theoretic constructions: the distribution of

X

and the probability density function of

X

with respect to the counting measure. We make this more precise below.

Suppose that

(A,lA,P)

is a probability spaceand that

(B,lB)

is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of

B

. In this setting, a random variable

X\colonA\toB

is discrete provided its image is countable.The pushforward measure

X*(P)

—called the distribution of

X

in this context—is a probability measure on

B

whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section)

fX\colonB\toR

since

fX(b)=P(X-1(b))=P(X=b)

for each

b\inB

.

Now suppose that

(B,lB,\mu)

is a measure space equipped with the counting measure

\mu

. The probability density function

f

of

X

with respect to the counting measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure of

X

(with respect to the counting measure), so

f=dX*P/d\mu

and

f

is a function from

B

to the non-negative reals. As a consequence, for any

b\inB

we haveP(X=b)=P(X^(b)) = X_*(P)(b) = \int_ f d \mu = f(b),

demonstrating that

f

is in fact a probability mass function.

When there is a natural order among the potential outcomes

x

, it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of

X

. That is,

fX

may be defined for all real numbers and

fX(x)=0

for all

x\notinX(S)

as shown in the figure.

The image of

X

has a countable subset on which the probability mass function

fX(x)

is one. Consequently, the probability mass function is zero for all but a countable number of values of

x

.

The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. If

X

is a discrete random variable, then

P(X=x)=1

means that the casual event

(X=x)

is certain (it is true in 100% of the occurrences); on the contrary,

P(X=x)=0

means that the casual event

(X=x)

is always impossible. This statement isn't true for a continuous random variable

X

, for which

P(X=x)=0

for any possible

x

. Discretization is the process of converting a continuous random variable into a discrete one.

Examples

See main article: Bernoulli distribution, Binomial distribution and Geometric distribution.

Finite

There are three major distributions associated, the Bernoulli distribution, the binomial distribution and the geometric distribution.

p, & \textx\text \\1-p, & \textx\text\end An example of the Bernoulli distribution is tossing a coin. Suppose that

S

is the sample space of all outcomes of a single toss of a fair coin, and

X

is the random variable defined on

S

assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is p_X(x) = \begin\frac, &x = 0,\\\frac, &x = 1,\\0, &x \notin \.\end

p

denotes the probability of the outcome "heads", and

k

denotes the number of necessary coin tosses. Other distributions that can be modeled using a probability mass function are the categorical distribution (also known as the generalized Bernoulli distribution) and the multinomial distribution.

Infinite

The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: \text(X=i)= \frac\qquad \text i=1, 2, 3, \dots Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.

Multivariate case

See main article: Joint probability distribution.

Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.

Further reading

Notes and References

  1. https://online.stat.psu.edu/stat414/lesson/7/7.2 7.2 - Probability Mass Functions | STAT 414 - PennState - Eberly College of Science
  2. Book: Stewart, William J.. Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling. Princeton University Press. 2011. 978-1-4008-3281-1. 105.
  3. Book: A modern introduction to probability and statistics : understanding why and how. 2005. Springer. Dekking, Michel, 1946-. 978-1-85233-896-1. London. 262680588.
  4. Book: Rao, Singiresu S.. Engineering optimization : theory and practice. 1996. Wiley. 0-471-55034-5. 3rd. New York. 62080932.