Probability mass function explained

In probability and statistics, a probability mass function (sometimes called probability function or frequency function^[1]) is a function that gives the probability that a discrete random variable is exactly equal to some value.^[2] Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

A probability mass function differs from a probability density function (PDF) in that the latter is associated with continuous rather than discrete random variables. A PDF must be integrated over an interval to yield a probability.^[3]

The value of the random variable having the largest probability mass is called the mode.

Formal definition

Probability mass function is the probability distribution of a discrete random variable, and provides the possible values and their associated probabilities. It is the function

p:\R\to[0,1]

defined by

for

-infin<x<infin

, where

is a probability measure.

p_X(x)

can also be simplified as

p(x)

.^[4]

The probabilities associated with all (hypothetical) values must be non-negative and sum up to 1,

$\sum_x p_X(x) = 1$ and $p_X(x)\geq 0.$

Thinking of probability as mass helps to avoid mistakes since the physical mass is conserved as is the total probability for all hypothetical outcomes

Measure theoretic formulation

A probability mass function of a discrete random variable

can be seen as a special case of two more general measure theoretic constructions: the distribution of

and the probability density function of

with respect to the counting measure. We make this more precise below.

Suppose that

(A,lA,P)

is a probability spaceand that

(B,lB)

is a measurable space whose underlying σ-algebra is discrete, so in particular contains singleton sets of

. In this setting, a random variable

X\colonA\toB

is discrete provided its image is countable.The pushforward measure

X_*(P)

—called the distribution of

in this context—is a probability measure on

whose restriction to singleton sets induces the probability mass function (as mentioned in the previous section)

f_X\colonB\toR

since

f_X(b)=P(X^-1(b))=P(X=b)

for each

b\inB

Now suppose that

(B,lB,\mu)

is a measure space equipped with the counting measure

\mu

. The probability density function

with respect to the counting measure, if it exists, is the Radon–Nikodym derivative of the pushforward measure of

(with respect to the counting measure), so

f=dX_*P/d\mu

and

is a function from

to the non-negative reals. As a consequence, for any

b\inB

we have

P(X=b)=P(X^(b)) = X_*(P)(b) = \int_ f d \mu = f(b),

demonstrating that

is in fact a probability mass function.

When there is a natural order among the potential outcomes

, it may be convenient to assign numerical values to them (or n-tuples in case of a discrete multivariate random variable) and to consider also values not in the image of

. That is,

f_X

may be defined for all real numbers and

f_X(x)=0

for all

x\notinX(S)

as shown in the figure.

The image of

has a countable subset on which the probability mass function

f_X(x)

is one. Consequently, the probability mass function is zero for all but a countable number of values of

The discontinuity of probability mass functions is related to the fact that the cumulative distribution function of a discrete random variable is also discontinuous. If

is a discrete random variable, then

P(X=x)=1

means that the casual event

(X=x)

is certain (it is true in 100% of the occurrences); on the contrary,

P(X=x)=0

means that the casual event

(X=x)

is always impossible. This statement isn't true for a continuous random variable

, for which

P(X=x)=0

for any possible

. Discretization is the process of converting a continuous random variable into a discrete one.

Examples

See main article: Bernoulli distribution, Binomial distribution and Geometric distribution.

Finite

There are three major distributions associated, the Bernoulli distribution, the binomial distribution and the geometric distribution.

Bernoulli distribution: ber(p) , is used to model an experiment with only two possible outcomes. The two outcomes are often encoded as 1 and 0. $p_X(x) = \begin$

p, & \textx\text \\1-p, & \textx\text\end An example of the Bernoulli distribution is tossing a coin. Suppose that

is the sample space of all outcomes of a single toss of a fair coin, and

is the random variable defined on

assigning 0 to the category "tails" and 1 to the category "heads". Since the coin is fair, the probability mass function is

p_X(x) = \begin\frac, &x = 0,\\\frac, &x = 1,\\0, &x \notin \.\end

Binomial distribution, models the number of successes when someone draws n times with replacement. Each draw or experiment is independent, with two possible outcomes. The associated probability mass function is $\binom p^k (1-p)^$ . An example of the binomial distribution is the probability of getting exactly one 6 when someone rolls a fair die three times.
Geometric distribution describes the number of trials needed to get one success. Its probability mass function is $p_X(k) = (1-p)^ p$ .An example is tossing a coin until the first "heads" appears.

denotes the probability of the outcome "heads", and

denotes the number of necessary coin tosses. Other distributions that can be modeled using a probability mass function are the categorical distribution (also known as the generalized Bernoulli distribution) and the multinomial distribution.

If the discrete distribution has two or more categories one of which may occur, whether or not these categories have a natural ordering, when there is only a single trial (draw) this is a categorical distribution.
An example of a multivariate discrete distribution, and of its probability mass function, is provided by the multinomial distribution. Here the multiple random variables are the numbers of successes in each of the categories after a given number of trials, and each non-zero probability mass gives the probability of a certain combination of numbers of successes in the various categories.

Infinite

The following exponentially declining distribution is an example of a distribution with an infinite number of possible outcomes—all the positive integers: $\text(X=i)= \frac\qquad \text i=1, 2, 3, \dots$ Despite the infinite number of possible outcomes, the total probability mass is 1/2 + 1/4 + 1/8 + ⋯ = 1, satisfying the unit total probability requirement for a probability distribution.

Multivariate case

See main article: Joint probability distribution.

Two or more discrete random variables have a joint probability mass function, which gives the probability of each possible combination of realizations for the random variables.

Notes and References

https://online.stat.psu.edu/stat414/lesson/7/7.2 7.2 - Probability Mass Functions | STAT 414 - PennState - Eberly College of Science
Book: Stewart, William J.. Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling. Princeton University Press. 2011. 978-1-4008-3281-1. 105.
Book: A modern introduction to probability and statistics : understanding why and how. 2005. Springer. Dekking, Michel, 1946-. 978-1-85233-896-1. London. 262680588.
Book: Rao, Singiresu S.. Engineering optimization : theory and practice. 1996. Wiley. 0-471-55034-5. 3rd. New York. 62080932.