Hardy distribution explained

Hardy Distribution

Type:

mass

Pdf Caption:

The horizontal axis represents the hole score . The vertical axis represents the probability of the hole score given the par of the hole and the probabilities = 0.20 and = 0.10. The blue points represent the probabilities for a par three, the green points for a par four and the red points for a par five The function is defined only at integer values of . The connecting lines are only guides for the eye.

Cdf Caption:

The horizontal axis represents the hole score . The vertical axis represents the cumulative probability of the hole score given the par of the hole and the probabilities = 0.20 and = 0.10. The blue points represent the probabilities for a par three, the green points for a par four and the red points for a par five. The cumulative probability density (CDF) is discontinuous at the integers of and flat everywhere else because a variable that is Hardy distributed takes on only integer values.

Notation:

\operatorname{Hardy}(p,q;m)

Parameters:

p,q\in(0,1)

p+q\in(0,1)

and

m=1,2,3,...

Support:

n\inN₀

(Natural numbers starting from 1)

Pdf:

For m is odd:

P\left(X=n\right)=\sum

j=	m+1
	2

{n-1\choosen-j}{q}^n-j\left(A_{j,m

}+B_ \right)

For m is even:

P\left(X=n\right)=\sum

j=	m
	2

{n-1\choosen-j}{q}^n-j\left(A_{j,m

}+B_ \right)

with

A_{j,m

}\, = \, ^ \left(1-p-q \right) ^

and

B_{j,m

}\, = \, ^ \left(1-p-q \right) ^

Mean:

-\sum

	m
		{
	j=1

	\left(m+1-j\right){p
	^j-1

}}

Mgf:

For m is odd:

M_m\left(t\right)=\sum

+	1
	2

	\left(X_{\it
	+Y

_{\it}\right)~e^j

}

For m is even:

M_m\left(t\right)=\sum

j=	m
	2

	\left(X_{\it
	+Y

_{\it}\right)~e^j

}

with

X_{{\it

}}\, = \,^ \left(1-p-q \right) ^

and

Y_{{\it

}}\, = \,^ \left(1-p-q \right) ^

In probability theory and statistics, the Hardy distribution is a discrete probability distribution that expresses the probability of the hole score for a given golf player. It is based on Hardy's (Hardy, 1945) basic assumption that there are three types of shots: good

(G)

, bad

(B)

and ordinary

(O)

, where the probability of a good hit equals

, the probability of a bad hit equals

and the probability of an ordinary hit equals

1-p-q

. Hardy further assigned

a value of 2 to a good stroke, a value of 0 to a bad stroke and a value of 1 to a regular or ordinary stroke.

Once the sum of the values is greater than or equal to the value of the par of the hole, the number of strokes in question is equal to the score achieved on that hole. A birdie on a par three could then have come about in three ways:

and

, respectively, with probabilities

(1-p-q)p

p(1-p-q)

and

p²

Definitions

Probability mass function

A discrete random variable is said to have a Hardy distribution, with parameters

and

if it has a probability mass function given by:

P\left(X=n\right)=\sum

j=	m+1
	2

{n-1\choosen-j}{q}^n-j\left(A_{j,m

}+B_ \right) if m is odd

and

P\left(X=n\right)=\sum

j=	m
	2

{n-1\choosen-j}{q}^n-j\left(A_{j,m

}+B_ \right) if m is even

with

A_{j,m

}\, = \, ^ \left(1-p-q \right) ^

and

B_{j,m

}\, = \, ^ \left(1-p-q \right) ^

where

is the par of the hole (

m=1,2,\ldots

)

is the golf hole score (

	m
	2

	m
	2

+1,

	m
	2

+2,\ldots

) if

is even

is the golf hole score (

	m+1
	2

	m+1
	2

+1,

	m+1
	2

+2,\ldots

) if

is odd

is the probability of a good shot (

0<p<1

)

is the probability of a bad shot (

0<q<1

) and (

0<p+q<1

)

The moment generating function is given by:

M_m\left(t\right)=\sum

+	1
	2

	\left(X_{\it
	+Y

_{\it}\right)~e^j

} if m is odd

and

M_m\left(t\right)=\sum

j=	m
	2

	\left(X_{\it
	+Y

_{\it}\right)~e^j

} if m is even

with

X_{{\it

}}\, = \,^ \left(1-p-q \right) ^

and

Y_{{\it

}}\, = \,^ \left(1-p-q \right) ^

Each raw moment and each central moment can be easily determined with the moment generating function, but the formulas involved are too large to present here.

Hardy distribution for a par three, four and five

For a par three:

$\beginP \left(T_=n \right) &= ^ \left(^+2\,p \, \left(1-p-q \right) \right)+\\&+^ \left(p \, \left(1-p-q \right) ^+ \left(1-p-q \right) ^ \right)\end$

For a par four:

$\beginP \left(T_=n \right) &= ^^+\\&+^ \left(2\,^ \left(1-p-q \right) +3\,p \, \left(1-p-q \right) ^ \right)+\\&+^ \left(p \, \left(1-p-q \right) ^+ \left(1-p-q \right) ^ \right)\end$

Note the resemblance with

P(T₃=n)

. For a par five:

$\beginP \left(T_=n \right) &= ^ \left(^+3\,^ \left(1-p-q \right) \right) +\\&+^ \left(3\,^ \left(1-p-q \right) ^+4\,p \, \left(1-p-q \right) ^ \right) +\\&+^ \left(p \, \left(1-p-q \right) ^+ \left(1-p-q \right) ^ \right)\end$

Note the resemblance with the formulas for

P(T₃=n)

and

P(T₄=n)

History

When trying to make a probability distribution in golf that describes the frequency distribution of the number of strokes on a hole, the simplest setup is to assume that there are only two types of strokes:

A good stroke with a probability of

A bad stroke with a probability of

1-p

. while a good shot then gets the value 1 and a bad shot gets the value 0.

Once the sum of the shot values equals the par of the hole, that is the number of strokes needed for the hole. It is clear that with this setup, a birdie is not possible. After all, the smallest number of strokes one can get is the par of the hole. Hardy (1945) probably realized that too and then came up with the idea not to assume that there were just two types of strokes: good

(G)

and bad

(B)

, but three types:

good

(G)

with probability

bad

(B)

with probability

ordinary

(O)

with probability

1-p-q

In fact, Hardy called a good shot a supershot and a bad shot a subshot. ^[1] Minton later called Hardy's supershot an excellent shot

(E)

and Hardy's subshot a bad shot

(B)

.^[2] In this article, Minton's excellent shot is called a good shot

(G)

. Hardy came up with the idea of three types of shots in 1945, but the actual derivation of the probability distribution of the hole score was not given until 2012 by van der Ven.^[3]

Hardy assumed that the probability of a good stroke was equal to the probability of a bad stroke, namely

p=q

. This was confirmed by Kang:

In retrospect, Hardy might well have been right, as the data in Table 2 in van der Ven (2013) show. This table shows the estimated

- and

-values for holes 1-18 for rounds 1 and 2 of the 2012 British Open Championship. The mean values were equal to 0.0633 and 0.0697, respectively. Later Cohen (2002) introduced the idea that

and

should be different. Kang says about this:

For the Hardy distribution the values of

and

may be different.

Goodness of fit

The Hardy distribution gives the probability distribution of a single player's hole score. It takes several observations to perform a goodness-of-fit test (see Goodness of fit test) to check whether the Hardy distribution applies or not. This can be done with a single individual by having the individual play the same hole multiple times. Goodness-of-fit tests assume pure replications (see Replication (statistics)). This means that there should be no change in the player's golfing ability during repeated play of the hole. For example, there should not be an ongoing learning process (see Learning). Such effects cannot really be ruled out. One way around this problem is to use multiple players who can be assumed to have approximately the same golf proficiency. Such players are the participants in professional golf tournaments (see PGA Tour). Before using a goodness-of-fit test, it should first be checked that the participants indeed have approximately the same golf proficiency. This can be done separately for each hole by using, for example, the Pearson correlation coefficient between the hole score on the first day and the second day of a tournament. If there are no systematic differences (see Classical test theory) between players, the correlation (see Correlation) between the score achieved on Day 1 on a hole and the score achieved on Day 2 on that hole will not differ significantly (see Statistical significance) from zero. This can be easily tested statistically. In a study by van der Ven,^[4] the results of a goodness-of-fit test of the Hardy distribution were reported using the hole-by-hole scores from the 2012 Open Championship played at the St Andrews Golf Club. The distribution has been tested separately for each hole. Pearson's chi-squared test was used to determine whether the observed sample frequencies of the hole scores differed significantly from the expected frequencies according to the Hardy distribution. The fit between observed and expected frequencies was generally very satisfactory.

References

Notes

Notes and References

G.H. . Hardy . A mathematical theorem about golf . The Mathematical Gazette. 29 . 226–227 . 1945. 10.2307/3609265. 3609265 .
Book: Minton, R. B.. Gallian. Joseph A.. Mathematics and sports. Mathematical Association of America. 2010. 9780883853498. 10.5948/UPO9781614442004. G. H. Hardy's Golfing Adventure.
A.H.G.S. . van der Ven . The Hardy distribution for golf hole-by-hole scores . The Mathematical Gazette. 96. 428–438 . 2012. 10.1017/S0025557200005052 . 233357735 .
A.H.G.S. . van der Ven . Applying the Hardy Distribution to the Hole Scores of the 2012 British Open Championship . International Journal of Golf Science. 2. 152–161 . 2013 . 2 . 10.1123/ijgs.2013-0014 .