Statistical inference might be thought of as gambling theory applied to the world around us. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information.[1] In that sense, information theory might be considered a formal expression of the theory of gambling. It is no surprise, therefore, that information theory has applications to games of chance.[2]
See main article: Kelly criterion. Kelly betting or proportional betting is an application of information theory to investing and gambling. Its discoverer was John Larry Kelly, Jr.
Part of Kelly's insight was to have the gambler maximize the expectation of the logarithm of his capital, rather than the expected profit from each bet. This is important, since in the latter case, one would be led to gamble all he had when presented with a favorable bet, and if he lost, would have no capital with which to place subsequent bets. Kelly realized that it was the logarithm of the gambler's capital which is additive in sequential bets, and "to which the law of large numbers applies."
A bit is the amount of entropy in a bettable event with two possible outcomes and even odds. Obviously we could double our money if we knew beforehand for certain what the outcome of that event would be. Kelly's insight was that no matter how complicated the betting scenario is, we can use an optimum betting strategy, called the Kelly criterion, to make our money grow exponentially with whatever side information we are able to obtain. The value of this "illicit" side information is measured as mutual information relative to the outcome of the betable event:
\begin{align}I(X;Y)&=EY\{DKL(P(X|Y)\|P(X|I))\}\\ &=EY\{DKL(P(X|rm{side} rm{information} Y)\|P(X|rm{stated} rm{odds} I)) \},\end{align}
The nature of side information is extremely finicky. We have already seen that it can affect the actual event as well as our knowledge of the outcome. Suppose we have an informer, who tells us that a certain horse is going to win. We certainly do not want to bet all our money on that horse just upon a rumor: that informer may be betting on another horse, and may be spreading rumors just so he can get better odds himself. Instead, as we have indicated, we need to evaluate our side information in the long term to see how it correlates with the outcomes of the races. This way we can determine exactly how reliable our informer is, and place our bets precisely to maximize the expected logarithm of our capital according to the Kelly criterion. Even if our informer is lying to us, we can still profit from his lies if we can find some reverse correlation between his tips and the actual race results.
Doubling rate in gambling on a horse race is [3]
W(b,p)=E[log2S(X)]=
m | |
\sum | |
i=1 |
pilog2bioi
where there are
m
i
pi
bi
oi
oi=2
i
b=p
for which
maxbW(b,p)=\sumipilog2oi-H(p)
where
H(p)
An important but simple relation exists between the amount of side information a gambler obtains and the expected exponential growth of his capital (Kelly):
ElogKt=logK0+
t | |
\sum | |
i=1 |
Hi
for an optimal betting strategy, where
K0
Kt
Hi
This equation was the first application of Shannon's theory of information outside its prevailing paradigm of data communications (Pierce).
The logarithmic probability measure self-information or surprisal,[4] whose average is information entropy/uncertainty and whose average difference is KL-divergence, has applications to odds-analysis all by itself. Its two primary strengths are that surprisals: (i) reduce minuscule probabilities to numbers of manageable size, and (ii) add whenever probabilities multiply.
For example, one might say that "the number of states equals two to the number of bits" i.e. #states = 2#bits. Here the quantity that's measured in bits is the logarithmic information measure mentioned above. Hence there are N bits of surprisal in landing all heads on one's first toss of N coins.
The additive nature of surprisals, and one's ability to get a feel for their meaning with a handful of coins, can help one put improbable events (like winning the lottery, or having an accident) into context. For example if one out of 17 million tickets is a winner, then the surprisal of winning from a single random selection is about 24 bits. Tossing 24 coins a few times might give you a feel for the surprisal of getting all heads on the first try.
The additive nature of this measure also comes in handy when weighing alternatives. For example, imagine that the surprisal of harm from a vaccination is 20 bits. If the surprisal of catching a disease without it is 16 bits, but the surprisal of harm from the disease if you catch it is 2 bits, then the surprisal of harm from NOT getting the vaccination is only 16+2=18 bits. Whether or not you decide to get the vaccination (e.g. the monetary cost of paying for it is not included in this discussion), you can in that way at least take responsibility for a decision informed to the fact that not getting the vaccination involves more than one bit of additional risk.
More generally, one can relate probability p to bits of surprisal sbits as probability = 1/2sbits. As suggested above, this is mainly useful with small probabilities. However, Jaynes pointed out that with true-false assertions one can also define bits of evidence ebits as the surprisal against minus the surprisal for. This evidence in bits relates simply to the odds ratio = p/(1-p) = 2ebits, and has advantages similar to those of self-information itself.
Information theory can be thought of as a way of quantifying information so as to make the best decision in the face of imperfect information. That is, how to make the best decision using only the information you have available. The point of betting is to rationally assess all relevant variables of an uncertain game/race/match, then compare them to the bookmaker's assessments, which usually comes in the form of odds or spreads and place the proper bet if the assessments differ sufficiently.[5] The area of gambling where this has the most use is sports betting. Sports handicapping lends itself to information theory extremely well because of the availability of statistics. For many years noted economists have tested different mathematical theories using sports as their laboratory, with vastly differing results.
One theory regarding sports betting is that it is a random walk. Random walk is a scenario where new information, prices and returns will fluctuate by chance, this is part of the efficient-market hypothesis. The underlying belief of the efficient market hypothesis is that the market will always make adjustments for any new information. Therefore no one can beat the market because they are trading on the same information from which the market adjusted. However, according to Fama,[6] to have an efficient market three qualities need to be met:
Statisticians have shown that it's the third condition which allows for information theory to be useful in sports handicapping. When everyone doesn't agree on how information will affect the outcome of the event, we get differing opinions.