The Will Rogers phenomenon, also rarely called the Okie paradox,[1] is when moving an observation from one group to another increases the average of both groups. It is named after a joke by the comedian Will Rogers in the 1930s about migration during the Great Depression:[2]
All Rogers was attempting to say was that, in his view, a group of Okies whose mean intelligence is lower than the average of all Okies may be smarter than the average Californian.
The apparent paradox comes from the rise in intelligence of both groups, which makes it seem as though intelligence has been "created." However, the overall population maintains the same average intelligence: moving a person from a low-intelligence group to a high-intelligence group makes the high-intelligence group larger, so the population mean (which is a weighted average of the two groups' intelligence) is unaffected.
Consider the following sets R and S, whose arithmetic mean are 2.5 and 7, respectively.
\left\{\begin{array}{lll} R&=\{1,2,3,4\}&(mean=2.5)\\ S&=\{{\bf5},6,7,8,9\}&(mean=7) \end{array}\right\} → \left\{\begin{array}{lll} R&=\{1,2,3,4,{\bf5}\}&(mean=3)\\ S&=\{6,7,8,9\}&(mean=7.5) \end{array}\right\}
If 5 is moved from S to R, then the arithmetic mean of R increases to 3, and the arithmetic mean of S increases to 7.5, even though the total set of numbers themselves, and therefore their overall average, have not changed.
Consider this more dramatic example, where the arithmetic means of sets R and S, are 1.5 and, respectively:
\left\{\begin{array}{lll} R&=\{1,2\}&(mean=1.5)\\ S&=\{{\bf99},10000,20000\}&(mean=10033) \end{array}\right\} → \left\{\begin{array}{lll} R&=\{1,2,{\bf99}\}&(mean=34)\\ S&=\{10000,20000\}&(mean=15000) \end{array}\right\}
If 99 is moved from S to R, then the arithmetic means increase to 34 and . The number 99 is orders of magnitude above 1 and 2, and orders of magnitude below and .
The element which is moved does not have to be the very lowest or highest of its set; it merely has to have a value that lies between the means of the two sets. And the sets themselves can have overlapping ranges. Consider this example:
\left\{\begin{array}{lll} R&=\{1,3,5,7,9,11,13\}&(mean=7)\\ S&=\{6,8,{\bf10},12,14,16,18\}&(mean=12) \end{array}\right\} → \left\{\begin{array}{lll} R&=\{1,3,5,7,9,{\bf10},11,13\}&(mean=7.375)\\ S&=\{6,8,12,14,16,18\}&(mean=12.333) \end{array}\right\}
If 10, which is larger than Rs mean of 7 and smaller than Ss mean of 12, is moved from S to R, the arithmetic means still increase slightly, to 7.375 and 12.333.
The phenomenon is a potential risk when comparing averages of subpopulations, because the outcome can change depending on how the population is divided rather than on changes of the individuals of the population. By extension, the average of the subpopulation averages would change even though there is no actual change within the population.[1] This effect can result in non-intuitive or objectionable results in Pareto and related utilitarian analysis.[1]
One real-world example of the phenomenon is seen in the medical concept of cancer stage migration,[3] which led clinician Alvan Feinstein to coin the term Will Rogers phenomenon in 1985, based on a remark by a friend who attributed the quote to Rogers.[4]
In medical stage migration, improved detection of illness leads to the movement of people from the set of healthy people to the set of unhealthy people. Because these people are actually not healthy — merely misclassified as healthy due to an imperfect earlier diagnosis — removing them from the set of healthy people increases the average lifespan of the healthy group. Likewise, the migrated people are healthier than the people already in the unhealthy set: their illness was so minor that only the newer more sensitive test could detect it. Adding them to the unhealthy raises the average lifespan of that group as well. Both lifespans are statistically lengthened, even if early detection of a cancer does not lead to better treatment: because it is detected earlier, more time is lived in the "unhealthy" set of people. In this form, the paradox can be viewed an instance of the equivocation fallacy. Equivocation occurs when one term is used with multiple meanings in order to mislead the listener into unwarranted comparisons, and life span statistics before and after a stage migration use different meanings of "unhealthy", as the cutoff for detection is different.