Do Bettors Create their own "Hot Hand" Winning Streaks?
Posted 18th January 2016
There are two common fallacies expressed by gamblers. The first, the gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the mistaken belief that, if something happens more frequently than normal during some period, it will happen less frequently in the future, and vice versa. It arises because of confusion between the (meaningless) law of averages (things must even out) and the (correct) law of large numbers (things have a tendency to even out but don't have to). The second, the hot hand fallacy, sometimes called the reverse gambler's fallacy, is the fallacious belief that a person who has experienced success with a random event has a greater chance of further success in additional attempts. In the context of sports betting, bettors suffering from the hot-hand fallacy might unreasonably expect winning streaks to continue, whilst those suffering from the gambler's fallacy might unreasonably expect losing streaks to reverse.
Analysing a history 565,915 bets from 776 gamblers during the 2010 calendar year, Juemin Xu and Nigel Harvey from the Department of Cognitive, Perceptual and Brain Sciences at University College London set out to investigate the incidence of these fallacies in online gamblers betting on a range of sports, most predominantly football and horse racing. They published their findings in the journal Cognition (2014, volume 131, pages 173 to 180). By looking at winning and losing streaks, Xu and Harvey concluded that gamblers who won were more likely to win again (apparently because they choose safer odds than before), whereas those who lost were more likely to lose again (apparently because they choose riskier odds than before). As such, the selection of safer odds after winning and riskier ones after losing would indicate that online sports gamblers expected their luck to reverse: that is to say they suffered from the gamblers' fallacy. By believing in the gambler's fallacy, they created their own hot hands. This article seeks to debunk their conclusions, most specifically that gamblers are choosing to bet shorter/longer odds after winning/losing. Everything can be explained by biases in the data. Moreover, the assumption that the bets placed by the gamblers were appropriately sequential, the next bet following the settlement of the previous one, may not even be valid.
Xu and Harvey analysed British, European and American gamblers separately. British gamblers contributed the biggest share of bets (66%). 48% of all bets placed by British gamblers won. Next, Xu and Harvey looked at the winning percentage of all bets placed after one win. It was 49%. They did the same again for bets after two, three, four, five and six consecutive wins. The winning percentages respectively for the next bet were 57%, 67%, 72%, 75% and 76%. The European and American sub-samples showed similar findings.
If subsequent betting outcomes are independent from each other, how could these increases in winning percentages after previous consecutive wins occur? One explanation would involve gamblers intentionally shortening their odds after previous wins. The more consecutive wins, the shorter the odds become. Among all British gamblers, the average odds bet was 8.72. [Readers should note that with an average winning percentage of 48%, it is clear that some very long odds will have skewed this average. Indeed, the authors report a standard deviation in betting odds of 37.73. Given this skew it would arguably have been far more sensible to report implied expectations (1/odds).] After a winning bet, average odds for the next bet were found to be to 7.19 (standard deviation 35.02). Following two consecutive winning bets, the mean odds decreased to 4.60 (standard deviation 24.69). By the time gamblers had won 6 consecutive bets, the average odds for the next wager were 1.85 (standard deviation 9.82). Xu and Harvey concluded that people who had won on more consecutive occasions chose to bet less risky odds.
Readers with a close attention to detail might have already spotted a flaw. Clearly, Demaree, Weaver and Juergensen spotted it, and were moved to offer a critique of Xu and Harvey's conclusions, publishing in the same journal that the original authors had used (A fallacious
"Gambler's Fallacy"? Commentary on Xu and Harvey. Cognition, 2014, volume 139, pages 168 to 170). Specifically, they argued that the statistical methods employed by Xu and Harvey were prone to a serious selection bias, such that participants on winning or losing streaks may have already been choosing safer and riskier wagers respectively, prior to the beginning of their streaks, implying that no choice to shorten odds after winning streaks (and lengthening them after losing streaks) was being employed. In other words, longer winning streaks survived longer simply by virtue of the fact that bets placed as a whole by those gamblers experiencing the longer winning streaks were shorter anyway. Hence the next bet of a longer winning sequence would, as a consequence of this survivorship bias, be shorter.
We can see how this survivorship bias might arise by constructing our own hypothetical data set of gamblers and their wagers. For this task I have eliminated the bookmaker's margin, such that all odds are fair. Consider a set of 99 gamblers. The 1st places 1,000 wagers all at implied win expectations of 99%, i.e. odds of 1.0101. The 2nd player places 1,000 wagers at implied win expectation of 98%, i.e. odds of 1.0204. For each consecutive player, their wagers are placed at win expectations decreasing by 1%, until the 99th player, who bets at 1% win expectation (or odds of 100). In total there are 99,000 bets with average win expectation 50% and average odds 5.23 (standard deviation 11.74). I have applied this model simply to replicate as best I can (without too much effort) the skewed odds data set used by Xu and Harvey. A random number generator is applied to determine the outcome of each bet. The table below reports the win percentage of the next bet after winning streaks of 1 through 6.
|Consecutive wins||Average odds||Win percentage||1||3.00||50%
Whilst the average odds are considerably shorter than Xu and Harvey's (probably on account of an imprecise modelling to try to replicate their data), there is evidently the same odds shortening which they report after increasing numbers of consecutive wins. Indeed, by the time we get to winning streaks of 20 bets, the 21st bet has win probability of 92%. Moreover, the increase in winning percentages looks very much like the values reported by Xu and Harvey. Remember, however, none of this 'hot hand' streak can have anything to do with choice. In my data, each gambler bet the same odds for every one of his 1,000 wagers. The odds shortening (and win percentage increase), therefore, can only have arisen because of the survivorship bias discussed above.
In a follow-up paper responding to the criticism, Xu and Harvey decided to look at what gamblers with winning streaks were doing as a whole. If survivorship bias is the explanation for the original 'hot hand' finding, gamblers with longer winning streaks should be relatively safer
gamblers overall, betting shorter odds on average than gamblers who do not experience such long winning streaks, both during winning streaks and during betting activity outside those streaks. Xu and Harvey argued that if all the gambles placed by those with longer winning streaks do not have lower mean odds than the mean odds in the sample as a whole, then we can eliminate survivorship bias as an explanation of the results originally reported. If only that were so.
In performing their secondary analysis, Xu and Harvey reported results which showed no sign that gamblers who experienced
longer winning streaks generally placed safer bets or that gamblers who suffered longer losing streaks generally placed riskier bets. Unfortunately, their analysis failed to consider in influence of variable length of betting. Arguably, gamblers who have placed far more wagers can expect to have longer winning sequences simply by virtue of being in the game longer, even if the odds they are betting are the same as others who have placed far fewer wagers. A little simulation (using a random number generator) of wagers placed at odds of 2.00 shows this nicely, as reported in the table below.
|Player||Number of bets placed||Maximum length of winning streak||1||8||3
Indeed, an analysis of about a million tips (using betting odds from Pinnacle Sports) from the online tipping community Pyckio.com reveals similar findings. Despite little variation in the betting odds, the maximum length of winning streak increases as the number of total bets advised by tipsters increases.
|Maximum winning streak||Average number of bets||Average odds||1||4||2.15
Gamblers vary significantly in the way they place wagers. Along with many other natural phenomena, we typically find that about 80% of the wagers are placed by only 20% of the gamblers. This is the so-called Pareto Principle. This law is recursive, such that 20% of the 20% place 80% of the 80% and so on. Hence, we will observe that the number of wagers different gamblers place is highly skewed. By not reporting on the influence this variable wagering may have had, Xu and Harvey have failed to rule out the original survivorship bias as a possible cause of their 'hot hand' finding (in addition to this secondary bias), and their conclusion that gamblers intentionally choose to bet shorter odds after wins cannot be sustained.
Corresponding with Xu by e-mail, I put it to him that the best way to see whether gamblers are choosing to shorten odds after wins is to look at what they are doing during their winning sequences. Apparently, Xu explained, this is not the right way to analyse the data. If that is the case, I'm at a loss to know what is more appropriate. If we see actual odds shorten during winning sequences, this would imply real choice on the part of the gambler, and real causation of the effect. Analysing the Pyckio data again, I looked at the behaviour of gamblers who had managed 6 consecutive wins (13,951 of them). The average odds for the 1st, 2nd, 3rd, 4th, 5th and 6th bets in these winning sequences were as follows.
|Bet number in winning sequence||Average odds||1||1.64
The results speak for themselves. Furthermore, assigning a score of +1 for every shortening of odds from one winning bet to the next, and -1 for a lengthening of odds, the next table shows the average score for each pair.
|Odds pair||Average movement score||Bet 1 to bet 2||-0.04
||Bet 2 to bet 3||0.00
||Bet 3 to bet 4||-0.03
||Bet 4 to bet 5||-0.02
||Bet 5 to bet 6||-0.04
Overall, the average was -0.03, as near to no odds movement between bets as could possibly be within the confines of randomness.
Despite all of the analysis above, there remains one elephant in the room that has the potential to void everything: Xu and Harvey had no idea whether 'next bets' were placed before or after the results of 'previous bets' became known. Indeed, in correspondence with Xu, he conceded that he had no idea about the proportion. If you have no idea whether 'next bets' are placed after 'previous bets' are settled, how can you begin to draw conclusions about causality in odds movement between sequential winning bets, and specifically that gamblers are choosing to change odds in response to previous wins and losses? Of course, the same argument can be applied to the Pyckio data set. Fortunately, I have been able to filter for bets in that sample that were placed after the previous ones had been settled. [The opportunity to filter Xu and Harvey's data was not available to me. Xu explained he was unable to provide the raw data because the provider (presumably the bookmaker) would not allow this. So much for the integrity of scientific method.] The proportion amounted to only 11.6% of the total sample. The vast majority of bets placed were struck before previous ones had been settled. Consequently, for the vast majority of the sample, choice, if any was being exerted, could have had nothing to do with changes in odds during winning (and losing) sequences. Analysing that 11.6% as in the manner above (looking at what gamblers were doing within winning sequences) showed no evidence of any meaningful price changes between consecutive wins. Yet even for this sample, we cannot be sure that the tipsters advising or betting these picks were not advising or betting picks elsewhere and with other bookmakers. Similarly, even if Xu and Harvey were to revisit their data and test for previous bet settlement, they would still have no idea whether these gamblers had not been placing bets at other bookmakers in between consecutive wins in their data sample, which came from only one bookmaker. Sports bettors typically place more than one bet at a time.
In concluding from their analysis that, by believing in the gambler's fallacy, sports bettors create their own hot hands and choose to shorten the odds they bet after wins, Xu and Harvey have, in my opinion, demonstrated a complete misreading of their data. Irrespective of the failure to consider the influence of survivorship bias in the generation of longer winning sequences with shorter odds, and differential betting history lengths (that experience shows do exist) in the generation of longer winning sequences with the same odds, they have implied a causality where it is fundamentally impossible to do so, given the lack of information about the timing of sequential bets. In my view, their original paper should never have been accepted as a serious piece of work as it stood, since it really only presented a conclusion based on an assumption (next bets always followed settlement of previous bets) that is completely untestable. Sports bettors, like other gamblers, may very well suffer from both the gambler's fallacy and the hot hand fallacy, but in the context of sequential odds choice, and specifically the deliberate shortening of odds after wins, we really still have no idea. Moreover, in Asian handicap and American spread markets, the possibility that bettors can shorten their odds after wins is, of course, completely unavailable to them anyway.
If nothing else, hopefully readers who like to analyse data, whether investigating potential betting systems or running the numbers over their betting performance, will recognise that they should always consider potential sources of bias and what impact they might have on the conclusions they draw.