Is Football Home Advantage Reduced Playing Behind Closed Doors?
Posted 17th August 2020
In July 2020 Pinnacle published my article looking at whether playing behind closed doors as a consequence of the COVID-19 pandemic had reduced the influence of home advantage. Arguments have been made that without spectators, referees are less influenced by home fans. Consequently, they are relatively more forgiving to away teams than they would otherwise have been. When coupled with a potential reduction in the desire of the home team to perform for their crowd support, potentially as a form of loss aversion, one might expect to see a reduction in the advantage experienced by home football teams. Typically, they win 40 to 50% of games depending on the division, with away teams winning only a quarter to a half.
Since the COVID-19 restrictions were introduced, the following major European football divisions decided to complete their seasons by playing matches without spectators.
- English Premiership
- English Championship
- German Bundesliga 1
- German Bundesliga 2
- Italian Serie A
- Italian Serie B
- Spanish La Liga 1
- Spanish La Liga 2
- Portuguese Primeira Liga
- Turkish Super League
- Greek Super League
Prior to the restrictions a total of 2,924 matches had been played in these divisions. At the time of my writing, a further 502 had been played behind closed doors. From analysing the data, it was clear that whilst some match statistics had seen significant changes, these seemed to only weakly influence the strength of home advantage.
With the 2019/20 seasons now completed for all these divisions, I have revisited the analysis. There is now a total of 1,074 matches played behind closed doors, more than double the sample size previously. Can we find an impact on home advantage with this large sample size?
The table image below compares a number of different averages between game played with and without spectators. The first two data columns are fairly obvious.
The third column (Sigma) provides a measure of the statistical significance of the difference between the data with and without spectators. Sigma is the number of standard deviations. Broadly speaking, a figure of 3 corresponds to a 1-in-1,000 probability that the difference observed would be expected to happen by chance, a level that I would consider a minimum in this context to have any confidence that something is statistically significant. Saying that something is occurring by chance with less than a 1-in-1,000 probability is subjectively saying we think that chance probably isn't the only explanation and that something causal is at work.
Some notable differences stand out. Firstly, it has been confirmed that away teams are indeed punished far less in terms of bookings. 9 sigmas is off the charts in terms of happening by chance, hence we can assume that something causal is going on. The most likely candidate is that referees are no longer being swayed by a baying home crowd. At the same time, the referees appear to award significantly more fouls against the home team.
Secondly, it appears that home teams are playing less aggressively, with statistically fewer shots and shots on target and fewer corners. Players and commentators alike have noted that games behind closed doors have taken on more of a training ground feel to them. This might offer an explanation for the less attacking play.
But there are also some notable absences of any difference. Most importantly there appears to be no statistically significant difference in the number of goals scored by either home or away teams, and the distribution of home, draw and away results. Yes, there is a slight drop in home win percentage and a slight increase in away win percentage, but the differences are not statistically significant in my view. Whatever influences a change in referee and player behaviour are having, they are evidently not translating into a change in goals scored or the outcome of the game.
In fact, the change in home and away wins is even smaller than this data would suggest. Using Pinnacle's closing match odds with the margin removed to estimate the average home win expectation for the two groups of matches, we find that this particular set of 1,074 matches had a slightly lower average home win percentage expectation anyway. For the games played with spectators it was 44.7% (compared to 43.1% observed). For the games played without spectators it was 42.0% (compared to 42.0% observed).
Similarly, for the away wins, expected and observed were 29.2% and 29.3% respectively for game with spectators, whilst for games without fans the figures were 31.9% and 32.0%.
One might reasonably argue that the drop in average expected home win probability and increase in average expected away win probability is evidence that Pinnacle have factored in a change in home advantage. If this is true then the change is small and so far, like the change in actual outcome percentages, statistically insignificant. Equally it might simply be that the home teams playing without spectators were relatively worse than those playing with them in the first part of the season.
We could test this proposition if we had some measure of team quality independent of the betting odds. For this purpose I've used the ELO team ratings provided by ELOfootball.com. The average home and away team ELO ratings for the games with spectators for my sample was 1,784 and 1,783 respectively. For the games playing behind closed doors the figures are 1,790 and 1,789. On this evidence we can rule out a different relative home versus away team quality as a possible cause.
Nevertheless, the drop in expected home win pecentage is only about 2 sigmas, not really enough to totally rule out the role of chance at this stage. Hence from this aggregated data analysis, whilst there appears to be some weakening of the home advantage for a football team, the change is not large (and arguably not yet statistically significant with this size of data set), despite much more significant changes in specific match metrics one would assume might influence it. I should stress that this analysis uses an aggregated data set and I haven't investigated the presence of divisional differences.