One report from Zion Market Research suggested that the sports betting market was worth over $104 billion in 2017, and would reach $155 billion by 2024. More recent analysis shows that the above was a gross understatement. Another report suggests the market will be worth over $600 billion by 2027. As bookmakers are seemingly permitted to advertise in practically every form possible, we may see sports betting hit the trillion-dollar mark by the end of the 2020s. And we at Mercurius have built an Artificial Intelligence for profiting from it. Start Trading with our automated trading bot.
With so much money involved, it is hardly a shock that the quest for the Holy Grail of betting continues apace: The ability to consistently make accurate sports, and especially football predictions. Any sports pundit or tipster with a half-decent record is feted as a demi-god. At least, until these supposed seers fall victim to the laws of probability. In reality, many of these tipsters are merely on a ‘hot streak,’ and their methodology is seldom one with long-term success.
The desperation to find people capable of forecasting football, in particular, leads to some very strange phenomena. Between 2008 and 2010, Paul the Octopus gained global fame for his uncannily accurate predictions. His handlers presented him with two boxes of food, each of which was covered with the flag of a nation involved in a fixture. Whichever box he ate from first was deemed his prediction for a match.
Overall, Paul correctly predicted 12 of 14 games, a success rate of over 85%. Of course, it was nothing more than a statistical anomaly, the kind one often sees across a small betting sample size.
What we intend to do in this article is to outline different ways to make accurate football predictions and forecast football. What? You’re wondering why we’re using forecasting and prediction interchangeably? In reality, they are technically different things.
Download the data used for writing this article!
By downloading this data sample you will get an Excel spreadsheet containing Pinnacle's opening and closing odds for the "Home Draw Away" market of the seasons 17/18 & 18/1 of the Premier League.
In this way you will be able to try out the methodologies described below with real data!
Forecasting Football Outcomes & Making Football Predictions: What’s the Difference?
Although a prediction is often called a ‘forecast,’ there is a subtle difference between the two. A prediction is based on the use of cross-sectional data. This is information gained from observing different subjects (such as people, businesses, or countries) at one point in time. Analysing cross-sectional data usually means comparing the differences between the selected subjects.
A forecast is based on the use of time-series data. As the name suggests, it involves the use of a series of data points listed in time order. This type of data is used in statistics, mathematical finance, and weather forecasting.
Despite the difference, the process of attaining accuracy in both is the same.
Uncertainty is Certain in Sports Betting
Whether you plan to forecast football or make a prediction, there is simply no way of knowing the future with any certainty. This is why a forecast/prediction is expressed in probability terms, i.e. odds. For example, odds of 2.00 means there is a 50% chance of an outcome occurring (not allowing for bookmaker ‘edge’).
It would behove all bettors to understand this simple concept before investing money. Even a very strong favourite at odds of 1.25, has a 20% chance of NOT winning! In May 1999, a horse named Victory Spin had a Starting Price of 1/66, or 1.015 in decimal odds. This equates to a 98.52% chance of winning, yet he lost!
I once backed a team in the women’s Lithuanian football league to win at odds of 67.00, a 1.49% chance of success, and they did so by a score of 4-1!
A sad fact is that in football, as in all sports, there are often attempts made to ‘fix’ the outcome. It destroys the integrity of sport but it happens more than people want to believe. The perpetrators pay sums of money to players to help them achieve a specific outcome. A number of football players have been implicated in match fixing scandals over the years.
However, even match fixing isn’t a guarantee of success! Kelong Kings is a fascinating book that looks at the life of the world’s most prolific match-fixer, Wilson Raj Perumal. The Singaporean was involved in countless fixes and spent spells behind bars. He even had the uncanny knack of knowing a fixed match when he wasn’t involved.
Despite his skill (after many early setbacks), even Perumal occasionally lost out due to the unpredictability of football. One memorable occasion involved the attempts of Perumal and a partner to fix an obscure international match between Bosnia and Zimbabwe. The idea was to bet heavily on the Bosnian team to win by four goals. As the fixer explains, even a two-goal victory was profitable, however.
Perumal and his accomplice paid five Zimbabwean players to perform poorly. However, they misjudged the gap in quality between the sides. With 20 minutes left, one of the paid Zimbabweans accidentally scored a 40-yard volley when Bosnia was leading 2-1. Perumal wrote that the player put his hands to his head in despair after scoring!
Judging the ‘Distance’
You can quickly measure the accuracy of any football prediction or forecast by determining the ‘distance’. It is a term used in the field of probability. It has a very simple formula:
Distance = 1 – The Probability for the outcome if it comes upThe shorter the distance, the better the accuracy. If you get a forecast completely wrong, you end up with a maximum score of 1. A score of 0 means complete 100% certainty. The only way one can guarantee a score of 0 is if you are a God capable of seeing the future. A prime example is Odin, who had an eye that ensured he could do just that!
Unfortunately, Odin doesn’t exist and if he did, surely he would have better things to do than forecast football!
In any case, let’s say you estimate a probability of a forecast of 80%. If this outcome occurs, the distance is 1 – 0.8 = 0.2. Not Odin, but pretty good nonetheless!
The Easiest Way to Predict Football Outcomes
We don’t wish to mislead our readers here. When we say ‘easy,’ we mean in terms of trying to understand the concept. You still need to put in a LOT of effort to gain sufficient understanding. Below, we have compiled three relatively simple ways of forecasting football. We follow this with a pair of more complex methods of creating football predictions
3 Basic Methods to Predict and Forecast Football Outcomes
1 – Pinnacle Closing Odds
We firmly believe that using Pinnacle’s closing odds is the easiest way to predict cross-sectional football outcomes. In this case, you also benefit from market efficiency because Pinnacle’s closing odds lines are the most consistently accurate around.
Remember, betting odds represent the probabilities of an expected outcome. Obviously, results are a black and white case of ‘you win’ or ‘you lose.’ Detailed research has shown that Pinnacle’s closing odds are remarkably efficient.
For instance, its closing odds from the 2018/19 English Premier League season transformed into probabilities with an average distance of 53.71%. Using opening odds, the distance was 53.69%. Practically no difference. However, if you bet on the opening odds using the information contained in closing odds, you would have earned a net profit of 21.6 units.
Looking at the raw numbers, it is puzzling. However, it is unwise to rely solely on averages because it only takes one or two outliers to completely skew the data. For instance, if the average weight of 30 people on a bus is 60kg and a 140kg man comes on board, the average is significantly increased.
There is an old saying that says you should "not cross a river that is on average four feet deep." While it might be four feet in many places, all it takes is one outlier, a 20-foot deep torrent, to place your life in danger. Likewise, our bus could have an average weight per person of 60kg, but not have a SINGLE passenger that actually weighs 60kg!
The Pinnacle closing odds provided only occur after an extraordinarily detailed analysis of an immense array of factors. Much of this information is not yet available at opening odds. Then you must also factor in things such as the Wisdom of the Crowd and Weight of Money.
While the ‘average’ distance is 53.69% for opening and 53.71% for closing, there are outliers that mask the true difference between the two sets of odds. This is evidenced by the ability to earn a significant profit from betting on the opening line using information contained in the Pinnacle closing odds. In the end, this efficiency is essential in value betting, which is in turn how you can expect to earn a profit.
2 – Previous Meetings
Perhaps the easiest way to forecast football outcomes using time-series data is to focus on the last game between two football teams. The idea is to say the same outcome will definitely happen again, and assign a probability of 1 to it. Oddly enough, there is method of this apparent madness! Remember, certain teams have wretched records against certain opposition and on specific grounds. For example, Everton hasn’t beaten Liverpool at Anfield since 1999.
In the 2018/19 season, Liverpool defeated Everton 1-0 at Anfield. For their 2019/20 season match at Liverpool’s ground, you assign a probability of 1 for a home win. It would be a successful wager as Liverpool won 5-2.
If you follow the above process and remove newly promoted sides (such as Norwich in the EPL in 2019/20), you could see very interesting results. For instance, following this method, and using 2017/18 outcomes to predict 2018/19 games resulted in an average distance of 53.3%. Pinnacle’s closing odds had a distance of 51.3%. However, the P & L was -19.5 units from 272 matches in 2019.
3 – Home – Draw – Away
This method involves the use of cross-sectional data. In the 2017/18 English Premier League season, there were 173 home wins, 99 draws, and 108 away wins. We clearly see the importance of home advantage and the fact that the draw is the least likely outcome. If you assign the same probability to all three outcomes, you get an average distance of 66.66%. However, the distance becomes 63.91% when you take the actual results into account.
Once again, however, following this strategy and betting on Pinnacle closing odds yields another loss, this time of -31.92 units.
The three methods above are useful ways of dipping your toe into the water. However, you need to invest more time and effort into digging much deeper. Rudimentary analysis won’t come close to giving you an edge in the long-term. In case you’re interested, check out the difference in Pinnacle’s opening and closing odds for the Premier League for yourself.
2 Complex Ways to Forecast Football Outcomes
Although the above methods of trying to predict football outcomes have merit, one needs to delve into more complicated matters to improve results. This means implementing a statistical model based on probability and mathematics. However, as statistician George Box once said: "The only question of interest is: ‘Is the model illuminating and useful?’
In this instance, a ‘model’ doesn’t refer to Tyra Banks and her ilk! Instead, it is a ‘set of assumptions’ we have on any given dataset. Such assumptions are as flawed as they are necessary. Box was also quoted as saying that ‘all models are wrong.’ Fortunately, in sports betting, we don’t need 100% accuracy. Instead, we simply need a useful enough model to identify, and take advantage of, value bets as and when they arrive.
This is the theory behind the following two methods of forecasting football outcomes:
- Dixon and Coles
- The Bayesian Hierarchical Model
Dixon and Coles Model for Football Predictions
Up until the 1990s, many people believed that forecasting the outcome of football matches was pointless because it was down to chance. However, the research of Stuart Coles and Mark Dixon changed everything. The duo used a ‘Poisson Process,’ named after a physicist named Simeon Poisson.
When you assume something follows this process, you do so in the belief that events occur at a fixed rate. After all, what happened in the past is no guarantee of what happens in the future, right? A game with no goals in the first half isn’t any more likely to have goals in the second half than if it has already had at least one goal. At least, that was the theory.
Therefore, the initial Dixon and Coles model worked on the assumption that goals were scored at a fixed and consistent rate during a match. They also assumed that the goal total varied depending on the teams in question. Now, they needed to calculate how many goals they could expect each side to score.
Ultimately, Dixon and Coles divided matters into attack and defence. The expected goals scored by the home side depended on:
Their attacking ability * The defensive weakness of the away side * Home advantage.
Expected goals by away sides depended on:
Their attacking ability * The defensive weakness of the home side.
The duo collected data from the English football league’s four divisions across several years. with 92 clubs, it equated to 185 factors to analyse. The addition of promotion and relegation made things more complicated. As a result, they used sophisticated computational methods. When using the model for the 1995/96, they decided only to bet on games where a result was 10% more likely than the bookmaker’s odds.
The Dixon and Coles model was the first complex model for predicting football outcomes and had profitability potential. However, you can probably guess the flaws.
Poisson is Imperfect in Practice for Football Betting
What Dixon and Coles did was astounding, but it needed to be built upon. The Poisson Process is not ideal for football. For a start, there are more draws than it predicts in reality. Two researchers from Germany analysed Bundesliga games from a period of over 40 years. They found that teams take fewer risks at 0-0 with 10 or fewer minutes to go, on average. Draws are perhaps less common in the era of three points for a win, but it is still worth keeping it in mind.
It is also a fact that goals are not scored at a fixed rate. In general, there are more goals in the last 15 minutes before half-time and the last 15 minutes of a match, than during other periods. Dixon and Coles’ model also didn’t account for the fact that players get tired during games. Therefore, their fixed attacking and defensive ratings were incorrect towards the end of matches.
What does this mean for prospective sports bettors then? If you are serious about betting professionally, follow these simple words of advice from Michael Kent: "It’s the model building that’s important. You have to know how to build a model. And you never stop building the model." Kent was part of the Computer Group, a betting syndicate that made a profit of $14 million between 1980 and 1985, with an ROI of almost 10%.
If you want success in betting, you need to create a model and work on it constantly; just as we do at Mercurius. Gathering the data is easy, the challenging parts include analysing it correctly, and constantly taking relevant new information into account. The rest is just noise.
Bayesian Hierarchical Network for Forecasting Football Outcomes
Bettors are so focused on being ‘right’ with their forecasts and predictions, that they don’t stop and think about what this really means. In essence, being a successful punter is about being ‘less wrong,’ according to Nate Silver in The Signal and the Noise. The old saying suggests that ‘God is in the detail.’ In the world of sports betting, ‘God is in the data.’ However, it is even more important to analyse the data correctly if you wish to make consistently accurate predictions.
What AI is potentially capable of doing is a complete game-changer in sports betting. We have already witnessed it in pursuits such as chess and ‘Go.’ AI can adapt to circumstances and change strategies to suit. In sports betting, it means updating the algorithm to incorporate relevant new data.
Even Pinnacle has fully climbed on board this particular train. Marco Blume, its head of trading, openly admitted that Pinnacle’s traders use machine learning models. Specifically, they use the Scikit Learn package in Python and the Caret package in R. According to Andrew Mack in Statistical Sports Models in Excel, there will come a time where such platforms are the only possible method of profiting from sports betting in the long-term. Time will tell.
What the Bayesian Hierarchical Network does is help you make the best use of the information we know so we can be wrong less often! Like AI, the Bayesian model uses relevant updated data to change its calculations. It gets its name from Thomas Bayes, an English minister who lived in the 18th century. The Bayesian Theorem is a philosophical and mathematical expression of getting closer to the truth as we gather more evidence.
It is important to note that while Bayes had the ideas, it was a French mathematician named Pierre-Simon Laplace who devised the ‘formula’ we use today. Although it is a complex idea, it is simple in mathematical terms. The Bayesian formula is an algebraic equation with four variables, three of which you already know. You use the three pieces of data you know to come with the answer to the unknown. Here’s the formula:
P(AB) = P(A) * P(BA) / P(B)The probability of Event A given B equals the Probability of Event A multiplied by the Probability of Event B Given A divided by the Probability of Event A
Using Bayes in Sports Betting
Let’s say you see that Manchester United are playing a home game against Leicester City on a Monday night. You know the following:
- Man United have won 6 of the last 10 meetings at Old Trafford.
- Leicester has won 2 of the last 10 games at Old Trafford with 2 draws.
Therefore, we could say United have a 60% chance of victory, with a 20% chance of an away win and a 20% chance of a draw. If the odds are greater than 1.67, you would back a United win in the expectation it is a value bet.
However, what if you find out that they have played 4 times on a Monday night, and Leicester have won 2 with 1 Man United win and 1 draw?
This is the best time to use the Bayesian theorem! It is designed to use the initial probability, and take into account a new condition that could change the odds of the outcomes.
Remember, Leicester has 2 wins at Old Trafford on a Monday night with no wins on any other day of the week for 2 wins total. There have been 4 Monday night games. We can use the probability of Leicester winning this fixture on a Monday night.
P(A) (Monday Night Games – Leicester Winning) = 2 / 4 = 0.5 P(BA) (Leicester Winning in All Meetings with Man United at Old Trafford) = 2 / 10 = 0.2 P(B) (Monday Night Games) 4/10 = 0.4 P(AB) = 0.5 * 0.2 = 0.1 0.1/0.4 = 0.25 = 25%
The fact the game is on a Monday night raises chances of a Leicester win from 20% to 25%. It isn’t a significant jump, but 5% is effectively the ROI that a professional gambler lives off.
Jettison Preconceived Notions!
One of the biggest mistakes a punter can make is to rigidly stick to an outcome despite changing circumstances. When you use the Bayesian hierarchical network, you constantly test new evidence that comes to light against your current position. However, you also have to be smart when the conditions enter the equation.
Bookmakers and bettors are still guilty of assigning too much importance to small data samples, especially when they are recent developments. What statisticians know is that the data they collect is useless without the context.
For example, there was data from February 2020 that showed Barcelona’s record with and without Lionel Messi since the beginning of the 2017/18 season. Across a sample size of 155 games, Messi played 129 of them. With Messi, Barcelona:
Won 68.2% of games. Lost 6.2% of games. Scored 2.3 goals per game and conceded just 0.8. Averaged 2.1 points per game.
Without Messi, Barcelona:
Won 57.7% of games. Lost 26.9% of games. Scored 2.2 goals per game and conceded 1.2. Averaged 1.7 points per game.
It is no surprise that Barcelona suffers without one of the greatest players ever. What we need to do is understand why the team does so badly, and why the opposition performs better. Using a scientific method involves the following:
Step Taken | Example in Sports Betting |
---|---|
Observe the phenomenon | Barcelona lose more often and win less often without Messi |
Develop a hypothesis to explain | Barcelona have less belief without the talisman, opposition get bolder and attack more often as seen by the higher rate of opposition goals scored |
Formulate a prediction | Barcelona will continue to struggle without Messi. As a result, opposition teams have a greater chance of winning, there is more chance of a higher scoring game. |
Test the prediction | Place your bet, perhaps on opposition goals or match goals if match odds are not suitable |
However, you must also apply more context. What quality of opposition has Barcelona faced without Messi? It is a relatively small sample size, so it could involve tricky away fixtures at a higher rate than when he plays. Not having Messi away to Real Madrid or Atletico is a different matter to not having him at home to a relegation struggler.
Download the data used for writing this article!
By downloading this data sample you will get an Excel spreadsheet containing Pinnacle's opening and closing odds for the "Home Draw Away" market of the seasons 17/18 & 18/1 of the Premier League.
In this way you will be able to try out the methodologies described below with real data!
Final Thoughts on Forecasting Football Outcomes
Ultimately, you must toss away your preconceived beliefs if you wish to accurately predict and forecast football matches. It doesn’t help your case if you assign 100% or 0% belief to anything. It is imperative that you provide a level of probability to every single possible outcome in a match. During the 2018/19 Premier League season, Crystal Palace were 176.00 to win on the Exchange when losing 1-0 away to Man City. They came back to win 3-2. Need we mention the 5000/1 title winners Leicester City?