clock menu more-arrow no yes mobile

Filed under:

Fun with Graphs

David Moyes ponders Everton's chances of winning given a one-goal lead at home in the 58th minute. (hint: it's 78.7%)
David Moyes ponders Everton's chances of winning given a one-goal lead at home in the 58th minute. (hint: it's 78.7%)

A two -goal lead is the most dangerous lead in football. We hear it all the time, but is it really true? What if you could know the exact odds of your favorite team (Everton, let's say, since this is an Everton blog after all) winning, losing, or drawing at any given moment in a match?

Well guess what? Now you can!

I've always been fascinated with statistics, especially if the statistics are sports-related. Now I'm certainly no mathematician, but with just a basic knowledge of the principles of statistics it becomes possible to look at sports in different and interesting ways. One such way is the "win probability" graph, which is a visual representation of each team's chances of winning over the course of a game. Win probability is most closely associated with baseball, as the people at Fangraphs popularized the idea. There have been win probability models for other sports as well (including soccer), but I decided to create my own.

Now before I start boring all of you with a lot of math-speak, I'll give an example of what I'm talking about. Here's a graph from one of the most famous matches in Everton's history: the Great Escape of 1994. For those not familiar with the situation, I'll paint the picture: Everton entered the final match of the 1993-94 season in severe danger of being relegated out of the top division for the first time since 1954. As it turned out, only winning would have secured Everton's safety... and that's exactly what they did, defeating Wimbledon 3-2 in an incredible game at Goodison Park:


So what do all those lines mean? As you can see, the graph shows the chances of an Everton win, a Wimbledon win, and a draw over the 90+ minutes of the match. With this information, we can see that at two different points in the game (around the 20th minute and the 67th minute), Everton essentially were 95% sure to be relegated, which goes to show just how amazing it was that Everton managed to avoid the drop. Cool, huh?

Win probability generally considers two possible results (a win or a loss), but in soccer there are of course three possible results (win, loss, or draw). For that reason I've decided to call my system "outcome expectancy" rather than "win probability," as I think this better describes the information the graphs are conveying. I really like these charts because they almost read like an "emotional barometer" over the course of the match, with the same peaks and valleys you experience during the game as a fan. Anyway, I hope you guys find these half as interesting as I do, and I'll definitely try to include them with my match reports going forward. Just for kicks, here's the visual representation of Sunday's match with West Brom:


Now I would like to provide anyone who's interested with a little background on the mathematics that went into creating this model, though I can understand if most of you would prefer to skip the technical stuff, so fair warning before you read on...

To create the model, I used something called a Poisson distribution. This basically means that you can figure out the probability of an event if you first make some assumptions. In order to use a Poisson distribution, you have to be dealing with a fixed interval of time (in soccer, that's the 90+ minutes of the match). Then, you have to assume there is a known average rate (in this case, how many goals each team can be expected to score on average) and that the events occur independently (in other words, a goal being scored does not effect the average rate at which future goals will be scored).

Now if you've been paying attention, you've probably already spotted a problem. As anyone who has watched a lot of soccer can tell you, goals do not occur independently. For example, if the away team goes down 1-0, that generally opens up the game and increases the likelihood that more goals will be scored. This means that a soccer match is not a true Poisson distribution, but for our purposes the numbers are close enough to continue on.

In order to create the model, the first thing we have to do is find the known average rate. How many goals should we expect a home and away team to score respectively? Using the last ten years of Premier League data, I determined that on average a home team scores 1.508 goals per match and an away team scores 1.098 goals per match. With that information, you can determine the probability percentages for any possibility over the course of the game.

Another important thing to note is that these graphs assume that both teams are of equal skill level. This is a necessary assumption in order to make the statistics work properly. According to my calculations, on average the home team should win 46.8% of the time, the away team should win 27.5% of the time, and the teams should draw 25.7% of the time (last year's actual totals were 47.1% home wins, 23.7% away wins, and 29.2% draws). Now of course, if Everton are playing Manchester United at home they probably won't have a 46.8% chance of winning the match. Therefore, the best way to look at these graphs is to consider them in the context of the actual match-up on the field.

One more caveat... these models don't take into account the effect of a player being sent off. I'm currently working on adding that variable into the mix, so that should be in version 2.0.

Got all that? Whew. If you have any questions about anything related to this stuff feel free to comment below!