For those who are interested in the details: we model the strength of teams using the well-know Elo model (used e.g., for official chess ratings).
We innovated on two aspects of the traditional Elo model, in which every team has an independent parameter:
1) We actually model the strength of players instead of teams. This makes it possible to learn from games that were played in championships between clubs, and transfer this knowledge to games between countries
2) There is a not-so-widely-known connection between Elo-type comparison models and Gaussian process classification. We leverage this, and get a full posterior distribution for each team's strength. (Information on the uncertainty of our estimates helps a lot in coming up with sensible predictions)
If anyone wants to know more (explanations on the web page are very superficial at the moment), please drop me a line!
But how does that make sense in a game where you can have a draw? By definition 100% cannot cover only home/away. Draw must be factored in. So doesn't that make all the Elo assumptions false?
Just wondering how you can take draw into consideration...
Excellent point - this one is definitely on our todo list. There are several simple extensions of the Elo model that take draws into consideration (i.e., give a non-zero probability to draws), for example the Rao-Kupper model. There are only minimal changes needed w.r.t. the original model, but still we didn't manage to make the changes in time for this version of the site.
In short: at its core, the "Elo assumption" postulates that every team can be represented by a real number (that can be interpreted as the strength of the team), and that the probability of the outcome depends on on the difference in strength. In the vanilla Elo model, the outcome is binary, but it's easy to make it ternary.
The thing is that football is a time based sport, so draw is an outcome with pretty good chances.
Usually weak teams will try to delay as much as they can to get the draw. Given enough time, the strong team would have much bigger chances to win. Also, depending on the context ( points needed for each time ) a team might have bigger motive to go for a draw than a win.
Predicting outcome possibility in football is a very complicated story, I doubt it can be solved in a simple way like elo ranking the players or teams.
That said, the knock-out phase might be more suitable for that model.
A few years ago I had a similar idea for trying to build team models, so you can make a better guess at the performance of national teams, since they don't play very often, or for league teams due to transfers at the start/during the season, but hadn't got as far as you :)
We did some preliminary experiments in this direction. Basically, we tried to do a regression on the score difference instead of using only binary outcomes. In our experiments it didn't improve the predictive accuracy - but there are many more things to try. It does feel a bit wasteful not to take score data into account.
One common way to represent the probability of outcomes in chess as a single number (e.g. in comparing opening lines) is to say "white gets 55% of the points", which aggregates wins and draws. (So for instance if white wins 50%, draws 25% and loses 25% of the games, it gets 0.5 * 1 + 0.25 * 0.5 + 0.25 * 0 = 62.5% of the points.)
One specific criticism regarding Portugal-Iceland: main squad will not be exactly as considered there. E.g. Cristiano Ronaldo is considered a substitute.. Does it make a difference to your predictions?
You are correct about Portugal, not exactly the squad that will likely play against Iceland. We used the starting lineup of the last official game (I believe in this case it was Portugal vs England on June 2nd), which did not include e.g. Cristiano Ronaldo.
The actual lineup does impact the predictions - we will update the lineups before the start of each game, when the lineup is anounced.
In a future version, we would like to make it possible for visitors to change the players in the team, and automatically update the prediction.
I think the same is also true for Ireland. The last friendly match was used to view mainly fringe players. I guess 7/8 of the likely starters are here as substitutes. Perhaps you should look at the last competitive game in the qualifiers.
Have you had any chance to compare your player strength values against other models to check how well they are aligned? You can find from Premier League's web site "Player Performance Index", e.g. for last season the top three players according to PL PPI were Harry Kane, Riyad Mahrez and Jamie Vardy.
Another obvious question is that have you checked your model against the odds on betting sites that provide "Draw-no-bet" bets since you are not yet taking into account draws?
I'm interested in how you model the strength of players, is that for the whole squad or the expected starting eleven? The prediction that stood out for me was the Wales vs Slovakia which FIFA rankings and betting odds would both suggest will be closer, would love to hear more about the factors behind that particular prediction.
The Wales starting lineup is missing Gareth Bale (arguably in the top 5 best players in the world). His 'kickscore' is listed as higher than any of the welsh players so I don't understand why the model has excluded him from the starting lineup. See also Portugal and Ronaldo (kickscore 100!).
Interesting but perhaps needs some tweaking to match the expected starting lineups.
Hi! I'm Victor, one of the researchers behind this project. We used the most recent lineups (up to yesterday) to do the predictions you see on the web page. We will update them with the latest friendlies (e.g., Portugal indeed) and before every game, as soon as we have the official lineups!
It may be worth taking competitive matches into account over friendly matches, as the purpose of competitive matches is to win at all cost. Friendly matches, on the other hand, tend to be used primarily for match fitness (especially leading up to tournaments), and for trialling new tactics before the competitive games begin.
It will be interesting to see how things change when you get the official lineups. As I'm sure you are aware, teams often rest their 'star' players in friendlies immediately prior to a tournament in order to minimize the risk of injury.
We innovated on two aspects of the traditional Elo model, in which every team has an independent parameter:
1) We actually model the strength of players instead of teams. This makes it possible to learn from games that were played in championships between clubs, and transfer this knowledge to games between countries
2) There is a not-so-widely-known connection between Elo-type comparison models and Gaussian process classification. We leverage this, and get a full posterior distribution for each team's strength. (Information on the uncertainty of our estimates helps a lot in coming up with sensible predictions)
If anyone wants to know more (explanations on the web page are very superficial at the moment), please drop me a line!