This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.
For the last two years, I’ve been publishing the results of a statistical model for predicting the results of the Eurovision Song Contest. This year’s final takes place on Saturday in an abandoned shipyard in Copenhagen, so it’s time for some more predictions. I’ve made some small changes to the model this year, which have had huge consequences for the results, which I think should be a lot more accurate now.
What’s new in model-land?
The big change in this year’s model is that I’ve incorporated data from betting markets, specifically the Betfair Eurovision winner market. In previous years, the model has had no real information about song quality to go on, apart from what we know from previous contests. This year, I took a look at the relationship between a song’s betting odds and the quality score estimated by the model.
Using data from 2004-20101, I’ve plotted the betting odds available in the week before the contest against the song’s overall quality score, as estimated by the model after the fact. There’s a pretty clear relationship: better songs get shorter odds, as you might expect. It’s not perfect, but it’s definitely better than what we had before.
Interestingly, there doesn’t seem to be as good a relationship between the betting odds and actual performance in the contest. It seems that gamblers are better at taking into account the quality of a song than the complicated voting patterns which exist. This is good for us, because it means that we can use the betting odds as a proxy for song quality without worrying about double-counting voting relationships.
I’ve also removed the effect of performer gender. After a bit of experimentation, it seems that this wasn’t helping much, and may even have been making things worse. It was also a bit of a pain to classify objectively, so I’ve dropped it. There’s not much effect on the final result.
On a technical level, I’ve reimplemented the model in Julia as a learning exercise. In general, I’m pretty impressed with Julia as a language. There are some mild annoyances with the type system, but I expect that’s more a result of my slightly dodgy beginner’s code than anything to do with the language itself. Performance is pretty fantastic, and all that’s really missing is the mature package ecosystem that more established languages have.
Enough with the nerding, what about the contest?
A few more countries have dropped out (Serbia, Croatia, Bulgaria, Cyprus), mostly citing economic worries. This leaves a bit of a hole in the Balkan region, which is historically one of the stronger voting blocs. It’s not immediately clear to me what effect that will have, but it’s probably good news for the other large blocs in Scandinavia and the former Soviet Union.
Returning, we have Portugal and Poland. Portugal will probably give 12 points to Spain, but this isn’t likely to affect the outcome of the contest by much. Poland are more of a wildcard, so it’s doubtful they’ll have a huge effect either. Overall, it’s likely that there’s been a slight rebalancing of the contest from east to west.
From a geopolitical perspective, it’s obvious to ask what the effect will be of recent events in Ukraine. The EBU have ruled that televotes from the Crimea will be treated as Ukrainian votes, as Ukrainian telecoms operators are still active in the region. As Ukraine usually gives a fairly high score to Russia anyway, it’s doubtful that this will skew things greatly.
Generally speaking, it’s quite rare for international events to have a big effect on voting in the contest, but it’s conceivable that there could be a small sympathy boost for Ukraine. Given that Ukraine is in semi-serious contention anyway, a small increase in votes could be all they need. It’s unlikely that there will be much negative backlash against Russia, for the simple reason that it’s impossible to cast votes against a country in Eurovision.
Here are the results of the Bayesian jury
Anyway, what are the predictions? The betting public seem to have chosen Armenia’s Aram MP3 as their favourite, but the model likes Sweden’s Sanna Nielsen a little bit more. As I said before, Ukraine are in with an outside shot, and the probabilities drop off very quickly after that.
Compared to previous years, the model is showing very high degrees of certainty, but this is largely due to having incorporated the Betfair data. In reality, this is a year with no stand-out entries, so it’s probably more open than usual to a strong performance on the night.
If we compare the model probabilities with the implied probabilities from just the Betfair data, there are some interesting patterns. Betfair gives a much higher win probability for the UK than the model, which might be explained by the primarily UK-based customer base of Betfair. Similarly, the chances of my personal favourite, Austria’s Conchita Wurst might be overestimated by some in the west.
Interestingly, the three countries the model projects as probable winners are all competing in the first semi-final on Tuesday night, along with Azerbaijan, Russia and Hungary, all of which are also highly rated. The only entrants in the second semi-final with more than 1% chance of winning are Norway and Greece, both of which clock in around 3%. The draw for the semi-finals is largely designed to prevent regional bloc voting, and doesn’t do much to prevent unbalanced draws like this one.
Such a strong field makes the qualifications from the first semi-final a little bit predictable. The six countries I’ve mentioned so far all have more than 95% chance of qualifying. Of the others, the Netherlands and Belgium should be back in the final, repeating last year’s success after a long absence. Moldova and Estonia are likely the last two qualifiers, but Iceland have an outside shot. San Marino, having sent the same performer every year they’ve entered, are very likely to have the same result, immediate elimination. Sorry, San Marino.
This is the only data I could get hold of. If anyone reading has more recent data, please get in touch. ↩