Recent Posts

Eurovision 2013: Final predictions

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.


After Tuesday night’s disappointing result, I was somewhat worried about the changes to the model for this year, and considered reverting to last year’s model. However, this would be both intellectually dishonest and, more importantly, a lot of work, so I decided against it. In any event, the second semi-final threw up fewer surprises than the first, and the model did fairly admirably, predicting 8 out of 10 qualifiers. This is better than random by quite a bit, but not an improvement on last year’s model. 14 out of 20 overall is respectable, but nothing to write home about.

Let’s get this out of the way

We now have all of the information we’re going to get before the final itself takes place on Saturday night. That means it’s time to make some forecasts.

As Macedonia failed to qualify, this will be the first Eurovision final since 1985 not to feature any (former) Yugoslavian entries. It will definitely be interesting to see what this does to the voting, as all of these countries’ points become up for grabs. The former USSR, on the other hand, will be there in strength, with only Latvia letting the side down. This is probably not great news for any of the ex-Soviet states, but Russia, Ukraine and Azerbaijan can probably weather the storm.

Winning probabilities

Now that they’ve qualified, Azerbaijan have retaken the top spot from Russia, and even extended their lead. This is probably because they’re better at drawing votes from outside the former Soviet Union, whereas Russia will now be competing somewhat with the other eight former Soviet republics. Scandinavia is also very well represented in the final, which may be a blow to the chances for everybody’s favourite, Denmark (now at an implied win probability of 55% on Betfair).

Overall, the chance of an ex-Soviet winner is a very respectable 47%: with nine entries, they must have a decent song in there somewhere.

Are you as good as you think you are?

Of course, all of these probabilities are based on a very vague idea of what each song’s quality level is. The model hasn’t heard any of the songs, so there’s some very important information missing in these calculations.

It’s possibly more interesting to ask, rather than “who will win?”, “how good does a song have to be to win?”. If we have an answer to that, we can apply our own judgment to the songs which we hear tomorrow night. I’ve plotted the quality level a song has to reach before its country has a 50-50 shot at victory. As the model’s quality units are a little abstract, I’ve also included five recent winners (and one not-winner) for comparison.

Threshold qualities

For reference:

By this measure, Russia have the easiest run of things, but they’ll still need a better song than they’ve ever produced to reach this level. In fact, only five countries have produced songs which, if they entered them this year, would give them a better than 50% chance of winning. From the graph, we can obviously see that Norway, Greece and Finland have done so - if they can replicate these performances, they’ll have an excellent chance of victory. Azerbaijan have also managed this, but interestingly not with their winning song - the model claims that their 2009 entry, “Always” was considerably better.

The other country is maybe more interesting. The United Kingdom have produced two entries which were good enough to win, but failed to do so for various reasons. In 1998, Imaani came a very close second with “Where Are You?”, losing out by only six points. In 2009, Jade Ewen sang the Andrew Lloyd Webber/Diane Warren number “It’s My Time”, but lost out to the Alexander Rybak juggernaut. In a less strong year like this year, either of these songs would be easily in with a good chance of winning.

The UK are one of the most variable countries, and this isn’t something the model takes into account. In a good year, they can severely overperform the model predictions. In a bad year, they can be among the worst countries out there. It’s up to you whether you think Bonnie Tyler is at the high or low end of that spectrum.

Early indications

Some countries vote very predictably, and other countries less so. Like last year, the voting order this year will be rigged for maximum excitement, so it’s likely that the more predictable countries will be got out of the way early in the voting, to increase the suspense. However, we can still look at which countries are likely to be good predictors of the final winner.

In this case, we’re looking at the “bellwether probability”, the chance that the entry that each country gives 12 points to goes on to win the contest. The more predictable countries tend to be very low on this score. Cyprus gives its 12 to Greece almost all the time, so like a stopped clock it’s only “right” when Greece wins. On the other hand, Hungary has no particular alignments, so its votes are more likely to match with those of Europe as a whole.

Bellwether probabilities

Last year, the best predictors were a diverse group of central European countries and outliers. This year we’ve added a new and intriguing group of bellwethers. As there are no former Yugoslavian entries in the final (nor their neighbour Albania), this normally completely predictable area has sprung wide open. If an entry can appeal to this area of the map, there are a lot of points available. If only everyone had known that beforehand.

Old friends

At the other end of the scale, there are the perennial relationships that lead people to claim that the voting is “rigged”. I’m reliably informed that people last year used these as the basis of a drinking game. I couldn’t possibly condone such behaviour, but I feel I should list them for completeness.

Actually, in the absence of the Balkans and Turkey, many of the longstanding relationships are left dangling. This could be one of the most unpredictable sets of voting in recent memory. However, some relationships remain strong:

  • Lithuania → Georgia (42%)
  • Ukraine → Azerbaijan (43%)
  • Albania → Greece (45%)
  • Belarus → Russia (48%)
  • Italy → Romania (49%)
  • France → Armenia (51%)
  • Armenia → Russia (56%)
  • Moldova → Romania (59%)
  • Romania → Moldova (74%)
  • Cyprus → Greece (90%)

As I said, these are less certain than last year, so adjust beverage sizes accordingly.

I don’t have time to read all that nerd stuff

To summarise, if this is a typical year, then Azerbaijan have the best shot at things. Russia have the easiest ride of things, but don’t have quite as consistent a record as the Azeris. The UK could probably win this thing if they bother to try this year, and avoid a Humperdinck-style disaster.

Things are a little unpredictable this year, because the qualifiers are a little bit unbalanced. You can still rely on Cyprus loving Greece to prove you haven’t slipped into an alternate timeline. He who controls the Balkans controls the universe.

For listeners in the UK, I’ll be doing an interview with BBC Radio Wales on Saturday night, around 7:50pm, as part of their Eurovision coverage, live from my (and Bonnie Tyler’s) local pub.


  1. This is, according to the model, the best song never to have won Eurovision. It actually came third in 2004, behind Ukraine and Serbia/Montenegro, both of which benefitted greatly from their regional voting blocs.


Eurovision 2013: Predictions for semi final 2

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.


Didn’t we do well?

I think it would be fair to say that the results of the first semi-final were surprising all round. Western Europe and the former Soviet Union both had very strong nights, while the former Yugoslavia managed zero qualifications from four attempts. From the standpoint of the model predictions, the most surprising thing was that Serbia failed to qualify. It’s a little unclear to me why the model was predicting such a high probability for Serbia (they failed to qualify in 2009 as well) but I would have expected that votes from the other former Yugoslavs would push them through. Perhaps these bonds are weakening somewhat with time, or perhaps the slight changes in the voting system have made things more difficult for them. Or maybe nobody liked the song (I quite enjoyed it, but probably wouldn’t have voted for it).

The model managed 6 out of 10 correct qualifications, which doesn’t sound bad, or 2 out of 6 knockouts, which does. In most of these cases, the probabilities were fairly balanced, so we got Belarus (50%), Netherlands (51%) and Ireland (55%) instead of Slovenia (58%), Cyprus (58%) and Croatia (74%). It would have been more surprising if there weren’t a few switches like this. Losing Serbia (91%) for Belgium (46%) seems a bit more serious, but it’s still something that’s going to happen from time to time. Overall, 6 out of 10 is less good than we might have expected, but still not terrible - it’s definitely not a statistically significant failure1.

Another day, another prediction

So, given that we’ve lost one of our most favoured countries (Serbia) and have gained a few surprise qualifiers, how do the winning probabilities change? Obviously, everyone who’s qualified has gained a bit of a bonus. Moldova, particularly, have jumped from respectable also-ran to outside shot. The big winner is obviously Russia, with a strong ex-Soviet showing pushing them into the first place slot. Bookies’ favourites Denmark have leap-frogged hosts Sweden to become the top Scandinavian country. I’ve colour-coded the graph below by qualification status: green have qualified from the first semi-final, purple have yet to qualify, and grey are the automatic qualifiers.

Winning probabilities

The model is confident enough about the qualification prospects of Azerbaijan and Greece that it doesn’t really matter that they’re yet to qualify. Of the automatic qualifiers, it looks like a tussle between established powerhouse and hosts Sweden and Italy, recently returned from a decade in the Eurovision wilderness. Spain, suffering from the lack of their old friends Portugal and Andorra, will not be doing so well.

A lot will now depend on the qualifiers from the second semi-final. If Azerbaijan do qualify, as expected, then they’ll retake Russia’s top spot. If not, the model predicts a straight fight between Russia and Ukraine. However, it’s always possible that some dark horse candidate, particularly from the West, could swoop in and change things completely.

A better tomorrow

So let’s look at tomorrow night’s second semi-final and see who’s going to qualify. This is a larger semi-final (17 rather than 16 contestants) but still only 10 qualification slots.

Qualification probabilities

After the first semi-final I feel rather silly claiming anything is certain, but Greece, Armenia and Azerbaijan look like they’re safer than most. After that there’s a fairly smooth decline in probability: Albania and Romania look pretty good, Norway, Israel and Iceland maybe a little less likely, and Malta and Georgia round things out to ten. According to the model, Switzerland have only a 19% chance of qualification, but given the strong performance from Western Europe in the first semi-final, it would be silly to rule them out completely. The former Yugoslavia’s only hope rests with Macedonia, and having heard the song, I don’t think they’ll be celebrating.

Note that Azerbaijan have a relatively tough semi-final compared to how the model predicts they’ll do in the final. This semi-final is fairly low on former Soviet republics, and one of the four that are here is their old enemy Armenia. Assuming they qualify, they’ll do a lot better when they can get votes from Russia, Ukraine, Moldova, etc.

So as an overall prediction, we have:

  • Greece (90%)
  • Armenia (87%)
  • Azerbaijan (84%)
  • Albania (77%)
  • Romania (72%)
  • Norway (68%)
  • Israel (67%)
  • Iceland (66%)
  • Malta (65%)
  • Georgia (60%)

Let’s see how wrong I can be this time, and I leave you with my personal favourite from Tuesday night, the Montenegrin space program.


Next post in the series


  1. p ≈ 0.17


Eurovision 2013: First predictions

This is post is part of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.


Previously on the Eurovision Song Contest

Last year around this time, I wrote a series of blog posts outlining a Bayesian predictive model for the Eurovision Song Contest, and making a set of predictions about last year’s contest. Apart from a few hiccups relating to Malta, the model was a fairly qualified success. This year, by “popular” demand, I’ve revisited the model and brought it up to date for 2013, ahead of Saturday’s showdown in Malmö.

For those just coming in, I’m going to give a quick recap of the model that I used last year. If you’re already familiar, feel free to skip ahead to the next section, where I’ll talk about what’s new for this year. If you’re looking for a more detailed look at last year’s contest, take a peek at the series of posts from the start

Essentially, we can look at people’s voting preferences in the Eurovision Song Contest as composed of two components: song quality, and a “friendship” score, which takes into account how much the voting country likes or dislikes1 the country being voted on. If we want to know whether a voter V will rank country A or country B higher, we add up the song quality and the friendship score in each case, and subtract the two. Then we can take this difference, feed it through a logistic curve and use the result as a probability.

I’ve taken voting results from both the Eurovision finals (going back to the introduction of televoting in 1998) and the semi-finals (going back to their introduction in 2004). I’ve then used a Markov Chain Monte Carlo sampler2 to calculate the song qualities and friendship scores, assuming that they’re both normally distributed.

Once I’ve got the parameters, it’s relatively straightforward to run a simulation of this year’s contest, including the semifinals and all the voting procedures. Last year I ran 10,000 simulations like this, and looked at both who the likely qualifiers from the semifinals were, and the overall winner. The model managed fifteen out of twenty of the qualifiers, as well as the eventual winner. Although this level of success, particularly predicting the winner, was probably mostly luck, this year I’d like to do the same.

What’s new?

The biggest change to the contest since last year is that four countries have pulled out (Bosnia-Herzegovina, Portugal, Slovakia, Turkey), and one has returned (welcome back, Armenia!). This obviously has effects on the voting dynamics, both directly (e.g. all the Balkan countries no longer get votes from Bosnia-Herzegovina) and indirectly (e.g. Germany will have an extra voting slot free, rather than spending it on Turkey). Overall, we should expect a slight rebalancing of the votes, although the changes are fairly evenly spread across Europe, so it’s unclear what the overall effect should be.

In terms of the model, I’ve added two terms to try to increase the accuracy. The first of these is a term which sets the average song quality for a given country. If we look at countries’ past record in Eurovision, some stick out as more consistently successful (or unsuccessful) than others. For example, Azerbaijan have never finished outside the top ten, and only once outside the top five. On the other hand, Switzerland seem to have trouble even qualifying for the final, and on one occasion even scored “nul points” in a semi-final, which is quite an achievement, given the competition. This leads us to an idea that some countries might, through greater enthusiasm for the contest, a larger talent pool, or some quirk of their selection process, just be plain better at producing Eurovision entries than others.

The term I’ve added simply affects the mean song quality for the country: all countries still have the same variance, and they all still have the potential to produce songs of any quality. However, on average, some countries do end up better than others. The effect here is smaller than the general variation in song quality, but still relatively large overall: it ends up explaining about a third to a half of the variation in song quality. I’ve plotted the value of this term below for all the countries which have competed since 1998. This year’s entrants are in green, and the bars show one standard deviation of song quality: roughly speaking the song quality should be inside the bars about two-thirds of the time.

Average song qualities

Countries near the top of the list (e.g. Azerbaijan, Russia) tend to be those which put a lot of resources into the contest, and see it as a way of promoting their culture throughout Europe. We also see high places for countries such as Sweden and Italy which have very well-developed national song competitions, which gives a strong talent pool to draw on. Near the bottom of the list we see a lot of smaller countries (Andorra, Monaco, San Marino) which simply don’t have the resources or the talent pool to compete successfully.

It’s also fairly notable that (barring Turkey and Bosnia-Herzegovina), the countries which are not competing this year are largely those which typically produce low-quality songs. It’s hard to tell which way the relationship works here. It’s possible that these countries have low enthusiasm for Eurovision, and thus have small talent pools to pick from, and pulling out is less of a big deal. It’s also possible that a string of poor performances could lead to a country becoming disillusioned with the competition.

The second change I’ve made to the model is to introduce a term accounting for the gender of the performers3. There’s a definite effect there: all-female entries are slightly better than all-male entries, which are a lot better than mixed entries. However, the overall magnitude is fairly small, around 0.3 quality units, roughly equal to the quality bonus a song gets for being from Malta.

Qualified success

This year, I’ve run 100,000 simulations of the full contest. Looking just at the first semi-final for now, there are 10 qualification places available, from a field of 16 countries. From a completely naïve standpoint, this means each country has a basline qualification probability of 62.5%. In reality though, some countries are more likely than others. For the first semi-final, this is what the model says.

Qualification probabilities

In general, this year’s model is more certain about its predictions than last year’s; time will tell if it’s any more accurate. Anyway, as I think most people would predict, Russia and Serbia are dead certs for qualification. Montenegro, on the other hand, have a mountain to climb. There aren’t enough Montenegrins for the diaspora to have any significant effect on most countries’ voting patterns (although Serbia, Croatia and Slovenia will probably give them a few).

The interesting stuff is in the middle of the table. Ireland benefit from having the UK and Denmark voting in their semifinal, while Slovenia will have a boost from having three other Balkan countries in the mix. I’m not sure whose idea it was to put Netherlands and Belgium in the same semifinal, but it’ll be interesting to see if it gives either of them a big enough boost.

Overall, I think Russia (94%), Serbia (91%) and Ukraine (86%) are safe. Denmark (76%), Croatia (74%), Estonia (69%) and Moldova (66%) are also pretty good bets. Beyond that, it looks like Lithuania (60%), Cyprus (58%) and Slovenia (58%) but I don’t think it’s safe to rule out Ireland (55%).

Back in Baku?

Looking on to the final then, once again the model is more confident than last year (although possibly no more accurate). The top predictions are similar, but not identical, to the list of countries with highest average song quality. Quirks of the voting have boosted Serbia’s relative chances, for example. Overall though, the lists are very similar.

Winning probabilities

Overall, we’d be foolish not to plump for either Azerbaijan or Russia as winner. The bookies, on the other hand, are heavily pushing the Danish entry. At time of writing, the available odds on Betfair are around 2.5, implying a win probability of 40%. I’d take this with a note of caution though. While the wisdom of the crowds is often right, Eurovision betting has something of an echo chamber effect, with lots of people piling on the perceived favourite, and driving the odds downwards. Two years ago, the favourite was France which finished in an ignominious 15th place, losing out to… Azerbaijan.


Next post in the series


  1. Let’s ignore for now what it means for a country to “like” another country. Countries that like each other vote for each other.

  2. Last year I used JAGS, this year I’ve used Stan.

  3. Gender is performance, and Eurovision doubly so. In each case I’ve assigned people the gender that they appear to be performing as. For avoidance of dount, Dustin the Turkey is male, and Verka Serduchka is female.


EMI Music Hackathon: How I Did It

Last weekend, Kaggle and Data Science London ran a second hackathon, this time focused around the EMI One Million Interview Dataset, a large database of musical preferences. I took third place globally in this competition, and this is an attempt to explain how my model worked. The code is available on GitHub.

For this competition I took a “blitz” approach. Rather than focusing on one model, trying to tweak all the performance I could out of it, I threw together a bunch of simple models, and blended their results. In the end, I combined ten separate predictions for my final submission.

Because I knew that I was going to be blending the results, for each model I retained a set of cross-validation predictions for the training set. These were used as input to the blending process, as well as to give me an idea of how well each model was performing, without having to use up my submission quota. In general, I used ten-fold cross-validation, but for models which used random forests, I simply used the out-of-bag predictions for each data point.

Preprocessing

As given, the data consists of a table of (user, artist, track, rating) quadruples, along with tables of data about users (demographic information, etc), and about user-artist pairs (descriptive words about each artist, whether the user owned any of their music, etc). These secondary tables were quite messy, with lots of missing data and a lack of standard values.

I generally don’t enjoy data cleaning, so I did one quick pass through this data to tidy it up a little, then used it as-is for all the models. I merged the two “Good lyrics” columns, which differed only in capitalisation. For the “OWN_ARTIST_MUSIC” column, I collapsed the multiple encodings for “Don’t know”. Similarly, I collapsed several of the levels of the “HEARD_OF” column. The responses for “LIST_OWN” and “LIST_BACK” needed to be converted to numbers, rather than the mish-mash of numeric and text values which were there to begin with.

To fill in missing values, I used the median value for numeric columns, and the most common value for categorical columns. I then joined these tables with the training data.

In most cases, the results were aided by first removing “global effects”. I subtracted the overall mean rating, and then estimated effects for users and tracks, with Bayesian priors which reduced these effects towards zero for poorly sampled users and tracks. These effects were then added back in after the model prediction had been made.

Chuck it all in a Random Forest/GBM/Linear Regression

The first thing I tried was an attempt to mimic Ben Hamner’s success in the last hackathon, by throwing everything into a random forest and hoping for the best. It turned out that while this was a pretty good approach, it was also extremely slow. I was only able to run a limited number of trees, with a reduced sample size, so the results probably weren’t as good as they could have been. I also originally ran this without removing global effects, and didn’t have time to go back and do it again.

As variations on the same theme, I tried a GBM and a simple linear regression. These were less successful, but much faster.

Partitioning the data

While a simple linear regression seemed unlikely to be successful, I thought it might be reasonable to try a separate regression for each artist. The results were surprisingly good.

Given that the random forest was so successful, and that the per-artist linear regression was quite good, it seemed like a good idea to try a per-artist random forest approach. This was also good, but not as good as I’d hoped.

Weirdly, the per-artist approach was much more successful than a per-track approach. A linear model on a per-track basis was much worse than the per-artist model – so bad that I didn’t even try a random forest.

SVD

The models I’ve used up to now have been the kind of tools that one would use on a generic machine learning problem. However, the kind of user-item-rating data we’re given here leads to what is called a “collaborative filtering” problem, for which there are a number of specialised techniques available. The most successful approaches in the past have come from matrix factorisation models, the simplest of which is SVD (singular value decomposition).

The particular form of SVD I used is one which was developed for the Netflix Prize competition by Simon Funk, who wrote an excellent blog post about how it works, and its implementation. Oddly, there doesn’t seem to be a standard implementation available in R, so I wrote my own in C and interfaced with that.

I ended up using results from two separate SVD runs: a quick early one, and a longer one with more features. The results were not fantastic, but they contributed greatly to the blend.

Nearest neighbour

Another common approach to collaborative filtering problems is nearest-neighbour filtering. I calculated a distance measure between tracks, based on the correlation between ratings given by people who rated both tracks. To predict new ratings, I looked through the user’s rating history and calculated a weighted average of the “most similar” tracks this user had rated. The results were fairly disappointing — most users didn’t have enough ratings to make this approach viable.

Demographics

As a final attempt at adding some variety to the blend, I tried an approach based purely on the demographic info given. I divided the users into five age quantiles, then for each track calculated the average score for each age quantile and gender. These turned out to be pretty terrible predictors of the actual ratings.

Blending

For the final blend I used a neural network with five hidden units, and skip-layer connections. Given that there were only ten inputs and over a hundred thousand training examples, overfitting was never a huge concern. However, a small weight decay did help with convergence.

Code, etc

The code is now available on GitHub. This is a slightly cleaned-up version of the actual code I wrote during the competition – most of the results I actually generated were done with a lot of tweaking hard-coded parameters and so on. However, the overall results should be pretty similar to what I submitted. As always, questions and comments are welcome, but suggestions for improvement are probably not going to be followed up.


Eurovision visualisation

This is part five of a series of posts describing a predictive model for the Eurovision Song Contest. The full set of posts can be found here.

A brief diversion

I still haven’t gotten around to doing a full assessment of the Eurovision model’s performance on the night, but I did spend an afternoon messing about in D3.js, and I managed to come up with the network graph you see below. This is based on updated values of the friendship matrix, including the 2012 data, so it’s not identical to the graph I showed originally, but it’s quite similar. You can drag the slider underneath to change the threshold from 2 (approximately one standard deviation above average) to 6 (approximately three standard deviations above average).

Countries are represented by ISO 3166-1 alpha-2 codes, but you knew that already. This probably needs a modern browser, but I don’t have any non-modern browsers to test it in, so who knows?