Who is the best soccer player?

In this article we use data science to answer the question: Who is the best soccer player? We use the TrueSkill ranking algorithm developed by Microsoft Research and used in the Xbox matchmaking system. We use data from the UEFA Champions League competition.

read more


I like to go to sleep with the radio on, but i find it extremelly annoying when a commercial wakes me up. Fortunatelly, machine learning can help us having a better rest. A trained algorithm can detect and remove those ads. Now i have published the software so that everybody can use it. have a very good night! [work in progress]

read more

FFracer: from Bergen to Oslo

FFracer is an independent racing video game. It has been created using footage from the Bergensbanen documentary showing every minute of the scenic train ride between Bergen on the Norwegian west coast, crossing the mountains to the capital of Oslo".

The game's concept was to take a popular documentary of Norway's picturesque landscape and let people experience it in an interactive way. Players can coast through the Scandinavian forests, snow-clad mountainside, and glimpse at the colorful cabins with music and accelerating speed.

The game includes 17 tracks that correspond to the 17 train stops between Bergen and Oslo. The player must complete each track before the time runs out by avoiding crashes and going at full speed. There is no maximum speed -- the train accelerates as long as the player makes turns correctly.

read more

World Battleground, 1000 years of war in 5 minutes

This animation shows all important battles that took place over the last ten centuries. The sizes of the explosions and labels are proportional to the number of casualties. The music is "Ride Of The Valkyries" by Richard Wagner. The data comes from the wikipedia article, List of Battles. It has over 3.5 Million views as of Jul 4 2015.

read more


btcat is a command line tool that downloads a file using the bittorrent protocol and outputs its contents to the standard output. btcat streams the data sequentially, which allows processing the file in a pipeline before the whole transfer has been completed. It is possible, for instance, to reproduce a media file while it's still downloading.

read more

The Mopping Puzzle

The Mopping Puzzle is a game programmed in javascript that runs in the browser. The objective is to clean the floor without stepping on the parts that are already clean. Sounds easy? Give it a try!

read more

Designing new borders to optimize happiness in the population of a country

In some sovereign states there is a hight degree of heterogeneity within the population across different territories. A great source to empirically observe this phenomenon are the election results. In this article we analyse the Spanish general elections results for 2011 to detect the different communities within the Spainsh state. We continue describing a model in which we measure the global happiness of the population with the results of the elections. The final section of the article shows an algorithm that is able to optimize the global happiness by more than 12% by creating a few new borders.

read more

Elo Ranking para la liga española de fútbol

This is an analysis of the strenght of the soccer teams of the Spanish league. We use the same technique that its used for chess to rate the grandmasters. (in Spanish)

read more

Language similarity measure

In this study we calculate the similarity between any two given languages that can go from 0 to 100. We show the similarity matrix that contain all similarities of all possible pairs of languages of our sample.

read more

Magic squares

This is a magic trick I invented many years ago. I got the idea while studying the binary numeral system in college. Watch the video to see how it works.

read more

Xavier Sala-i-Martin income distribution visualization

In this visualization we show the income distribution for each country of the world over time. It is easy to observe how the distribution shifts to the right as time goes on, which means that there is less poor people on the planet.

read more

The Music Planetarium

This visualization shows the top 1000 music artists by popularity and their relationships. The arrows represent the similarity between two artists. The data comes from the website last.fm. Click on the image to see a larger version where you can zoom in (its a big file: 11M)

read more

The Wikipedia Planetarium

This visualization shows a sample of the wikipedia articles and their relations. The sample is extracted from this list: List_of_articles_every_Wikipedia_should_have. We create a link from article A to article B if there are at least 3 references to article B in article A. We also delete from the graph articles having more than 15 links (91 articles) as they just clutter the final result. All articles that remain isolated are also removed. The number of articles shown in the visualization is 729. Click on the image to see a larger version where you can zoom in (its a big file: 8M)

read more


OpenPajek is a network analysis software that aims to be an open source replacement to proprietary software such as Pajek or UCINET. Its core is igraph, a robust and efficient C library for analyzing networks It can be used to help understand how micro behavior of the individual nodes yields to important changes in the global structure of the network.

It was created to be used at the class An introduction to social networks for Management PhD students at Columbia Business School.

read more

Peer to speaker

Peer to speaker is an open p2p application that enables you to listen to music on demand. No more need to mantain an mp3 collection, your collection now is virtually all existent music, and you can access it from any computer connected to internet.

read more
I'm a data scientist and my main subjects of interest are Artificial Intelligence, Machine learning and Complex Networks. I also invest some of my spare time working on personal projects that will be published on this website.

Bachelor thesis: Small worlds of economic networks

Master thesis: New clustering proposals for Twitter based on the Louvain algorithm