Sunday, February 19, 2012

Emergent Similarity

Computers and humans seem to have a different notion of similarity. Here are a couple of examples that appeared almost simultaneously on Twitter.

@socialtechno "Twitter told me @dansabbagh is 'similar to' @rupertmurdoch. Computers do not make skilful editors."

@markhillary "Got my Spotify tuned to music similar to the Pogues and Meat Loaf just came on..?"

When statistical analysis throws up a surprising similarity or juxtaposition, we humans naturally wish to explain this by finding something in common. It is quite possible that there is a common characteristic that nobody has previously noticed, and statistical analysis then leads us to a new way of classifying things: scientific progress has sometimes taken this route. But statistical analysis is also perfectly capable of throwing up meaningless similarities - or what we humans with our limited intelligence are unable to find meaningful.

However, computers usually infer similarity between two items not by their intrinsic characteristics (affinity clustering) but by their relationship to external activities (interaction clustering). People who liked A also liked B. People who searched for A also searched for B. We just need to get used to the fact that this notion of "similarity" is not the same as ours.

