The Statistics of Deduction, or, Sherlock Is a Computational Linguist

This isn’t a fandom blog by any means, but I like books and movies and TV as much as the next guy, and I’ve been kind of quietly excited about Season 3 of Sherlock that premiered today.

Anyway, while watching the live-stream, this frame caught my eye.

*perks up*  Mary Morstan is a linguist?

Wait, how could Sherlock possibly know that just from looking at her?

He could because Sherlock is a computational linguist—or rather, Sherlock is something a computational linguist would create.

Much of the heavy lifting of computational linguistics these days revolves around classifiers: methods of identifying which of a set of categories a new observation belongs to by comparing the new observation to a set of observations whose categories are already known.

Given a category C with a set of features X, and an object O with a set of features X’, what is the probability that O belongs in the category C?

This is basically how any computer does deductive reasoning, and Sherlock Holmes just takes this trope and turns it up to eleven.

So how could he tell that Mary is a linguist just by looking at her?

To answer this, we must first ask the question: what does a linguist look like?

Contrary to popular opinion, most of us don’t walk around with “Kiss me, I’m a linguist” cards pasted to our foreheads, so that’s not it.  We assume instead that there must be a set of subtle visual cues that someone like Sherlock can pick up on to determine someone’s occupation.  So let’s throw out some dummy data.  Let’s assume:

  • 80% of female linguists wear dangly earrings
  • 60% of female linguists wear faux-fur-trimmed coats
  • 75% of female linguists have hair shorter than shoulder-length

(I don’t actually remember what she was wearing in the scene, but just pretend I’m right).  Also:

  • 1% of all women are linguists (they’re not, but these are really bad features, so the numbers aren’t realistic)

A category must always be defined in opposition to another category, so let’s see some more data:

  • 10% of female butchers wear dangly earrings
  • 50% of female butchers wear faux-fur-trimmed coats
  • 95% of female butchers have hair shorter than shoulder-length
  • and 5% of all women are butchers (again, unrealistic numbers)

Now we look at the algorithm:

P(C|F1,F2,F3) = [P© x P(F1,F2,F3|C)] / P(F1,F2,F3)

This is Bayes’ Theorem.  In plain English it can be read as posterior = (prior x likelihood) / evidence.  For our example here, it can be read as (put on your Sherlock voice):

The probability that Mary, who is wearing dangly earrings and a faux-fur-trimmed coat and has shorter-than-shoulder-length hair, is a linguist (as opposed to a butcher), is equal to the probability that any given woman is a linguist (instead of a butcher) times the probability that a women is a linguist (and not a butcher) if she has all three of those same traits divided by the probability that a woman has all three of those traits, regardless of occupation.

The evidence, or probability that a woman has all three relevant traits, has to be normalized over all categories, and is done as follows since we only have two categories here:

P(F1,F2,F3) = [P© x P(F1|C) x P(F2|C) x P(F3|C)] + [P(C’) x P(F1|C’) x P(F2|C’) x P(F3|C’)]

Plug in the numbers from above:

P(Mary is a linguist given her appearance) = [.01 x (.8x.65x.75)] / (.01x.8x.65x.75 + .05x.1x.5x.95) = [.01 x (.8x.65x.75)] / (.006275) ≈ 0.6215 = 62.15%

P(Mary is a butcher given her appearance) = [.05 x (.1x.5x.95)] / (.006275) ≈ .3785 = 37.85%

Given what Sherlock knows at this point, there’s a 62% chance of Mary being a linguist and a 38% chance of her being a butcher.

So Sherlock can see that Mary is more likely a linguist than a butcher, assuming, of course, that his only choices are “linguist” and “butcher.”

And that’s basically how he’d do it if he were a kind of stupid computer.

The classifier demonstrated is called a Naive Bayes classifier.  It’s by the far the easiest to demonstrate in a Tumblr post, but also far from the best algorithm.  There are plenty of others.

The catch is that you have to train these algorithms on HUGE amounts of data, on the order of billions of data points in order to build a decent generalist classifier with actually good discriminative features, but it’s been canonically established that Sherlock has a huge fact bank in his brain, so that’s not really a problem.

Basically, Sherlock apparently has some sweet classifier algorithms downloaded into his head, and that plus a enormous bank of facts allows for his “magical” deductive capability.