User authentication is a big, big deal. It seems like every other day there’s some data breach at an internet giant.
I wanted to talk in this blog post a little bit about how password storage and authentication actually works on the Internet. It’s an interesting topic, and one that I didn’t know much about before starting at App Academy.
How Password Storage Actually Works
Most people think that when they log into their favorite website, it goes something like this:
- They send the server their password
- Server receives password and compares it to their recorded password
- If it’s a match, the server lets you in
That’s not at all what happens on any remotely competent website. You might guess: are the passwords encrypted then maybe, and then later decrypted when you check the passwords? I used to think this, but that’s wrong as well.
The passwords are actually hashed. What does it mean for a password to be hashed? It means that it’s run through a one-way, irreversible filter which can turn any piece of data into a random-looking string.
That’s essentially what a hash function is. For any input data, it produces a random-looking string. But two important points. 1) A good hashing function will never produce the same random string for two different inputs, and 2) it’ll produce completely different strings if the input is changed even slightly. So the hashed version of “password″ might be “aslmdp20824” and the hashed version of “passworD” might be “64t09u02g.”
Note: this is not a code. It’s not transforming each letter by shifting it a certain amount or anything like that. A code is reversible; by definition, if you know how a code works, you can decode it. A hashed password is irreversible. The output is effectively random. “password″ will always produce “aslmdp20824″ if you run it through this function, but there’s no realistic way to get from the hashed password back to the original.
This is important. In our database, we’re never actually storing any passwords, nor are we storing encrypted passwords (i.e., password “codes”). We’re storing hashed passwords. That means there’s no way to get from a hashed password back to the original password.
So how do we check if the typed password matches the stored one? When someone sends our website their password, we hash the password they give us through that exact same hashing function, and then compare that output hash to the hash that we saved in our database. Remember, hash functions always produce the same output for the same input. So if those hashes match, we let them in.
Their real password isn’t saved anywhere on the database, it’s simply used in the business logic of our app and then immediately discarded. And here’s why this matters:
Let’s say our database gets stolen.
A team of tireless Chinese hackers have exploited a vulnerability in our hardware and stolen a copy of our database. As we speak, they’re hard at work on trying to break the passwords of all of our users so that they can wreak havoc.
In our mind’s eye, we imagine the moment when they finally crack our database encryption. There are “oohs” and “ahhs” (or their Chinese equivalent) as they crowd around a dimly-lit screen and stare at a giant table. A giant table of… hashed passwords.
What can they do with hashed passwords?
Not a lot. Remember, good hashing algorithms are computationally impossible to reverse. This means that if a team of hackers can’t go from a hashed password back to the original password, they’re forced to go the other way. They have to guess a password, hash it, and see if the hashed output matches with what’s in the database. With a good (i.e., computationally expensive) hashing algorithm, this will take them a lot of time, a lot of guesses, and hundreds of thousands of dollars in computing power to crack even a moderate number of passwords.
Or, at least, it should.
The problem is that it won’t. At least, not if that’s the only precaution we’ve taken. In fact, within minutes of opening our hashed database, this team of Chinese hackers will have already broken hundreds of passwords. Our security will have failed.
Rainbow Tables are why.
I thought we hashed all of our passwords? Well, we did. The problem is not with our hashing algorithms, but with our users. Most of our users’ passwords are among the most common passwords. Hundreds of them use passwords like “qwerty123″ or “12345678.″ In fact, 91% of users use passwords that are within the 1000 most common passwords, and 99.8% of users use ones within the 10,000 most common (source).
Much like us, our users are incompetent and can’t be trusted to get anything right.
So here’s the thing: if these hackers know that many people use passwords like “qwerty123,” if they were smart, they and their hacker buddies would join forces and compute a table of the hashes of the most common passwords in advance. Then anytime anyone cracks a database, the hackers can just check for hash values that show up anywhere in this giant pre-computed table, and then they instantly know what commonly-used password is being used in that database.
But where would a rogue team of Chinese hackers find something as amazing as that? Answer: by Googling it. They’re available on the Internet for free, and are fairly well-maintained at that. They’re known as rainbow tables, and they are the bane of poorly secured websites. And right now, that’s ours. If anyone stole our database, their rainbow tables would grind us into the dust.
Here’s where salting comes in.
Salting is simply taking an (ideally random) arbitrary string, and appending it to each password before running it through our hashing function. So let’s say that our simplistic salt is to generate a number based on the seconds hand of the clock when our user signs up (0 - 60). So if you sign up at 2:52:42 PM, your salt is 42. What we do is we add that 42 to your password, and then we run your (salt + password) through our hashing algorithm.
Remember, slightly different inputs create totally different outputs. That’s what saves us here. Your “42qwerty123″ will produce a completely different hash from “qwerty123.” And your dumb buddy who also uses “qwerty123″ will now be “16qwerty123,” which also produces a completely different hash.
The salt is still stored in the database. So when our hackers get access to the database, won’t they see the salts and just break our passwords then?
No, because the rainbow table is what gave our hackers the ability to break a bunch of passwords. And on that rainbow table, they don’t have entries for “12qwerty123″ and “45qwerty123,” or all of the other permutations of all the other most popular passwords. (And of course, in a real database, we’d have a much more complex salt than a two-digit number.)
So their rainbow tables are useless. They’re back to guessing passwords, adding them to the salt, and trying to hash them to see if it matches what’s saved in the database. So for each user, they’d take the salt, and go down the list of most common passwords, adding the salt to the password to try to see if it matches.
And then, even if they’re finally able to crack one, they have to do the same thing for every single password in the database, even if the users’ passwords were exactly the same.
This is what’s known in expert circles as “winning.”
And that, in a nutshell, is how modern password authentication works. TL;DR: People are dumb.
Hope you learned something!