Don’t hash trash.

What is hashing?

For those of you who don’t know, hashing is simply passing data through a formula that produces a result. Hashing is very useful because it’s one-way, you can hash some data and create a hash, but you can’t take a hash and recreate the same data. Some famous hashing algorithms which you may know include: md5, sha1 and bcrypt.

One of the main uses of hashing, in terms of security, is for hashing users passwords. When a user signs up to your website, you need to store their password so that you can log them in next time they give you their password. Storing their passwords in plain text (not hashed), instead of hashing their password and storing that, has security issues. One of which is that anyone who has access to your database (employees, contractors) can see your users passwords! Another issue is that if the worst happens and your database gets leaked by hackers, they then have everyone’s passwords as well.

An explanation of hashing without talking about rainbow tables would be incomplete. A rainbow table is a table that stores values and what they hash into. Using a rainbow table you can “reverse” a hash. Of course you don’t actually reverse the hash at all. You can only find what a hash is equal to if it already exists within your rainbow table.

The table above show a very basic rainbow tab;e of the top 10 common passwords. All of these passwords are terrible and you should never use any of them. If I were to obtain all of the users for a website with their hashed passwords I could use this table to lookup what their real passwords are. If I found a user with password 25d55ad283aa400af464c76d713c07ad, I would know that their real password is 12345678. I could then perform an attack called credential stuffing. This is where I take their username and password and try these credentials on different websites to see if they've reused their password. This type of attack is interesting but a topic for another article.

Finally, let’s talk about salts. A salt is random data added to the data you’re trying hash. This extra data completely changes the resulting hash. This is great for making rainbow tables redundant. This is like reinforcing your users passwords with your own password. Obviously the salt you choose should be unpredictable otherwise there could exist a rainbow table for your salt!

My history with hashing

When I first started programming in PHP, there was a shift from using md5 or sha1 to using crypt. One of the reasons for moving away from md5 and sha1 is that the average computer got faster and faster. Calculating these hashes became quick, even for standard desktop computers. An attacker can generate billions of password hashes per second on a single GPU and create a rainbow table based on your salt very quickly. This would reveal most users passwords, salted or not unless they were super complex.

Later crypt was then replaced with the password_hash and password_verify PHP functions. This move forced the use of salts. The password_hash function generates a hash with a random salt using the latest hashing algorithm. The resulting hash looks like this.

As you can see each resulting hash contains it’s own salt. This means for every user on your site they have their own personal salt. This greatly increases how long it takes to brute force peoples passwords. You now have to brute force passwords on a per password basis. If an attacker cracks one password they don’t find anyone else’s password if it’s the same as other passwords would use a different salt and have a different hash.

The password_hash function and the bcrypt algorithm has been used for a while now. The ability to increase the rounds is very beneficial as computers get faster.

How to hash… badly

Imagine the scenario where you want to create a pin based login system. You don’t have passwords to login to your website or mobile app. When you type your email in, you get emailed a one-time 6 digit pin which you then type on your app to login.

Passwordless authentication seems to be a quite popular move for apps recently. People are bad at making passwords and removing passwords means removing bad passwords. You now don’t have to worry about hashing or storing passwords securely anymore. How could this go wrong?

The problem

During my career I came across some code which generated login pins using a hashing function. There is nothing wrong with this approach as long as what you’re putting into the hashing function is unpredictable. The code I looked at generated the login pin using the users email and the unique identifier for that login request (for example #1028829). An example input for the hashing function would be 1028829hello@jpb.dev and the output 150455.

When I first looked at this function to generate login pins, I didn’t see the issue straight away. To predict the next generated login pin you needed those two components, the users email address who is logging in, and this unique identifier for the login request. Email addresses aren’t very private so this could be predicted quite easily, but the next component, this identifier which could be over a million. How could you predict that?

I initially glanced over this code and came to the conclusion it was secure enough because guessing an auto incrementing number over a million is not possible.

The eureka moment

One night when I was laying in bed struggling to sleep, I was thinking about this generate pin function. I had a eureka moment! You can get 2 out of 3 components. You have the generated pin, and (if you use your own account) you have your email. You can actually make a rainbow table which looks like this…

You then generate this table up to 10 million rows locally. When I did this myself it only took a couple of minutes to generate that many hashes.

Now you go and login to the app with your email hello@jpb.dev. You then get your login code 738853. You now search your rainbow table for this login code and you'll find a match! Right? Unfortunately it's not quite that easy. The login pin 738853 had hundreds of matches in my table and the input used could have been any one of them!

So you do it again (100081). Then you do it a third time (279607) and a fourth (367066) and fifth (120415). Eventually you will start to see a pattern in your local table. You should see your generated login pins clumped together like this.

Well now the impossible is looking more possible! You now know what the current identifier is set to (#1023493), and can predict what the next identifier will be (#1023494).

Now providing you have someone else’s email address, you can now predict what the next generated login pin will be for that user. You now hash 1023494dave@jpb.dev and get 398812 which is their next login pin. This is assuming that no one else has tried to login between now and when you found the last login request identifier.

To ensure the the login pins are generated using login request identifiers close to each other, you can fire two login requests at the same time. One for your account, and one for their account. You will receive your login pin for your account, you can now search your local table for that login pin. You know it must be greater than 1023493hello@jpb.dev but not by much, maybe only a hundred or so? You then find that identifier, increment by one, generate the login pin for Dave and you should be pretty close!

Conclusion

Hashing is great, but only when done correctly. If you need to store user passwords, please find an up to date article explaining the best current practices. What to use changes often and security should be top of everyone’s priority. The last thing you want is to be sending an email to all your customers saying everyone’s passwords have been stolen. Not to mention the fines that will ensue.

If you need to generate something random, typically a hash function isn’t what you should be using. Furthermore if you’re generating random login pins you need to be super careful that it can’t be predicted! Even functions like rand in PHP can be predicted, ensure you're using a cryptographically secure random generator.

The above security issue was mainly introduced due to the fact that the authentication was custom built and the person who built it was not as experience as they could have been. When you custom build your authentication flows, you take on all of the security responsibility yourself. For small teams it might be worth considering services like AWS Cognito or Auth0 which take care of all of this.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store