MD5, SHA1, SHA256 and SHA512 hex digests in Erlang

jwatte's picture

Orignal post 2011-06-10:
In various web APIs, there is some confusion between the representation of a hash value.
There exists APIs where a password is validated as, say, MD5(challenge + MD5(salt + password)).
Let's leave aside the fact that MD5 is not a secure algorithm anymore (you can procedurally generate an input that generates any MD5 hash value you want in cheap-to-compute time).
However, there are still problems with this specification. An MD5 checksum can be represented as a sequence of 16 binary bytes, or it can be represented as a sequence of 32 hexadecimal digits. And, in the latter case, those digits can be upper or lower case (although lower case generally prevails).

Having to interface with one of the hex-digit varieties, from code that runs in the Erlang programming language, is somewhat annoying, because the Erlang built-in MD5 generates binary bytes only. By contrast, the Python MD5 API generates both, and the PHP MD5 API generates hex by default but in modern versions has an option to generate binary. Googling for this will quickly show a couple of blog posts that convert from binary to hex. However, they do it the hard way, writing their own "hex digit" conversion functions, and generally making life difficult. Turns out, Erlang is a lot more elegant than those examples. I started out with this function to convert a binary value to a hex string in Erlang:

hexstring(Binary) when is_binary(Binary) ->
        fun(X) -> io_lib:format("~2.16.0b", [X]) end, 

That's it. One line of code (broken into 4 for readability). You might even write it where you need it, instead of putting it in a function (although having it in a function is convenient). It's also totally functional in its expression, not requiring any non-standard dependencies.

The trick to this expression is the io_lib:format format string for integers. io_lib:format() is similar to sprintf() in C/C++, except it's type safe, and uses a different sytax. Each format term is introduced through a tilde. There are three (mostly optional) fields: width, precision and pad, followed by the conversion character. "B" means "uppercase integer of given base," and "b" means "lowercase integer of given base." The width in this case is 2 characters, the precision is 16, which is interpreted as base for this conversion, and the padding character is 0. If it didn't pad by 0, then single-digit values would be default padded to 2 width by a space, which is not what we want.

But, then I started thinking: Erlang has a wonderful bigint library, that will go as large as you want it to. The integer format conversion works for those, too. So, behold an even better version:

hexstring(<<X:128/big-unsigned-integer>>) ->
    lists:flatten(io_lib:format("~32.16.0b", [X])).

This uses binary matching syntax (a really cool part of Erlang designed to ease implementation of network protocols). The input argument is exactly a binary value of 128 bits, interpreted as an unsigned big-endian integer. It's then formatted as a 32-digit, 0-padded, lowercase, base-16 (hexadecimal) number. Done!

Functional Programming with Standard Libraries: 1
Crappy Web Blogs: 0

Update 2011-03-16: if you also are working with hashes of SHA1 or SHA-256 or SHA-512, then you can extend the function as appropriate:

hexstring(<<X:128/big-unsigned-integer>>) ->
    lists:flatten(io_lib:format("~32.16.0b", [X]));
hexstring(<<X:160/big-unsigned-integer>>) ->
    lists:flatten(io_lib:format("~40.16.0b", [X]));
hexstring(<<X:256/big-unsigned-integer>>) ->
    lists:flatten(io_lib:format("~64.16.0b", [X]));
hexstring(<<X:512/big-unsigned-integer>>) ->
    lists:flatten(io_lib:format("~128.16.0b", [X])).

Yay matching!

(Extending it to an arbitrary precision is also possible and left as an exercise to the reader, although I prefer to be explicit about the data I accept, so as to catch unexpected data early!)