TenMinuteTutor

Programming tutorials

Cryptographic hashes

A hash function takes a message of any length, and creates a fixed size hash value which corresponds to the message. A hash is rather like a checksum, but it uses a more sophisticated algorithm and generally produces a slightly longer result. There are many algorithms, and they typically produce hash codes of length 64 to 512 bits.

For example, consider this message

The quick brown fox jumps over a lazy dog

If we calculate its hash value using the MD5 algorith, the result is

30DED807D65EE0370FC6D73D6AB55A95

as a hex number. MD5 is a well known hash algorithm which produces a 128 bit hash value. Note that MD5 is no longer considered totally secure, although it is still in common use. Now consider a very slightly different message:

The slick brown fox jumps over a lazy dog

The MD5 hash value

D35D4ACAFBEB409BCE78BE5794E6B2C6

As you can see, a tiny change in the message results in a totally different hash value.

You can calculate a hash value for any message, from a single byte to an extremely long file. The hash value is always the same size - 128 bits in the case of MD5.

An interesting and very useful feature of a hash, we have seen, is that if you make any change to the message you will create a totally different hash code. This doesn’t only apply to short messages, it applies to large messages too. Even if you make a tiny change to a very large file, it will result in a big change to the hash value.

For example, consider the MD5 hash value of a full length DVD movie file, containing around a billion pixels. If you were to change the color of just one pixel, anywhere, in any frame, the hash value of the new file would be totally different. If you change a different pixel instead, you would get yet another totally different hash value. The hash is highly sensitive to every single byte (indeed, every single bit) of the message. In some way that short hash value represents the entire multi-gigabyte file.

This property of a hash leads to various alternative names. A hash is somtimes called a message digest, a fingerprint or a signature. The term signature is best avoided, because the field of crytography also includes digital signatures, which are something quite different. We will use the term hash.