TenMinuteTutor

Programming tutorials

Applications of hashes

Hash functions have many applications, some of which are described below.

Message Integrity

One of the simplest applications is to verify that the content of a message has not changed. Due to the characteristics of a cryptographic hash, if two messages have the same hash value you assume them to be identical with a very high degree of certainty.

The benefit of the hash code is that it is much shorter than the file. While a large file has to be stored on disk or transmitted over a network (where it might be vulnerable to alteration), a hash code can be written down on a piece of paper (in your own handwriting), read out to someone over the phone, published on a website or in a personal ad in the Times, or even memorised if you try hard enough. There are many alternative ways of storing or communicating a hash which are less vulnerable to hacking than a computer file.

For example, consider a free software library or application, distributed via the Internet. It is impossible to know which servers it might end up on, or how it might get passed between people or organisations. For all anyone knows, a copy of the binary version might become infected with a virus, or someone might distribute a bogus version which performs some malicious activity.

One way to protect against this would be to publish the hash value of the binary, on a secure website, so that anyone obtaining the binary copy can verify its integrity by checking its hash value. Of course, it is crucial that the hash value can be obtained from a trusted source (eg the official library website). It would be pointless to distribute the hash with the binary, because a hacker could simply alter the library and alter the hash to match.

Message Signing

Digital signatures are discussed elsewhere in detail. One characteristic of most digital signatures is that they work best on relatively short messages. If I wanted to digitally sign this tutorial, I wouldn’t try to apply a digital signature to the whole tutorial. I would find the hash value of the zipped source files and sign the hash value.

If anyone subsequently altered the tutorial, they could not forge my signatuure of the new hash (because they don’t know my private signing key). Any reader who wished to check could easily discover that I had not signed the modified tutorial and would therefore be suspicious of it.

One-way Password Files

Many systems use passwords to allow users to authenticate their identity as they log on. The system must store every user’s password in some way, but this can be a problem. If you store the data in a plaintext file, even if you take the precaution of making the file read and write protected, someone with sufficient privileges would be able to access every user’s password, which is clearly undesirable.

One particular problem is that many people use the same password on different system, or else they use related passwords. So on one level you might say it doesn’t matter that your system administrator at work can see your password (he has complete control of the system anyway), if it helps him guess your internet banking password then it might not be so good.

Encrypting the passwords doesn’t help too much either. The encryption uses a key, and that key is either stored somewhere on the system or hardcoded into the program which stores the passwords. However it is done, it would be difficult to avoid the possibility of a privileged user discovering the key and consequently reading everyone’s passwords.

A more secure method is to store the hash value of each user’s password. It is impossible to recover someone’s password from the hash value, but it is possible to verify someone’s password against the hash value. Simply calculate the hash value of the password they typed in, and compare that with the stored password hash.

Even this method is not secure against a dictionary attack. An attacker could create a dictionary containing the hash values of every common password, and then search the password files for matches (assuming he had sufficient privileges on the system to access the password files). A first line of defence against this type of attack is to prevent users from choosing weak, easily guessable passwords. Other techniques discussed in the chapter on Key Derivation can also be applied.

Key Derivation

Most algorithms which permit user selected keys (such as symmetric encryption and MACs) require a binary key, typically 128 bits or more. This equates to a hexadecimal string of at least 32 characters. Most of us would struggle to remember such a key, or indeed to type it in accurately.

Generally most of us prefer using a password rather than a long binary key. Converting a password of arbitrary length into a fixed size binary key is easily accomplished using a hash function. You simply treat the password as the message, and the has as the binary key. It is possible to truncate the key (eg just take the first 64 bits if that is all that is required) to derive a shorter key, or to choose a different algorithm to create a longer key (SHA-512 produce a key of up to 512 bits).

In fact, hashing alone doesn’t necessarily improve security. To take an extreme example, suppose your password is actually a 4 digit number (eg a PIN, quite a weak form of password) represented as a 4 character string. When you hash the string, you obtain what appears to be a random 128 bit key. However the central weakness remains - there are only 10,000 possible numbers, and so there will only be 10,000 possible 128 bit keys, and the hash does nothing to prevent a brute force attack.

What hashing does allow you to do is to use much longer strings (a phrase, sentence or an entire paragraph if the data is important enough), which are resistant to brute force attacks because the number of combinations is so large. It is also used in more sophisticated key derivation algorithms discussed later.

Random number generation

A cryptographically strong random number generator essentially produces a stream of random bits (which can be assembled into random numbers of any required size). It should have the property that it is impossible to predict the value of the next bit with better than a 50% chance of getting it right.

Many real life sources of random data are not quite as random as they really need to be. They often contain biases (for example, a source of random bits might produce more 1’s than 0’s). A hash function can mask small biases simply by the complex way it jumbles data, and it can also increase the degree of randomness by compressing a number of semi-random bits into a smaller number of truly random bits. This is discussed elsewhere.