Comparison of encoding schemes

By Martin McBride, 2017-04-09
Tags: binary encoding
Categories: binary encoding data formats

Of the main four binary encoding schemes, Hex and Base64 are the most commonly used. Base32 is less common, and ASCII85 is only really used in the context of PostScript and PDF.

Hex encoding has several major plus points. It is very easy to understand and implement. Each byte is encoded as a separate character pair. Looking at hex-encoded data in a text editor is exactly like looking at binary data in a hex editor. You can search it, edit it, and if you have spent enough time working with hex files you might even be able to read it, decoding it in your head as you go along. There is a major downside to these advantages. It is the most inefficient scheme. In fact, it increases the size of the data by 100%.

Base64 uses a larger character set to achieve a more efficient encoding. It is not intended to be in any way human-readable, but it is designed to be compatible with as many systems as possible. The 64 characters are compatible with normal ASCII, as well as older variants of ASCII, and EBCDIC (a predecessor of ASCII). The algorithm is more complex than hex encoding, but the data size is only increased by 33%.

Base32 uses a more restricted character set than Base64, and is therefore less efficient. The encoding only uses upper case letters and some numerals. Numbers 0 and 1 are excluded to avoid confusion with letters. Base32 can also be used in place of Base64 if there is a danger that the case of letters might be altered. Data size is increased by 60%

Base32 falls somewhere between Hex and Base64. It lacks the simplicity of Hex, but it is less efficient than Base64. In most cases, if you are opting for a slightly more complex algorithm you might as well go for Base64 which produces smaller data.

There is one area where Base32 is quite useful. If you need a user to manually enter a binary key, for example, a product activation code, Base32 is worth considering. It is case-insensitive and uses only letters and numerals - for manual entry, this is less confusing than Base64 (which is case-sensitive and uses punctuation symbols), but more compact than Hex.

ASCII85 is the most efficient coding system – data size increases by just 20%. It has a couple of minor disadvantages. It uses a larger character set, so it is only compatible with ASCII (unlike Base64, which supports various close relatives of ASCII). It is also slightly more demanding computationally since it uses division rather than bit shifting. However, these factors are becoming increasingly irrelevant in the context of modern computer systems. The main reason that Base64 continues to be used more than ASCII85 is probably the simple fact that it has been around for longer.

See also

Sign up to the Creative Coding Newletter

Join my newsletter to receive occasional emails when new content is added, using the form below:

Popular tags

555 timer abstract data type abstraction addition algorithm and gate array ascii ascii85 base32 base64 battery binary binary encoding binary search bit block cipher block padding byte canvas colour coming soon computer music condition cryptographic attacks cryptography decomposition decryption deduplication dictionary attack encryption file server flash memory hard drive hashing hexadecimal hmac html image insertion sort ip address key derivation lamp linear search list mac mac address mesh network message authentication code music nand gate network storage none nor gate not gate op-amp or gate pixel private key python quantisation queue raid ram relational operator resources rgb rom search sort sound synthesis ssd star network supercollider svg switch symmetric encryption truth table turtle graphics yenc