TenMinuteTutor

Coding, maths and art

yEnc encoding

yEncode (yEnc) is a binary encoding format designed for sending binary data via email and newsgroups. It is different to other methods (such as Base64 encoding) in that yEncoding produces binary data, modified so that certain “critical” byte values are substituted.

The key to yEnc is to recognise that most modern news and email systems can transfer 8 bit binary data, except for few problems with certain specific byte values. A null (hex 0x00) can be mis-interpreted as end of data, and can cause problems on some systems. Similarly hex values 0x0A and 0x0D can be mis-interpreted as end of line characters, and may sometimes be automatically substituted in certain systems. The main purpose of yEnc is to ensure that these byte values never appear is the encoded data.

Key Characteristics

yEnc is not a true binary encoding method - it does not produce printable ASCII data. It is not suitable for ASCII only networks or protocols. It is not suitable for user entry of binary keys, nor for including binary strings in filenames or URLs.

yEnc is useful in specific situations such as email and newsgroups, where the main requirement is to escape problem characters such as null or CR, LF. Encoded data size is data dependent, but in most cases it is very efficient, typically increasing data size by only 1 or 2%. Worst case data size increases by about 100%, but this is unlikely to happen in practice.

It is worth noting that this algorithm has not been adopted by any official standards organisation. It is supported by a number of web browsers and email clients.

Encoding

The encoding algorithm operates as follows:

Escape character is “=”, hex 0x3D.

Critical characters are 0x00, 0x0A, 0x0D and 0x3D. 0x3d is included because it is the escape character, and therefore cannot appear “as itself” in the encoded data.

n is the output line length, typically 128 is used.

  1. Get a byte from the input stream.

  2. Add 42 (decimal) to the byte, modulo 256.

  3. If the result is a critical character, output an escape character and increment the input byte by another 64 (modulo 256).

  4. Output the byte.

This is repeated until all the input bytes are used up. To ensure that the data can be transmitted by most standard protocols, a CRLF pair should be inserted every n output bytes. However, if a line ends with a critical character, then the 2 byte escape sequence should be output on the same line. This means that, in that case only, the line is permitted to be n+1 bytes long.

If you are curious about the step of adding 42 to each byte of the input data, the reason is quite simple. Binary data often contains a disproportionate number of zero bytes, which would all need to be escaped, doubling their size. Adding 42 to every byte removes this source of inefficiency. Almost any value could have been chosen, presumably 42 was chosen for the obvious reason.

Header and Trailer

The binary data is immediately preceeded by the header line:

=ybegin line=128 size=123456 name=mybinary.dat

The 3 parameters must all be present, and the name must be the final parameter. These parameters indicate the typical line length (ie, “n”), the number of bytes in the unencoded data, and the name of the original binary file. The following trailer must follow immediately after the data:

=yend size=123456 crc32=abcdef12

Size must have the same value as the header. The crc32 value is optional, but if present it must contain the 32 bit CRC of the original data. One thing which is not made entirely explicit in the yEnc specification is that the CRC value is an 8 digit HEX number, all other parameters are decimal.

Multipart Encodings

yEnc supports multipart encoded binaries. It also contains recommendations for subject line conventions when posting multipart encoded binaries to newsgroups. This isn’t described here, refer to the yEnc website for more details.