What is Base 64 Encoded Data?
Put simply, base64 encoded data is a string of character that contains only a-z, A-Z, 0-9, + and / characters and is often used in situations when sending non-text information via a text only transmission protocol.
Why use Base 64 Encoding?
The transport system of e-mail protocol is designed for sending plain ASCII text. The plain ASCII text includes a-z, A-Z, 0-9, ", %, &, ', (, ), *, +, -, _ and , (the comma). They will be referred to as the universal ASCII character set. There are other ASCII characters, but these are not universal. If an e-mail is sent using non-universal character (for example the £ symbol) the receiver of the e-mail may see a completely different character on their screen or printer. The situation is ever worse when trying to send e-mails in non English languages. Often these languages will have characters that are not part of the ASCII set. What is needed is a method to convert non-universal ASCII characters to a common form. This common form is called Base64 encoding.
What does Base 64 Encoding Do?
Base 64 Encoding takes a stream of characters and converts them to characters that belong to the universal ASCII character set. Once a stream of characters has been converted to characters that belong to the universal ASCII character set (Base 64 encoded) they can be transported with ease over the Internet using the e-mail protocols.
How does Base 64 Encoding Work?
ASCII characters belong to the 256 byte character space and includes the usual alpha, numeric, special characters (e.g. %), control characters (e.g. CR, LF, BELL) and other characters. Each character is 8 bits in size. For some languages the ASCII character set cannot be used (e.g. oriental pictographic languages). These use a UNICODE character set where each characters needs 16 bits to represent it.
The number of bits per character is not a problem for Base 64 Encoding. Base 64 Encoding takes a stream of bits and converts them to 8 bit characters that belong to the universal ASCII character set. Base 64 Encoding does not care about how many bits (8 or 16) are necessary to make a character as it works at the bit level. Once a stream of bits have been converted to characters that belong to the universal ASCII character set (Base 64 encoded) they can be transported with ease over the Internet using the e-mail protocols.
Base64 Encoding takes three bytes of character data (3 ASCII characters or 1½ UNICODE character) and produces four bytes of data from the universal character set. That is, Base64 Encoding takes 24 bits of input data and converts it to 32 bits of encoded data.
Base64 Encoding does not care what the data is. If the data contains line feeds, nulls or special characters, Base64 Encoding does not care - it simply sees it as a sting of bytes.
Encoding "Mary had" to Base 64
In this example we are using a simple text string ("Mary had") but the principle holds no matter what the data is (e.g. graphics file). To convert each 24 bits of input data to 32 bits of output, Base 64 encoding splits the 24 bits into 4 chunks of 6 bits. The first problem we notice is that "Mary had" is not a multiple of 3 bytes - it is 8 bytes long. Because of this, the last group of bits is only 4 bits long. To remedy this we add two extra bits of '0' and remember this fact by putting a '=' at the end. If the text string to be converted to Base 64 was 7 bytes long, the last group would have had 2 bits. In this case we would have added four extra bits of '0' and remember this fact by putting '==' at the end.
Each of these groups is then converted to a decimal number from 0 to 63. Base 64 gets its name from the fact that there are 64 states for each element. This is the same as decimal being called Base 10 are there are 10 states (0-9) for each element.
And finally, the decimal number is used as an index into the universal ASCII character set of
In our example:
Decoding "TWFyeSBoYWQ=" from Base 64 to ASCII
The first thing to note is the '=' at the end of the Base 64 encoded string. A Base 64 encoded string will have zero, one or two '='s at the end. As '=' is not part of the Base 64 encoding, it can only ever appear at the end and has a special meaning.
The stages to convert the Base 64 encoded data to ASCII is: