This article will cover the mechanisms of Base64 encoding. If you’re into cybersecurity or programming then you might have come across Base64. These days it’s used in a huge number of applications for easy data transmission, encoding, etc. We see a bunch of random letters with equals sign at the end and the next moment we try to decode it. Ever thought how those bunch of letters store data?

Contents


Base64? What is it?

Base64 encoding was defined in RFC4648 along with Base16 and Base32. Base64 is a binary-to-text encoding scheme which is very common on the Internet these days. It is used for easy data transmission, encoding, to embed images in HTML, CSS via Data URIs.

It simply represents binary data of 8-bit bytes which is converted to 6-bit bytes for the base64 conversion.


Common Uses of Base64

Base64 is being used A LOT these days, here are some of the common uses of base64:

  • JSON Web Token uses Base64 encoding algorithm.
  • Maintaining data integrity during transfers.
  • Obfuscating data (security by obscurity)
  • Embedding files inside static HTML, CSS code to prevent dependancy from external sources.

Getting Technical

Base64 (MIME Implementation) uses the following:

  • 26 uppercase letters
  • 26 lowercase letters
  • 0 to 9 numbers
  • \
  • +
  • .
  • = (used for padding)

Base64 is the name of this encoding scheme because it consists of 64 main characters but there is an extra character “=” which is for padding so we can say that 65 characters can be used in this encoding.

Let’s take a look at the Base64 index below:

index_table

Each of the letter corresponds to its 6-bit binary representation which has an index (decimal value) to it.

Encoding Process

In the encoding process, the algorithm uses the 8-bit binary stream and processes it from left to right and re-arranges it as 24-bit binary stream and are used as concatenated 6-bit chunks of binary values.

The result from the above steps can be used to match the numbers in the index table to obtain single letter values and finally obtaining the encoded result value of the initial input.


Manual Encoding

From the above part of the article, you must have understood the steps and the flow of the encoding/decoding process and if not, then you will get practical understanding of it in this part of the article.

Let’s take an ASCII value: Hacked

We will try encoding our ASCII value Hacked into base64 (manually) to get better understanding of the process.

  • Step 1: 8-bit Binary Representation

The binary representation of our value is 01001000 01100001 01100011 01101011 01100101 01100100.

  • Step 2: Re-arranging into 24-bit Binary Representation

Make sure to do this from left to right. The value after re-arranging is 010010 000110 000101 100011 011010 110110 010101 100100.

  • Step 3: Match the 6-bit chunks from the index table to obtain ASCII value

You have to do this chunk-by-chunk and obtain single letters each time.

010010: S
000110: G
000101: F
100011: j
011010: a
110110: 2
010101: V
100100: k

Concatenating all the obtained letters, we get SGFja2Vk. We manually encoded a string into Base64! If you did it right, give yourselves a pat on the back.

To confirm it, you can use a tool to check if you encoded it properly and if are missing any letters, values. Decode your obtained value for the word you encoded and see if that is exactly the word you tried to encode. We will also take a look at manually decoding base64 further in this article.

Encoding with Padding

Sometimes there are values which can’t have all 6-bits in the chunks as number of bits could be less in that data than regular 24-bit streams. In that case, we use the concept of padding where we add extra zeroes (0) at the end of the chunks which are shorter than 6-bits.

Let’s try encoding an ASCII string which requires the use of Padding while encoding into base64…

ASCII value to encode: Deep

  • Step 1: 8-bit Binary Representation

The binary representation of our value is 01000100 01100101 01100101 01110000.

  • Step 2: Re-arranging into 24-bit Binary Representation

Make sure to do this from left to right. The value after re-arranging is 010001 000110 010101 100101 011100 00. In the last chunk there are only 2 bits, so we will add extra zeroes in order to make it a 6-bit chunk.

The final 6-bit groups representation should be: 010001 000110 010101 100101 011100 000000 - (Added 4 zeroes at the end).

  • Step 3: Match the 6-bit chunks from the index table to obtain ASCII value

You have to do this chunk-by-chunk and obtain single letters each time.

010001: R
000110: G
010101: V
100101: l
011100: c
000000: A

After concatenating each of the obtained letters, we get RGVlcA but we still have to add the padding, we can do that by adding =. The last chunk had only 2 bits and 4 bits were still spare so we added 4 zeroes, in case of 2 spare bits we always have to add double = and in case of a single spare bit (e.g. 0010 in the last chunk) we can add a single =.

In our case we had 2 bits spare (4 zeroes added) so we have to add == at the end of our obtained letters.

The final encoded output is: RGVlcA== (this decodes to “Deep”).


Manual Decoding

We will now decode a base64 encoded string by reversing the whole encoding process we followed.

Base64 string to decode: V2luCg==.

Let’s decode this manually now…

Step 1: Remove the padding

Remove the padding and we now have V2luCg.

Step 2: Match each character with 6-bit binary representation from index table

V: 010101
2: 110110
l: 100101
u: 101110
C: 000010
g: 100000

We got 010101 110110 100101 101110 000010 100000.

Step 3: Re-arrange as 8-bit chunks

Re-arranging as 8 bit chunks from left-to-right, we get 01010111 01101001 01101110 00001010. We removed the last four zeroes because they were for padding, decoding requires no padding.

We now have our 8-bit binary representation of our data and converting this to ASCII will get us our final decoded value Win!


Filename & URL Safe Base64

Filename & URL safe base64 is also a part of RFC4648. This type of base64 is to be called “base64url” because of the difference in this and regular base64 encoding.

The index number 62 and 63 are for + and / respectively. These 2 characters have a special meaning if used in URLs or filenames. In a URL, + is for whitespace and / is for directory (in filenames as well). Due to this, these 2 characters cannot be used in a regular base64 encoding becuase if that happens and base64 encoded data is transferred through URLs, there could be many problems as the browser will interpret the signs as whitespace and for directory so in this case, + (plus sign) has been replaced by - (hyphen) and / has been replaced by _ (underscore) and with these replacements placed, the encoded string is known as “base64url”.


I hope you enjoyed this article and gained some valuable knowledge about our friendly encoding algorithm Base64. Feel free to contact me for any suggestions, feedbacks and I would really appreciate those. Contact Me

You can also Buy Me A Coffee to support this blog page!

Back to Top