戴兜

戴兜的小屋

Coding the world.
github
bilibili
twitter

Analysis of the principle of base64 data steganography implementation

Before starting this article, let's take a look at a group of base64 encoded strings:

ZG==
YY==
aW==
ZF==
cm==
aM==
b2==
dc==
c2==
Zf==

The decoded content is "daidrhouse", which seems fine. But if you look closely, the decoded results of the first and fourth lines are both "d", but the content is different?

According to the normal base64 encoding, "daidrhouse" should result in the following:

ZA==
YQ==
aQ==
ZA==
cg==
aA==
bw==
dQ==
cw==
ZQ==

Clearly, compared to the former, the second character of each base64 string has been changed, but the decoded content remains the same. This brings us to the principle of base64 encoding.

What is base64?#

As the name suggests, base64 encoding is a way of encoding binary content using 64 ASCII characters as a base. You may have seen base64 encoded images embedded in web pages, or even when transferring lyrics files in QQ Music. Encoding binary data into ASCII characters makes it easier to read and transmit data in certain scenarios. Of course, compressing all binary data into just 64 characters will inevitably compromise the size. After encoding, the size of the characters will increase by 1/3, and the reason for this will be explained below.

Index Table#

Base64 has a standard encoding table, which consists of 64 ASCII characters sorted and assigned indices.

IndexCharacterIndexCharacterIndexCharacterIndexCharacter
0A16Q32g48w
1B17R33h49x
2C18S34i50y
3D19T35j51z
4E20U36k520
5F21V37l531
6G22W38m542
7H23X39n553
8I24Y40o564
9J25Z41p575
10K26a42q586
11L27b43r597
12M28c44s608
13N29d45t619
14O30e46u62+
15P31f47v63/

Sometimes, to avoid confusion (such as in URLs), . and _ are used instead of + and / from the index table.

Encoding Method#

Base64 processes 3 bytes (24 bits) as a group. If there are fewer than 3 bytes, padding with 0 is done, and = is used at the end to indicate the number of padded bytes. Each group of 6 bits is then encoded as 1 group of 6-bit binary, resulting in 4 groups of 6-bit binary for the 24 bits. At this point, there are a total of 64 possible combinations for these 6-bit binaries, which can be represented by 64 characters. (This also explains why the size increases by 1/3 after encoding.)

Examples#

image

image

Steganography Principle#

When decoding base64, the number of = at the end of the string determines the number of bytes to be removed. You may have noticed that when the number of characters in a group is 1 byte or 2 bytes, 4 or 2 bits of binary are ignored during decoding, as indicated by the red marks in the following image.

image

image

These red-marked binaries can be encoded but are ignored during decoding. Modifying the content at these positions will not affect the original data.

Problem Solving#

Now, let's try to solve the problem mentioned at the beginning of the article. What is hidden in that group of base64 encoded strings?

image

image

image

image

image

image

image

image

image

image

By concatenating all the red-marked binary bits, we can obtain the final result: "hello".

image

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.