Deobfuscation techniques: Peephole deobfuscation

In this article we describe a basic deobfuscation technique that leverages code snippet substitution. For concrete examples we'll analyse a publicly available Lumma sample using Ghidra.

Problem statement

IT Security is a never-ending battlefield between attackers and defenders - black hats and white hats, red and blue teams, penetration testers and website owners, antivirus companies and malware operators, exploit and software developers.

Reverse engineering is no exception. Since it's virtually impossible¹ to prevent a determined analyst from understanding everything about a program, those looking to protect their secrets rely on techniques that make analysis more time consuming. There are many different ways to make life harder for reverse-engineers (and antivirus tools). Some key techniques include:

Packers and protectors (wrapping a program in one or more layers to hide the real code).
Anti-debugging (making it hard or impossible to run the unmodified program under a debugger).
Anti-VM techniques (preventing the program from being executed in a virtual machine).
Obfuscation (the main topic of this post).

Obfuscation means "intentionally making the code hard to understand". The goal is to make the code as unreadable as possible to a human analyst, while still preserving its original behaviour. It's easy to understand when applied to scripting languages, like Javascript:

For binary code, the idea is the same, even though the techniques differ. There are many different obfuscation methods, for example:

Junk code insertion - adds meaningless and useless instruction to a function, to make the real code harder to find. Modern decompilers usually have no problems removing this kind of noise.
Instruction substitution - a basic form of obfuscation, such as replacing a + b with a mathematically equivalent operation like (a ^ b) + ((a & b) << 1). Substitutions like this make assembly code unreadable to humans, but modern decompilers are also good at simplifying them.
Control flow obfuscation - replace standard programming language constructs, like "if/else" or loops, with complex series of jumps that are hard to understand for reverse-engineering tools. Often the jump condition is intentionally complex, to confuse the decompiler even more.
Control flow flattening - a specific, advanced version of the previous technique. We won't cover this technique in this blog post.
Opaque predicates - another control flow obfuscation technique, where simple statements like if (x) { ... } are replaced with logic that's hard for a decompiler to reason about, like if (always_false | x) { ... } else { /* dead code */ }. This can make the code significantly harder to read.

In this blog post we'll describe basic (but powerful) ways to untangle all but the most advanced protection methods.

Peephole (de)obfuscation

As one mights guess, deobfuscation is the process of reversing obfuscation. In principle it can be done manually, but in the context of binary analysis we almost always mean automated deobfuscation. In this post we'll apply a technique I'll refer to peephole deobfuscation² from now on.

The name comes from analogy with peephole obfuscation (and peephole optimization). The idea is that we can perform binary transformations by only examining only small portions of a program at once (just like "looking through the peephole"³). In case of obfuscation, the goal is to replace simple easy-to-understand fragments with more complex but equivalent code⁴. A typical example is junk code insertion. This code:

mov rax, [rdx]
inc rax
mov [rdx], rax

May be obfuscated to:

and rbx, rbx
mov rax, [rdx]
push rcx
xchg rcx, rdx
inc rax
xor rdx, rax
mov rdx, rcx
pop rcx
mov [rdx], rax

The code looks significantly more complex, but after careful inspection we'll discover that it works in exactly the same way.

and rbx, rbx     ; no-op
mov rax, [rdx]   ;
push rcx         ; save rcx -----\
xchg rcx, rdx    ; rcx = rdx --\ |
inc rax          ;             | |
xor rdx, rax     ; rdx = junk  | |
mov rdx, rcx     ; rdx = rcx <-/ |
pop rcx          ; restore rcx <-/
mov [rdx], rax   ;

Performing this type of obfuscation relatively simple, since it only requires looking at small pieces of assembly code at a time⁵. Because of this, it's also fairly easy to implement. But it's also easy to analyse for all but the most basic reverse-engineering tools.

Instruction substitution is another clear example of peephole obfuscation - we operate at a single expression at a time. For contrast, control flow flattening is a global obfuscation technique - it requires looking at whole functions at once.

That's where peephole deobfuscation name comes from - it encompasses deobfuscation methods that can be applied by only looking at a relatively small number of opcodes at a time. This will be the focus of the rest of this blog post.

Our reverse-engineering environment

We will test our deobfuscation techniques on a Lumma sample with sha256 hash 44573a7526d5053a28d4e3e70c6ad8adf8eec148d8fe81302140b6bb3df179c0, or more precisely, on the unpacked version of this sample. To make it easier to follow along, both samples are available on malwarebazaar - the unpacked sample is here.

In the code examples we'll use Ghidra along with a helper library called ghidralib. Ghidra was chosen, because it's free, and it was the tool used during the initial analysis of the sample. The library will be used, because it significantly simplifies our code snippets.

That said, the core ideas are tool agnostic. You can easily follow along using other setups, like IDA, Binary Ninja, or even pure Python ⁶.

Humble beginnings

Let's start with a simple example. The binary contains several assembly snippets like this:

MOV        ECX,dword ptr [EDI]  ; 00411e63 8b 0f
MOV        EAX,ECX              ; 00411e65 89 c8
AND        EAX,0x2              ; 00411e67 83 e0 02
OR         ECX,0x2              ; 00411e6a 83 c9 02
ADD        ECX,EAX              ; 00411e6d 01 c1

This might look like a relatively complex operation. To understand it, we can rewrite it to C-like pseudocode: (ECX & 2) + (ECX | 2). Now, if we take a closer look, we may realise:

If second bit of ECX is not set, then this expression simplifies to (0) + (ECX | 2) == ECX + 2
If second bit of ECX is set, then this expression simplifies to (2) + (ECX) == ECX + 2

In other words, this entire obfuscated pattern can be replaced with a simple ADD ECX, 2. As a warmup exercise, we'll implement a deobfuscator that detects and simplifies this pattern.

This is actually very straightforward! We'll generalize this a bit later, but for now let's focus on handling this specific case. To replace an assembly pattern, we can do just:

from ghidralib import *

assemble_at(0x411e65, [
    "ADD ECX, 2",
], pad_to=10)  # add NOPs until 10 bytes

This is enough to deobfuscate that snippet:

MOV        ECX,dword ptr [EDI]  ; 00411e63 8b 0f
ADD        ECX,0x2              ; 00411e65 83 c1 02
NOP                             ; 00411e68 90
NOP                             ; 00411e69 90
NOP                             ; 00411e6a 90
NOP                             ; 00411e6b 90
NOP                             ; 00411e6c 90
NOP                             ; 00411e6d 90
NOP                             ; 00411e6e 90

But, of course, we don't want to manually run this code on every instance of this pattern. Instead, we'd like to find and replace all occurrences at once. Since (for now) we're only targeting this specific pattern, we can just look for the exact byte sequence we expect:

# Bytes of the original instructions - see assembly snippet above
pattern = "89 c8 83 e0 02 83 c9 02 01 c1"
for addr in findall_pattern(pattern):
    print(hex(addr))
    # use assemble_at to patch instructions at `addr`

One downside of this approach is that it's a bit too strict. For example 89 c8 matches mov eax, ecx, but the obfuscation works for other register pairs, like mov edx, eax. It turns out that for most x86 instructions we generalize this with simple wildcards. Let's take a look at a few examples:

x86 instruction	bytes
`mov eax, ebx`	`89d8`
`mov eax, ecx`	`89c8`
`mov eax, edx`	`89d0`
`mov eax, esi`	`89f0`
`mov edi, eax`	`89c7`
`mov edi, ebx`	`89df`
`mov edi, edx`	`89d7`

Clearly we can find our instructions by just masking out the second byte⁸. Let's update our rule:

pattern = "89 ?? 83 e? 02 83 c? 02 01 ??"
for addr in findall_pattern(pattern):
    print(hex(addr), disassemble_at(addr, 4))
    # use assemble_at to patch instructions at `addr`

That way we will not only match MOV EAX, ECX, but also, for example, MOV EBX, EAX and other register combinations⁷. Knowing which bytes to mask out is not always obvious⁹. With some experience it can usually be done by eye - but when in doubt it's always a good idea to consult x86 instruction reference.

A pedantic note about correctness:

Byte patterns like this are not completely reliable. Technically we should disassemble the matched bytes and confirm they look like we expect. For example, each instruction is matched independently, so mov eax, ebx; and ecx, 0x2; ... will also match - even though the meaning is different. In our experience, this doesn't usually matter and for reverse-engineering purposes a small risk of incorrect deobfuscation is acceptable. But if we do require 100% correctness (for instance, we want our deobfuscated binary to run correctly), then extra care is needed.
A dirty secret: even with perfect opcode matching, 100% correctness is impossible. If some part of code jumps into the middle of our pattern, the deobfuscated result will always be incorrect. Because of how obfuscators are implemented, this shouldn't happen often, but obfuscators are specifically designed to make our life miserable, so it can't be ruled out entirely.

It's a job of reverse-engineer to make sure such edge cases don't occur. Having said that, we'll go ahead and ignore these issues and assume our patterns work (because in practice they do).

Lumma control flow (de)obfuscation

Operation obfuscation makes the code harder to read - like (ECX & 2) + (ECX | 2) instead of ECX + 2 - but it doesn't fundamentally change the approach to reverse-engineering. It's still possible to analyse the obfuscated binary with some patience.

Control flow obfuscation is different - it can completely hide most of the code from the eyes of the reverse-engineer. We can observe this in our test binary. If we just load it into our decompiler, we'll immediately see that many functions appear almost empty and end abruptly:

void FUN_0040e020(void) {
  HRESULT HVar1 = CoInitializeEx(NULL,2);
                    // WARNING: Could not recover jumptable at 0x0040e051. Too many branches
                    // WARNING: Treating indirect jump as call
  (*(code *)(*(int *)(&DAT_00444f88 + (HVar1 >> 0x1f) * -4) + (_DAT_00444f90 ^ 0x5a65af47) + 1))();
  return;
}

In situations like this, it's usually a good idea to take a look at the assembly code, to understand what's going on:

CALL       dword ptr [->OLE32.DLL::CoInitializeEx]
SHR        EAX,0x1f
MOV        EAX,dword ptr [EAX*0x4 + DAT_00444f88]
MOV        ECX,0x5a65af47
XOR        ECX,dword ptr [DAT_00444f90]
ADD        EAX,ECX
INC        EAX
JMP        EAX

What's going on here? Let's analyse this step by step:

CALL       dword ptr [->OLE32.DLL::CoInitializeEx]

CoInitializeEx function is called. It returns a HRESULT, success is denoted by S_OK (zero), and errors are denoted by negative values (i.e. the highest bit is set).

SHR        EAX,0x1f

This is the key operation here - shifting EAX 31 places to the right. As you may recall, most computer architectures, including x86, store numbers using a method called two's complement. One property of this notation is that negative numbers always have the highest bit set. If we shift a 32bit number 31 places to the right, we're left with only the highest bit. In short, after this operation EAX=1 for negative numbers, and EAX=0 otherwise.

MOV        EAX,dword ptr [EAX*0x4 + DAT_00444f88]

Now we read a value from a memory location that depends on EAX. But we know that EAX may only be equal to 0 or 1! So, in pseudocode, we have:

if (CoInitializeEx() >= 0) {
    EAX = DAT_00444f88;  // dword at DAT_00444f88
} else {
    EAX = DAT_00444f8C;  // dword at (DAT_00444f88 + 4)
}

The rest of the dispatcher is straightforward:

; compute DAT_00444f90 xor 0x5a65af47
MOV        ECX,0x5a65af47
XOR        ECX,dword ptr [DAT_00444f90]
; compute AND of previously loaded constant with that xor
ADD        EAX,ECX
; increment the result by one
INC        EAX
; jump to the computed address
JMP        EAX

Since everything here is constant and depends only on EAX's sign, this simplifies to just:

if (CoInitializeEx() >= 0) {
    // Jump to constant address A
} else {
    // Jump to constant address B
}

This is a typical control flow obfuscation (as mentioned at the beginning of this blog post). Obfuscated jumps like these can replace all standard conditional jumps in a binary, effectively hiding if statements, for and while loops, and more. When decompiler - like Ghidra in this case - fails to understand obfuscated control flow, it severely limits automated analysis and makes manual analysis very hard.

Now let's apply our conclusions from this and the previous section to our Lumma sample, and evaluate our results.

Deobfuscating Lumma

First, let's take a look at raw bytes of the dispatcher:

MOV        EAX,dword ptr [EAX*0x4 + DAT_00444f88] ; 8b 04 85 88 4f 44 00
MOV        ECX,0x5a65af47                         ; b9 47 af 65 5a
XOR        ECX,dword ptr [DAT_00444f90]           ; 33 0d 90 4f 44 00
ADD        EAX,ECX                                ; 01 c8
INC        EAX                                    ; 40
JMP        EAX                                    ; ff e0

If we mask out the pointers and registers:

MOV        EAX, dword ptr [EAX*0x4 + ????????]    ; 8b 04 85 ?? ?? ?? ??
MOV        ???, ????????                          ; b? ?? ?? ?? ??
XOR        ???,dword ptr [????????]               ; 3? ?? ?? ?? ?? ??
ADD        ???, ???                               ; 01 ??
INC        ???                                    ; 4?

We get something that can be used to match interesting code:

pattern = "8B 04 85 ?? ?? ?? ?? b? ?? ?? ?? ?? 3? ?? ?? ?? ?? ?? 01 ?? 4?"
for addr in findall_pattern(pattern):
    for op in disassemble_at(addr, 10):
        if op.mnemonic == "JMP" and op.operands[0].is_register:
            jump_to = op.operands[0].register
            break
    else:
        # no JMP found
        continue
    ...

Note that the byte pattern is quite conservative - since we're just looking for a very unusual MOV followed by a conditional JMP, we could probably get away with just the first three bytes of the pattern (at a cost of few more disassembly operations).

Now, because we know the operation order, we could extract the constants and compute the target address directly. But I'd like to demonstrate a more generic approach. Ghidra comes with a built-in emulator which we can easily apply here:

emu = Emulator()
emu["EAX"] = 0                  # Assume EAX=0
emu.emulate(addr, op.address)   # Emulate until JMP
iffalse = emu[jump_to]          # Target address at JMP if EAX=0

emu = Emulator()
emu["EAX"] = 1                  # Assume EAX=1
emu.emulate(addr, op.address)   # Emulate until JMP
iftrue = emu[jump_to]           # Target address at JMP if EAX=1

This approach saves us some coding, reduces the risk of a mistake, and is much more universal.

Finally, we just need to patch the binary with equivalent but deobfuscated code:

assemble_at(addr, [
    "TEST EAX, EAX",                # Test EAX
    "JZ  0x{:x}".format(iffalse),   # Jump to `iffalse` if shifted number was nonnegative
    "JMP 0x{:x}".format(iftrue),    # Otherwise jump to `iftrue`
], pad_to=op.address - addr + 2)    # Pad the snippet (not mandatory, since we JMP)

And that's it. We can now combine these parts into one script:

from ghidralib import *

pattern = "8B 04 85 ?? ?? ?? ?? b? ?? ?? ?? ?? 3? ?? ?? ?? ?? ?? 01 ?? 4?"
for addr in findall_pattern(pattern):
    for op in disassemble_at(addr, 10):
        if op.mnemonic == "JMP" and op.operands[0].is_register:
            jump_to = op.operands[0].register
            break
    else:
        # no JMP found
        continue

    emu = Emulator()
    emu.emulate(addr, op.address)
    iffalse = emu[jump_to]

    emu = Emulator()
    emu["EAX"] = 1
    emu.emulate(addr, op.address)
    iftrue = emu[jump_to]

    assemble_at(addr, [
        "TEST EAX, EAX",
        "JGE 0x{:x}".format(iffalse),
        "JMP 0x{:x}".format(iftrue),
    ], pad_to=op.address - addr + 2)

Now, when we run our script, it will deobfuscate the binary:

undefined4 FUN_0040e020(void) {
  HRESULT HVar1 = CoInitializeEx(NULL,2);
  if (-1 < HVar1) {
    HVar1 = CoInitializeSecurity(NULL,-1,NULL,NULL,0,3,NULL,0,NULL);
    if (-1 < HVar1) {
      undefined4 uVar2 = (*(code *)(((~DAT_00444fc0 | 0x8eece992) & (DAT_00444fc0 | 0x7113166d)) + 0x700ae7))();
      return uVar2;
    }
    CoUninitialize();
  }
  return 0;
}

There are two things to improve left:

For some reason, instead of HVar1 >= 0 Ghidra prefers to write -1 < HVar1. We'll have to live with that.
While some code was successfully deobfuscated, just a few statements later there is another obfuscated jump!

So why doesn't Ghidra resolve that second jump automatically? The reason is simple - the region where those constants are located is not marked as constant.

Fortunately, fixing that is easy. We just need to change the "mutability" setting. Select the memory range 0x444000-0x449e1d, and change Data -> Settings -> Mutability to Constant. Alternatively, it's possible to mark the whole section as readonly using a memory window¹⁰. This leaves us with a completely deobfuscated control flow:

undefined4 FUN_0040e020(void) {
  // ... (snip)
  HVar4 = CoInitializeEx(NULL,2);
  if (-1 < HVar4) {
    HVar4 = CoInitializeSecurity(NULL,-1,NULL,NULL,0,3,NULL,0,NULL);
    if (-1 < HVar4) {
      // ... (snip)
      uStack_c10 = 0x9aaebea3;
      uStack_c0c = 0xa2bba4ae;
      uStack_c08 = 0xa2b3a8d6;
      uStack_c04 = 0xb2a3a2ab;
      uStack_c00 = 0xe35090ad;
      uStack_bfc = 0x59ad9993;
      uStack_bf8 = 0x9d58535e;
      uStack_bf4 = 0x65a5839f;
      uStack_bf0 &= 0xffff0000;
      iVar7 = 0x8e;
      bVar11 = 0x70;
      do {
        bVar2 = abStack_c9e[iVar7];
        abStack_c9e[iVar7] = (bVar2 & bVar11 | ~(bVar2 * '\x02') + bVar2 & (char)iVar7 + 1U) + 0x35;
        iVar7 += 1;
        bVar11 -= 1;
      } while (iVar7 != 0xb0);
      // ... (snip)
      cVar3 = FUN_0040ce00();
      if (cVar3 != '\0') {
        GetSystemDirectoryW((LPWSTR)&uStack_c10,0x104);
        uVar5 = (uint)(iStack_dc4 * 3) >> 2;
        cVar3 = *(char *)(iStack_dd4 + -1 + iStack_dc4);
        if ((cVar3 == '=') || (cVar3 == '.')) {
          uVar5 -= 1;
        }
        cVar3 = *(char *)(iStack_dd4 + -2 + iStack_dc4);
        if ((cVar3 == '=') || (cVar3 == '.')) {
          uVar5 -= 1;
        }
        puVar6 = (undefined4 *)thunk_FUN_0043a220(uVar5);
        FUN_00438620(iStack_dd4,puVar6);
        FUN_0043a2b0(iStack_dd4);
      }
      // ... (snip)
      return uVar8;
    }
    CoUninitialize();
  }
  return 0;
}

The control flow is now completely clear. There are some obfuscated jumps left, but the code is now readable enough to continue the manual analysis. However, this doesn't mean that the binary is now completely deobfuscated - for example notice the data obfuscation used to hide a string. Decoding patterns like this is outside the scope of this post, but it's worth recommending a high-quality tool called FLOSS. In many cases it can automatically decode such packed strings completely automatically.

Results and next steps

This concludes our short post about deobfuscation. We covered a simple but powerful technique known here as peephole deobfuscation, based on localized pattern-based substitution.

Of course that only scratches the surface of the broader field of deobfuscation. Still, knowing how to write quick-and-dirty deobfuscators is an extremely valuable skill in any reverse-engineers toolbox. It can let us analyse binaries that would otherwise be unapproachable.

The malware used in this blog post was inspired by a great article by Mandiant: LummaC2: Obfuscation THrough Indirect Control Flow. It presents an alternative approach to Lumma deobfuscation, using a symbolic execution engine and backward slicing. I highly recommend checking it out - it was very inspiring to me. That said, I believe it's a huge overkill for practical incident response and day-to-day reverse engineering. This post shows an alternative, lightweight approach.

Cover image at the top created by Max Harlynking

At least currently. There are research topics that in the future could make perfectly opaque computation possible, but that's outside of the scope of this article. ↩
This is not a standard term. As far as I know there's no name for this particular approach. ↩
https://en.wikipedia.org/wiki/Peephole ↩
In peephole optimization, we replace slow assembly fragments with faster ones, and in peephole deobfuscation we'll obviously replace complex code with simpler one. ↩
Technical nitpick: we assume that all transformation run on a single basic blocks, i.e. there are no jumps in the middle of the replaced instruction sequence. ↩
For a dated example, see deobfuscator in https://github.com/CERT-Polska/nymaim-tools. It's a bit more advanced approach to the opcode-level deobfuscation engine. ↩
Head's up - this particular pattern occurs in the test binary only once. It was chosen because it's easy to explain, not because it's important. Looking for other patterns and deobfuscating them is a good exercise. ↩
In fact, the first bits of the second byte are always set. But patterns like this should only be used for initial filtering anyway - ideally we should disassemble the opcodes and do a sanity check if they look like we expect. ↩
I'm not aware of any automated tool that can do it automatically, but it's entirely possible that nowadays there is a tool or a large language model that can do this mostly correctly. ↩
But one should be careful doing that - marking data as constant will allow Ghidra to inline it, and variable expressions may be incorrectly inlined. ↩