Table of contents
Introduction
Today we'll continue topics started in the first part of the tutorial. We'll learn about malduck
, what can it do and how to write your own modules. Later we'll also show how to integrate it with Karton by using with karton-config-extractor
.
Mal-🦆
Malduck is a utility library designed for malware researchers. The most important features include:
- Extraction engine (modular extraction framework for config extraction from files/dumps)
- Cryptography (AES, Blowfish, Camelie, ChaCha20, Serpent and many others)
- Compression algorithms (aPLib, gzip, LZNT1 (RtlDecompressBuffer))
- Memory model objects (work on memory dumps, PE/ELF, raw files and IDA dumps with the same code)
- Fixed integer types (like Uint64) and bitwise utilities
- String operations (chunks, padding, packing/unpacking etc.)
- Hashing algorithms (CRC32, MD5, SHA1, SHA256)
In this tutorial, we'll focus on the first one - the extraction engine. But we'll also showcase other library features in the code snippets.
To get a better overview of the library, check out code examples in the README or the official documentation.
Now, install malduck in a temporary virtual environment:
$ cd $(mktemp -d)
$ python3 -m venv venv
$ source ./venv/bin/activate
$ pip install malduck
$ malduck --version
malduck, version 4.1.0
Your first extractor module
First, download a malware sample for tests. You can download one from our Github (don't worry, it's a memory dump and won't harm anyone (it may trigger the AV, though, so make sure you're ready)):
wget https://github.com/CERT-Polska/training-mwdb/raw/main/citadelmalware.bin
This is a dump of a (pretty old) citadel sample. Now let's try to write a malduck module for it.
First, we need a Yara
rule. For example, this one:
rule citadel
{
meta:
author = "mak"
module = "citadel"
strings:
$briankerbs = "Coded by BRIAN KREBS for personal use only. I love my job & wife."
$cit_aes_xor = {81 30 [4] 0F B6 50 03 0F B6 78 02 81 70 04 [4] 81 70 08 [4] 81 70 0C [4] C1 E2 08 0B D7 }
$cit_salt = { 8A D1 80 E2 07 C0 E9 03 47 83 FF 04 }
$cit_login = { 30 [1-2] 8A 8? [4] 32 }
$cit_getpes = { 68 [2] 00 00 8D ( 84 24 | 85) [4] 50 8D ( 85 ?? ?? ?? ?? | 44 24 ?? ) 50 E8 [4] B8 [2] 00 00 50 68 }
$cit_base_off = { 5? 8D 85 [4] E8 [4] 6A 20 68 [4] 8D [2] 50 E8 [4] 8D 85 [4] 50 }
condition:
3 of them
}
Now check that it works:
$ yara -rs citadel.yar citadelmalware.bin
citadel citadelmalware.bin
0x33795:$cit_aes_xor: 81 30 3E 4A BB 01 0F B6 50 03 0F B6 78 02 81 70 04 84 1B B2 98 81 70 08 12 2B B5 EF 81 70 0C B1 ...
0x32e19:$cit_salt: 8A D1 80 E2 07 C0 E9 03 47 83 FF 04
0x32f2f:$cit_login: 30 04 3E 8A 89 B8 5F 40 00 32
0x16593:$cit_base_off: 57 8D 85 20 F9 FF FF E8 12 96 00 00 6A 20 68 B8 5F 40 00 8D 45 EC 50 E8 88 C5 01 00 8D 85 77 FA ...
0x1fbe7:$cit_base_off: 57 8D 85 DC FA FF FF E8 BE FF FF FF 6A 20 68 B8 5F 40 00 8D 45 F0 50 E8 34 2F 01 00 8D 85 33 FC ...
It looks like it does, and multiple symbols matched.
We usually try to match the most exciting or specific segments of code. For example, cit_aes_xor
is related to AES encryption code, cit_salt
is a code that reads the salt, etc. Those code fragments were picked because they're stable and don't often change between different compilations, but also because we can extract useful information with them.
Enter malduck modules code. Your role as a programmer is to provide callbacks for interesting symbols and extract additional information with them:
import logging
from malduck.extractor import Extractor
log = logging.getLogger()
class Citadel(Extractor): # @Extractor
family = "citadel"
yara_rules = "citadel", # mind the comma (this is a tuple, not a string)
# Callback for "briankerbs" symbol (by default function name is used).
@Extractor.extractor("briankerbs")
def citadel_found(self, p, addr):
log.info('[+] `Coded by Brian Krebs` str @ %X' % addr)
return {'family': 'citadel'}
@Extractor.extractor
def cit_salt(self, p, addr): # @Callbacks
salt = p.uint32v(addr - 8) # @Procmem
log.info('[+] Found salt @ %X - %x' % (addr, salt))
return {'salt': salt}
What's going on here? This is a pretty simple module with a single callback, cit_salt
(name matters).
Extractor
We have just created an "extractor". An extractor is responsible for extracting configs from dumps of a given family (in this case, citadel
). It needs
a corresponding .yar
file with one or many rules.
Callbacks
Extractors usually have multiple callbacks. Every callback is called for every occurrence of a matching symbol in Yara rules. 1
In this case, the cit_salt
callback will be called with addr=0x32e19
(the address of the symbol in the dump - see above).
Callbacks are responsible for extracting simple pieces of information, and they return them as Python dict objects. In the "real world", there are usually multiple callbacks, and their result is combined. For example, if one callback returns:
{ "salt": "xyz123" }
And the other one returns:
{ "key": "ilovemalware13" }
Then the final config is:
{
"salt": "xyz123",
"key": "ilovemalware13"
}
Callbacks can be very simple, like citadel_found
function. It will only be called when the symbol of interest is found in the binary.
Beyond uint32v
Of course just reading uint32
is not overly impressive. Let's look at a bit more advanced callback (from a full version of the Citadel extractor):
@Extractor.extractor
def cit_login(self, p, addr):
log.info('[+] Found login_key xor @ %X' % addr)
hit = p.uint32v(addr + 4)
if p.is_addr(hit):
return {'login_key': p.asciiz(hit)}
hit = p.uint32v(addr + 5)
if p.is_addr(hit):
return {'login_key': p.asciiz(hit)}
To understand what's going on here, we need to look at the assembly code. Recall the Yara matches:
0x32f2f:$cit_login: 30 04 3E 8A 89 B8 5F 40 00 32
Let's disassemble it:
$ echo 30043E8A89B85F400032 | xxd -r -ps | ndisasm -b 32 -
00000000 30043E xor [esi+edi],al
00000003 8A89B85F4000 mov cl,[ecx+0x405fb8]
00000009 32 db 0x32
As we can see, this is just a simple piece of code that moves data around. It's interesting because mov
opcode copies a byte from the AES key to the cl
register 2. This means that we can use this to get a location of the AES key in the memory - in this case, it's 0x405fb8
(offset from mov
's mod/rm operand)'.
So we can get the address of the AES key:
hit = p.uint32v(addr + 4)
Of course, it's not very useful - address may be different in every analysed binary (or even change during every execution). We also need to read the key:
return {'login_key': p.asciiz(hit)}
asciiz
is one of many helper methods useful for reading various types of data. As the name suggests, it reads an ASCII string, starting from the hit
address, and until a null byte is found.
Malduck ninjutsu
Sometimes you really need to flex your module-writing skills. For example, imagine that the key is xor-red in runtime in assembly code (xor key is not stored in a data segment somewhere). The assembly code changes after every recompilation. This is precisely what happens in Citadel. Let's disassemble cit_aes_xor
hit:
$ echo 81303e4abb010fb650030fb67802817004841bb298817008122bb5ef81700cb1bed171c1e2080bd70fb | xxd -r -ps | ndisasm -b32 -
00000000 81303E4ABB01 xor dword [eax],0x1bb4a3e
00000006 0FB65003 movzx edx,byte [eax+0x3]
0000000A 0FB67802 movzx edi,byte [eax+0x2]
0000000E 817004841BB298 xor dword [eax+0x4],0x98b21b84
00000015 817008122BB5EF xor dword [eax+0x8],0xefb52b12
0000001C 81700CB1BED171 xor dword [eax+0xc],0x71d1beb1
00000023 C1E208 shl edx,byte 0x8
00000026 0BD7 or edx,edi
How do you write a module for it? Well, there are multiple options. But the easiest, and most readable one is to just use disassembler in our favour:
@Extractor.extractor
def cit_aes_xor(self, p, addr):
log.info('[+] Found aes_xor key @ %X' % addr)
r = []
for c in p.disasmv(addr, 40): # disassembly 40 bytes starting from addr
if len(r) == 4: # key is always 4 dwords long
break
if c.mnem == 'xor':
r.append(c.op2.value)
return {'aes_xor': malduck.enhex(b''.join(map(p32, r)))}
We disassemble the code until we find four xor
opcodes and concatenate the operants into a final aes_xor
config key.
Procmem
Last but not least, we should talk about process memory objects.
The files we work on are various kinds of memory maps (like PE files, ELF files, or memory dumps). We usually care more about their in-memory layout than their on-disk layout. For example, we often ask "read 5 bytes from address 0x400100
", but not "what is the 117th byte of the file".
Process Memory objects are the abstraction that makes it possible. They load various types of files to memory, and implement functions like .readv
(read a chunk of memory from a given virtual address).
Right now, the supported formats are: - PE files - memory dumps - ELF files - IDA interactive session (IDAMem objects) - memory dumps in Cuckoo 2.x format
But it's not hard to add a new format when necessary.
Try it out!
Now it's time to try our module. Copy&paste the yara
and python
files, or download them from our Github:
wget https://github.com/CERT-Polska/training-mwdb/raw/main/modules.7z
7z x modules.7z
modules/
directory should look like this:
$ find
.
./modules
./modules/citadel
./modules/citadel/citadel.yar
./modules/citadel/citadel.py
./modules/citadel/__init__.py
./modules/__init__.py
Now, try to run the extractor on a downloaded Citadel sample:
$ malduck extract citadelmalware.bin --modules modules
[+] Ripped 'citadel' from citadelmalware.bin:
{
"family": "citadel",
"salt": 4073311727
}
It looks like it worked!
Karton integration with karton-config-extractor
How does it all relate to the Karton framework? Malduck is packaged as karton-config-extractor
, and you can easily plug it into your pipeline. See Karton Gems 1 for a longer description of that topic.
Like in Karton Gems 1, you need a karton-playground
(docker-compose with a dev environment) running on your local machine:
$ git clone https://github.com/CERT-Polska/karton-playground.git
$ cd karton-playground
$ sudo docker-compose up # this may take a while
Long story short, just install the config-extractor package and run it:
$ python3 -m venv venv; source ./venv/bin/activate
$ pip install karton-config-extractor
$ karton-config-extractor --modules modules
[2021-05-13 15:27:10,085][INFO] Service karton.config-extractor started
[2021-05-13 15:27:10,098][INFO] Binds changed, old service instances should exit soon.
[2021-05-13 15:27:10,099][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'win32'}
[2021-05-13 15:27:10,100][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'win64'}
[2021-05-13 15:27:10,100][INFO] Binding on: {'type': 'sample', 'stage': 'recognized', 'kind': 'runnable', 'platform': 'linux'}
[2021-05-13 15:27:10,101][INFO] Binding on: {'type': 'analysis', 'kind': 'drakrun-prod'}
[2021-05-13 15:27:10,101][INFO] Binding on: {'type': 'analysis', 'kind': 'drakrun'}
Now upload an executable file to mwdb, and new logs should appear:
[2021-05-13 15:27:20,940][INFO] Received new task - ee2abafc-271b-441b-b81c-77f264c8e120
[2021-05-13 15:27:20,981][INFO] Processing drakmon OSS analysis, sample: 3a153c52aa82a667091dff9a4b4defb7a6e395c3d0604d7aa18f75ca6a27e77e
[2021-05-13 15:27:24,130][INFO] Merging and reporting extracted configs
[2021-05-13 15:27:24,131][INFO] done analysing, results: {"analysed": 94, "crashed": 0}
[2021-05-13 15:27:24,156][INFO] Task done - ee2abafc-271b-441b-b81c-77f264c8e120
When config is extracted successfully, it's added to mwdb automatically.
What's next
That's it, enough kartoning for today. Porting all your modules to malduck may be a long and exhausting endeavour, but it was worth it for us.
You may use the community to your advantage. There is a small but growing repository with publicly available modules at https://github.com/c3rb3ru5d3d53c/mwcfg-modules. You can use it as a starting point for your modules or get a better feel of malduck. If possible, try to contribute back.
In future instalments of the series, we'll talk a bit about other open-sourced kartons and deployment options.