I’m preparing a new StringEncryption pass for Hikari, aim to fix various drawbacks from previous implementations.
The most famous one among those implementations would be GoSSIP-SJTU/Armariris
You can read the pass’s full source HERE
Armaririsuses one single uint8_t as the encryption key across the full ConstantAggregate ,
this is trival to bypass due to there are maximum 256 possible values for an
uint8_t, even brute force is a feasible option. Plus extracting a XOR instruction from assemblies shouldn’t be too hard.
We know that:
- Last character of the unencrypted c string is 0x00
- We can extract the last character of the encrypted string.
- We know one single key is used for all characters
We know the “encryption” is XOR.
Which means? : )
Armaririsinject its global decryptor at
llvm.global_ctors, for the C/ObjC/C++ guys reading this, this is equivalent to
__attribute__((constructor)).(LLVM also has a global destructor,which is
llvm.global_dtors, but that’s not the topic of this post). This makes it trival to
dumpdecrypted strings at main executable’s entrypoint, where the system loader(
dyldon Darwin) has finished calling ctors and thus decrypting our string GVs
Armaririsis not handling ObjC Strings correctly.Below is GV filtering code extracted from
This piece of code did filter out LLVM Metadata usage,Non-CDS usage and GVs that are stored in
However, consider the following code:
Which yields the following LLVM IR(Unrelated part stripped out):
@.str = private unnamed_addr constant [6 x i8] c"FOOOO\00", section "__TEXT,__cstring,cstring_literals", align 1
@OBJC_CLASS_NAME_ = private unnamed_addr constant [4 x i8] c"foo\00", section "__TEXT,__objc_classname,cstring_literals", align 1
OBJC_CLASS_NAME_ here is also handled by
Armariris.In other words, the class is registered in the ObjC Runtime with completely wrong names and types.The result could be catastrophic.
Let’s see dyld.cpp. Note line 1090:
and the following comment right above
static void addRootImage(ImageLoader* image)
In order for register_func_for_add_image() callbacks to to be called bottom up,
In other words, dyld run initializers for libraries first, then the main executable, that means when our binary is starting up, the string constants like ObjC’s class name are still encrypted, which could be troublesome for ObjC and various system libraries.
The correct implementation would use this to distinguish ObjC strings and C strings:
set<GlobalVariable *> cstrings;
which correctly split CFStrings and C-Style strings. This works by collecting all strings first, then iterate all CFStrings and remove the C-Style strings referenced by the CFString from the list
First of all we need to analyze the GV’s def-use chain and locate any instructions referencing it. This is not as trival as it seems because direct users are usually BitCast ConstantExprs, we need to iterate through the def-use chain.
Usually the use-def chain we are looking for look like this:
Then we can either create new AllocaInst at function entrypoint or re-use existing GVs. The decryption can be done at function entrypoint, then re-encrypt GVs back at terminators. Unless we are dealing with malformed BasicBlocks, which shouldn’t happen unless frontend has gone wild.
There is a lot to do to make a workable obfuscator and
GoSSIP-SJTU surely did some remarkable work, props to them.