I’m preparing a new StringEncryption pass for Hikari, aim to fix various drawbacks from previous implementations.
The most famous one among those implementations would be GoSSIP-SJTU/Armariris
You can read the pass’s full source HERE
Drawbacks
Armariris
uses one single uint8_t as the encryption key across the full ConstantAggregate ,this is trival to bypass due to there are maximum 256 possible values for anuint8_t
, even brute force is a feasible option. Plus extracting a XOR instruction from assemblies shouldn’t be too hard.
We know that:- Last character of the unencrypted c string is 0x00
- We can extract the last character of the encrypted string.
- We know one single key is used for all characters
We know the “encryption” is XOR.
Which means? : )
Armariris
inject its global decryptor atllvm.global_ctors
, for the C/ObjC/C++ guys reading this, this is equivalent to__attribute__((constructor))
.(LLVM also has a global destructor,which isllvm.global_dtors
, but that’s not the topic of this post). This makes it trival todump
decrypted strings at main executable’s entrypoint, where the system loader(dyld
on Darwin) has finished calling ctors and thus decrypting our string GVsArmariris
is not handling ObjC Strings correctly.Below is GV filtering code extracted fromArmariris
:
1 | std::string section(gv->getSection()); |
This piece of code did filter out LLVM Metadata usage,Non-CDS usage and GVs that are stored in __objc_methname
.
However, consider the following code:
1 | @interface foo:NSObject |
Which yields the following LLVM IR(Unrelated part stripped out):
1 | @.str = private unnamed_addr constant [6 x i8] c"FOOOO\00", section "__TEXT,__cstring,cstring_literals", align 1 |
Note
1 | @OBJC_CLASS_NAME_ = private unnamed_addr constant [4 x i8] c"foo\00", section "__TEXT,__objc_classname,cstring_literals", align 1 |
The OBJC_CLASS_NAME_
here is also handled by Armariris
.In other words, the class is registered in the ObjC Runtime with completely wrong names and types.The result could be catastrophic.
Let’s see dyld.cpp. Note line 1090:
1 | void initializeMainExecutable() |
and the following comment right above static void addRootImage(ImageLoader* image)
1 | In order for register_func_for_add_image() callbacks to to be called bottom up, |
In other words, dyld run initializers for libraries first, then the main executable, that means when our binary is starting up, the string constants like ObjC’s class name are still encrypted, which could be troublesome for ObjC and various system libraries.
Improvements
Spliting
The correct implementation would use this to distinguish ObjC strings and C strings:
1 | set<GlobalVariable *> cstrings; |
which correctly split CFStrings and C-Style strings. This works by collecting all strings first, then iterate all CFStrings and remove the C-Style strings referenced by the CFString from the list
Crypto
First of all we need to analyze the GV’s def-use chain and locate any instructions referencing it. This is not as trival as it seems because direct users are usually BitCast ConstantExprs, we need to iterate through the def-use chain.
Usually the use-def chain we are looking for look like this:ConstantExpr->Instruction->BasicBlock->Function
Then we can either create new AllocaInst at function entrypoint or re-use existing GVs. The decryption can be done at function entrypoint, then possibly re-encrypt GVs back at terminators. Unless we are dealing with malformed BasicBlocks, which shouldn’t happen unless frontend has gone wild.
There is a lot to do to make a workable obfuscator and GoSSIP-SJTU
surely did some remarkable work, props to them.
Zhang