Ollvm 虚假控制流混淆源码阅读

如题

前置知识

BCF

源码路径:llvm/lib/Transforms/Obfuscation/BogusControlFlow.cpp

虚假控制流,在原来的基本块前添加新的基本块 condition,新的基本块condition 包含不透明谓词(一个表达式,表达式的值在执行到某处时已知,值运行时确定因此可以对抗静态分析),运行时会固定地跳到原来的基本块,并且复制原来的基本块并随机添加垃圾指令为一个新的基本块 Altered。这一过程可以多次进行(-mllvm -bcf_loop=3 指定循环次数),-mllvm -bcf_prob=40 可以按指定概率进行混淆

pass 定义

单独摘出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
namespace {
struct BogusControlFlow : public FunctionPass {
static char ID; // Pass identification
bool flag;
BogusControlFlow() : FunctionPass(ID) {}
BogusControlFlow(bool flag) : FunctionPass(ID) {this->flag = flag; BogusControlFlow();}

/* runOnFunction
*
* Overwrite FunctionPass method to apply the transformation
* to the function. See header for more details.
*/
virtual bool runOnFunction(Function &F);
void bogus(Function &F);
virtual void addBogusFlow(BasicBlock * basicBlock, Function &F);
virtual BasicBlock* createAlteredBasicBlock(BasicBlock * basicBlock,
const Twine & Name = "gen", Function * F = 0);
bool doF(Module &M);
}
char BogusControlFlow::ID = 0;
static RegisterPass<BogusControlFlow> X("boguscf", "inserting bogus control flow");

bogus

忽略开头输出调试信息的代码,主要逻辑在下面 do-while 循环(混淆次数)。和控制流平坦化一样先保存所有的 BB

1
2
3
4
5
// Put all the function's block in a list
std::list<BasicBlock *> basicBlocks;
for (Function::iterator i=F.begin();i!=F.end();++i) {
basicBlocks.push_back(&*i);
}

根据指定概率选择 BB 混淆,直到所有 BB 经过一次循环处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
while(!basicBlocks.empty()){
NumBasicBlocks ++;
// Basic Blocks' selection
// 通过产生随机数实现按概率选择 BB
if((int)llvm::cryptoutils->get_range(100) <= ObfProbRate){
DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
<< NumBasicBlocks <<" selected. \n");
hasBeenModified = true;
++NumModifiedBasicBlocks;
// 每增加一个虚假控制流就会增加3个 BB
NumAddedBasicBlocks += 3;
FinalNumBasicBlocks += 3;
// Add bogus flow to the given Basic Block (see description)
BasicBlock *basicBlock = basicBlocks.front();
// 添加虚假控制流
addBogusFlow(basicBlock, F);
}
else{
DEBUG_WITH_TYPE("opt", errs() << "bcf: Block "
<< NumBasicBlocks <<" not selected.\n");
}
// remove the block from the list
basicBlocks.pop_front();

if(firstTime){ // first time we iterate on this function
++InitNumBasicBlocks;
++FinalNumBasicBlocks;
}
} // end of while(!basicBlocks.empty())

addBogusFlow

把传入的 BB 分成两部分,第一部分只有 phi 指令、调试信息,第二部分是剩下的所有指令。这里对 BB 进行分割是为了防止 phi 指令可能产生的干扰

llvm ir 是 SSA(static single assignment)语言

  • 所有变量只能被赋值一次
  • 所有变量在使用前必须定义

phi 指令是 绕过 SSA 机制的一种方式,格式如下
<result> = phi <ty> [<val0>, <label0>], [<val1>, <label1>]
会从之前执行过的 basic block 中选择值

如果直接添加虚假控制流可能会打乱 phi 指令执行,所以就先拆分出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Split the block: first part with only the phi nodes and debug info and terminator
// created by splitBasicBlock. (-> No instruction)
// Second part with every instructions from the original block
// We do this way, so we don't have to adjust all the phi nodes, metadatas and so on
// for the first block. We have to let the phi nodes in the first part, because they
// actually are updated in the second part according to them.
BasicBlock::iterator i1 = basicBlock->begin();
if(basicBlock->getFirstNonPHIOrDbgOrLifetime())
// 找到分割点
i1 = (BasicBlock::iterator)basicBlock->getFirstNonPHIOrDbgOrLifetime();
Twine *var;
var = new Twine("originalBB");
// 拆分
BasicBlock *originalBB = basicBlock->splitBasicBlock(i1, *var);
DEBUG_WITH_TYPE("gen", errs() << "bcf: First and original basic blocks: ok\n");

经过拆分后复制 orignalBB 随机添加垃圾指令得到 Altered BBcreateAlteredBasicBlock 函数下面细说

1
2
3
4
5
6
7
8
9
10
11
12
// Creating the altered basic block on which the first basicBlock will jump
Twine * var3 = new Twine("alteredBB");
BasicBlock *alteredBB = createAlteredBasicBlock(originalBB, *var3, &F);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Altered basic block: ok\n");

// Now that all the blocks are created,
// we modify the terminators to adjust the control flow.
// 去掉 Terminator 为之后重建控制流
alteredBB->getTerminator()->eraseFromParent();
basicBlock->getTerminator()->eraseFromParent();
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator removed from the altered"
<<" and first basic blocks\n");

创建 condition 块,重建控制流

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Preparing a condition..
// For now, the condition is an always true comparaison between 2 float
// This will be complicated after the pass (in doFinalization())
// 结果永远为真,在 doFinalization 内复杂化条件表达式
Value * LHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
Value * RHS = ConstantFP::get(Type::getFloatTy(F.getContext()), 1.0);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Value LHS and RHS created\n");


// The always true condition. End of the first block
Twine * var4 = new Twine("condition");
FCmpInst * condition = new FCmpInst(*basicBlock, FCmpInst::FCMP_TRUE , LHS, RHS, *var4);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Always true condition created\n");

// Jump to the original basic block if the condition is true or
// to the altered block if false.
BranchInst::Create(originalBB, alteredBB, (Value *)condition, basicBlock);
DEBUG_WITH_TYPE("gen",
errs() << "bcf: Terminator instruction in first basic block: ok\n");

// The altered block loop back on the original one.
BranchInst::Create(originalBB, alteredBB);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator instruction in altered block: ok\n");

拆分出 originalBB 的 Terminator 作为新的 BB,在拆分后的 originalBB 最后添加条件跳转语句,条件始终为真跳转到原来的 Terminator(这部分对照文章最开始的那个 cfg 更好理解)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// The end of the originalBB is modified to give the impression that sometimes
// it continues in the loop, and sometimes it return the desired value
// (of course it's always true, so it always use the original terminator..
// but this will be obfuscated too;) )

// iterate on instruction just before the terminator of the originalBB
BasicBlock::iterator i = originalBB->end();

// Split at this point (we only want the terminator in the second part)
Twine * var5 = new Twine("originalBBpart2");
BasicBlock * originalBBpart2 = originalBB->splitBasicBlock(--i , *var5);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator part of the original basic block"
<< " is isolated\n");
// the first part go either on the return statement or on the begining
// of the altered block.. So we erase the terminator created when splitting.
originalBB->getTerminator()->eraseFromParent();
// We add at the end a new always true condition
Twine * var6 = new Twine("condition2");
// 条件始终为 True
FCmpInst * condition2 = new FCmpInst(*originalBB, CmpInst::FCMP_TRUE , LHS, RHS, *var6);
BranchInst::Create(originalBBpart2, alteredBB, (Value *)condition2, originalBB);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Terminator original basic block: ok\n");
DEBUG_WITH_TYPE("gen", errs() << "bcf: End of addBogusFlow().\n");

createAlteredBasicBlock

调用 CloneBasicBlock 复制一个 BB

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/// Return a copy of the specified basic block, but without
/// embedding the block into a particular function. The block returned is an
/// exact copy of the specified basic block, without any remapping having been
/// performed. Because of this, this is only suitable for applications where
/// the basic block will be inserted into the same function that it was cloned
/// from (loop unrolling would use this, for example).
///
/// Also, note that this function makes a direct copy of the basic block, and
/// can thus produce illegal LLVM code. In particular, it will copy any PHI
/// nodes from the original block, even though there are no predecessors for the
/// newly cloned block (thus, phi nodes will have to be updated). Also, this
/// block will branch to the old successors of the original block: these
/// successors will have to have any PHI nodes updated to account for the new
/// incoming edges.
///
/// The correlation between instructions in the source and result basic blocks
/// is recorded in the VMap map.
///
/// If you have a particular suffix you'd like to use to add to any cloned
/// names, specify it as the optional third parameter.
///
/// If you would like the basic block to be auto-inserted into the end of a
/// function, you can specify it as the optional fourth parameter.
///
/// If you would like to collect additional information about the cloned
/// function, you can specify a ClonedCodeInfo object with the optional fifth
/// parameter.
BasicBlock *CloneBasicBlock(const BasicBlock *BB, ValueToValueMapTy &VMap,
const Twine &NameSuffix = "", Function *F = nullptr,
ClonedCodeInfo *CodeInfo = nullptr,
DebugInfoFinder *DIFinder = nullptr);

这一大段注释的大概意思是 CloneBasicBlock 会返回传入的 BB 的副本但不会把返回的副本插入到当前处理的函数内,这时返回的副本 BB 还存在问题需要修正(remapping)

副本是传入 BB 的直接拷贝,不会直接插入到函数内,如果传入的 BB 存在 phi 指令那么拷贝的副本因为没有前一个 BB 就会产生错误。同时,拷贝的副本会有到原来传入 BB 后继块的分支,原来 BB 后继块内的 phi 结构就需要调整

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
ValueToValueMapTy VMap;
BasicBlock * alteredBB = llvm::CloneBasicBlock (basicBlock, VMap, Name, F);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Original basic block cloned\n");
// Remap operands.
BasicBlock::iterator ji = basicBlock->begin();
for (BasicBlock::iterator i = alteredBB->begin(), e = alteredBB->end() ; i != e; ++i){
// Loop over the operands of the instruction
// 重新调整操作数
for(User::op_iterator opi = i->op_begin (), ope = i->op_end(); opi != ope; ++opi){
// get the value for the operand
Value *v = MapValue(*opi, VMap, RF_None, 0);
if (v != 0){
*opi = v;
DEBUG_WITH_TYPE("gen", errs() << "bcf: Value's operand has been setted\n");
}
}
DEBUG_WITH_TYPE("gen", errs() << "bcf: Operands remapped\n");
// Remap phi nodes' incoming blocks.
// 这里对 phi 的处理有些看不懂,好像没有改原来 BB 后继块的 phi 指令?
if (PHINode *pn = dyn_cast<PHINode>(i)) {
for (unsigned j = 0, e = pn->getNumIncomingValues(); j != e; ++j) {
Value *v = MapValue(pn->getIncomingBlock(j), VMap, RF_None, 0);
if (v != 0){
pn->setIncomingBlock(j, cast<BasicBlock>(v));
}
}
}
// 下面 remap metadata 和调试信息
DEBUG_WITH_TYPE("gen", errs() << "bcf: PHINodes remapped\n");
// Remap attached metadata.
SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
i->getAllMetadata(MDs);
DEBUG_WITH_TYPE("gen", errs() << "bcf: Metadatas remapped\n");
// important for compiling with DWARF, using option -g.
i->setDebugLoc(ji->getDebugLoc());
ji++;
DEBUG_WITH_TYPE("gen", errs() << "bcf: Debug information location setted\n");

} // The instructions' informations are now all correct
// 到此结束副本的修正

PHINode 类注释
PHINode - The PHINode class is used to represent the magical mystical PHI
node, that can not exist in nature, but can be synthesized in a computer
scientist’s overactive imagination.

看起来 phi 没有在实际当中使用

接下来插入随机的垃圾指令,分为整数和浮点数两种情况进行处理,遇到运算指令和比较指令就进行随机的插入或替换。

doF

这个函数会对之前添加的虚假块(condution)最后的跳转条件复杂化,创建两个全局变量 x 和 y,两层 for 循环记录下所有虚假块的跳转条件、分支跳转语句分别到 toDeletetoEdit 两个数组

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
for(Module::iterator mi = M.begin(), me = M.end(); mi != me; ++mi){
for(Function::iterator fi = mi->begin(), fe = mi->end(); fi != fe; ++fi){
//fi->setName("");
TerminatorInst * tbb= fi->getTerminator();
if(tbb->getOpcode() == Instruction::Br){
BranchInst * br = (BranchInst *)(tbb);
// 判断是否是条件跳转
if(br->isConditional()){
FCmpInst * cond = (FCmpInst *)br->getCondition();
unsigned opcode = cond->getOpcode();
if(opcode == Instruction::FCmp){
// 条件固定为真就说明是前面过程添加的 condition 块
if (cond->getPredicate() == FCmpInst::FCMP_TRUE){
DEBUG_WITH_TYPE("gen",
errs()<<"bcf: an always true predicate !\n");
toDelete.push_back(cond); // The condition
toEdit.push_back(tbb); // The branch using the condition
}
}
}
}
/*
for (BasicBlock::iterator bi = fi->begin(), be = fi->end() ; bi != be; ++bi){
bi->setName(""); // setting the basic blocks' names
}
*/
}
}

替换所有之前记录的分支跳转

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Replacing all the branches we found
for(std::vector<Instruction*>::iterator i =toEdit.begin();i!=toEdit.end();++i){
//if y < 10 || x*(x+1) % 2 == 0
opX = new LoadInst ((Value *)x, "", (*i));
opY = new LoadInst ((Value *)y, "", (*i));

// x-1 和上面注释写的不太一样
op = BinaryOperator::Create(Instruction::Sub, (Value *)opX,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 1,
false), "", (*i));
// x*(x-1)
op1 = BinaryOperator::Create(Instruction::Mul, (Value *)opX, op, "", (*i));
// x*(x-1)%2
op = BinaryOperator::Create(Instruction::URem, op1,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 2,
false), "", (*i));
// x*(x-1)%2 == 0
condition = new ICmpInst((*i), ICmpInst::ICMP_EQ, op,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 0,
false));
// y < 10
condition2 = new ICmpInst((*i), ICmpInst::ICMP_SLT, opY,
ConstantInt::get(Type::getInt32Ty(M.getContext()), 10,
false));
// 最后产生的条件 y < 10 || x*(x-1) % 2 == 0 x=y=0 条件固定为真
op1 = BinaryOperator::Create(Instruction::Or, (Value *)condition,
(Value *)condition2, "", (*i));

// 替换分支
BranchInst::Create(((BranchInst*)*i)->getSuccessor(0),
((BranchInst*)*i)->getSuccessor(1),(Value *) op1,
((BranchInst*)*i)->getParent());
DEBUG_WITH_TYPE("gen", errs() << "bcf: Erase branch instruction:"
<< *((BranchInst*)*i) << "\n");
(*i)->eraseFromParent(); // erase the branch
}

到此结束整个 bcf 混淆过程

混淆前后对比

ollvm 的虚假控制流混淆实际上还有许多能够优化的地方,比如在代码执行过程更改不透明谓词中全局变量的值(当前代码全局变量固定值 0),添加随机垃圾指令时加花之类的

Ref

作者

0wl

发布于

2022-11-20

更新于

2022-11-20

许可协议

评论