LLVM IR Instructions' trick

LLVM IR instructions are the basic elements while we are writing LLVM PASS codes. Here are some interesting experiences.

A basic classification of the commonly used LLVM IR instructions

Here is a brief classification of the instructions based on the operand numbers. This classification would be useful when you want to analyze the operands and result op of instructions.
The type in different positions in the instruction usually needs to be guaranteed to be the same (be more careful especially when you need to change the type of an instruction).

Alloca Instruction

1
%i = alloca type

The type of this instruction is `type*.

Call Instruction

There are two different kinds of call instruction in LLVM.

1
%i = call retType @func_name (type %p1, ...)

and

1
call void @llvm.dbg.declare/value (metadata type %p, ...)

The second one is quite interesting and will be explained in the next section.

Load Instruction

1
%i = load type, type* %op

The type of this instruction is type, but not type*.

Store Instruction

1
store type %op1, type* %op2

GetElementPtr Instruction

1
%i = gep type, type1* %op1, type2 %op2, (type3 %op3)

Binary Instruction

1
%i = binaryInst type %op1, %op2

binaryInst here is a representative word, it can be Add, FAdd, etc.

Unary Instruction

1
%i = unaryInst type %op

Same as the binaryInst in the Binary Instruction section, unaryInst in the above code is a representative word, which can be FNeg, etc.

Cast Instruction

1
%i = castInst type1 %op1 to type2

Same as the binaryInst in the Binary Instruction section, castInst in the above code is a representative word, it actually can be FPToUI, FPToSI, SIToFP, UIToFP, ZExt, SExt, FPExt, Trunc, FPTrunc, BitCast.

PHI Instruction

1
%.i = phi type [%op1, %bb1], [%op2, %bb2], ...

Get debug information from Instruction::Call

For the LLVM-define call instruction, like llvm.dbg.value and llvm.dbg.declare, we can easily get almost every debug information (as long as you compile with debug config, -O0 -g) from the LLVM metadata.
Here’s a piece of code about how to get the debug information you want from IR.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
case Instruction::Call:  
if (auto llvmIrCallInstruction = dyn_cast<CallInst>(&llvmIrInstruction))
{
Function * calledFunction = llvmIrCallInstruction->getCalledFunction();
if (calledFunction == nullptr || !calledFunction->hasName() || calledFunction->getName().empty())
break;
if (calledFunction->getName().startswith("llvm.dbg.value") ||
calledFunction->getName().startswith("llvm.dbg.declare"))
{
if (!isa<MetadataAsValue>(llvmIrCallInstruction->getOperand(0)))
break;
auto firstOperator = cast<MetadataAsValue>(llvmIrCallInstruction->getOperand(0));
if (!isa<ValueAsMetadata>(firstOperator->getMetadata()))
break;
auto localVariableAddressAsMetadata = cast<ValueAsMetadata>(firstOperator->getMetadata());
auto localVariableAddress = localVariableAddressAsMetadata->getValue();

auto variableMetadata = cast<MetadataAsValue>(llvmIrCallInstruction->getOperand(1));
if (!isa<DIVariable>(variableMetadata->getMetadata()))
break;
auto debugInfoVariable = cast<DIVariable>(variableMetadata->getMetadata());
const DIType * variableType = debugInfoVariable->getType();

if (const auto * compositeVariableType = dyn_cast<DICompositeType>(variableType))
{
/*
* It's a composite type, including structure, union, array, and enumeration
* Extract from composite type
* */
auto typeTag = compositeVariableType->getTag();
if (typeTag == dwarf::DW_TAG_union_type)
{
// do something
}
else if (typeTag == dwarf::DW_TAG_structure_type)
{
// do something
}
else if (typeTag == dwarf::DW_TAG_array_type)
{
const DIType * ElemType = compositeVariableType->getBaseType();
// do something with element type
}
else if (typeTag == dwarf::DW_TAG_enumeration_type)
{
// do something
}
}
}

Instructions that imply sign meanings

LLVM’s type system doesn’t explicitly specify the sign of the operand or instruction but uses different instructions to hint at the sign bit.

Suggest the sign symbol by the instruction name

LLVM uses UDiv, URem, and LShr to calculate the unsigned operands and get the positive result. Correspondingly, SDiv, Srem, and AShr are for the signed calculation.
Also, some type cast instructions, including FPToUI, UIToFP, and ZExt mean the results of these instructions are unsigned values, FPToSI, SIToFP, and SExt are on the opposite.

Sign hint in the ICmp instruction

There are hints like sgt, sge, slt, and sle in ICmp to compare as the signed operands, hints like ugt, uge, ult, and ule are for the unsigned operands.

Warning flag in the instruction

nsw (No Signed Wrap) and nuw (No Unsigned Wrap) are flags to generate poison value if signed and/or unsigned overflow.

Anyone is free to use it, and please indicate the reference when using or publishing. Thank you!

References

LLVM Language Reference Manual

Compare and Remove the same (redundant) functions in LLVM

Background

In my last blog, I introduced how to clone functions in LLVM. After a range of specific optimization passes, some cloned functions may not change as a redundant function. This blog shows how to delete these redundant functions.

I didn’t find related discussion on Google. Maybe no one has the need to clone the functions and delete them like I do. But class MergeFunctions in llvm/lib/Transforms/IPO/MergeFunctions.cpp gives a good example. In this class, it didn’t provide an external API, but we can still learn some techniques from it. LLVM’s doc also introduces this clearly.

Solution

The key point is to use std::set, an STL container with unique key value, to filter the unique functions. This means we need to program our own compare functions for the set data structure. diralik provides 5 ways to design a custom std::set comparator. In this blog, I chose the traditional old way with () operator.

I used a FunctionNode class to store the Function in LLVM and its FunctionComparator::FunctionHash. Then I had a FunctionNodeCmp class with a custom () operator, and use the FunctionComparator::FunctionHash and FunctionComparator::compare() to compare if two functions are the same.

The last step is to insert each function into the std::set. And remember,

Code

Anyone is free to use it, and please indicate the reference when using or publishing. Thank you!

References

Clone Function in LLVM

LLVM is a powerful compiler infrastructure, but sometimes compiler engineers need to design their own functions for their own purposes. This Blog will introduce how to clone an existing function in LLVM to the same module.

Background

In my current project, CoSense (hope will soon be published and open-sourced), I’m going to clone some functions first and optimize each of these cloned functions with specific algorithms. Then my compiler will eliminate the redundant functions.

There’s no existing API to clone functions quickly and conveniently, so I did a bit of research and wrote this blog.

While browsing on Google, there are lots of good experiences.

KeksimvsMaximvs[1] asked how to clone function in another module, and gave a good solution to it by CloneFunctionInto. Scott A. Carr[2] shared his experience of how to search and fix this problem, which is really helpful, but didn’t provide a usable source code. And both of them didn’t mention that when a callee function is cloned, the callInst node in the caller function should change the called name. In LLVM’s mailing list, Zhang[3] answered this question with CallInst::setCalledFunction. But the problems still didn’t have a good solution.

Solution

In my solution, I use a std::map<std::string, int> to store the cloned function’s ID with the same base function, as they are separate functions with their own names. And the IDs are used to generate the new name of the cloned functions.

Then I create the new function with the new name and set arguments to it. Next, I use CloneFunctionInto to generate the new function’s implementation, set visibility and linkage, and copy metadata from the base function.

The last step is to insert the new function into the module (also the base function’s parent node), and changed the call instruction.

Code

Here’s the partial but sufficient key code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
std::map<std::string, int> calleeCounter;

/*
* collect the inner bound info for each callee function.
*/
auto llvmIrCallInstruction = dyn_cast<CallInst>(&llvmIrInstruction);
Function * calledFunction = llvmIrCallInstruction->getCalledFunction();
auto calleeCounterIt = calleeCounter.find(calledFunction->getName().str());
if (calleeCounterIt != calleeCounter.end()) {
calleeCounterIt->second++;
} else {
calleeCounter.emplace(calledFunction->getName().str(), 0);
}
std::string newFuncName = calledFunction->getName().str() + '_' + std::to_string(calleeCounterIt->second);
/*
* rename the llvmIrCallInstruction to the new function name
*/
ValueToValueMapTy vMap;
auto overloadFunc = Function::Create(calledFunction->getFunctionType(),
calledFunction->getLinkage(),
calledFunction->getAddressSpace(),
newFuncName);
auto *newFuncArgIt = overloadFunc->arg_begin();
for (auto &arg : calledFunction->args()) {
auto argName = arg.getName();
newFuncArgIt->setName(argName);
vMap[&arg] = &(*newFuncArgIt++);
}
SmallVector<ReturnInst*, 8> Returns;
CloneFunctionInto(overloadFunc, calledFunction, vMap,
CloneFunctionChangeType::LocalChangesOnly, Returns);
// Set the linkage and visibility late as CloneFunctionInto has some
// implicit requirements.
overloadFunc->setVisibility(GlobalValue::DefaultVisibility);
overloadFunc->setLinkage(GlobalValue::PrivateLinkage);

// Copy metadata
SmallVector<std::pair<unsigned, MDNode *>, 1> MDs;
calledFunction->getAllMetadata(MDs);
for (auto MDIt : MDs) {
if (!overloadFunc->hasMetadata()) {
overloadFunc->addMetadata(MDIt.first, *MDIt.second);
}
}

Module &funcModule = *calledFunction->getParent();
funcModule.getFunctionList().insert(calledFunction->getIterator(), overloadFunc);
overloadFunc->setDSOLocal(true);
llvmIrCallInstruction->setCalledFunction(overloadFunc);

PS: I used LLVM-13, and the function CloneFunctionInto may change in different LLVM versions.

Conclusion

So this is how to clone functions in LLVM. I will introduce how to eliminate the same (redundant) function after cloning. It will also be fun.

Anyone is free to use it, and please indicate the reference when using or publishing. Thank you!

References

[1] https://stackoverflow.com/questions/61189720/llvm-clone-function-pass-to-different-module
[2] http://scottcarr.github.io/2016/01/24/my-week-in-llvm.html
[3] https://lists.llvm.org/pipermail/llvm-dev/2018-March/122114.html