LLVM IR Instructions' trick

LLVM IR instructions are the basic elements while we are writing LLVM PASS codes. Here are some interesting experiences.

A basic classification of the commonly used LLVM IR instructions

Here is a brief classification of the instructions based on the operand numbers. This classification would be useful when you want to analyze the operands and result op of instructions.
The type in different positions in the instruction usually needs to be guaranteed to be the same (be more careful especially when you need to change the type of an instruction).

Alloca Instruction

1
%i = alloca type

The type of this instruction is `type*.

Call Instruction

There are two different kinds of call instruction in LLVM.

1
%i = call retType @func_name (type %p1, ...)

and

1
call void @llvm.dbg.declare/value (metadata type %p, ...)

The second one is quite interesting and will be explained in the next section.

Load Instruction

1
%i = load type, type* %op

The type of this instruction is type, but not type*.

Store Instruction

1
store type %op1, type* %op2

GetElementPtr Instruction

1
%i = gep type, type1* %op1, type2 %op2, (type3 %op3)

Binary Instruction

1
%i = binaryInst type %op1, %op2

binaryInst here is a representative word, it can be Add, FAdd, etc.

Unary Instruction

1
%i = unaryInst type %op

Same as the binaryInst in the Binary Instruction section, unaryInst in the above code is a representative word, which can be FNeg, etc.

Cast Instruction

1
%i = castInst type1 %op1 to type2

Same as the binaryInst in the Binary Instruction section, castInst in the above code is a representative word, it actually can be FPToUI, FPToSI, SIToFP, UIToFP, ZExt, SExt, FPExt, Trunc, FPTrunc, BitCast.

PHI Instruction

1
%.i = phi type [%op1, %bb1], [%op2, %bb2], ...

Get debug information from Instruction::Call

For the LLVM-define call instruction, like llvm.dbg.value and llvm.dbg.declare, we can easily get almost every debug information (as long as you compile with debug config, -O0 -g) from the LLVM metadata.
Here’s a piece of code about how to get the debug information you want from IR.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
case Instruction::Call:  
if (auto llvmIrCallInstruction = dyn_cast<CallInst>(&llvmIrInstruction))
{
Function * calledFunction = llvmIrCallInstruction->getCalledFunction();
if (calledFunction == nullptr || !calledFunction->hasName() || calledFunction->getName().empty())
break;
if (calledFunction->getName().startswith("llvm.dbg.value") ||
calledFunction->getName().startswith("llvm.dbg.declare"))
{
if (!isa<MetadataAsValue>(llvmIrCallInstruction->getOperand(0)))
break;
auto firstOperator = cast<MetadataAsValue>(llvmIrCallInstruction->getOperand(0));
if (!isa<ValueAsMetadata>(firstOperator->getMetadata()))
break;
auto localVariableAddressAsMetadata = cast<ValueAsMetadata>(firstOperator->getMetadata());
auto localVariableAddress = localVariableAddressAsMetadata->getValue();

auto variableMetadata = cast<MetadataAsValue>(llvmIrCallInstruction->getOperand(1));
if (!isa<DIVariable>(variableMetadata->getMetadata()))
break;
auto debugInfoVariable = cast<DIVariable>(variableMetadata->getMetadata());
const DIType * variableType = debugInfoVariable->getType();

if (const auto * compositeVariableType = dyn_cast<DICompositeType>(variableType))
{
/*
* It's a composite type, including structure, union, array, and enumeration
* Extract from composite type
* */
auto typeTag = compositeVariableType->getTag();
if (typeTag == dwarf::DW_TAG_union_type)
{
// do something
}
else if (typeTag == dwarf::DW_TAG_structure_type)
{
// do something
}
else if (typeTag == dwarf::DW_TAG_array_type)
{
const DIType * ElemType = compositeVariableType->getBaseType();
// do something with element type
}
else if (typeTag == dwarf::DW_TAG_enumeration_type)
{
// do something
}
}
}

Instructions that imply sign meanings

LLVM’s type system doesn’t explicitly specify the sign of the operand or instruction but uses different instructions to hint at the sign bit.

Suggest the sign symbol by the instruction name

LLVM uses UDiv, URem, and LShr to calculate the unsigned operands and get the positive result. Correspondingly, SDiv, Srem, and AShr are for the signed calculation.
Also, some type cast instructions, including FPToUI, UIToFP, and ZExt mean the results of these instructions are unsigned values, FPToSI, SIToFP, and SExt are on the opposite.

Sign hint in the ICmp instruction

There are hints like sgt, sge, slt, and sle in ICmp to compare as the signed operands, hints like ugt, uge, ult, and ule are for the unsigned operands.

Warning flag in the instruction

nsw (No Signed Wrap) and nuw (No Unsigned Wrap) are flags to generate poison value if signed and/or unsigned overflow.

Anyone is free to use it, and please indicate the reference when using or publishing. Thank you!

References

LLVM Language Reference Manual