LLVM and GC¶
In this blog, I will introduce how I integrate a precise, relocation GC with LLVM using its statepoint api. We will also talk about some optimization techniques, debug support and so on.
LLVM gc.root and statepoint¶
LLVM statepoint is an eperimental api to support precise relocation GC. Comparing with the old gc.root
api, it has the following advantages:
- can be generate by builtin pass
- works well with other optimization passes
Also, it has some disadvantages:
- not stable, which means it may change in the future, and have some bugs
- not well documented--makes it really hard to use
- cannot customize the stack map -- at least in LLVM 16
If you have tried to use gc.root
to support a relocation GC, you will find that it is really hard to cover all stack roots. And it is even harder to play with all those derived pointers--A relocation GC may change the address of objects during collection, so after any points that may trigger a GC(those points were called safepoints), we have to reload all GC pointers that we may use later from stack or register, include base pointers and derived pointers(base pointer with offset, like the result of getelementptr of a base pointer).
The gc.root
api is so difficult to get right for a compiler frontend, that I've spent days and nights to make it work. And until its retirement, it still has some bugs that I never figured out(which crash the program from time to time). Also, it's performance is really poor, probably because I'm generating it from frontend, and those instructions prevent many optimizations in LLVM.
The statepoint api, on the other hand, doesn't need frontend to do so much. There's a builtin pass called RewriteStatepointsForGC
, which can help us to generate statepoint instructions.
LLVM statepoint usage¶
The statepoint
api is a combination of gc.statepoint
gc.result
and gc.relocate
instructions. The gc.statepoint
instruction is used to mark a safepoint, it's basically a wrapper around a call instruction, but it doesn't directly return the result of the call, instead, it returns a token
the pointers that may relocate after the statepoint is placed in the gc-live
operand bundle just at the end of the instruction. The gc.result
instruction is used to get the result of the call by accepting the token
as its arguments. The gc.relocate
instruction is used to get the relocated address of a GC pointer, it accepts the token
and the index of the pointer in the gc-live
operand bundle as its arguments, and produce the relocated pointer. The exaple usage is as follow[1]:
%statepoint_token46 = call token (i64, i32, ptr, i32, i32, ...) @llvm.experimental.gc.statepoint.p0(i64 2882400000, i32 0, ptr elementtype(ptr addrspace(1) (i64, i8, i64)) @DioGC__malloc, i32 3, i32 0, i64 32, i8 2, i64 %3, i32 0, i32 0) [ "gc-live"(ptr addrspace(1) %heapptr_retvalue3.relocated45) ]
%heapptr_string47 = call ptr addrspace(1) @llvm.experimental.gc.result.p1(token %statepoint_token46)
%heapptr_retvalue3.relocated48 = call coldcc ptr addrspace(1) @llvm.experimental.gc.relocate.p1(token %statepoint_token46, i32 0, i32 0) ; (%heapptr_retvalue3.relocated45, %heapptr_retvalue3.relocated45)
Apparently, the gc.statepoint
instruction is really complicated, so I won't recommend you to generate it by yourself. Instead, you shall use the builtin pass RewriteStatepointsForGC
, which treats all call instructions as safepoints, and generate gc.statepoint
instructions for them. The RewriteStatepointsForGC
pass will analyze the liveness of all GC pointers and generate gc.relocate
instructions as well.
How to integrate a GC with LLVM statepoint¶
Now, we know the basis about the statepoint api, but how could we use it in GC?
GC strategy¶
The first thing we need to do is to choose or implement a GCStraegy. The GCStraegy can be used to distinguish the GC pointers from other pointers. However, as the statepoint api is still experimental, before LLVM 18, only the builtin GCStraegy statepoint-example
and coreclr
can be used with RewriteStatepointsForGC
pass. So in this blog, we will choose the statepoint-example
GCStraegy as an example.
Frontend support¶
Both of two builtin GCStraegies assume that all GC pointers are in address space 1, so we need to make sure that all GC pointers are correctly annotated in the frontend.
Should I use stack alloca?
Of course you can, but I don't recommend you to do so. The RewriteStatepointsForGC
pass will generate stack roots automatically for all GC pointers it identified, so you don't need to do it by yourself. If you still did so, and store the GC pointers in stack alloca, remember your alloca won't be treated as a GC root, so you must manually tell your GC about this root or the heap pointer in it won't be correct after relocation. Generally, you should never use stack alloca when you are using LLVM statepoint api. The allocation added by RewriteStatepointsForGC
pass is enough for you, and can be optimized better by LLVM.
Parse stackmap¶
After you have successfully generated the llvm ir and run the RewriteStatepointsForGC
pass, next you need to implement a stackmap parser.
The llvm stackmap is a binary format that contains all the information about the GC roots at safepoints, you need to parse it to something like a hashtable for runtime usage. The format is pretty simple[2] [3], you can also refer to my stackmap parser[4] for more details. Basically, the parsed stackmap will become a hashtable with the safepoint(call site) address as the key, and the GC roots offset(from sp
) as the value.
Use stackmap during GC¶
Now, you have all the information you need at runtime, theoretically, you can use it to locate all GC roots, but how?
To locate all GC roots, you need to walk the stack, and find all the call site addresses, then use the stackmap to find all the GC roots at those call sites.
I'll provide you two ways to do so, one is easy to implement but not so fast, the other is more hacky but much more efficient.
Use an unwind library to walk the stack¶
The first way is to use an unwind library to walk the stack, there are many choices, like libunwind, libbacktrace, libgcc, etc. I'm using Rust so I choose the backtrace-rs
crate, which is merely a wrapper around those C libraries.
Those libraries can walk the stack by themselves, with some unwinding information in the binary (like eh_frame) or register, the can provide you every stack frame's ip
and sp
value. The key of our parsed stackmap hashtable is expected to be the same as the ip
register at call site, so it's really easy to use them to locate all stack roots. But most of them are pretty heavy, and may introduse some synchronization overheads(I believe there's a global lock in backtrace-rs
), so if your GC is designed to be parallel, you may not want to use them.
Manually walk the stack¶
If you look at the stackmap format carefully, you will find that besides the root offset, it also contains the stack size at the call site. With this information, and some basic knowledge about the stack frame layout, we can manually walk the stack using the sp
register's value at the beginning of the GC.
Stack frame layout & stack walking algorithm¶
The stack frame layout is different on different platforms, but they are all similar if you are using C calling convention. The stack frame layout of a function on aarch64 is like this:
+-------------------+ high address
| main |
| .... |
| | <- previous sp ----
| return addr | ↑
| func a frame... | frame size
| .... | ↓
| | <- sp // safe point call site, before the GC logic was called, the sp is pointing here
| return addr | // return address, key of the stackmap hashtable
| GC frames ... | // GC function frames
The algorithm to walk the stack is pretty simple:
The stack layout on x86_64 is a little bit different, it's actual frame size is frame_size + 8
, but the algorithm is the same.
Linking and stackmap
The stackmap is really low level, and may not work well with some linker options. For example, if you are targeting x86_64 windows, you must specify the /INCREMENTAL:NO
option to the linker, or the stackmap would not work correctly. The reason is that the incremental linking will change the function address, but the stackmap is generated before linking, so it will become invalid after linking.
How to get sp pointer value in LLVM
You can use inline assembly to get the sp pointer value in LLVM, for example, on aarch64:
On x86_64:Call convention
The stack walking algorithm above is based on C calling convention, if you are using other calling convention, you may need to adjust the algorithm a little bit. I personally recommend you to use C calling convention for it's simplicity and compatibility.
Optimization¶
LLVM's most optimization passes are not aware of the statepoint api, so we should do as much optimization as possible before the RewriteStatepointsForGC
pass.
LLVM has a builtin capture tracking pass, which is just like the escape analysis in Go, it can automatically determine whether a pointer is escaping or not, and remove unnecessary heap allocations, but you need to give it some hints. Basically, add allockind("alloc")
[5] to your gcmalloc
function would be enough.
Debug support¶
The relocation GC is really hard to debug, because the GC may move things around during collection.
I recommend you to disable the relocation during debugging, and only enable it when you are sure that your program is correct. Also, I've wrote a simple pass [6] to remove the relocation instructions, which remove the usage of relocated pointers. It's based on the origin StripGCRelocates
pass, but it make sure the stackmap is still valid after stripping.
Statepoint Bugs¶
There's several bugs about passing large struct as argument or return value:
https://github.com/llvm/llvm-project/issues/75162
https://github.com/llvm/llvm-project/issues/74612
Conclusion¶
The LLVM statepoint api is really hard to use, and I've really learned a lot during the integration. I hope this blog can help you to integrate your GC with LLVM.