Conscious Memory Management
This post is an accumulation of my thoughts over the past 2 months, when I learned a bit of new stuff about memory management. It ultimately convinced me that no mainstream language does memory management consciously (Rust, I’m also looking at you).
Consider this post a short introduction to arena allocators with high-quality further reads linked at the end.
Sloppy memory allocation habits§
You might have them and certainly I had them!
I bet you’ve heard this:
Heap is slow.
If not, you are the one who said this.
Thinking from the CPU’s perspective it doesn’t make sense. R in RAM stands for random, meaning access to any memory location x takes the same time. Same thing applies to all CPU caches.
So clearly heap is not slow in terms of memory access. Your CPU doesn’t distinguish it from other memory locations (*some caveats here, but it’s off-topic).
Speaking more precisely, heap memory allocations are slow. And the reason why is not hard to see. Modern “allocators” have to serve your sloppy habits and therefore do a lot of bookkeeping.
Just to give you a glimpse of why malloc is not trivial, consider this memory allocation pattern:
void* x = malloc(X);
void* y = malloc(Y);
void* z = malloc(Z);
free(y);
y = malloc(Y);
free(z);
free(y);
free(x);
Returning the buffer y creates “a hole” in a contiguous block of memory allocator has got from the OS,
and we would like our allocator to somehow be smart enough and return it next time we ask it for Y or less bytes (otherwise the hole will be lost).
This creates a bit of complexity and allows memory fragmentation to creep in. On top of that, add support for multithreading.
Hopefully the complexity of malloc’s innerworkings doesn’t come as a surprise to you. As interfaces, malloc and free
impose almost no restrictions, making their use convenient & simple at the cost of non-trivial implementations.
These issue have a lot of solutions, all with different trade-offs and considerations, but they all stem from this thinking of an allocator as a magical box I can query any time with no . This is I want to focus on:
Such allocators create individual lifetimes for each allocation
This is actually why people are afraid of manual memory management. How do you know when to
call free? Are you sure you aren’t double-freeing? Perhaps you didn’t free at all.
In large code-bases, it’s very hard to manage, especially when these lifetimes interact in
subtle ways. This what RAII and Rust’s borrowck partially solve. They hide management
complexity from you while gross memory allocation patterns do remain.
Thankfully this needn’t be the case, we can design the whole problem out in 95% of cases and be smarter than those Rustaceans.
Arena allocators§
Arenas are stupidly simple. Think of them as buffers and an offset.
Every time you ask for x bytes, an arena returns you arena.ptr + arena.offset
and advances (bumps) its offset forward by x. And most importantly: there is no way to
free an individual allocation. To free something from the arena, you have to delete
everything.
This property is crucial and forces you to stop thinking about each object as an individually managed entity, but tie everything together into 1 large lifetime. A cliche example is a game with per-frame bump allocator: at the beginning of each frame, allocator’s offset is reset and further memory requests are super cheap. Just this tremendously simplifies the code by bypassing standard allocator (and it’s internals), and most-importantly makes manual memory management simple and tolerable.
Another example is a web server with short-lived requests: just create a bump allocator for each request and use it for everything “request related”. Before returning the response, free the bump allocator. You can also query this allocator to see how much memory you used - something you don’t get for free with most of standard allocators.
Hopefully you can see the pattern emerge. Instead of managing gross lifetime interactions
and freeing objects one at a time, we deal with a single “context-lifetime” and destroy
each object when the context logically terminates. This one single change makes everything so
trivial we don’t even need borrowck to verify correctness for us. Ah, and less importantly,
we’ve gained a good perfomance boost for free.
If don’t know anything about virtual memory you may be wondering how does one know how much memory
is required for an arena? Allocating a gibibyte buffer with malloc for an arena is obviously a
non-solution. The key is to use your operating system and the built-in memory paging
mechanism on modern CPUs. Just ask your OS to virtually commit a portion of the address space
(like 64 GiB). On subsuquent uses, your OS will allocate backing physical pages as you use
the arena. And if you hit the 64GiB limit, you probably have to reconsider some of your life
choices.
Because of this simplicity, arenas wear a lot of hats. They are most commonly known as:
- Bump allocators
- Linear allocators
- Stacks
Conclusions§
Linear allocators are so simple they are built-in into all CPUs. Yet very few codebases consider them seriously despite gains in simplicity and performance which is kinda embarrasing. Modern software-engineers tend to care more about a new AI slop model instead of fixing skill-issues and actually learning how to program. This general lack of education is the ultimate reason software has been declining.
For what it’s worse, this stuff isn’t even told generally. Popular YouTube try to convince me that manual memory management is error prone – my faulty brain would never do that right. CS colleges are no better.
Languge wise, I can only think of 3 languages which decently support/encourage arenas:
- C
- Zig
- Odin, maybe?
Rust, obviously, doesn’t fall into that category: as it literally has a built-in RAII on steroids.
You could argue that crates like bumpalo add this missing functionality but until this “low-level”
language implements allocator API it’s of no use.
While Zig is more fun to write (it even forces these ideas of passing the allocator around) we can get around without modern unstable tooling to just use arenas. Try to implement and use them in C – it is dang simple.
Further reading§
Hopefully I convinced you that there is a better way to do manual memory management. Check out this talk and this blogpost by Ryan Fleury. Memory Allocation Strategies series by Ginger Bill are also nice.
I didn’t discuss some catches with the arena allocators (scratch arenas subtleties) and techniques on how to deal with them as these sources already do a great job at this.