Writing a simple memory allocator

The first time I wrote my own malloc, the whole concept of dynamic memory stopped feeling like a black box. An allocator is just a program that manages one big block of memory and hands out pieces of it on request. The hard parts are bookkeeping and fragmentation, not anything mystical. Let me walk through a minimal version.

Where the memory comes from

Your allocator needs raw memory to carve up. On Unix you get it from the kernel with sbrk, which moves the program break (the top of the heap), or with mmap for larger regions. sbrk is the simplest to reason about: call it with a positive number and the heap grows, returning a pointer to the new space.

void *region = sbrk(4096);   // ask the kernel for one page
if (region == (void *) -1) {
    // out of memory
}

The whole game is now: take that region and parcel it out, remembering which parts are in use and which are free.

Block headers: the core trick

The fundamental idea is to store metadata right before each block you hand out. When the caller asks for N bytes, you actually reserve N bytes plus the size of a small header. You return a pointer past the header, so the caller never sees it. When they call free with that pointer, you step backward to find the header again.

typedef struct block {
    size_t size;          // payload size in bytes
    int free;             // 1 if available, 0 if in use
    struct block *next;   // next block in the list
} block_t;

#define HEADER_SIZE sizeof(block_t)

I keep a linked list of these headers. To allocate, I walk the list looking for a free block big enough. This is the first-fit strategy: take the first block that works. Best-fit (the smallest block that fits) wastes less space but is slower to search. Both are fine for learning.

The allocate path

static block_t *head = NULL;

void *my_malloc(size_t size) {
    block_t *cur = head;
    while (cur) {
        if (cur->free && cur->size >= size) {
            cur->free = 0;
            return (void *)(cur + 1);   // memory just after the header
        }
        cur = cur->next;
    }
    // nothing free, grow the heap
    block_t *blk = sbrk(HEADER_SIZE + size);
    if (blk == (void *) -1) return NULL;
    blk->size = size;
    blk->free = 0;
    blk->next = head;
    head = blk;
    return (void *)(blk + 1);
}

The expression cur + 1 is pointer arithmetic on a block_t pointer, so it jumps exactly past the header. If that line looks strange, my post on how pointers really work explains why adding one moves by a whole struct, not one byte.

Freeing and the fragmentation problem

Freeing is almost too easy: find the header and flip the free flag. The memory is not returned to the kernel, it is just marked reusable by the next allocation.

void my_free(void *ptr) {
    if (!ptr) return;
    block_t *blk = (block_t *)ptr - 1;   // step back to the header
    blk->free = 1;
}

This naive version has a real flaw: fragmentation. Free a 100 byte block and a 100 byte block next to each other, then ask for 150 bytes, and my allocator fails even though 200 contiguous bytes are free. The fix is coalescing: when freeing, check if the neighboring blocks are also free and merge them into one larger block. Real allocators also split oversized blocks so a request for 16 bytes does not consume a 4 KB chunk.

Alignment, the detail that bites later

There is one correctness issue my toy version glosses over: alignment. The CPU expects an eight byte value to sit at an address divisible by eight, and on some architectures a misaligned access faults outright. A real allocator rounds every request up so the payload always starts on a properly aligned boundary, usually 16 bytes on a 64 bit system. My version happens to work because the header is already a multiple of the alignment, but the moment you start splitting blocks you have to round the sizes or you will hand back addresses that crash on certain reads. It is the kind of bug that hides for months and then surfaces only on one platform.

Why this is worth building

You stop fearing malloc and start seeing it as a tunable component.
You understand why allocation patterns matter for performance, which ties directly into data-oriented design and CPU caches.
You see exactly why double-free and use-after-free corrupt memory: they scribble on these headers.

Production allocators like jemalloc and tcmalloc add size classes, per-thread caches, and clever data structures, but the skeleton is what I just described. Knowing the skeleton is the difference between using memory and understanding it.