I don't know how fast SuperMalloc is for other people, but in my experience it's...

ciniglio · on Aug 26, 2016

Can you elaborate on what you mean by "Giving up the malloc() interface"?

scott_s · on Aug 26, 2016

The API for malloc is very simple:

  void* malloc(size_t len);
  void free(void* ptr);

Basically, ask for an amount of memory, and give back a pointer to memory. But it does not allow you communicate anything but the amount on an allocation, and you can only give a memory pointer on freeing. If, for example, malloc had an interface like:

  void* malloc(size_t len, duration_t dur);
  enum duration_t { SHORT, MEDIUM, LONG};

One can imagine you could optimize based on the given hint. Once you open up one kind of hint, you can imagine many other kinds of things users could communicate explicitly to the memory allocator about the allocation and access patterns of their data.

All implementations of free also need to find the related metadata for the given pointer. We could also imagine an interface for free which required the user to maintain that, but could perhaps speed things up:

  void free(void* ptr, meta_t d);

Or, we could also imagine communicating how important it is to free this memory:

  void free(void* ptr, immediacy_t im);
  enum immediacy_t { IMMEDIATE, DELAYABLE };

In the latter case, maybe the memory allocator could put off doing delayable requests, and do them in a batch later. (Making it a bit more like garbage collection.)

I'm not sure any of these ideas would help, but the point is that because of the limited interface, we really can't explore them. We can, but then convincing the rest of the world to change such a basic building block of C code is quite hard.

MaulingMonkey · on Aug 27, 2016

> In the latter case, maybe the memory allocator could put off doing delayable requests, and do them in a batch later. (Making it a bit more like garbage collection.)

Amusingly, when I've used an IMMEDIATE/DELAYABLE style hint, it was for the opposite purpose: I had some batched deallocs that I would either delay (to spread out over multiple frames instead of handling as a single batch, to eliminate the framerate hitch we were getting), or perform immediately as a single batch (to achieve greater throughput when switching scenes as delayed deallocation was adding untenable amounts of overhead.)

> We can, but then convincing the rest of the world to change such a basic building block of C code is quite hard.

Changing such a fundamental building block if C is impossible.

However, providing a second alternative interface, for those applications which could really benefit from such fiddly high performance tweaks, already happens a good bit in games at least. Pool allocators, allocators with extra debug information, allocation of entirely different styles of memory (e.g. write combined memory for texture uploads)... lots of stuff out there. Some low level graphics APIs now make you decide if e.g. you want to put shader constants in their own GPU buffers, or just interleave them into the command buffers themselves...

pcwalton · on Aug 26, 2016

jemalloc has an alternative API that allows specifying the size of the allocation: sdallocx [1].

[1]: http://www.canonware.com/download/jemalloc/jemalloc-latest/d...

Someone · on Aug 27, 2016

"jemalloc has an alternative API that allows specifying the size of the allocation"

I would like to see an API for malloc where you don't need to specify the size of the allocation :-)

For those who wonder: it takes additional flags specifying alignment, whether to zero memory, whether to store data in a thread-specific cache, or am arena to use.

vmorgulis · on Aug 26, 2016

Interesting idea.

Another way of improvement is to use alloca() for small local (to function) objects but there is no direct way to know if a variable is local or not (in C and C++ at least).

> We can, but then convincing the rest of the world to change such a basic building block of C code is quite hard.

I can be made with static analysis or binary instrumentation.

geocar · on Aug 27, 2016

> but there is no direct way to know if a variable is local or not (in C and C++ at least).

If you only have one stack, and the stack is at the top of memory and it grows down, you can:

    int onstackp(void*x){char a;return x>&a;}

vmorgulis · on Aug 27, 2016

That's so simple! Thank you a lot.

geocar · on Aug 26, 2016

malloc(sizeof(foo)) is slower than alloc_foo() because the latter can simply be a pointer increment, but there is no way to tell the malloc() interface that you're going to be doing nothing but allocating foo for a while.

malloc() can guess this with heuristics, but a good malloc() needs to perform well for a wide variety of use cases: Surely you can appreciate that balance has a cost that the specialised allocator simply doesn't have to pay.