Allocate Little, Gain Much

Currently, I’ve been working on bindings for Copper with Irrlicht – a project I’ll probably write about later. I started to consider how much memory I was using by creating wrappings for every little thing. Naturally, I wanted things both small yet convenient. (It’s true that it would be better just to write a native GUI for Copper, but that’s asking alot out of myself.) I wanted the easiest way to have maximum control, but Irrlicht bindings send everything through a tunnel – i.e. when you want an attribute of something, you get all of the attributes. I could dump the ones I didn’t want, but what’s the point? Isn’t it faster to just load all of them into a Copper function?

That got me thinking… how much memory are Copper function-objects actually using?

It turned out, quite alot. By default, Copper scopes allocated about 100 slots for member variables. That doesn’t seem too bad until you realize that all of those are containers of 32-bit (or 64-bit) pointers, so technically it’s more like 400 bytes (or 800) instead of 100. The larger the requested memory allocation, the longer it takes to find and allocate the space. Perhaps it seemed like a good idea at the time (since there was very little chance of a bottleneck from reallocating), but how many variables are going to need 100 members? Only a stack frame needs that many, and even on the global scope, it’s unlikely anyone is going to use much more than 100. The scopes can resize, of course, but why allocate so much when variables don’t need it?

I decided to add a parameter for setting the scope size to the Scope class and let the default size be settable by a #define equal to 15. Then I benchmarked the code using my old code for getting the “first 5000 primes” (actually, it’s getting the primes less than 5000). The speed of the benchmark went from 22 seconds to only 15!!!! I saved a whopping 7 seconds! Awesome!

Ok, so what if the default size was 10? At 10, the time dropped to 14.5 – a significant drop, but obviously not as large. Still worth it in my opinion, though. The danger of having too small of an allocation is the frequency of collisions inside the robin-hood hash table the scope uses.

With this change, I think I’ve found the biggest bottleneck in my codebase. That’s not to say it wouldn’t be nice to make it even faster, but it’s harder to complain about 14.5 seconds than 22. It’s possible that some collisions are leading to slow-downs. The hash table for built-in functions has 39 of 50 slots full. I would hope that robin-hood hash tables do – as has been proposed – still act fast when loaded like this, but maybe that hope is ill-founded.

Even though it’s obvious from the first test that the time reduction was significant, I decided to test it on the first 100 Fibonacci numbers. The results fluctuated (as expected for such small times), but all but one was notable smaller than the previous results by roughly 5-10%. On average, it was definitely less. I would have liked to have run the test longer, but even C++ doubles (the value being used) can only hold up to a certain real value.

The code for both tests is in the documentation in the FAQ section. The listed benchmark times there are now outdated.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s