Internal Variable Storage

Continuing the theme of memory management, I’m going to discuss how variables are stored.

Variable storage is tricky: on one hand, you want it to be fast to store. On the other hand, you want it to be fast to access (for both reading and writing). I would argue that the latter is slightly more important, but that all depends on how your language is implemented and how it is supposed to be used (and yes, I can imagine a language where variable storage speed is more important than random variable access speed).

Linked lists are very fast for storage – O(1) to be exact, assuming you keep a pointer at the node where you add the next new node. But for fast access, you must put the most used variables near the end of the list that your algorithm checks first when processing an access request. Generally, the last used variable goes last, and since programmers generally create variables shortly before using them, your algorithm should check the last-added variables first. The bummer is that eventually those other variables may need to be used. Again, this depends on the design of your program. The greatest benefit of a linked list is virtually unlimited storage with a constant element-append time. Therefore, such a structure is suitable for the global namespace, which could go on “forever”. For smaller scopes, it’s advantages vanish.

Raw arrays are no faster than linked lists in the long run. They scale horribly, and access time is just as slow. I only speak of them for the sake of being complete.

Another issue with linked lists stems from the type of data we’re trying to access: names. Names are stored as strings, and over the course of a program, many names will end up starting with the same characters. It would be nice to check these characters only once and filter from there. That’s where trees are nice.

Trees sacrifice being fast at addition for being a filter. Worst case time is O(n/2), but this requires balancing the tree. Balancing the tree isn’t necessarily a complicated act per-se, but it does requiring storing information about weights, and that uses precious memory. Furthermore, trees are messy to clean up. They are, like linked lists, ideally suited for the global namespace (where cleanup time is unimportant). But even here, they use up memory faster than linked lists, and with a search time of O(n/2), they aren’t offering all that much faster performance, especially if you consider that the last-added (and most frequently-used) variables will have such a search time, whereas for the linked list, such search time would be most likely significantly better if not comparable. I put this in realistic terms rather, since O(n) vs O(n/2) is deceptive. Ok, so trees are out. That saves production time since trees aren’t so fun to implement.

Hash tables are the best option for random access speed. They are supposedly O(1) for access (for best case). However, a generic hash table suffers from being either slow when there are conflicts (using a jump of 1 to look for the next open slot) or very wasteful (using a quadratic jump) and may end in failure (since quadratic jumps don’t guarantee finding an open location). The alternative is a Robin Hood hash table, which is much better about not wasting memory. In such a hash table, the access time is reduced to a kind of average for conflicts, so in practice, the access time is greater than O(1) but much less than O(n). Adding variables is also O(1) in best case, and pleasantly enough, it doesn’t perform much worse than that in practice. The major caviot [sic] is resizing.

To compare linked list additions to hash tables, I did a speed test that involved strings. The Robin Hood hash table performed slightly better, despite having to move some things around. However, as soon as the table needed to be resized, any speed advantage the hash table had over the list was completely lost.

~ Final notes ~

Since the variable storage is different for the global namespace as it is for the local space, the scopes must act as a generic interface for accessing and storage data. The global namespace can use a linked list, and the local scopes can use Robin Hood hash tables. All set? Onward to the next subject!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s