Variable Storage Tree Model

It’s time to introduce – with pictures – the memory management of Copper and explain why it does it this way.

Background

Many new languages use garbage collection. In fact, I dare say all but the static and stack-based languages use garbage collection so far as I can tell. It’s understandable how this comes about: many language designers want to have dynamic (a.k.a. automatic) variable typing so you can store an “int” where a “float” was once held. Because a variable should hold any type, you end up with the null type (sometimes called “nil”), which often leads to crashes or hacks. However, having dynamic typing means saving alot of keyboard typing. It also means we can throw objects around and have functions perform actions on multiple different types so long as those types share certain method names (and prototypes) in common. However, in order to store such data, there must be a separation between the data and the variable. On the bright side, this allows variables to easily share data without interfering with each other (consider, for example, that Python primitives are immutable). However, designers often think this means separating the data from the lifetime of the variable. Sometimes complex ways can be used to ensure safety – such as only one variable controlling the data at a time – but these are cumbersome to implement and cumbersome to use. What’s the point of having dynamic typing if you still have to babysit it?

Separating data from variable lifetimes leads to the problematic challenge of leaving data in limbo. When should that data die? The usual solution is to keeping the data in a kind of table where it is reference counted. Once the references to it are removed and the garbage collection algorithm kicks in, the data are deleted. The downside of this is that the program must pause for the garbage collector, which can be a bad thing if the program is in the middle of something important that shouldn’t be interrupted. Some languages allow you to request garbage collection before important code is executed, but the very fact that you must do so means that the language still forces you to perform manual memory management no matter how “hidden” it may be.

In addition to this, there is the added blessing and curse of being able to freely pass around data and let it reference anything, such that it can create a cyclic loop of reference. While sometimes this may be beneficial for a short term activity, the fact is that cyclic loops of reference in memory, if not broken, prevent data from being destroyed (because the reference count never reaches zero). This is a memory leak, and the very fact that it can occur means that programmers – despite working with a “high-level language” – must be aware of the memory handling of the virtual machine. What’s scary is that this kind of memory leak is impossible to watch out for in all its forms. Consider, for example, the following chain of references:

a->b
b->c
b->d
c->e
c->a

If we tried checking whether or not “a” was in a cyclic loop, we could try going through the references and marking them. There is a tree of paths that must be explored before determining if “a” is referenced somewhere in that chain. The tree looks something like this:

a->{b}
b->{c,d}
c->{a,e}

… where the curly brackets ({}) denote branches. To save memory, we could perform depth-first search. This is a very time-consuming process, and in order to prevent other loops from interfering (that is, by getting caught in one loop while searching for the possibility of another), this process would have to be done every time a variable is assigned. If we try to mark variables, we have to somehow remove the marks when we’re done.
It’s easier to let the programmer handle this, but that means the programmer must be ever careful to avoid this, and that defeats part of the purpose for using a dynamic language: to be more carefree.

Variable Storage Tree Model

Data needs to have a lifetime, just as variables do. A garbage collector doesn’t provide such a lifetime, but stack-based scopes do. The creators of Rust realized this and decided to address this issue in tandem with data races, resulting in stack-based object instantiation and a limit to a single object owning an object or data at a time. This model is excellent for low-level software and programs requiring multi-threading, but it’s hardly suitable for a high-level language where programmers are more interested in tossing around objects for quick prototyping.
The solution devised for Copper is one that also stems from its simplicity. Copper only has one data-type that can be stored in variables: function-objects. Syntactically, this immediately allows two things: No type declarations and universal variable syntax. There are no data-type-context-based operators in Copper. Within the virtual machine, the single data type means that all variables can be easily linked to a single object without need for internal casting and type checking. That is, variables do not store merely “Object”, they store specifically “Function”.
Multiple variables can link to the same function-object but only one of them owns it. This means that the lifetime of that function-object is bound to the variable. When the variable itself dies, so does the function-object. This is different than having reference counting track the life of the function-object because there may be a number of references left over.
What happens to all those leftover pointers pointing to the data that is now gone? There are a couple of solutions. One is to keep the data in a table and mark it as gone. Such a table requires a look-up algorithm and wastes space (because you have to pre-allocate space for the slots and you don’t know how many you need). It also leads the virtual machine being susceptible to cracking via tampering with the slot addresses in pointers.
The alternative is the solution used by Copper: an intermediate, by which the cycle of references can be broken without leaving pointers accessing null. The article “pointer~item” already describes how this works, but let’s see a visual representation:

variable storage tree model of Copper
Green links show ownership by a variable of a shared function.

In the diagram, the FunctionContainer is used for containing the Function and being the go-to point for all variables. The FunctionContainer itself acts as the object, but all of the data is held in Function (including the member variables that would contain self-references and cause the cyclic references). The FunctionContainer is owned by whatever variable it is assigned to first (or it just dies if not assigned). It will persist as long as there are variables that point to it. When its owner variable is destroyed, the FunctionContainer destroys its Function data.
Variables that try to access the destroyed data of a FunctionContainer are informed that the data is gone, and subsequently, they release their reference to the FunctionContainer itself, bringing it closer to its destruction. The variable, however, is assigned its own new empty FunctionContainer, to which members can be added.

variable storage tree model of Copper after variable destruction
After the destruction of a variable, the function data is destroyed but the container remains. Pointers that attempt to access the data detach and create new, empty functions and containers.

The creation of a new function from a pointer avoids the pitfall of segfaults, allows processing to continue, and can result in warning messages rather than the entire program crashing.
Doing this silently is undesirable because it may not be an intended action. Therefore, the attempted access of an empty FunctionContainer by a pointer results in a warning message. Such warnings, passed through the Logger interface of the virtual machine, can be used to stop processing.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s