FFI = cu.translate(other)

Interpreted languages would be useless without extensions or some foreign function interface. Unfortunately, this is a very, very annoying position to be in because we’re effectively going back and forth between “trusted code” and “not-trusted code”. This gives rise to a number of potential problems to solve, and coming up with solutions for them for Copper wasn’t easy because the solutions are custom fits for a language and engine that have never before existed.

Data Transmission Problem

One such problem is how to transmit data between the systems. Internally, the interpreter engine can use smart pointers. However, if externally, there are no smart pointers or another smart pointer system is used, you end up having to decide between passing raw data, passing by wrapper, or inventing some interface to handle data transfer “safely”. I wanted my system to be simple, so raw pointers were more appealing than wrappers. However, I didn’t see anything wrong with exposing part of the system to allow less ambiguous transfer of data. Rather than forcing the user to implement a round-about way of setting required data, I decided it would be better – via the foreign function interface – to pass them an address to a smart pointer handling internal data. That way, all that is ever required from them is to call a set() function – no needless complication of memory management. Well… ok, so when I worked with it, I started to balk at the templated code I was using for just setting the result of a function call, and I figured it be better to hide that, as I’ll explain.
Also, functions in Copper are variadic by default. Extra parameters are ignored and unset ones are defaulted to empty functions. That’s not true with most other languages, and it makes sense why (not that you can’t make variadic functions in C++, but no one should be doing that anymore). Directly exposing the parameters passed to the function was convenient for the VM, but it made implementing extensions overly time-consuming and resulted in alot of duplicate code. Rather than continue on that path, I wanted something easy.
Implementing functions for an embedded language should be easy. A number of languages are like this, though I suspect their weird quirks leak out when you start to do things with their macros out of order. I don’t wan’t macros. They hide important details and raw pointers in plain site. The fact that you have to wrap pointers in things like SOME_LANG_DEREF() shows that the creator was trying too hard to simplify pointer operations. I don’t want to keep track of pointer referencing unless I create the pointer, in which case, I should know that it’s a pointer rather than hiding it behind some silly macro facade.
I decided it was time to consolidate everything into a single, safe parameter that could be passed to extension functions: FFIServices. This service would allow for the safe and easy extraction of parameters, allow for saving of the function result (if there was one; and keeping track of whether there was one or not), and providing other useful features like the ability to print warnings and errors and killing the engine.
While I had something like FFIServices in mind when I first thought of FFI, I didn’t know in advance what programming an extension would look like nor had I really considered what I would need. I thought the user might want full control (at least I did at first), but having some things wrapped, automated, and guaranteed is nice. It was hard to know in advance what I really needed.
One of the benefits of the change will be the decrease in likelihood for null pointer errors, which I’m really hoping for. This is due to the fact users won’t need to perform any complicated type-checking tasks, requiring pointer passing just to make mundane tasks more expedient. The user also won’t need to perform any reference counting on data to protect it from destruction should they pass it on to further engine processing.
The downside of the major change is that I have to rewrite a number of extensions, including variadic numeric ones. Sadly, I can’t preserve them, but at least I didn’t write very many and I don’t have to ditch all my code. Changing them should be much faster with the new interface. It’s a small (though time-consuming) process to pay for a better interface.
Another downside is the speed cost. The first FFI was extremely simple so as to allow calling foreign functions in the fastest way possible. In fact, anything other than will be slower. The speed decrease, however, could be very small and quite worth it. The change is mostly relocating boilerplate code to within the interpreter engine, so the speed difference won’t be very much. Given the simplicity of the old version, it almost (almost) makes sense to keep both the new and old versions (so I don’t have to re-implement all my existing extensions). Unfortunately, this would only serve to confuse users and the new version is intended to replace the old one completely anyways, not co-exist with it.

Incorporation

Another problem was how to integrate foreign functions into the actual mechanics of the engine. Sure, giving each foreign function an interface is easy (uh, the giving part, not figuring out WHAT they should have), but how are these meant to be stored? Obviously, with a limited number, they could be readily stored in a constant size array to not waste space. But then we need to either iterate through the array and perform name matching – as we would for a list – or use a hash table function for adding these names. But what if this list grows? After all, if you have several groups of functions to add, the array might need to grow. You end up with a slow build-up period. There is no “perfect size structure” since we’re not working with either the global scope or a local scope (where, in both cases, average occupancy can be approximated). However, since we care more about time to access such functions more than the speed at which they are stored, it makes more sense to store them in a hash table.
The current engine uses a round-robin hash table for storing foreign functions. Given the tests I made with built-in system functions, it seems readily apparent that the speed increase is definitely worth the little extra wasted space.

Namespaces

Being an embedded language intended for modest scripting, namespaces was not in the original game-plan, though it possibly should have been. I’m still considering how to go about this if I do. You can somewhat simulate this by appending functions to other functions, but the creation process is tedious for extensions and the “namespace” itself would not be constant.
The lack of namespaces has a significant impact on extensions, which could otherwise utilize the restricted nature to hide their functions rather than clutter the foreign function table. Depending on its implementation and the number of actual extension functions, this may or may not provide a speed advantage, and at may or may not provide code clarity.
A namespace is nothing more than a scope, so I may implement it the same way (if not merely disguise or wrap scopes in/as namespaces). Given the current syntax, however, something like “::” could not be used. It’s more likely that namespace name-joining syntax would use something like “@”, which I have (heehee) reserved for this purpose should the need arise.
Within Copper, the alternatives to namespaces are adding another way for grouping functions which are then stored in objects (a disguised form of object creation) or simply storing via the parameters component. The benefit of these approaches is that it keeps the virtual machine simple, allows for extending “namespaces” and sharing functions.
Foreign functions are not part of scopes (at the moment), so there is no way of even mimicking the within-Copper-alternative. Making them part of Scopes doesn’t necessarily simplify the process for the user who is creating extensions. The creation process should be as straight-forward and simple as possible without sacrificing functionality. Something has to give way, and in this case, it’s likely to be speed.
In short, the need for namespaces is primarily for extensions where it currently isn’t possible to mimic them with object creation (because foreign functions aren’t part of scopes). Making them part of scopes still forces a semi-complex process that occurs within another foreign function, accessed from the global scope. Within that function would be something like the following:

Function* f = new Function();
FunctionContainer* fc = new FunctionContainer(f);
f->getPersistentScope().addForeignFunction( myForeignFunc );
f->deref();
// return the function container

This is a process of rebuilding the same “namespace” function, over and over, every time the entry function is called. Unacceptable.
The current situation is still in limbo, but no doubt, I intend to resolve it relatively soon so I can decide whether or not to put math functions within one.
If one considers the desire to “import” functions, there is possibly the “using” syntax (similar to C++), which would appear as a function but be a special structure like “own”. It would then march through the tokens and parse out which would be the correct one.

Arguments Against Namespaces

Aside from the possibility that it might be easier than it initially appears to make adding foreign functions to scopes, there are a number of other reasons to dismiss and even dislike namespaces.
First, one of the goals of Copper is to simplify syntax. Objects, hash tables, and namespaces are nothing more than containers. It makes little sense to differentiate between them internally if syntactically it is easy to simulate all of them. Object-functions, as it is, get all the advantages of copy-constructors, pointers, scope members, and the ability to expand – all features that would be reimplemented for namespaces in order to comply with their specific requirement needs (such as the inability to delete namespaces or namespace members).
Second, namespaces complicate the virtual machine. Perhaps this reason is even more important, but I give it secondary notice because if I really wanted the feature, the desire would outweigh this reason. Namespace syntax requires searching a namespace path and then an object member path for structures like “own”, “is_ptr”, and so forth. The bulk isn’t justified for an embedded language, especially if most scripts are likely to be so small, they don’t need namespaces.
Thirdly, the implementation of namespaces does not guarantee a speed advantage. It will certainly cost speed in various parts of the virtual machine, but the primary speed up it provides is simply lookup time (which could be a huge factor, though it’s hard to tell without actual code to compare).

Implementation

The implementation of the foreign function mechanisms is easier said than done. What does it look like? How is the user expected to use it? Ideally, the user should only be expected to create clean-looking code and in a short time. If you’ve every tried to write a Python extension (*raises hand*) you find out that it isn’t pretty looking. Yes, that’s C, but Copper is in C++ and everyone who will use it will probably expect a nice object-oriented solution.
The Copper interpreter FFI allows the user to directly interact with Copper Objects. The interpreter uses a class called “Object” that isn’t aliased or hidden. It’s the parent class from which all other classes (used in operation) are derived. Python, Java, and many other programming languages are like this, so it’s nothing new. A foreign function receives from the interpreter the raw Object parameters (passed to the foreign function via a call in the Copper code) and can optionally set the return (there is a default in case it is not set). Exposed in branches fn and Bobcat, branch Cheetah now hides this in the FFIServices, but the raw Objects are still exposed as pointers, and each one must be downcast according to its typeName listing as given by ForeignFunc::getParameterName(). The object type can be checked before being cast, but if you are creating a normal (non-variadic), you probably don’t want to do this (since it’s automatically done by the interpreter). That said, the typing mechanism isn’t “safe” in the strictest sense of that word; if you make the wrong cast, you can mess things up. However, I would like to think it would take alittle more than an accidental oversight to make a mistake here.
Currently, all versions of the interpreter use the same class: ForeignFunc, which has a single method that gets called by the engine when the “name” for the foreign function is used in Copper code. Implementing such a class is straightforward, but working with it for my extensions, I found out very quickly that it gets time consuming. For the previous two versions of the interpreter, I wrote a couple of convenience functions for generating wrapper classes so that I didn’t need to recreate a class every time. This only handled variadic functions and thus won’t work for the new interpreter. The new interpreter needs a list of object type names, which it gets via two methods: ForeignFunc::getParameterName() and ForeignFunc::getParameterCount(). However, this leads to the issue of how to create an appropriate wrapper.
Creating a wrapper is certainly possible, but I would like something that is both easy and requires little memory. Using a linked list for saving parameters fulfills neither, though it does avoid having to implement a new data structure for arrays. If I were using the standard library, I could make this easy by using std::vector and using the array initialization pattern:

std::vector<String> params = { "int", "double", "double" };

Alas, that’s out of the question.
One alternative is passing in a null-terminated array of strings, but this is ugly. It forces the user to be remember such a tinny but critically-important detail. A similar alternative is having the interface require a size parameter. Then users could utilize std::vector when creating the list of parameters and std::vector::size() for the function pass. Neither idea is ideal.
In any of these cases, to avoid boxing the user in, it may be best to create a generic container interface for returning the size and elements. The wrapper could use it to construct an internal array, and the user would only need to create a class that extends std::vector and the container.
Instead of all this, I’m considered having the user pass in a second function – one that returns the parameter names. Again, this doesn’t help with the size-handling problem, and in fact offers no remedy aside from macros (which would be an ugly solution).
Ignoring safety for the sake of convenience, the dreaded but most succinct way to do this is by creating a variadic function. It seems rather fitting that Copper would use such a feature, though it could be a serious security hole. On the positive side, the variadic part (the arguments after the first 4) would all be String, so there would be no need to worry about demanding a particular order. The usage would look like:

engine.addForeignFunction("add", &add, false, 2, "int", "int");

(The “false” indicates this is not variadic.) That in itself is somewhat ambiguous. It’s terse – which is nice – but it hurts readability.
The Boost Library has some interesting assignment mechanisms. These are somewhat slow; they sacrifice speed for the sake of efficiency. Considering that the creation of foreign functions would proceed any operations by the interpreter, this isn’t such a bad trade-off. But in any case, it still requires creating a new class for handling the creation. For what it’s worth, I would rather create the container interface I mentioned and let users hook it up to std::vector.

Conclusion

Whatever I end up doing, there will be downsides. That’s just part of building anything. Picking any approach to developing a program comes with trade-offs. Picking portability has cost the optimal approaches for creating foreign functions but given the power to work on any machine (… uh, well, any modern computer). Using an all-encompassing interface for the FFI cleans up code at the cost of speed.
The final decision on the foreign functions wrapper will most likely appear in the documentation than in a blog post, if it even comes to fruition. I will likely try containers first to see if I like them, and if that seems too tedious… I guess we’ll find out what happens when I get there.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s