Progress Report #3

Sometimes the easiest way to see if something will work is to start coding it, writing down ideas as you go. The adage is 100 hours of programming will save you 10 hours of planning. That can be very true, but sometimes it isn’t possible to figure out if an idea will work until you try putting it together in code. You’re going to need to write some pseudo code, and you may get to the point where you’re just writing code.

I use a restricted subset of C++ (so it’s cross-platform and C++ version agnostic, at least according to what I’ve read), but C++ is already very restrictive. The most annoying aspect of C++ is its inability to identify types during runtime (by default, not with RTTI). If I want to know the type of an object, I need to build an entire system around it, which tends to slow everything down because it involves either enumerations or strings. There are C++11 ways of creating an “any” class. I’ve read about them, but I hesitate to use them so as to keep my code C++ version agnostic. I may need a library or framework that isn’t C++11 friendly or would require hacking someone else’s behemoth codebase just to get a few tiny changes to make it compatible. That’s one of the reasons I elected to not use ChaiScript.
Because of the type system dilemma, I’ve chosen to sacrifice memory for the sake of speed and security. Computers – even old ones – can get more memory (up to a point), but speed and security come with the program.
In designing the new engine (the core of the virtual machine that parses and executes the code), I elected to use a similar approach to before. This has a number of advantages. First, it preserves one of the roles for which the VM was intended: to be a command-line language, dependent on user input and therefore required to be able to pause and save state. After quite a bit of thinking, I couldn’t think of a better system. On top of that, it fairly simple. It’s made even simpler by the fact that I need fewer states, since more can be done in a cycle parse cycle (as opposed to the processing cycles I was using before: processing one token at a time).
Another advantage of the new design is that I can reuse much of the old token-parsing code. That saves a great deal of work.
Finally, the VM itself will need only a minor change in function bodies. Some stuff will definitely be deleted or rewritten – specifically, everything from the Task Processing Model mentioned in a previous post. That all needs to be redone.
The task system will have a reduced role. Not only will it contain less data, but it will be used by a new opcode system I’m implementing for performing the actual actions.
The new engine will use “opcode strands” (lists of operations) to perform actions. In the future, I’ll be able to compile Copper to bytecode and use the same engine to read it. The real advantage, however, is the speed benefit. My previous engine design was an attempt to speed up the command-line interpreting process by avoiding having to write byte code. Given that I’m creating the new engine in the same style, I’d say that attempt was successful in a number of ways.
The old engine could quickly parse code and run it when that code needed to be run only once (more than once, such as in loops, showed the true sluggishness). Thus, if you need a virtual machine that quickly parses simple input with a simple-yet-powerful syntax, by all means, use the original Copper VM. That said, the “old VM” is not garbage. I’m not going to throw it away, and I’ll likely continue to maintain it as long as I need it. It does, however, need changes to the foreign function interface (FFI) – something I intend to implement first in the new engine but is, no doubt, needed everywhere. I intend to make all the FFI class code the same so I’ll be able to use foreign functions with both versions of the virtual machine.
Of course, if it turns out that new VM is superior to the old one even for inputs that are run once, I may put the old VM on the back-burner until I can get some ideas for optimizing it. This is quite possible, depending on how those inputs come. The old VM can preserve processing state at every token, so it would work perfectly for accepting code directly over a network. The new VM will only be able to preserve a subset of parsing states and it must redo some processing if there are not enough tokens given. Granted, it can still do it, but if things like function bodies are big and the stream of code is interrupted (or paused), the parsing of the function body must be redone. There are ways I can fix this, but it complicates the code.

Using opcodes does not automatically make the engine faster. The entire parsing riggamarole will likely slow it down initially. The speed should come from the simplicity of the code that executes the op codes. There are 4 function calls in the main opcode execution loop, one of which contains a “switch” for picking out the opcode and processing it. While I still end up doing a number of operations for each task, the execution gets to handle chunks of parsed tokens rather than individual ones. The thing that may benefit the most will be the function calls, which in the old VM (I realized) required tons of processing cycles. C++ may be fast, but even with optimizations, it was bogged down by the slow nature of the procedure.

Coding Progress?

I spent some time planning out ideas and seeing if they would work. There’s no sense in jumping in immediately to coding when you don’t know what you’re supposed to be coding. I believe I toyed with ideas for the better part of a week, but even still, it was hard to know what things would look like without sitting down and coding them. C++ has a way of making you feel very lost in its complexity. For safety, I tend to use wrappers (especially when saving pointers in lists since I would like something to automatically dereference them when the list is destroyed). However, too many things end up needing wrappers, and if you’re not careful, it can be easy to forget what you’re doing or think that you can’t do it. I came up with a model for execution, but even though it was good, that model is dictated in part by what I can actually do in the parser. At the same time, it’s important to remember that the parser itself is designed for making the execution process easy, not for being easy on itself. There is a balance that needed to be met while simultaneously keeping the C++ compiler happy.
When will this be finished? Hopefully much, much sooner than it took to make the VM the first time. It helps to know, in part, what needs to be done and not need to rewrite everything.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s