Progress Report #1

Feels odd that this is technically the first complete progress report being made on this project, but the good news is that there are a number of positive things to report.

First, the virtual machine (VM) is fully functional. It was a breeze to debug (which I may discuss later), and the past few days, I’ve been adding internal functions to aid in usage. Very soon, work will shift to extensions.

More time was spent developing the engine than writing about it (that’s a good thing – I actually got work done!), and now that the project has hit an exciting milestone, I can save this blog from decay. Here’s a run-down of the capabilities provided by the virtual machine:

  • The user is able to create functions, with or without parameters.
  • Variables are passed by address to functions.
  • The user is able to create variables, save functions to them, call functions from those variables, and assign new functions to those variables.
  • The user is able to check for built-in types.
  • The user may create boolean, string, and numeric data.
  • Functions can have child-member variables and act as classes, having their own special pointers (“this” and “super”) that allow for accessing their child-member variables.
  • The user can use an “if” control structure that has both “else” and “elif” keywords and functionality.
  • The user can use a “loop” control structure to repeat code indefinitely or until a “stop” is processed. There is also a “skip” keyword for resetting the loop processing cycle.
  • Variables can be converted to pointers.
  • Variables are automatically initialized to the one and only type, functions.

Pointers are only one level deep. This means modifying a variable via a pointer modifies the original variable. You can’t manipulate one pointer via another pointer. While programming in C++, I’ve found it useful to use doubly-nested pointers, I believe Java clearly demonstrates how this isn’t necessary when language functionality suffices. I believe it suffices in the case of this language.

That said, functions are the only object type that a variable may “store” (point to). Technically, all variables are pointers to functions, just that one happens to be the “owner” to which the function’s lifetime is tied. This simplifies things immensely, and consequently, the overall language is simplified. Getting them to work correctly was tricky, but I spent quite some time testing the features to make sure everything worked without error, so I believe I can safely say this VM is in beta release form (as opposed to alpha). I wish I could say it’s beyond that, but there are a couple of sticky issues.

~ Strings ~

First, string creation is possible. It is not, by default, filtered. I intended to have it filtered to Unicode-8, but I decided against this. I have a default-off preprocessor option for filtering down to ANSI ASCII only characters, but no warning is given for non-ASCII characters. In part, I’m influenced by Lua, which is string-encoding-agnostic. Python 2 is also like this, and while there is a lot of hoopla over the switch to Unicode support, I inclined to think such a feature would be better implemented by whoever needed the Unicode functionality. Adding Unicode-filtering functionality would have added bulk to the virtual machine or a library (which I don’t want), and I’m under the impression Unicode isn’t necessary for many user projects where this sort of virtual machine would be used. A library can be used to add this functionality, though admittedly, it is disappointing to not have it built in.

~ Numbers ~

Second, number creation is possible, but it’s sort of an alias. There are different ways of converting ASCII numeric strings (which is what I have now) to binary strings representing numbers. I could use bitset (which requires either creating my own specifically for this task or including a library, which I don’t want to do). I could also convert each individual digit into its corresponding unsigned int value and add them together, for which I have a few functions. I have opted to retain the literal string form. Why? – It’s easier to use with existing libraries, such as GNU MP (which I intend to use), and it can be easily parsed, which allows for fast printing. Converting to and from binary is a bit annoying.Byte format (as opposed to literal format) has a number of other downsides. First, some memory is saved, but not as much memory as you might think. If you’re worried about memory, you shouldn’t be using a virtual machine like this one to begin with. Every number would either need to save its format (if the format isn’t universal) and/or have some extra mechanism for handling overflow. For some implementations, this means keeping a byte buffer size and letting the numbers grow indefinitely (er, at least until unsigned int max or unsigned long max). This means for every number, you actually use at least twice the memory as the base-language code counterpart. On top of that, you need functions for handling such a large number. I’d rather not do that, so like I said, I intend to use GNU MP.I also plan on creating a small-number library that abuses overflow but would be suitable for games where precision and accuracy are rather low on the priority list. How many features (and exactly what features) this latter library will have has not been decided, but it will likely be limited to addition, subtraction, and multiplication; division; modulus; power; sine; cosine.In addition to a small number library, I am greatly considering a numeric tables library (a.k.a. matrices), wherein the contents would be raw C++ numeric type data (of type “double” most likely) and could therefore have bulk operations be performed on it. In some cases, this would be far more useful than individually handling numbers, and would limit the memory footprint of matrices.
Such a numeric tables library could be replicated for larger number data types, such as the GNU MP data type, and could utilize the HDF5 library for storage. This is a prospective idea, but since it has great usefulness in scientific applications, it’s likely I will eventually implement it.

~ Input/Output ~

Copper is not a language where “import” or “include” have any meaning. Being an embedded language, it doesn’t really make sense to have these anyways as the desired functionality will be provided by whoever includes this in their software. On top of that, “importing” functions could be done by processing other files or adding a function via the FFI that performs such a task. Even though such a function would be the equivalent in action to the Python “eval” on a file, it’s not built-in, so I would consider it safe in this case.

To be useful as a language, Copper should at least have a standard library for printing to console and performing file input/output operations.

Currently, I have completed a couple tools. One is printer for printing to console. The other is a logging tool that writes error messages to a FILE* (which could be stdout). The latter prints the line and column of the error and uses a function (in another header file) for converting error codes into strings.

I opted for error codes instead of strings for error messages. The motivation for this was to make it easier for IDEs running Copper to handle error messages.

Finally, there needs to be a small library (acting through the FFI) for handling file reading, writing, creating, and deleting. Directory handling is also needed and could be in this library. I consider these things essential because they are the doors to the world for a programming language. Consequently, they are next on the list for implementation, behind numeric libraries.

Paths are likely to be UNIX style but using the basic string class. File operation functions will have to convert to Windows paths internally. File path handling is sort of a no-win situation. Windows treats capitalized files as having the same name as their lower-case counterparts, whereas Unix and Linux-based distros do not. I plan on looking at how other cross-platform software handles this, so while it may not be an all-out-winning situation, at least it can be compatible with other software.

~ Documentation ~

The code base, being functional, deserves to be published now. However, I don’t feel this is a good idea without first publishing more documentation. What’s the point of sharing a car immediately when it will take you at least another month to explain it? The documentation isn’t “patchy” per say – I have a number of posts ready, but some articles are incomplete and are holding up the line, so I’ll be working on those.

The documentation isn’t just blog posts, though many of the details are explained in blog posts, and all the docs will probably be published as blog posts (why not?). A definitive guide to functions in Copper is in progress, and it should serve as both a reference guide and quick tutorial for anyone interested in using the language. A rough draft of the guide has already been completed and will be reviewed and edited alongside the completion of other tasks. The guide itself may be one of the last things to be published. In part, this is because it is built on previous information. In part, it is because it should be one of the most “recent” posts when users go to look up Copper. While it will have a dedicated link on the pages section, it being posted is sort of a milestone in the progress of this entire project. In short, it’s symbolic.

~ Conclusion ~

The virtual machine is now functional. I’ve tested it out quite thoroughly and found it to work correctly. It lacks completed documentation, and it lacks some essential functionality. Strings have been left as raw bytes and numbers are still literal numbers in effort to keep the VM small and out of expectation the user will add the functionality they need.

Schedule-wise, I should have documentation completed by early January, though I intend to release it in a spaced fashion (no bulk doc-dump all at once on the blog). The libraries for math are the next priority. How long this will take depends on the difficulty in learning the existing code-base. I plan to use MPFR C++ as a wrapper to GNU MPFR (itself an extension of GNU MP) for the arbitrary precision math operation libraries, but I intend to convert Copper numbers to primitive C++ data types for the simple library. The simple library would normally be completed in around a week (plus another week for debugging), but the Christmas season is important to me, so I may be on hiatus until after Christmas.

~ Merry Christmas! ~

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s