The Task Processing Model – Branch Bobcat

ATTENTION: This applies to the first two branches of Copper, known as “fn” (original, using “fn” for function start syntax) and “Bobcat” (the version to begin using the square bracket syntax). The task-processing system has been replaced in the new master branch known as “Cheetah”.

Copper is a very simple language. Unsurprisingly, the language requires a number of little quirks on the backside that make the model I’ve chosen for processing tokens to look like gold in hindsight. … Maybe I’ve just done this before and learned from experience. 😉

I wanted the virtual machine (VM) to double as a REPL, not merely be for file-reading. File reading is nice in that the number of tokens being input are limited to the file size, and we can play all sorts of wonderful little games that require having a limited set of tokens. For example, we could require syntax rules that allow for naming functions that aren’t defined until later in the file (or perhaps even in another file). We could perform preprocessing techniques that could simplify the code or allow it to be compiled into a smaller bytecode. These things are nice, and there are plenty of other languages that utilize this amazing benefit. Copper doesn’t need it, and neither does this VM.
One of the defining qualities of a REPL (at least in my book) is that it can process virtually infinite number of tokens (assuming we have the RAM for it). The only limiting factor should be the amount of memory that is required for each variable that is to be used. It should not be limited by old data that is unreachable and (at least should be) erased. As I spoke before in a previous article (Interpreter > Semantics), the nature of languages that don’t terminate with newlines tends to make it rather difficult to create an abstract parse tree when the VM needs to work as a REPL. Yes, a tightly-controlled queue of tokens could be used, but the ideas I had for such a system were slower than a task-queue system, which my VM currently uses.
For the sake of completeness, let me have a word about the controlled-token-queue-based systems that I had in mind for Copper specifically (as opposed to other languages). Queues could be used in a number of ways, and as the reader may well know, I haven’t thought of all of them. Let’s stick with the simplest and fastest.
In one system, we could have the different duties or tasks request a pattern match of tokens. Tokens would be gathered until that pattern is fulfilled, in which case they would be submitted to the rest of the engine for further processing. Failing to fulfill the patterns available would result in an error. Such a system is very, very nice for complex patterns. However, I believe such a system is better to implement when the language itself is compiled and the overhead of such a system is eliminated.
A second type of queued-token-based system is one where a requested number of tokens is required to be buffered or perhaps certain tokens are considered deliminators and cap the buffer input. Here, the buffer is the current limitation. The easiest way to attack such a system (presuming it uses a managed array as a buffer) created for Copper would be to begin with a function-body-opening-token and never follow it with a closing token. Using a list for the buffer counters this problem, but if you want to perform pattern matching on the entire strand of tokens, lists are slower than arrays.
Buffering is fine when we’re loading files, but the console should have both a limited buffer size and still enable continuous reading. Obedient to this prescription, the main run() function of the virtual machine Engine uses a buffer, but it doesn’t control the termination of runtime or influence what processing occurs for tokens (except in the case of strings, numbers, and comments which need to be completed; but this may change).
Getting to the true heart of the token processing of Copper, I suppose I should also admit that the language of Copper is designed to be such that the first token always determines the task and subsequent actions can be inferred from successive tokens. Having a full string of tokens in order to perform pattern matching is unnecessary.
Tokens in my virtual machine are processed using a task-queue system. A task could be things such as the duty of constructing a function or the duty of converting ownership of pointers. In the VM, tasks are represented by task struct instances, which contain all of the variables related to the task at hand. These structs are generated whenever a token cannot be immediately interpreted as data. The task is put on a queue and only removed when it has finally been given a token that ends the task or an error occurs.
Some task instances handle more than one task in the VM. This is due, in part, to the fact that the first token does not immediately indicate the exact task that is going to occur, but does indicate a set of related tasks (or rather “sub-tasks”) that can occur at that point. For example, the struct TaskFunctionFound in the VM handles all tasks where a random variable is encountered. Its sub-tasks include:

  • Directly returning the variable’s function’s data as an object.
  • Assigning a new function to a variable.
  • Converting a variable into a pointer.

Tasks can require several processing cycles to complete, so each task struct has its own state to indicate which method is called next to continue its processing. Tasks generally begin by processing the next token. If they can’t use it, it may be an error, but it is often left for further processing to interpret that token and convert it into an object. The object is then processed with the task again where it is either “eaten” or causes the task to pop off the task queue. It is also possible that the new object itself may create a task because its actual value is contextually determined.
For example:
a(b(c))

In this case, a variable named “a” is found. A task is created, requiring that the next token be checked. If the next token isn’t usable, the function pointed to by “a” is returned. In this case, however, the next token indicates that the function stored in “a” should be run. This task then waits for parameters. The parameter “b” is encountered, and like “a”, its return is contextually determined, so yet another task is created for “b”. Finally, the variable “c” is encountered, and the token encountered on the next processing cycle is not useful for “c”, so its task ends and the function stored in “c” is returned. This value is encountered by the task for “b” in the object-processing section of processing. The task for “b” can handle this object, but it continues to sit on the task stack to see if there are more parameters. On the next processing cycle, the parameter-body-closing token (the closing parethesis, “)”) is encountered, which is processed by the task for “b”. This results in the function stored in “b” to be run, followed by the task for “b” being popped from the task stack. The same happens with “a”, and the task stack becomes empty.

The inner mechanics of all this are quite fast and the true C++ stack remains small. If an error occurs, the VM function stack and the VM task stack can be easily cleared. There are very few nested processing loops, so the need to write baby-sitting code is unnecessary. What do I mean? Here’s an overly simplified layout of the two approaches I had tried for object processing. The first is Goldfish, and the latter is Copper.

Goldfish


ErrorFlag Engine::process() {
	// let objects interpret tokens
	object->process(this, token)
	// handle interpreting token (if object process didn't use it)
	interpret_token()
	// let objects handle objects
	object = get_last_object_being_processed()
	object->process(this, obj)
}
class Object {
	process(engine, token) {
		// babysitting, from nesting of process()
		switch( engine->process() ) { handle error case }
		// more stuff
		// babysitting, from nesting of process()
		switch( engine->process() ) { handle error case }
	}
	reprocess(engine, obj) {
	}
}

Copper


ErrorFlag Engine::process() {
	// let tasks interpret tokens
	process_with_topmost_task(token)
	// handle interpreting token (if task didn't use it)
	interpret_token()
	// let tasks handle objects
	process_with_topmost_task(obj)
}

process_with_topmost_task(token) {
	switch( current_task.state ) {
	state 1:
		do prep
		break
	state 2:
		add_task_to_top
		break
	}
}

process_with_topmost_task(obj) {
	switch( current_task.state ) {
	state 1:
		error
	state 2:
		use_object(task, obj)
		pop_task_stack
	}
}

To put it simply, the task replaces the object in the process() method, though the model inside of process() itself is strikingly similar.
At any rate, by not having to babysit code, I reduce typing, simplify required error handling, and – here’s a nice thing – I can STOP processing at any time.
Being able to stop processing at any time is an awesome benefit. It means that it’s much, much easier save the state of the virtual machine and reload it (assuming I or someone else implements that feature). Saving the state is a complicated task on its own, but at least this way, I wouldn’t be reconstructing a C++ stack or rerunning processing in order to return it to the state where it left off.

Conclusion

In short, Copper uses a task-queue for processing, which is superior to the nested processing that was used in my previous virtual machine for Goldfish. It has also simplified debugging (though it has created a great amount of similar but slightly different code, interestingly enough). Task structures are used for holding states which are then processed by the engine itself as opposed to objects containing the states and processing methods that apply to themselves.
The limitation of the task-queue method is that the syntax of the language itself is not extensible. For Goldfish, the syntax was (which gave a great deal of power and flexibility to the programmer creating an extension Object). For Copper, the syntax is not, and I consider this to be a very good thing (as it guarantees reliability of the code).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s