benchmarking follow-up, optimization comments (master / branch 2, v 0.17)

After the disappointing times in the previous benchmarking tests, some things received optimization… and some things, reconsideration.

I changed the way system functions are searched for. Now a round-robin hash table is used. I tested it again – with optimizations – using the following test:


benchmark = {
	m = MSec_ClockTime()
	i = int(0)
	loop {
		if ( int_equal(i: 1000000) ) { stop }
		i = int+(i: 1)
	}
	mm = MSec_ClockTime()
	print(MSecTime_rdc(mm: m:))
}

Times in milliseconds:
10311
10434
10483
10472

I managed to save about a half-second.
Again, I tested the loop version, but with optimizations:

benchmark = {
	l = List()
	m = MSec_ClockTime()
	i = int(0)
	loop {
		if ( int_equal(i: 1000000) ) { stop }
		i = int+(i: 1)
		Push_top(l: i:)
	}
	mm = MSec_ClockTime()
	print(MSecTime_rdc(mm: m:))
}

Times in milliseconds:
15637
15684
15779
15614

So I only managed to save close to 1 second instead of 2. 😦 Yes, some work is still needed.

The next optimization is the removal of lists in scopes. This also eliminates the need to check for non-null variable tables (since there can only be one table, which must always be initialized).
Again, I ran the above code.

Loop 1 times in milliseconds:
10108
9818
9980
9831

Loop 2 times in milliseconds:
15247
15597
15290
15337

Around 3/10ths of a second time savings in total.

The biggest question with this is whether some of the changes I’ve made will break other things. So far, after some testing, it looks like it hasn’t, so that’s a good sign.

The time savings from removing list scopes was minor, as expected. On the bright side, 3/10ths of a second in savings is pretty huge if we’re considering the fact that the target speeds are about that size.
The reality is, the direct copying of scopes is expensive. The repeated saving to the variable “i” (as well as the destruction of its old components) is a time sink that could be dramatically improved by using functions that edit the data WITHIN the function rather than replacing the function entirely. For example, for integers, such functions are incrementors (e.g. a++ ), decrementors (e.g. a– ), and math-applied-to-self (e.g. operator::+= and operator::-= in C++).
Copper does not have these at the moment because they prevent safe multi-threading. However, as data is copied by default, it makes sense to allow such an incrementor.

Using ++

For experimentation, I added an incrementor (called “++”), and repeated the loop using it. Note that this is NOT the same as a math-applied-to-self operator.

benchmark = {
	m = MSec_ClockTime()
	i = int(0)
	loop {
		if ( int_equal(i: 1000000) ) { stop }
		++(i:)
	}
	mm = MSec_ClockTime()
	print(MSecTime_rdc(mm: m:))
}

Times in milliseconds:
8532
8312
8390
8619

A full second and almost a half speedup! It’s worth noting that there is one less parameter to the function. The more parameters a function has, the more it costs to set them up and pass them. Every time a parameter is passed, the process loop has to go through a number of function calls in order to collect the value being passed. This operation is slower than calling the actual function. The only way to speed this up is to parse this stuff in advance and make the execution less dependent on performing long processing cycles.

VM Engine Remake

To speed things up, parsing should occur only once. While it is possible to use the current engine to do this, it would only convolute the system, and much of the processing would be duplicated and occur in such a different way that it makes sense to only have one such system.
The idea of remaking the entire engine is not appealing, but it is likely to be faster and less work (surprise!). The remake would separate the parsing from the execution of operations. It would likely require less functions, since the current engine requires several methods per task.
One of the benefits of the current system is the immediate execution without the immediate function parsing. The latter functionality can be reproduced in the remake, but the former requires knowing where the end of an expression occurs. Separating the parsing from the execution for the sake of speed means that the execution part can’t be processing while the parsing is occurring. In fact, it even requires that the next, unrelated token occur. This is fine in only some cases. One exception is if-statements (specifically, at the end of a file).
Here’s the problem: The following code would immediately print in the current engine but not the remake:

if ( true ) { print("hi") }

In the remake, the parser would be waiting for the possibility of “elif” and “else” blocks that, if NOT found, result in the if-statement being passed to the execution part of the engine.
In the remake, to get the execution to start, you would need to do something else, like init a variable:

if ( true ) { print("hi") }
a

I’m fine with it, but someone using the Copper VM as a command-line program might find this odd. It depends on whether they realize the evaluation won’t occur until after the if-statement has been shown to completely end. I have figured out a way that avoid this issue when the stream ends, but this won’t work in the terminal or command prompt.

The new system would likely be “stackless” (that is, the stack would not be shared with the C++ stack – in other words, no recursion). This allows for integration with debugging hooks. How so? For example, the “stackless” version allows for easily sharing the engine stack – which allows for full description of the engine state – as well as pausing the processing / execution at almost any time.
Based on the plans I have thus far, I should be able to reuse tons of parsing token (including within the engine) since it’s primarily the processing part that needs to be changed.
Also, it’s quite possible I’ll be able to add “fn” as a keyword again. I don’t plan on reinstating parentheses as parameter body wrappings, however.
Finally, I’ve figure I can actually avoid total stack crash when syntax errors are encountered within functions. Meaning the global scope isn’t reset (as it is now), which also means users won’t need to worry about their REPL contents being deleted from a syntax mistake. This robustness complements the intention of Copper and would enhance the virtual machine.

All that said, I’m greatly leaning on remaking the processing part of the VM. It will be alot of work, and it means I have to walk away from all the bug fixes and great progress I made building up the current system, but that stuff will at least get to sit in the repository. I’m fairly confident that, while I may not see the exact solutions to all the eventual issues with the new design just yet, it will be easier to see solutions as things become more concrete.
Now to publish this blog post before it becomes outdated.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s