The more complicated the syntax, the more likely it is for me to forget something. I would rather focus on design than on the quirks of the syntax in whatever language I happen to be working.
To simplify the syntax, you must simplify the overall mechanisms of the language. Does this translate into a loss of capability? Yes and no. You can do a great number of things in LISP, which has a simple syntax. The real loss is in ease. It is inconvenient to simulate classes in LISP, and in a way, you probably wouldn’t want that anyways. Python, on the other hand, maintains a relatively simple syntax overall, but has a number of language features that add both complexity and power. Java is beautiful, for the most part, but there are some extra features (like generics) that add a measure of complexity detracting from readability and ease of writing.
To maintain both simple syntax and power, a language must be built around the correct paradigm. Often, languages resort to simplifying storage rather than simplifying the underlying paradigm. Dynamic languages are this way: every variable becomes an automatically-typed variable. But this leaves languages with inevitably facing the problem of needing a placeholder. We could assign zero to all our variables, but that would not be suitable in every case, and almost instinctively, programmers resort to using that horrible stand-in: The “null” object.
The correct paradym can be formulated by understanding the minimum criteria necessary for a language to function. Null is not necessary, even when interacting with languages that use it, so long as there is an appropriate default. There are, arguably, only a few things that a high-level programming language needs for it to be capable of completing any desired programmatic task (ignoring, of course, the underlying mechanisms to make these things possible):
- Mutable Storage
- Containers for storing other storage
- Byte data
Notice that “goto” is not included. While memory registers and the goto command, as shown by Assembly, can perfom the tasks we need, these approaches are not characteristic of high-level programming. (Incidentally, my programming language Goldfish had capabilities more powerful than a simple “goto” command might have had (should it have been implemented), but such functionality was almost unreadable when in code.)
“Mutable Storage” is often translated as “Variable”, but it can appear in other forms. The basic idea is to tie a name to some data. For Python, such things are dynamic variables. For LISP, such things are created with “define”. We don’t really care how the data is saved in the storage, only that the storage can eventually hold something else.
“Containers for storing other storage” could be interpreted in a number of ways. Your mutable storage might contain lists or it might have child members or both. Technically, only one of these two options is technically needed, though you should have the appropriate built-in functions to compensate for the lack of one or the other.
There is also some consideration as to whether or not pointers are required. That is to say, do I need a storage (perhaps mutable) that directly affects what is stored somewhere else? I cannot prove that pointers are necessary, but I would find any language without them to be ridiculously difficult. In many cases, a language’s syntax hides the fact that there are, in fact, addresses being passed around and saved in pointers. This reduces the tediousness of using the language, but it is often responsible for subtle bugs.
“Byte data” is a necessity of any good language (not to say that you can’t make a language without it). Some languages embrace it and some try to hide it. Java seems to go to great lengths to make you pretend you aren’t working with actual byte data while still giving you the optional datatype “byte”. Python 2.7 works with raw binary strings while Python 3 tries to shield the character system from it (which has irritated a number of programmers, apparently). C++ does very little to disguise the raw bytes, even letting you set enumeration values using hexadecimal values. In many languages, byte data shows up in at least one basic (restricted) form. At the very least, a language could perform all its necessary byte operations using just boolean and the appropriate built-in functions. However, that’s painful, and we usually shortcut it by making it possible to write in terms of base-10 numbers and strings. Numbers and characters are byte data, yes, but not always raw byte data (as in C) and obviously the operations you are able to perform on them are limited by the built-in functionality of the language with respect to such data types (unless type casting or conversion is available). Some languages prefer to treat this data as constant or at least immutable. This is the safe approach, since it guarantees not messing up lots of other things. It does require more work on the part of the programmer, however. One reason is because you not only have to work with mutable storage but you also have to work with the containers of that mutable storage so that pointers always remain capable of accessing that mutable storage (and don’t simply become detached when the data is detached). In Python, for instance, strings are constant. If you assign a string to a variable, it stops pointing to its current data and points to this new string. If you want to change the string while continuing to retain access to the same variable, your pointer has to point to the parent variable of the variable you wish to change.
The minimum basic forms of byte data needed for sane programming are raw (that is, it exists internally in the (what I’ll call) “exact” form in which it is represented to the programmer), boolean, numbers, and strings (sometimes consisting of single characters, esp. if the language makes no provisions for single characters). That does not mean other forms of byte data may not be included. Byte data might include objects or pointers to objects from another language. This allows for interesting cross-language interactions. For example, it might be helpful to represent a data table in one language with an object in another, storing references to data tables in the higher level language and using that higher level language to control how that data is used in bulk.
Raw byte data sequences are slightly different than strings (specifically, strings formed by sequences of bytes, as opposed to character objects in an array or list). Strings tend to be limited in what bytes they hold (ASCII, UTF8, UTF16, UTF32) (for psychological reasons), whereas raw byte data sequences might be the random bits you obtain from a file and need to interpret. Obviously, to handle all possible file formats, handling raw byte data and having functionality for manipulating it is required. At the same time, string objects greatly simplify the task of handling text. Since the two of these are closely related, it makes more sense to simply have functions that take raw byte data sequences and filter out the garbage that does not belong in a string, thus “converting” such byte data into a string. Therefore, strings are superfluous. One might think that having dedicated strings decreases the likelihood of security holes, but I question the extent of protection. You still have to filter your inputs, but at least when you have only raw data, it’s less transparent and should therefore cause you to think more about what the input actually is.
With all this in mind, I came to some very straightforward conclusions:
- Variables can store objects that double as both member containers and functions.
- Data can be static.
- Access to editing byte data is necessary.
As long as there are built-in functions for handling byte data and returning such data (for replacing old data with new data), it shouldn’t matter if byte data is constant.
Objects doubling as member storage and functions does something very important: It reduces object typing to a single type. This means no null-pointers. Every object works exactly the same. This simplifies syntax and simplified rules for expected behavior. (More on that later.) A variable thus has a single type: mutable storage of object pointer.
One last thing before ending this article…
Context-free grammar is desired. This means keywords (words with special meaning in certain constructions) should be avoided. While they make sense in Python and certain other languages for readability, they are perhaps one of the reasons people despise FORTRAN. They require the programmer to remember certain rules in certain contexts and to not make the mistake of giving their variables the same names as the keywords. While remembering such keywords (and not to use them) is not much more strain on the brain as remembering any other reserved word, their rules add complexity to the syntax.
In conclusion, hopefully you can see how the information here rationalizes my approach for writing this language. I’ll try to explain more as I cover the different aspects of the language.