So it has been a while since I posted any of the updated progress in the NewScript in C front. And I'm pleased to announce that there is yet another version of the NewScript programming environment in C! What has stayed the same is the instruction encoding, literal encoding, and the main execution loop. What is different is the instruction set, naming conventions, code size, and formatting. As I've been doing more and more programming with the NewScript compilers, I've been revising the instruction set to better match the types of code that I've been writing. Some of the issues are cosmetic, but most of the changes have to do with reducing the semantic gap between what you can say and what you'd want to say. In this spirit, I've espoused the following design principles:
NewScript - Design Principles
- Personal Mastery - anyone should be able to learn the system in its entirety
- Direct Manipulation - code is data, data is directly manipulable by the programmer.
- Contextual Semantics - meanings are contextual, as in all human languages
- Expressive and Concise - both code and documentation must convey meaning for humans
- Comprehendible by Design - the software is simple enough to be fully understood
- Sustainable Software - the software must be clean, efficient, and maintainable
NewScript - The Instruction Set
The new instruction set has 32 methods, broken down into 5 basic categories:
- Flow
. ! ; ? ( )
- Stack
_ <- -> ^ :
- Register
% #% , # @ #@ $ #$
- Math
- + * / << >>
- Logic
~ & | \ = < >
This instruction set is the list of methods for the Core object, which defines the base context for all code. If you read the documentation on the old NewScript instruction set, or used the current web app, quite a few of the terms have changed. There is also 10 few instructions, than in the current webapp version. Of particular note is how flow control has changed. The words in the flow control method list consist largely of typical punctuation marks. This is intentional, as to bridge the gap between English and your programs.
Flow
- . return
- The period represents return, and will return to the value on the top of the return stack
- ! call
- The exclamation point represents a function call to the value on the top of the data stack. This is useful for vectoring.
- ; continue
- The semicolon produces a coroutine call, by calling the value on the top of the return stack. This allows you to write simple green threaded code.
- ? conditional branch
- The question mark will branch to the address on the top of the stack if the next value on the stack is non-zero
- ( for
- The left parenthesis mark indicates the start of a counted loop, and pops the loop count off of the stack
- ) next
- The right parenthesis mark decrements the loop count and tests to see if it is zero, if not it jumps to the start of the loop whose address is on the return stack
Stack
- _ drop
- The underscore drops the value on the top of the stack. Since the stack is 8 cells and circular, 8 drops in a row does a nop
- <- push
- The left arrow pushes the top of the data stack onto the return stack
- -> pop
- The right arrow pops the top of the return stack onto the data stack
- ^ over
- The carrot copies the next value on the data stack above the top of the stack, hence ab ^ aba
- : duplicate
- The color duplicates the top of the data stack
Register
- % object
- The percentage mark sets the obj register to the top of the stack
- #% get object
- The combination hash percentage gets the value in the obj register
- , set
- The comma stores the value on the top of the stack to the address stored in the destination register, $, and increments the destination register
- # get
- The hash mark fetches the value stored at the address contained in the source register, @, and increments the source register
- @ source
- The amphora sets the value of the source register to the top of the stack
- #@ get source
- The hash amphora combo fetchs the value of the source register
- $ destination
- The dollar sign sets the value of the destination register to the value on top of the stack
- #$ get destination
- The hash dollar sign combo fetches the value of the destination register
Math and Logic
The math and logic methods are rather easy to understand. Unlike the funny named ones above, these are pretty straight forward:
| Math | Operation | Logic | Operation |
|---|
| - | negate | ~ | compliment |
| + | add | & | and |
| * | multiply | | | or |
| / | divide & modulus | \ | xor |
| << | shift left | = | equality |
| >> | shift right | < > | less than, greater than |
The only tricky thing here is that there is no subtraction, and the valid range of literal values ranges from 0 - 07fffffff in hexidecimal notation. So if you want to perform subtraction, what is typically done is adding a negative. Hence you'll see code such as:
subtract - + .
A Simple Program
Now that you understand the Core object methods, I can demonstrate a little program that I wrote to test the keyboard handling and printing out an ncurses interface. Currently if you read/write to an address of -1 you'll interface with the keyboard and terminal window. This is currently the only device interface on the new C vm, and I'll bring it in line with the OpenGL + Stereo sound of the old one soon. But for experimenting with new ideas, the current code base is sufficient. With out further ado, a simple program:
Copyright 2009 David J. Goehrig
The Term object provides basic interaction with the terminal window by driving character values out the port
located at 0ffffffff. Currently that address is the read/write port for all text io.
Term
key 1 - @ # .
emit 1 - $ , .
The Character object tests to see if the tos has the given key value
Character
space 32 .
tab 9 .
enter 10 .
whitespace? : space = ^ tab = | ^ enter = | .
The KeyTester object tests to see if various characters entered at the keyboard are white space
KeyTester
run Term key : emit Character whitespace? 66 + Term emit _ 32 emit .
Finally the App start method is the thing that kicks this whole thing off.
App
start 1 - ( KeyTester run ) .
The End
Comments, Legible Code, and Formatting
In this example, you can see how comments (lines which start with no tabs) are intertwined with code (lines which start with 1 or more tabs). This design allows for documentation to be a first class citizen. The excuse programmers give for not documenting their code is that as the code changes, the documentation skews. In most programming languages, comments require special formatting, extra delimiting characters which set them off from code. In NewScript, it is code that we distinguish by indenting it. This makes it easy to maintain your inline documentation, as you're not fighting your code formatting. NewScript has no end of line documentation like other languages.
Objects are declared by writing a word on a line with a single tab before it. By convention the object's name should start with a capital letter. This makes it easy to distinguish from ordinary method calls. When an object's name appears in code, it changes the context, so that all method invocations are sent to it, rather than whatever the current context being defined is. This allows you to change context, as necessary, within a definition.
Methods of objects are declared by placing two tabs on the line before the name of the method for the current object. If you changed context in the last definition, the context will be reset to the proper object. Methods can be called recursively, and require no special care, beyond the correct context being set. A method of the current object is invoked simply by writing the method name in a definition. A method which appears before a period or question mark will be optimized as a tail call. As a rule, loops containing method calls should not contain nested function calls more than 6 deep.
NewScript requires highly factored code. The flow control structures are designed to promote defining single lines of code. The use of parenthesis to indicate a looping construct intentionally matches some data-flow analysis techniques, which use similar notation to indicate repetitions. This sample application, uses a loop with a loop index of -1 which is the largest loop count you can have. To preform an infinite loop, one would use a recursive function, rather than a stack expensive loop construct like this:
App
start KeyTester run App start .
NB: We had declare the recursive application of App start in full, as we had changed the context to KeyTester! This is similar to how KeyTester run switches between Term and Character and back to Term to handle the io and value tests. All in all, this syntax pattern closely matches the Subject verb agreement that most English speakers will find refreshingly familiar.