Mycelium/hyphae/instructions.toml
Ava Affine 258ee56149
All checks were successful
per-push tests / build (push) Successful in 1m26s
per-push tests / test-frontend (push) Successful in 1m35s
per-push tests / test-utility (push) Successful in 1m41s
per-push tests / test-backend (push) Successful in 1m50s
per-push tests / timed-decomposer-parse (push) Successful in 1m50s
WIP: elaborate on Hyphae in instructions.toml
Signed-off-by: Ava Affine <ava@sunnypup.io>
2025-08-14 21:35:30 +00:00

659 lines
17 KiB
TOML

description = """
HyphaeVM is a bytecode VM that aims to provide a simplified instruction set to
language implementors and other programmers who wish to use higher level
features without making too many compromises on overhead or performance.
The simplified instruction set greatly reduces the work in language design and
allows for simpler compilers overall. Meanwhile, the VM still meets performance
needs for modern application development.
HyphaeVM contains an instruction set, instruction set implementation, garbage
collection (reference counting), error handling, dynamic number package, vector
based data types, cons cell based dynamic data types, trap functions that
are programmatically extendable, as well as faux-registers for mutable access
to datum in an otherwise immutable stack based VM.
"""
datum = """
HyphaeVM instructions operate on Datum. A Datum can hold one of many data types
(see data types). The Datum type is implemented as a union type over each
data type's underlying form. Each Datum as stored in the VM is reference
counted. Each Datum will be automatically deallocated when it is no longer
referenced anywhere in the VM state.
Given that datum are reference counted it is possible to make both shallow and
deep copies to a source datum (see instructions: link and dupl). Information on
whether a datum is a shallow or deep copy of another datum is not accessible at
runtime without custom trap functions. It is up to the programmer to track what
they themselves have created.
Best of luck, friend.
"""
error_handling = """
The VM has fields for error_state and can store any given datum as an error.
Use the PANIC instruction to store an error, set the error state, and halt
HyphaeVM.
"""
sym_table = """
A symbol table is provided as part of HyphaeVM. It will map symbols to valid
address (see addressing modes). This is not provided for the implementation of
variables in languages. It is recommended that any {trans|com}piler implemented
for HyphaeVM reduce variables to Datum on the stack. However, the symbol table
is very useful for linking with library code or adding debug symbols to an
application.
"""
traps = """
HyphaeVM includes a trap vector. VM extenders can use this to store platform or
language specific functions that can then be called from bytecode.
"""
[[registers]]
name = "expr"
description = """
The expr register acts as a default return value store for instructions that
generate new data. Many instructions will set expr. Some instructions will even
use expr as an input.
The expr register provides mutable access.
"""
[[registers]]
name = "operand"
description = """
There are four operand registers. These each can be used as a type of scratch
space for oeprating on Datum without pushing to or popping from the stack.
The operand registers provide mutable access.
"""
[[registers]]
name = "error"
description = """
The error register is set by PANIC and is accessed by the VM to explain an
error state.
The error register does not provide mutable access.
"""
[[registers]]
name = "ictr"
description = """
The ictr register acts as the well known "pc" register in many CPUs... With the
caveat that the program is indexed per instruction and not per byte. This is
because the VM has its own logic to deserialize instructions from bytecode so
there is no reason not to rule out a whole class of errors where a bad offset
causes the instruction loader to start loading with some operand.
The ictr register does not hold a datum. Just an underlying native unsigned
integer (usize).
"""
[[data_types]]
name = "number"
description = """
The dynamic number type is defined in the 'Organelle' package. It is a number
built to enable implementation of the Scheme R7RS "small" specification. The
number type may be stored with any variety of underlying implementation.
NOTE: The number type is currently undergoing a redesign and will be
reimplemented as a more efficient and predictable type.
"""
[[data_types]]
name = "string"
description = """
The string type is implemented by a vector of bytes. It implements a superset
of the functionality that a bytevector implements.
"""
[[data_types]]
name = "bool"
description = """
The boolean type is implemented as whatever Rust chooses to represent it.
"""
[[data_types]]
name = "cons"
description = """
The cons cell is implemented as a pair of datum. This can contain any type in
either field. Data is referenced and not fully encapsulated within this type.
The cons cell can be used to create linkedlists, or any other dynamic data type
that relies on heap allocated units.
"""
[[data_types]]
name = "char"
description = "a single byte"
[[data_types]]
name = "vector"
description = """
A vector is a list of Datum stored in a contiguous block of memory. It is
represented by the Rust Vector type.
"""
[[data_types]]
name = "ByteVector"
description = "A bytevector is a vector that only contains individual bytes"
[[data_types]]
name = "None"
description = """
The none datum is a null type. It is not checkable or creatable by any
instruction except clear.
It is requested that programmers refrain from implementing custom traps to use
this type. Doing so is in incredibly bad form. If one is finding themselves
attempting to use None datums it is advised that they rethink their program
logic.
"""
[[addressing_modes]]
name = "expression"
mutable = true
symbol = "$expr"
example = "inc $expr"
description = """
The expression register is used as a default output, or input by many
instructions (see registers).
"""
[[addressing_modes]]
name = "operand"
mutable = true
symbol = "$oper<N>"
example = "add $oper1, $oper2"
description = """
There are four operand registers N=(0, 1, 2, 3, and 4) (see registers).
"""
[[addressing_modes]]
name = "stack"
mutable = false
symbol = "%N"
example = "dupl %0, $expr"
description = """
Stack addressing mode takes an index (N). This index is used to get the Nth
element from the top of the stack.
Keep in mind that any push instruction will then shift the element that a given
stack index refers to.
"""
[[addressing_modes]]
name = "instruction"
mutable = false
symbol = "@N"
example = "jmp @100"
description = """
Instruction addressing takes an index (N). The index represents the Nth
instruction in the program. Given how deserialization works in HyphaeVM, this
index does not have to account for operands... just instructions.
"""
[[addressing_modes]]
name = "numeric"
mutable = false
symbol = "N"
example = "const $expr, 100"
description = """
Numeric addressing mode accepts a single unsigned 8 bit integer as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[addressing_modes]]
name = "character"
mutable = false
symbol = "'N'"
example = "const $expr, 'c'"
description = """
Character addressing mode accepts a single character as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[addressing_modes]]
name = "boolean"
mutable = false
symbol = "{true|false}"
example = "const $expr, true"
description = """
Boolean addressing mode accepts a single character as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[instructions]]
name = "trap"
args = ["index"]
output = "result of function"
description = """
The trap instruction will accept as its argument only a numeric constant.
This constant will be used as an index into the VM trap vector. Once accessed,
the VM triggers the corresponding callback, which may vastly mutate VM state.
"""
[[instructions]]
name = "bind"
args = ["name", "operand"]
output = ""
description = """
The bind instruction will accept only a string datum as its name input. It
then maps the name to whatever address the operand input references in the VMs
symbol table.
"""
[[instructions]]
name = "unbind"
args = ["name"]
output = ""
description = """
The unbind instruction will accept only a string datum as its name operand. It
then removes the mapping that corresponds to name from the VMs symbol table.
"""
[[instructions]]
name = "bound"
args = ["name"]
output = "expr = true if name is bound"
description = """
The bound instruction will accept only a string datum as its name operand. It
will test if the name is already bound in the VMs symbol table. The expression
register will be set to a boolean datum representing whether or not the name is
bound.
"""
[[instructions]]
name = "push"
args = ["operand"]
output = ""
description = """
The push instruction accepts one operand of any type. It will push a deep copy
of the input onto the VM's stack.
"""
[[instructions]]
name = "pop"
args = []
output = "first datum on top of stack"
description = """
The pop instruction removes the first element at the top of the VMs stack. The
expression register is set to the element returned in this manner.
"""
[[instructions]]
name = "enter"
args = []
output = ""
description = """
The enter instruction creates a new stack frame. Subsequent push instructions
apply new elements to a separate stack that corresponds to this frame. Stack
indexes will still access across all frames as if they were one unified stack.
"""
[[instructions]]
name = "exit"
args = []
output = ""
description = """
The exit instruction deletes current stack frame. All information is simply
discarded. The stack fragment corresponding to the previous stack frame is then
subject to subsequent push or pop operations.
Together, enter and exit are useful for making sure that a dynamic routine that
makes use of the stack is properly cleaned up after.
"""
[[instructions]]
name = "link"
args = ["src", "dest"]
output = ""
description = """
The link instruction shallow copies the src operand into the destination that
the dst operand specifies. Shallow copy of source operand increases its
reference count.
Destination operand requires mutable access.
For more information on shallow vs deep copy see datum.
"""
[[instructions]]
name = "dupl"
args = ["src", "dest"]
output = ""
description = """
The dupl instruction deep copies the src operand into the destination that the
dst operand specifies.
Destination operand requires mutable access.
For more information on shallow vs deep copy see datum.
"""
[[instructions]]
name = "clear"
args = ["dest"]
output = ""
description = """
The clear instruction sets whatever destination is specified by its operand to
a None datum.
Destination operand requires mutable access.
Please do not use the clear instruction to try to work with None datum. It is
provided for cleanup/cleanliness purposes. This can be used to destroy a
shallow copy, decreasing its reference count.
"""
[[instructions]]
name = "nop"
args = []
output = ""
description = "no operation"
[[instructions]]
name = "halt"
args = []
output = ""
description = """
The halt instruction sets the VM running state to false. This halts the VM.
"""
[[instructions]]
name = "panic"
args = ["error"]
output = ""
description = """
The panic instruction accepts an error operand and shallow copies it into the
error register. Then, error_state flag in the VM is set and the VM is halted.
"""
[[instructions]]
name = "jmp"
args = ["addr"]
output = ""
description = """
The jump (jmp) instruction accepts only an instruction addres (see addressing
modes). It sets the ictr register to the referenced instruction index.
"""
[[instructions]]
name = "jmpif"
args = ["addr"]
output = ""
description = """
The jump (jmp) instruction accepts only an instruction addres (see addressing
modes). It sets the ictr register to the referenced instruction index if and
only if the expression register holds a boolean true value... So make sure to
set the expression register.
"""
[[instructions]]
name = "eq"
args = ["a", "b"]
output = "a == b"
description = """
The eq instruction performs an equality test and sets the expression register
to the resulting boolean value. In this case "equality" is set by the Rust
PartialEq trait logic as derived across the datum type (hyphae/src/heap.rs).
"""
[[instructions]]
name = "lt"
args = ["a", "b"]
output = "a < b"
description = """
The lt instruction accepts two number datum and performs a numeric less than
test. The expression register is set to a boolean value based on whether the
first input is strictly less than the second input.
"""
[[instructions]]
name = "gt"
args = ["a", "b"]
output = "a > b"
description = """
The gt instruction accepts two number datum and performs a numeric greater than
test. The expression register is set to a boolean value based on whether the
first input is strictly greater than the second input.
"""
[[instructions]]
name = "lte"
args = ["a", "b"]
output = "a <= b"
description = """
The lte instruction accepts two number datum and performs a numeric less than
equals test. The expression register is set to a boolean value based on whether
the first input is less than or equal to the second input.
"""
[[instructions]]
name = "gte"
args = ["a", "b"]
output = "a >= b"
description = """
The gte instruction accepts two number datum and performs a numeric greater
than equals test. The expression register is set to a boolean value based on if
the first input is greater than or equal to the second input.
"""
[[instructions]]
name = "bool_not"
args = []
output = "expr = !expr"
description = """
The
"""
[[instructions]]
name = "bool_and"
args = ["a", "b"]
output = "a && b"
description = "boolean and"
[[instructions]]
name = "bool_or"
args = ["a", "b"]
output = "a || b"
description = "boolean or"
[[instructions]]
name = "byte_and"
args = ["a", "b"]
output = "a & b"
description = "bitwise and"
[[instructions]]
name = "byte_or"
args = ["a", "b"]
output = "a | b"
description = "bitwise or"
[[instructions]]
name = "xor"
args = ["a", "b"]
output = "a xor b"
description = "bitwise exclusive or"
[[instructions]]
name = "byte_not"
args = []
output = "expr = !expr"
description = "bitwise not"
[[instructions]]
name = "add"
args = ["a", "b"]
output = "a + b"
description = "numeric addition"
[[instructions]]
name = "sub"
args = ["a", "b"]
output = "a - b"
description = "numeric subtraction"
[[instructions]]
name = "mul"
args = ["a", "b"]
output = "a * b"
description = "numeric multiplication"
[[instructions]]
name = "fdiv"
args = ["a", "b"]
output = "a / b"
description = "numeric FLOAT division"
[[instructions]]
name = "idiv"
args = ["a", "b"]
output = "a / b"
description = "numeric INTEGER division"
[[instructions]]
name = "pow"
args = ["a", "b"]
output = "a ^ b"
description = "numeric operation to raise a to the power of b"
[[instructions]]
name = "modulo"
args = ["a", "b"]
output = "a % b"
description = "numeric modulo operation"
[[instructions]]
name = "rem"
args = ["a", "b"]
output = "remainder from a / b"
description = "remainder from integer division"
[[instructions]]
name = "inc"
args = ["src"]
output = ""
description = "increments number at source"
[[instructions]]
name = "dec"
args = ["src"]
output = ""
description = "decrements number at source"
[[instructions]]
name = "ctos"
args = ["src"]
output = ""
description = "mutates a char datum into a string datum"
[[instructions]]
name = "cton"
args = ["src"]
output = ""
description = "mutates a char datum into a number datum"
[[instructions]]
name = "ntoc"
args = ["src"]
output = ""
description = "mutates a number datum into a char datum"
[[instructions]]
name = "ntoi"
args = ["src"]
output = ""
description = "mutates a number datum into its exact form"
[[instructions]]
name = "ntoe"
args = ["src"]
output = ""
description = "mutates a number datum into its inexact form"
[[instructions]]
name = "const"
args = ["dst", "data"]
output = ""
description = "sets dst location to constant integer data"
[[instructions]]
name = "mkvec"
args = []
output = "a blank vector"
description = "creates a new vector"
[[instructions]]
name = "mkbvec"
args = []
output = "a blank bytevector"
description = "creates a blank bytevector"
[[instructions]]
name = "mkstr"
args = []
output = "an empty string"
description = "creates a new empty string"
[[instructions]]
name = "index"
args = ["collection", "index"]
output = "collection[index]"
description = "extracts element from collection at index"
[[instructions]]
name = "length"
args = ["collection"]
output = "length of collection"
description = "calculates length of collection"
[[instructions]]
name = "subsl"
args = ["collection", "start", "end"]
output = "collection[start:end]"
description = "returns a subset from collection denoted by start and end indexes"
[[instructions]]
name = "inser"
args = ["collection", "elem", "idx"]
output = ""
description = "inserts an element at specified index into a collection"
[[instructions]]
name = "cons"
args = ["left", "right"]
output = "resulting collection"
description = "either append right to left or make new list from both"
[[instructions]]
name = "car"
args = ["list"]
output = "returns first element in cons cell"
description = "takes an AST and returns first element in top level cons cell"
[[instructions]]
name = "cdr"
args = ["list"]
output = "returns last element in cons cell"
description = "takes an AST and returns last element in top level cons cell"
[[instructions]]
name = "concat"
args = ["string_l", "string_r"]
output = "string_l+string_r"
description = "concatenates string r to string l and returns a new string"
[[instructions]]
name = "s_append"
args = ["parent", "child"]
output = ""
description = "append in place child character into parent string"