WIP: elaborate on Hyphae in instructions.toml
All checks were successful
per-push tests / build (push) Successful in 1m26s
per-push tests / test-frontend (push) Successful in 1m35s
per-push tests / test-utility (push) Successful in 1m41s
per-push tests / test-backend (push) Successful in 1m50s
per-push tests / timed-decomposer-parse (push) Successful in 1m50s

Signed-off-by: Ava Affine <ava@sunnypup.io>
This commit is contained in:
Ava Apples Affine 2025-08-14 07:20:03 +00:00
parent f48867db42
commit 258ee56149
3 changed files with 307 additions and 45 deletions

View file

@ -1,127 +1,357 @@
# TODO: add the following info description = """
# - introductory VM info (description, list of components) HyphaeVM is a bytecode VM that aims to provide a simplified instruction set to
# - info on the different data types language implementors and other programmers who wish to use higher level
# - info on garbage collection features without making too many compromises on overhead or performance.
# - info on program execution
# - info on error handling The simplified instruction set greatly reduces the work in language design and
# - info on traps allows for simpler compilers overall. Meanwhile, the VM still meets performance
# - info on numbers needs for modern application development.
# - info on symtable (and its uses)
HyphaeVM contains an instruction set, instruction set implementation, garbage
collection (reference counting), error handling, dynamic number package, vector
based data types, cons cell based dynamic data types, trap functions that
are programmatically extendable, as well as faux-registers for mutable access
to datum in an otherwise immutable stack based VM.
"""
datum = """
HyphaeVM instructions operate on Datum. A Datum can hold one of many data types
(see data types). The Datum type is implemented as a union type over each
data type's underlying form. Each Datum as stored in the VM is reference
counted. Each Datum will be automatically deallocated when it is no longer
referenced anywhere in the VM state.
Given that datum are reference counted it is possible to make both shallow and
deep copies to a source datum (see instructions: link and dupl). Information on
whether a datum is a shallow or deep copy of another datum is not accessible at
runtime without custom trap functions. It is up to the programmer to track what
they themselves have created.
Best of luck, friend.
"""
error_handling = """
The VM has fields for error_state and can store any given datum as an error.
Use the PANIC instruction to store an error, set the error state, and halt
HyphaeVM.
"""
sym_table = """
A symbol table is provided as part of HyphaeVM. It will map symbols to valid
address (see addressing modes). This is not provided for the implementation of
variables in languages. It is recommended that any {trans|com}piler implemented
for HyphaeVM reduce variables to Datum on the stack. However, the symbol table
is very useful for linking with library code or adding debug symbols to an
application.
"""
traps = """
HyphaeVM includes a trap vector. VM extenders can use this to store platform or
language specific functions that can then be called from bytecode.
"""
[[registers]]
name = "expr"
description = """
The expr register acts as a default return value store for instructions that
generate new data. Many instructions will set expr. Some instructions will even
use expr as an input.
The expr register provides mutable access.
"""
[[registers]]
name = "operand"
description = """
There are four operand registers. These each can be used as a type of scratch
space for oeprating on Datum without pushing to or popping from the stack.
The operand registers provide mutable access.
"""
[[registers]]
name = "error"
description = """
The error register is set by PANIC and is accessed by the VM to explain an
error state.
The error register does not provide mutable access.
"""
[[registers]]
name = "ictr"
description = """
The ictr register acts as the well known "pc" register in many CPUs... With the
caveat that the program is indexed per instruction and not per byte. This is
because the VM has its own logic to deserialize instructions from bytecode so
there is no reason not to rule out a whole class of errors where a bad offset
causes the instruction loader to start loading with some operand.
The ictr register does not hold a datum. Just an underlying native unsigned
integer (usize).
"""
[[data_types]]
name = "number"
description = """
The dynamic number type is defined in the 'Organelle' package. It is a number
built to enable implementation of the Scheme R7RS "small" specification. The
number type may be stored with any variety of underlying implementation.
NOTE: The number type is currently undergoing a redesign and will be
reimplemented as a more efficient and predictable type.
"""
[[data_types]]
name = "string"
description = """
The string type is implemented by a vector of bytes. It implements a superset
of the functionality that a bytevector implements.
"""
[[data_types]]
name = "bool"
description = """
The boolean type is implemented as whatever Rust chooses to represent it.
"""
[[data_types]]
name = "cons"
description = """
The cons cell is implemented as a pair of datum. This can contain any type in
either field. Data is referenced and not fully encapsulated within this type.
The cons cell can be used to create linkedlists, or any other dynamic data type
that relies on heap allocated units.
"""
[[data_types]]
name = "char"
description = "a single byte"
[[data_types]]
name = "vector"
description = """
A vector is a list of Datum stored in a contiguous block of memory. It is
represented by the Rust Vector type.
"""
[[data_types]]
name = "ByteVector"
description = "A bytevector is a vector that only contains individual bytes"
[[data_types]]
name = "None"
description = """
The none datum is a null type. It is not checkable or creatable by any
instruction except clear.
It is requested that programmers refrain from implementing custom traps to use
this type. Doing so is in incredibly bad form. If one is finding themselves
attempting to use None datums it is advised that they rethink their program
logic.
"""
[[addressing_modes]] [[addressing_modes]]
name = "expr" name = "expression"
mutable = true mutable = true
symbol = "$expr" symbol = "$expr"
example = "inc $expr" example = "inc $expr"
description = "The expression register is used as a default output, or input by many instructions." description = """
The expression register is used as a default output, or input by many
instructions (see registers).
"""
[[addressing_modes]] [[addressing_modes]]
name = "operand" name = "operand"
mutable = true mutable = true
symbol = "$oper<N>" symbol = "$oper<N>"
example = "add $oper1, $oper2" example = "add $oper1, $oper2"
description = "There are four operand registers N=(0, 1, 2, 3, and 4). They are for storing mutable data." description = """
There are four operand registers N=(0, 1, 2, 3, and 4) (see registers).
"""
[[addressing_modes]] [[addressing_modes]]
name = "stack" name = "stack"
mutable = false mutable = false
symbol = "%N" symbol = "%N"
example = "dupl %0, $expr" example = "dupl %0, $expr"
description = "Stack addressing mode takes an index in to the stack to read from." description = """
Stack addressing mode takes an index (N). This index is used to get the Nth
element from the top of the stack.
Keep in mind that any push instruction will then shift the element that a given
stack index refers to.
"""
[[addressing_modes]] [[addressing_modes]]
name = "instruction" name = "instruction"
mutable = false mutable = false
symbol = "@N" symbol = "@N"
example = "jmp @100" example = "jmp @100"
description = "Instruction addressing mode indexes by instruction into the program." description = """
Instruction addressing takes an index (N). The index represents the Nth
instruction in the program. Given how deserialization works in HyphaeVM, this
index does not have to account for operands... just instructions.
"""
[[addressing_modes]] [[addressing_modes]]
name = "numeric" name = "numeric"
mutable = false mutable = false
symbol = "N" symbol = "N"
example = "const $expr, 100" example = "const $expr, 100"
description = "Numeric addressing mode provides read only integer constants to instructions" description = """
Numeric addressing mode accepts a single unsigned 8 bit integer as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[addressing_modes]] [[addressing_modes]]
name = "char" name = "character"
mutable = false mutable = false
symbol = "'N'" symbol = "'N'"
example = "const $expr, 'c'" example = "const $expr, 'c'"
description = "Char addressing mode provides read only character constants to instructions" description = """
Character addressing mode accepts a single character as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[addressing_modes]] [[addressing_modes]]
name = "boolean" name = "boolean"
mutable = false mutable = false
symbol = "{true|false}" symbol = "{true|false}"
example = "const $expr, true" example = "const $expr, true"
description = "Boolean addressing mode provides read only booleans to instructions" description = """
Boolean addressing mode accepts a single character as an argument.
Not many instructions will read constants. Most will require that you use the
CONST instruction to construct a real datum for use in the program.
"""
[[instructions]] [[instructions]]
name = "trap" name = "trap"
args = ["index"] args = ["index"]
output = "result of function" output = "result of function"
description = "triggers callback in trap vector at index" description = """
The trap instruction will accept as its argument only a numeric constant.
This constant will be used as an index into the VM trap vector. Once accessed,
the VM triggers the corresponding callback, which may vastly mutate VM state.
"""
[[instructions]] [[instructions]]
name = "bind" name = "bind"
args = ["name", "operand"] args = ["name", "operand"]
output = "" output = ""
description = "map name to operand in sym table." description = """
The bind instruction will accept only a string datum as its name input. It
then maps the name to whatever address the operand input references in the VMs
symbol table.
"""
[[instructions]] [[instructions]]
name = "unbind" name = "unbind"
args = ["name"] args = ["name"]
output = "" output = ""
description = "remove name mapping from sym table." description = """
The unbind instruction will accept only a string datum as its name operand. It
then removes the mapping that corresponds to name from the VMs symbol table.
"""
[[instructions]] [[instructions]]
name = "bound" name = "bound"
args = ["name"] args = ["name"]
output = "expr = true if name is bound" output = "expr = true if name is bound"
description = "test if a name is already bound" description = """
The bound instruction will accept only a string datum as its name operand. It
will test if the name is already bound in the VMs symbol table. The expression
register will be set to a boolean datum representing whether or not the name is
bound.
"""
[[instructions]] [[instructions]]
name = "push" name = "push"
args = ["operand"] args = ["operand"]
output = "" output = ""
description = "pushes deep copy of operand onto stack." description = """
The push instruction accepts one operand of any type. It will push a deep copy
of the input onto the VM's stack.
"""
[[instructions]] [[instructions]]
name = "pop" name = "pop"
args = [] args = []
output = "" output = "first datum on top of stack"
description = "removes element at top of stack." description = """
The pop instruction removes the first element at the top of the VMs stack. The
expression register is set to the element returned in this manner.
"""
[[instructions]] [[instructions]]
name = "enter" name = "enter"
args = [] args = []
output = "" output = ""
description = "create new stack frame" description = """
The enter instruction creates a new stack frame. Subsequent push instructions
apply new elements to a separate stack that corresponds to this frame. Stack
indexes will still access across all frames as if they were one unified stack.
"""
[[instructions]] [[instructions]]
name = "exit" name = "exit"
args = [] args = []
output = "" output = ""
description = "delete current stack frame" description = """
The exit instruction deletes current stack frame. All information is simply
discarded. The stack fragment corresponding to the previous stack frame is then
subject to subsequent push or pop operations.
Together, enter and exit are useful for making sure that a dynamic routine that
makes use of the stack is properly cleaned up after.
"""
[[instructions]] [[instructions]]
name = "link" name = "link"
args = ["src", "dest"] args = ["src", "dest"]
output = "" output = ""
description = "shallow copies src into dest" description = """
The link instruction shallow copies the src operand into the destination that
the dst operand specifies. Shallow copy of source operand increases its
reference count.
Destination operand requires mutable access.
For more information on shallow vs deep copy see datum.
"""
[[instructions]] [[instructions]]
name = "dupl" name = "dupl"
args = ["src", "dest"] args = ["src", "dest"]
output = "" output = ""
description = "deep copies src into dest" description = """
The dupl instruction deep copies the src operand into the destination that the
dst operand specifies.
Destination operand requires mutable access.
For more information on shallow vs deep copy see datum.
"""
[[instructions]] [[instructions]]
name = "clear" name = "clear"
args = ["dest"] args = ["dest"]
output = "" output = ""
description = "clears dest" description = """
The clear instruction sets whatever destination is specified by its operand to
a None datum.
Destination operand requires mutable access.
Please do not use the clear instruction to try to work with None datum. It is
provided for cleanup/cleanliness purposes. This can be used to destroy a
shallow copy, decreasing its reference count.
"""
[[instructions]] [[instructions]]
name = "nop" name = "nop"
@ -133,61 +363,96 @@ description = "no operation"
name = "halt" name = "halt"
args = [] args = []
output = "" output = ""
description = "halts the VM" description = """
The halt instruction sets the VM running state to false. This halts the VM.
"""
[[instructions]] [[instructions]]
name = "panic" name = "panic"
args = ["error"] args = ["error"]
output = "" output = ""
description = "sets error state and halts VM" description = """
The panic instruction accepts an error operand and shallow copies it into the
error register. Then, error_state flag in the VM is set and the VM is halted.
"""
[[instructions]] [[instructions]]
name = "jmp" name = "jmp"
args = ["addr"] args = ["addr"]
output = "" output = ""
description = "sets ictr register to addr" description = """
The jump (jmp) instruction accepts only an instruction addres (see addressing
modes). It sets the ictr register to the referenced instruction index.
"""
[[instructions]] [[instructions]]
name = "jmpif" name = "jmpif"
args = ["addr"] args = ["addr"]
output = "" output = ""
description = "if expr register holds true, sets ictr to addr" description = """
The jump (jmp) instruction accepts only an instruction addres (see addressing
modes). It sets the ictr register to the referenced instruction index if and
only if the expression register holds a boolean true value... So make sure to
set the expression register.
"""
[[instructions]] [[instructions]]
name = "eq" name = "eq"
args = ["a", "b"] args = ["a", "b"]
output = "a == b" output = "a == b"
description = "equality test" description = """
The eq instruction performs an equality test and sets the expression register
to the resulting boolean value. In this case "equality" is set by the Rust
PartialEq trait logic as derived across the datum type (hyphae/src/heap.rs).
"""
[[instructions]] [[instructions]]
name = "lt" name = "lt"
args = ["a", "b"] args = ["a", "b"]
output = "a < b" output = "a < b"
description = "less than test" description = """
The lt instruction accepts two number datum and performs a numeric less than
test. The expression register is set to a boolean value based on whether the
first input is strictly less than the second input.
"""
[[instructions]] [[instructions]]
name = "gt" name = "gt"
args = ["a", "b"] args = ["a", "b"]
output = "a > b" output = "a > b"
description = "greater than test" description = """
The gt instruction accepts two number datum and performs a numeric greater than
test. The expression register is set to a boolean value based on whether the
first input is strictly greater than the second input.
"""
[[instructions]] [[instructions]]
name = "lte" name = "lte"
args = ["a", "b"] args = ["a", "b"]
output = "a <= b" output = "a <= b"
description = "less than equals test" description = """
The lte instruction accepts two number datum and performs a numeric less than
equals test. The expression register is set to a boolean value based on whether
the first input is less than or equal to the second input.
"""
[[instructions]] [[instructions]]
name = "gte" name = "gte"
args = ["a", "b"] args = ["a", "b"]
output = "a >= b" output = "a >= b"
description = "greater than equals test" description = """
The gte instruction accepts two number datum and performs a numeric greater
than equals test. The expression register is set to a boolean value based on if
the first input is greater than or equal to the second input.
"""
[[instructions]] [[instructions]]
name = "bool_not" name = "bool_not"
args = [] args = []
output = "expr = !expr" output = "expr = !expr"
description = "boolean not" description = """
The
"""
[[instructions]] [[instructions]]
name = "bool_and" name = "bool_and"

View file

@ -22,7 +22,6 @@ use alloc::rc::Rc;
use alloc::vec::Vec; use alloc::vec::Vec;
use alloc::boxed::Box; use alloc::boxed::Box;
use alloc::fmt::Debug; use alloc::fmt::Debug;
use alloc::string::String;
use organelle::Number; use organelle::Number;
@ -147,7 +146,6 @@ pub enum Datum {
Number(Number), Number(Number),
Bool(bool), Bool(bool),
Cons(Cons), Cons(Cons),
Symbol(String),
Char(u8), Char(u8),
String(Vec<u8>), String(Vec<u8>),
Vector(Vec<Gc<Datum>>), Vector(Vec<Gc<Datum>>),
@ -162,7 +160,6 @@ impl Clone for Datum {
Datum::Number(n) => Datum::Number(n.clone()), Datum::Number(n) => Datum::Number(n.clone()),
Datum::Bool(n) => Datum::Bool(n.clone()), Datum::Bool(n) => Datum::Bool(n.clone()),
Datum::Cons(n) => Datum::Cons(n.deep_copy()), Datum::Cons(n) => Datum::Cons(n.deep_copy()),
Datum::Symbol(n) => Datum::Symbol(n.clone()),
Datum::Char(n) => Datum::Char(n.clone()), Datum::Char(n) => Datum::Char(n.clone()),
Datum::String(n) => Datum::String(n.clone()), Datum::String(n) => Datum::String(n.clone()),
Datum::Vector(n) => Datum::Vector(n) =>

View file

@ -255,7 +255,7 @@ impl VM {
// stack ops // stack ops
i::PUSH => self.stack.push_current_stack( i::PUSH => self.stack.push_current_stack(
access!(&instr.1[0]).deep_copy()), access!(&instr.1[0]).deep_copy()),
i::POP => _ = self.stack.pop_current_stack(), i::POP => self.expr = self.stack.pop_current_stack(),
i::ENTER => self.stack.add_stack(), i::ENTER => self.stack.add_stack(),
i::EXIT => self.stack.destroy_top_stack(), i::EXIT => self.stack.destroy_top_stack(),
@ -326,7 +326,7 @@ impl VM {
}; };
let Datum::Number(ref r) = **access!(&instr.1[1]) else { let Datum::Number(ref r) = **access!(&instr.1[1]) else {
e!("illgal argument to IDIV instruction"); e!("illegal argument to IDIV instruction");
}; };
let Fraction(l, 1) = l.make_exact() else { let Fraction(l, 1) = l.make_exact() else {