The compiler does not write code in the target language directly; instead, it outputs an intermediate representation (IR) which can be translated into a variety of target languages. The IR is intended to be an abstract syntax tree for a “lowest common denominator” imperative language with the following features:
if
/else
statements and conditional expressions;break
statements which escape from the inner-most loop;switch
or match
statements, or a similar branching feature.yield
statements (see below);Translating via this IR also creates some opportunities for additional optimisations which may improve the performance of the program and/or reduce the size of the generated code.
The IR has its own types, not all of which are shared by the MJr language’s type system:
byte
: an integer between 0 (inclusive) and 128 (exclusive); used for alphabet symbols in grids or patterns.int
: a signed 32-bit integer.bool
: a Boolean.float
: a double-precision floating-point number.str
: a string.array.const
and array.mutable
: an array of integers, immutable or mutable respectively, which are all between zero (inclusive) and domainSize
(exclusive).grid
, pattern
, prng
, rewriteinfo
and sampler
: pre-defined struct types which can be declared in the MJr runtime library.void
: return type for functions with no return value.Note that the target language only needs to support integers of at least the required sizes; for example, nothing will go wrong if byte
and int
are mapped to the same type in the target language, or if an array is capable of holding integers much larger than its domain size, other than more memory being used. Likewise, the target language does not need to enforce immutability of arrays or dicts, since the generated code won’t attempt to mutate these anyway. These distinctions are only made to potentially allow more optimised output code.
Additionally, it’s OK for the mutable IR.INT32_ARRAY_TYPE
(which has a domain size of 2 ** 32
) to use signed 32-bit integers, since in practice these arrays are either used as bitmasks (so the signs don’t matter) or otherwise the elements will not exceed 2 ** 31 - 1
.
The IR
namespace has a number of factory functions for IR nodes; while most of these simply create a new node object with the given parameters and return it, there are some opportunities for simplifying or optimising the IR tree, in order to produce smaller and/or more efficient code output. For example, stmt.switch
statements will group multiple cases together when their blocks are equal.
This requires being able to tell when two IR nodes are equal. Currently this is implemented using the IR.key
function, which simply calls JSON.stringify
to transform an IR node into a primitive value, in a way which preserves “equality”. For this to work consistently,
The IR’s stmt.break
and stmt.continue
statements are only used in stmt.for.range
and stmt.while
loops, not (directly) in stmt.switch
statements. Additionally, break and continue statements are not used inside switch statements to control loops outside of the switch. This is to avoid the need for labelled breaks (which some languages don’t have), whether or not the language’s switch
construct interacts with break
statements.
The IR’s stmt.for.range
loop statements iterate over a range of consecutive integer values, either forwards or in reverse. The range is determined by two expressions, low
(inclusive) and high
(exclusive), both of which must remain constant for the duration of the loop. A forwards range is equivalent to the following C-style for
loop:
for(let i = low; i < high; ++i) { ... }
A reversed range is equivalent to the following C-style for
loop:
for(let i = high - 1; i >= low; --i) { ... }
The IR’s stmt.switch
statements are always used with an integer-valued variable in some range starting at zero, and never use fall-throughs between different cases. This is so that they can be compiled in different ways depending on what the target language supports:
switch
statements, in which case each ‘case’ block needs to have a break
statement appended to it.match
statements.switch
or match
, then they may be compiled to a chain of if
/else
statements, or something else.Some variable declarations in the IR do not have initialisers; this allows slightly more efficient code to be generated in some cases. However, a target language might not allow an uninitialised variable to be declared, particularly when the target language’s static checker cannot determine that the variable will not be used before it is assigned.
In this case, the code generator class may need to emit a default initialiser of the appropriate type. (For TypeScript output, the code generator instead emits a “definitely assigned” assertion.)
The animate
flag is currently implemented by emitting stmt.yield
statements; this means the target language must support generator function. In principle, it would be possible later to support target languages without generator functions, by outputting a class which encapsulates the program’s state and has a method to advance the program by one step.
The IR supports the bitwise operators int_and
, int_or
, int_xor
, int_not
, int_lshift
and int_rshift
. These are mostly self-explanatory, but have a few subtleties:
int_lshift
and int_rshift
are only used with a second operand in the range 0 (inclusive) to 32 (exclusive). This is because the shift amount is taken modulo 32 in some languages (e.g. JavaScript) but not others (e.g. Python).int_lshift
is “loose” in the sense that it does not need to ensure that the result fits in the signed 32-bit integer range; this will be ensured by the compiler when it matters.int_rshift
means a right arithmetic shift (i.e. sign-preserving), not a right logical shift.The integer operators defined by the ASG are required to have the correct behaviour for signed 32-bit integers, even when the target language does not natively have a signed 32-bit integer type. This means that, for example:
int_plus
is compiled to (a + b) | 0
, int_mult
is compiled to Math.imul(a, b)
, and so on.int_plus
is compiled to int32(a + b)
, where int32
is a function which performs a 32-bit signed integer overflow.Additionally, the int_floordiv
and int_mod
operators have different behaviour for negative numbers compared to the similar native operators in some target languages.
Therefore, the IR supports the additional “loose” integer operators loose_int_plus
, loose_int_minus
, loose_int_mult
, loose_int_floordiv
and loose_int_mod
. These operators are not required to have correct behaviour for overflows or negative operands, since the compiler must only emit them with operands that are non-negative and sufficiently small that the result cannot overflow. This allows for simpler code output in some cases.