The compiler does not write code in the target language directly; instead, it outputs an intermediate representation (IR) which can be translated into a variety of target languages. The IR is intended to be an abstract syntax tree for a “lowest common denominator” imperative language with the following features:
if/else statements and conditional expressions;break statements which escape from the inner-most loop;switch or match statements, or a similar branching feature.yield statements (see below);Translating via this IR also creates some opportunities for additional optimisations which may improve the performance of the program and/or reduce the size of the generated code.
The IR has its own types, not all of which are shared by the MJr language’s type system:
byte: an integer between 0 (inclusive) and 128 (exclusive); used for alphabet symbols in grids or patterns.int: a signed 32-bit integer.bool: a Boolean.float: a double-precision floating-point number.str: a string.array.const and array.mutable: an array of integers, immutable or mutable respectively, which are all between zero (inclusive) and domainSize (exclusive).grid, pattern, prng, rewriteinfo and sampler: pre-defined struct types which can be declared in the MJr runtime library.void: return type for functions with no return value.Note that the target language only needs to support integers of at least the required sizes; for example, nothing will go wrong if byte and int are mapped to the same type in the target language, or if an array is capable of holding integers much larger than its domain size, other than more memory being used. Likewise, the target language does not need to enforce immutability of arrays or dicts, since the generated code won’t attempt to mutate these anyway. These distinctions are only made to potentially allow more optimised output code.
Additionally, it’s OK for the mutable IR.INT32_ARRAY_TYPE (which has a domain size of 2 ** 32) to use signed 32-bit integers, since in practice these arrays are either used as bitmasks (so the signs don’t matter) or otherwise the elements will not exceed 2 ** 31 - 1.
The IR namespace has a number of factory functions for IR nodes; while most of these simply create a new node object with the given parameters and return it, there are some opportunities for simplifying or optimising the IR tree, in order to produce smaller and/or more efficient code output. For example, stmt.switch statements will group multiple cases together when their blocks are equal.
This requires being able to tell when two IR nodes are equal. Currently this is implemented using the IR.key function, which simply calls JSON.stringify to transform an IR node into a primitive value, in a way which preserves “equality”. For this to work consistently,
The IR’s stmt.break and stmt.continue statements are only used in stmt.for.range and stmt.while loops, not (directly) in stmt.switch statements. Additionally, break and continue statements are not used inside switch statements to control loops outside of the switch. This is to avoid the need for labelled breaks (which some languages don’t have), whether or not the language’s switch construct interacts with break statements.
The IR’s stmt.for.range loop statements iterate over a range of consecutive integer values, either forwards or in reverse. The range is determined by two expressions, low (inclusive) and high (exclusive), both of which must remain constant for the duration of the loop. A forwards range is equivalent to the following C-style for loop:
for(let i = low; i < high; ++i) { ... }
A reversed range is equivalent to the following C-style for loop:
for(let i = high - 1; i >= low; --i) { ... }
The IR’s stmt.switch statements are always used with an integer-valued variable in some range starting at zero, and never use fall-throughs between different cases. This is so that they can be compiled in different ways depending on what the target language supports:
switch statements, in which case each ‘case’ block needs to have a break statement appended to it.match statements.switch or match, then they may be compiled to a chain of if/else statements, or something else.Some variable declarations in the IR do not have initialisers; this allows slightly more efficient code to be generated in some cases. However, a target language might not allow an uninitialised variable to be declared, particularly when the target language’s static checker cannot determine that the variable will not be used before it is assigned.
In this case, the code generator class may need to emit a default initialiser of the appropriate type. (For TypeScript output, the code generator instead emits a “definitely assigned” assertion.)
The animate flag is currently implemented by emitting stmt.yield statements; this means the target language must support generator function. In principle, it would be possible later to support target languages without generator functions, by outputting a class which encapsulates the program’s state and has a method to advance the program by one step.
The IR supports the bitwise operators int_and, int_or, int_xor, int_not, int_lshift and int_rshift. These are mostly self-explanatory, but have a few subtleties:
int_lshift and int_rshift are only used with a second operand in the range 0 (inclusive) to 32 (exclusive). This is because the shift amount is taken modulo 32 in some languages (e.g. JavaScript) but not others (e.g. Python).int_lshift is “loose” in the sense that it does not need to ensure that the result fits in the signed 32-bit integer range; this will be ensured by the compiler when it matters.int_rshift means a right arithmetic shift (i.e. sign-preserving), not a right logical shift.The integer operators defined by the ASG are required to have the correct behaviour for signed 32-bit integers, even when the target language does not natively have a signed 32-bit integer type. This means that, for example:
int_plus is compiled to (a + b) | 0, int_mult is compiled to Math.imul(a, b), and so on.int_plus is compiled to int32(a + b), where int32 is a function which performs a 32-bit signed integer overflow.Additionally, the int_floordiv and int_mod operators have different behaviour for negative numbers compared to the similar native operators in some target languages.
Therefore, the IR supports the additional “loose” integer operators loose_int_plus, loose_int_minus, loose_int_mult, loose_int_floordiv and loose_int_mod. These operators are not required to have correct behaviour for overflows or negative operands, since the compiler must only emit them with operands that are non-negative and sufficiently small that the result cannot overflow. This allows for simpler code output in some cases.