SIMD-0173
SBPF instruction encoding improvements
Feature Gate Status
F6UVKh1ujTEFK3en2SyAL3cdVnqko1FVEXWhmdLRu6WP
TL;DR
There are some instructions with questionable encodings, that when slightly adjusted, could significantly simplify verification and execution of programs.
Summary
There are some instructions with questionable encodings, that when slightly adjusted, could significantly simplify verification and execution of programs.
Motivation
The instruction `lddw dst, imm` is currently the only instruction which takes two instruction slots. This proposal splits it into a two one-slot instruction sequence: `mov32 dst, imm` and an introduced `hor64 dst, imm`. This way all instructions will be exactly one slot long which will simplify: - Calculating the number of instructons in a program will no longer require a full linear scan. A division of the length of the text section by the instruction slot size will suffice. - The instruction meter will no longer have to skip one instruction slot when counting a `LDDW` instruction. - Jump and call instructions will no longer have to verify that the desination is not the second half of a `LDDW` instruction. - The verifier will no longer have to check that `LDDW` instructions are complete and its first or second half does not occur without the other on its own. The `LE` instruction is essentially useless as only `BE` performs a byte-swap. Its runtime behavior is close to no-op and can be replicated by other instructions: - `le dst, 16` behaves the same as `and32 dst, 0xFFFF` - `le dst, 32` behaves the same as `and32 dst, 0xFFFFFFFF` - `le dst, 64` behaves the same as `mov64 dst, src` The `CALLX` instruction encodes its source register in the immediate field. This is makes the instruction decoder more complex because it is the only case in which a register is encoded in the immediate field, for no reason. With all of the above changes and the ones defined in SIMD-0174, the memory related instructions can be moved into the ALU instruction classes. Doing so would free up 8 instruction classes completely, giving us back three bits of instruction encoding.
Key Changes
- the LDDW instruction (opcodes 0x18 and 0x00)
- the LE instruction (opcode 0xD4)
- the moved opcodes:
- 0x72, 0x71, 0x73 (STB, LDXB, STXB)
- 0x6A, 0x69, 0x6B (STH, LDXH, STXH)
- 0x62, 0x61, 0x63 (STW, LDXW, STXW)
- 0x7A, 0x79, 0x7B (STDW, LDXDW, STXDW)
- the HOR64 instruction (opcode 0xF7)
- 0x27, 0x2C, 0x2F (STB, LDXB, STXB)
- 0x37, 0x3C, 0x3F (STH, LDXH, STXH)
- 0x87, 0x8C, 0x8F (STW, LDXW, STXW)
- 0x97, 0x9C, 0x9F (STDW, LDXDW, STXDW)
- 0x72 => 0x27, 0x71 => 0x2C, 0x73 => 0x2F
- 0x6A => 0x37, 0x69 => 0x3C, 0x6B => 0x3F
- 0x62 => 0x87, 0x61 => 0x8C, 0x63 => 0x8F
- 0x7A => 0x97, 0x79 => 0x9C, 0x7B => 0x9F
Impact
The toolchain will emit machinecode according to the selected SBPF version. As most proposed changes affect the encoding only, and not the functionallity, we expect to see no impact on dApp developers. The only exception is that 64-bit immediate loads will now cost 2 CU instead of 1 CU.
Security Considerations
None.