SIMD-0321
VM Register 2 Instruction Data Pointer
Feature Gate Status
5xXZc66h4UdB6Yq7FzdBxBiRAFMMScMLwHxk2QZDaNZL
Min Agave: v3.1.0 · Firedancer: v0.806.30102 · Jito: v3.1.0
TL;DR
Provide a pointer to instruction data in VM register 2 (`r2`) at program entrypoint, enabling direct access to instruction data without having to parse the accounts section of the serialized input region.
Summary
Provide a pointer to instruction data in VM register 2 (`r2`) at program entrypoint, enabling direct access to instruction data without having to parse the accounts section of the serialized input region.
Motivation
Currently, sBPF programs must parse the accounts section of the serialized input region to locate instruction data. The serialization layout places accounts before instruction data, requiring programs to iterate through all accounts before reaching the instruction data section. This is inefficient for programs that primarily or exclusively need to access instruction data. By providing a direct pointer to instruction data in `r2`, programs can immediately access this data without any parsing overhead, resulting in improved performance and reduced compute unit consumption.
Key Changes
- Instruction data pointer: A 64-bit pointer (8 bytes) stored in VM register 2 that points directly to the start of the instruction data section in the input region.
- r1: Input region pointer (existing behavior)
- r2: Pointer to instruction data section (new)
- The pointer in r2 points to the first byte of the actual instruction data, NOT the length field.
- The pointer value in r2 is stored as a native 64-bit pointer (8 bytes) in little-endian format.
- When there is no instruction data (length = 0), r2 still points to the offset immediately proceeding the instruction length counter; in this case, the first byte of the program ID, ensuring it will always point to valid, readable memory within the bounds of the input region.
- The pointer must always point to valid memory within the input region bounds.
Impact
On-chain programs are positively impacted by this change. The new `r2` pointer gives programs the ability to efficiently read instruction data, further customize their program's control flow and maximize compute unit effiency. However, any programs that currently depend on the uninitialized/garbage value in `r2` at entrypoint will break when this feature is activated. Core contributors must implement this feature, which should be extremely minimally invasive, depending on the VM implementation.
Backwards Compatibility
This feature is only backwards compatible for programs that currently do not read from `r2` at program entrypoint. This feature is NOT backwards compatible for any programs that depend on the uninitialized/garbage data previously in `r2`.
Security Considerations
Programs should read and validate the instruction data length (stored at `r2 - 8`) before accessing data via the `r2` pointer. Failing to check the length could result in reading unintended memory contents or out-of-bounds access attempts. Additionally, programs that currently rely on `r2` containing uninitialized or garbage data at entrypoint will experience breaking changes when this feature is activated. While it is technically possible with assembly manipulations, no compiled code uses `r2` with an uninitialized value except in the case of `sol_log_64_` which is not a direct security concern as logs are not enshrined by consensus.