Renesas RX
notes from writing yaxpeax-rx
, largely from reading the rx v1/v2/v3 manuals:
rxv1
: RX Family RXv1 Instruction Set Architecture (User’s Manual: Software), Rev. 1.30 (Dec 2019)- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv1-instruction-set-architecture-users-manual-software-rev130
- sha256:
e659dd509141da6bb1cfabf26c9f9ab5996d02060acaad2b5702963116834415
rxv2
: RX Family RXv2 Instruction Set Architecture (User’s Manual: Software), Rev. 1.00 (Nov 2013)- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv2-instruction-set-architecture-users-manual-software
- sha256:
c12fc8d16adf1530f2cad3f75974d2a29062580a984a71fd9461417b66bba18a
rxv3
: RX Family RXv3 Instruction Set Architecture (User’s Manual: Software), Rev. 1.00 (Nov 2018)- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv3-instruction-set-architecture-users-manual-software-rev100
- sha256:
829815515a57d077bdfa418e0e167b512f2a04b3db3613329a4d8980399cf74c
broadly: of all the instruction sets, this is definitely one of them. 16 general-purpose registers. some instructions have shorter-form encodings that use only three bits for register selection, rather than four. so i imagine a preference to use the low eight registers for code density reasons. i’m curious how that works out for real programs and compilers weighing register choice like that.
BMCnd
stands out as an interesting instruction; Conditional bit transfer
undersells it. it moves the state of a condition, 0
or 1
, to the specified bit in a destination. the destination can either be a register or memory, and otherwise leaves the destination value unmodified. SCCnd
is similar but behaves more like x86’s setcc
instructions: set the entire destination byte/register to 0
or 1
depending on the condition.
rx v2
v2 adds a smattering of new instructions, and architectural extensions - see section 3.2 List of RXv2 Extended Instruction Set
.
- a second accumulator register was added, bringing the set to
a0
anda1
. - many instructions were extended to operate on either
a0
ora1
, in place of priora0
-only forms. fsqrt
! new! and 3-operand forms offadd
,fmul
, andfsub
.- and, accumulators are 72-bit now.
rx v3
v3 adds less, but also more. again, section 3.2 List of RXv3 Extended Instructions
for exact info.
bfmov/
bfmovz`, which i talk a bit more about below, for bulk bit transfers between words- a 3-operand form of
xor
, giving it parity with other instructions likeadd
,sub
, etc - AND AN ENTIRE SET OF DOUBLE-PRECISION INSTRUCTIONS AND 16 NEW DOUBLE-PRECISION REGISTERS.
practically speaking, the summaries here are accurate with i found when reading through the manuals’ contents. why did i have to read through the manuals meticulously?
decode table, or lack thereof
instruction encodings are listed in alphabetic order of instruction mnemonics. this is not amenable to writing a disassembler.. so i went through all three versions of the manual and transcribed encodings from the manual into a text file i could easily reorder. and so notes/encoding_table was born. reorder that to be approximately by bits, and notes/reordered_encodings. finally, i tried finding patterns across encodings and simplifying the total number of encodings across all instructions, and that left me with notes/grouped_encodings.
vendors! please do not make me write things like this!! i’m not good at it!!!
0 0 0 0 0 1 1 0 | mi [ opc ] ld | [ rs ] [ rd ] SUB src, dest (v1, v2, v3)
0 0 => B 0 0 => [Rs]
0 1 => W 0 1 => dsp:8[Rs]
1 0 => L 1 0 => dsp:16[Rs]
1 1 => UW 1 1 => Rs
opc={sub, cmp, add, mul, and, or, X, X, see below}
0 0 0 0 0 1 1 0 | mi 1 0 0 0 ld | 0 0 0 [ opc ] | [ rs ] [ rd ] SBB src, dest (v1, v2, v3)
1 0 => L
_ _ => invalid
00 => [Rs]
01 => dsp:8[Rs]
10 => dsp:16[Rs]
opc={
sbb(mi=10,ld!=11), X, adc(mi=10,ld!=11), X,
max, min, emul, emulu,
div, divu, X, X
tst, xor, X, X,
xchg, itof, X, X,
X, utof(v2, v3), X, X,
X, X, X, X,
X, X, X, X,
}
0 0 0 0 1 [dsp] BRA.S src (v1, v2, v3)
0 0 0 1 c [dsp] BCnd.S src (v1, v2, v3)
0 => beq/bz (src = if dsp > 2 { dsp } else { dsp + 8 })
1 => bne/bnz
0 0 1 0 [ cnd ] | [ pcdsp ] BCnd.B src (v1, v2, v3)
cnd => {eq, ne, geu, ltu, gtu, leu, pz, n, ge, lt, gt, le, o, no, bra.b, Reserved}
the disassembler itself is largely transcription of this table into source code. including, unfortunately, a massive chain of if/else from 0b00000000
stopping at dozens of points on the way to 0b11111111
. :’)
encoding notes
operands…
instructions with ld
or ls
fields encode an operand that is either [Reg]
, disp[Reg]
, or Reg
(just the register, no memory access). some of these instructions, like the 06
encodings of sub
, cmp
, add
, … also have a mi
field that indicates how the memory operand is extended for use with the second operand - which may be used only as a second source, or sometimes used as a source+destination.
so, if ld
is 0b11
indicating a Reg
, and mi
indicates, for example, .B
meaning sign extension of a byte. but there is no indication in the manual that, for example, sub
would have an encoding that would mean sub.b r1, r5
. so what does mi = 0b00 = b
mean for these instructions? no idea! yaxpeax-rx
assumes the bits are ignored for direct register operands. someone please prove this wrong! or right. either is fine.
stnz/stz v2+ encoding typo
encoding (2)
of both of these instructions is a new extension in RXv2
. unfortunately the manual has a typo: it says that stnz
encoding 2 looks like…
(2) STNZ src, dest
b7 b0 | b7 b0 | b7 b0a
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs ] [ rd ]
^^^^^^^ relevant
while encoding 2 of stz
…
(2) STZ src, dest
b7 b0 | b7 b0 | b7 b0a
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs ] [ rd ]
^^^^^^^ same as above!
are stz
and stnz
somehow encoded the same? confusion abounds. internet dog the6p4c had the good idea to check binutils to cross check with what Renesas themselves might have said on the matter. they found:
[PATCH v2][RX] Add RXv2 Instructions
+
+/** 1111 1100 0100 1011 rsrc rdst stz %1, %0 */
+ ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);
+
+/** 1111 1100 0100 1111 rsrc rdst stnz %1, %0 */
+ ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);
which pretty clearly says “stz
has the low bits of 1011
”, “stnz
has the low bits of 1111
”. confusion resolved. EXCEPT: this includes a different copy/paste error! both instructions here have S2cc(RXC_z)
. there’s a followup commit for this,
commit 239efab16429cad466591ccd1c57bba786171765
Author: Yoshinori Sato <ysato@users.sourceforge.jp>
Date: Thu Dec 17 01:42:34 2015 +0900
RXv2 support update
2015-12-22 Yoshinori Sato <ysato@users.sourceforge.jp>
opcodes/
* rx-decode.opc (movco): Use uniqe id.
(movli): Likewise.
(stnz): Condition fix.
[...snip...]
/** 1111 1100 0100 1111 rsrc rdst stnz %1, %0 */
- ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);
+ ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_nz);
[...snip...]
so eventually everything ended up in the right state. but it’s very funny to look through the history and realize there were two copy-paste errors in different directions about these two instructions. cursed additions!
cmp…
cmp encoding (2), for cmp #uimm:8
could be read as the bit pattern
0 1 1 1 0 1 li | [ opc ] [ rs2 ]
like cmp
encoding (3)
, or similar encodings of mul
, and
, or
, but with opc=0b101
. it has the additional constraint of li=0b01
in such a reading, but this raises a question.. if opc=0b000
allows four immediate operand lengths - 8, 16, 24, and 32 bits, sign-extended to 32 bits - why not allow all operand lengths with zero-extension for opc=0b101
?? alas.
double-precision instructions…
also in the area of
0 1 1 1 0 1 li ...
instructions, in RXv3 a new set of double-precision and related instructions were added. this makes another pattern with this encoding clearer: li
picks the number of bytes to be read for operands, even though none of the operands are necessarily interpreted as an immediate.
li=0b01
usually represents a 32-bit immediate encoded as a sign-extended 8-bit value. so, read 0x7a
, read a byte for the opcode and destination register, then read one byte for the immediate. but for instructions like int
, the encoding works out as
0 1 1 1 0 1 0 1 | 0 1 1 0 0 0 0 0 | [ uimm:8 ]
li=01 opc=0110 rd=0000 ^ and read the 1-byte immediate of li=01
RXv3 extends this - where a 2-byte immediate might involved in an instruction like
0 1 1 1 0 1 1 0 | 0 0 0 1 0 1 1 0 | 0 1 0 1 0 1 0 1 | 1 0 1 0 1 0 1 0
li=10 opc=0001 rs2=0110 imm=0x55AAi16
other new instructions, like dadd r6, r5, r4
, are encoded…. similarly
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | 0 1 0 1 0 0 0 0 | 0 1 1 0 0 1 0 0
"li=10" reserved? rs2=0101 opc=0000 rd=0110 rs=0100
li
still means “read two bytes”! they’re just not an immediate anymore. wild.
opcode selectors move around!
in RXv3, with the new double-precision instructions, there is an interesting consistency decision to note…
consider the {dadd,dsub,dmul}
encoding pattern of
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd ] [ rs ]
for these instructions, the exact opcode is chosen by the four opc
bits in the low nibble of the third byte. sure, that’s fine! one of the possible opcodes here is dcmp
, whose condition is indicated by the value of rd
. this means that dcmp
is encoded like:
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd ] [ rs ]
opc=0111 rd=cm={.., UN, EQ, ..}
or, an instruction like double-OP src, src2
and dest
repurposed otherwise.
this is in contrast of other two-operand instructions like dabs
, encoded like:
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs ] [ opc ] | [ rd ] [ opc2]
opc=1100 opc2=0001
where the instruction has a skeleton more like double-OP src, dest
, with rs
being the repurposed field. this follows! the instruction no longer has two source operands, but does have a destination operand.
i’m deeply curious why rs
is the repurposed field here, rather than rs2
. in that case, the “opcode” would be the third byte in its entirety, which seems like a nice property on its own. alternatively, maybe keeping the semantics of register selector bits the same simplifies decoder hardware…
float instruction encodings
the three-operand forms of float instructions have similar mappings from bits to opcodes, compared to scalar operations.
bits | scalar | float |
---|---|---|
0000 |
sub |
fsub |
0001 |
cmp |
undef |
0010 |
add |
fadd |
0011 |
mul |
fmul |
this does not continue to be the case for double-precision instructions, unfortunately. for those instructions, 0001
tends to select dadd
, rather than leave space for a future fcmp
.
bitfields
bfmov
and bfmovz
include a triplet of immediates to describe “move N bits starting from bit A out of source and into dest at bit B”. the manual then goes on to say,
If (slsb + width) > 32 and (dlsb + width) > 32, then dest becomes undefined.
… but that implies that if only one of the two overflows, dest is well-defined somehow? i think the manual means or
in that sentence, alas.