Renesas RX

notes from writing yaxpeax-rx, largely from reading the rx v1/v2/v3 manuals:

rxv1: RX Family RXv1 Instruction Set Architecture (User’s Manual: Software), Rev. 1.30 (Dec 2019)
- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv1-instruction-set-architecture-users-manual-software-rev130
- sha256: e659dd509141da6bb1cfabf26c9f9ab5996d02060acaad2b5702963116834415
rxv2: RX Family RXv2 Instruction Set Architecture (User’s Manual: Software), Rev. 1.00 (Nov 2013)
- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv2-instruction-set-architecture-users-manual-software
- sha256: c12fc8d16adf1530f2cad3f75974d2a29062580a984a71fd9461417b66bba18a
rxv3: RX Family RXv3 Instruction Set Architecture (User’s Manual: Software), Rev. 1.00 (Nov 2018)
- retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv3-instruction-set-architecture-users-manual-software-rev100
- sha256: 829815515a57d077bdfa418e0e167b512f2a04b3db3613329a4d8980399cf74c

broadly: of all the instruction sets, this is definitely one of them. 16 general-purpose registers. some instructions have shorter-form encodings that use only three bits for register selection, rather than four. so i imagine a preference to use the low eight registers for code density reasons. i’m curious how that works out for real programs and compilers weighing register choice like that.

BMCnd stands out as an interesting instruction; Conditional bit transfer undersells it. it moves the state of a condition, 0 or 1, to the specified bit in a destination. the destination can either be a register or memory, and otherwise leaves the destination value unmodified. SCCnd is similar but behaves more like x86’s setcc instructions: set the entire destination byte/register to 0 or 1 depending on the condition.

rx v2

v2 adds a smattering of new instructions, and architectural extensions - see section 3.2 List of RXv2 Extended Instruction Set.

a second accumulator register was added, bringing the set to a0 and a1.
many instructions were extended to operate on either a0 or a1, in place of prior a0-only forms.
fsqrt! new! and 3-operand forms of fadd, fmul, and fsub.
and, accumulators are 72-bit now.

rx v3

v3 adds less, but also more. again, section 3.2 List of RXv3 Extended Instructions for exact info.

bfmov/bfmovz`, which i talk a bit more about below, for bulk bit transfers between words
a 3-operand form of xor, giving it parity with other instructions like add, sub, etc
AND AN ENTIRE SET OF DOUBLE-PRECISION INSTRUCTIONS AND 16 NEW DOUBLE-PRECISION REGISTERS.

practically speaking, the summaries here are accurate with i found when reading through the manuals’ contents. why did i have to read through the manuals meticulously?

decode table, or lack thereof

instruction encodings are listed in alphabetic order of instruction mnemonics. this is not amenable to writing a disassembler.. so i went through all three versions of the manual and transcribed encodings from the manual into a text file i could easily reorder. and so notes/encoding_table was born. reorder that to be approximately by bits, and notes/reordered_encodings. finally, i tried finding patterns across encodings and simplifying the total number of encodings across all instructions, and that left me with notes/grouped_encodings.

vendors! please do not make me write things like this!! i’m not good at it!!!

0 0 0 0 0 1 1 0 | mi  [ opc ] ld  | [ rs  ] [ rd  ]                   SUB src, dest (v1, v2, v3)
                  0 0 => B    0 0 => [Rs]                                                        
                  0 1 => W    0 1 => dsp:8[Rs]                                                   
                  1 0 => L    1 0 => dsp:16[Rs]                                                  
                  1 1 => UW   1 1 => Rs                                                          
    opc={sub, cmp, add, mul, and, or, X, X, see below}                                           
                                                                                                 
0 0 0 0 0 1 1 0 | mi  1 0 0 0 ld  | 0 0 0 [  opc  ] | [ rs  ] [ rd  ] SBB src, dest (v1, v2, v3) 
                  1 0 => L                                                                       
                  _ _ => invalid                                                                 
                              00 => [Rs]                                                         
                              01 => dsp:8[Rs]                                                    
                              10 => dsp:16[Rs]                                                   
    opc={                                                                                        
      sbb(mi=10,ld!=11), X, adc(mi=10,ld!=11), X,                                                
      max, min, emul, emulu,                                                                     
      div, divu, X, X                                                                            
      tst, xor, X, X,                                                                            
      xchg, itof, X, X,                                                                          
      X, utof(v2, v3), X, X,                                                                     
      X, X, X, X,                                                                                
      X, X, X, X,                                                                                
    }                                                                                            
                                                                                                 
0 0 0 0 1 [dsp]                                                       BRA.S src (v1, v2, v3)     
                                                                                                 
0 0 0 1 c [dsp]                                                       BCnd.S src (v1, v2, v3)    
        0 => beq/bz   (src = if dsp > 2 { dsp } else { dsp + 8 })                                
        1 => bne/bnz                                                                             
                                                                                                 
0 0 1 0 [ cnd ] | [    pcdsp    ]                                     BCnd.B src (v1, v2, v3)    
        cnd => {eq, ne, geu, ltu, gtu, leu, pz, n, ge, lt, gt, le, o, no, bra.b, Reserved}

the disassembler itself is largely transcription of this table into source code. including, unfortunately, a massive chain of if/else from 0b00000000 stopping at dozens of points on the way to 0b11111111. :’)

encoding notes

operands…

instructions with ld or ls fields encode an operand that is either [Reg], disp[Reg], or Reg (just the register, no memory access). some of these instructions, like the 06 encodings of sub, cmp, add, … also have a mi field that indicates how the memory operand is extended for use with the second operand - which may be used only as a second source, or sometimes used as a source+destination.

so, if ld is 0b11 indicating a Reg, and mi indicates, for example, .B meaning sign extension of a byte. but there is no indication in the manual that, for example, sub would have an encoding that would mean sub.b r1, r5. so what does mi = 0b00 = b mean for these instructions? no idea! yaxpeax-rx assumes the bits are ignored for direct register operands. someone please prove this wrong! or right. either is fine.

stnz/stz v2+ encoding typo

encoding (2) of both of these instructions is a new extension in RXv2. unfortunately the manual has a typo: it says that stnz encoding 2 looks like…

(2) STNZ src, dest                                   
                                                     
b7           b0 | b7           b0 | b7           b0a 
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs  ] [ rd  ]  
                          ^^^^^^^ relevant

while encoding 2 of stz…

(2) STZ src, dest                                    
                                                     
b7           b0 | b7           b0 | b7           b0a 
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs  ] [ rd  ]  
                          ^^^^^^^ same as above!

are stz and stnz somehow encoded the same? confusion abounds. internet dog the6p4c had the good idea to check binutils to cross check with what Renesas themselves might have said on the matter. they found:

[PATCH v2][RX] Add RXv2 Instructions

+                                                   
+/** 1111 1100 0100 1011 rsrc rdst  stz %1, %0 */   
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);       
+                                                   
+/** 1111 1100 0100 1111 rsrc rdst  stnz  %1, %0 */ 
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);

which pretty clearly says “stz has the low bits of 1011”, “stnz has the low bits of 1111”. confusion resolved. EXCEPT: this includes a different copy/paste error! both instructions here have S2cc(RXC_z). there’s a followup commit for this,

commit 239efab16429cad466591ccd1c57bba786171765             
Author: Yoshinori Sato <ysato@users.sourceforge.jp>         
Date:   Thu Dec 17 01:42:34 2015 +0900                      
                                                            
    RXv2 support update                                     
                                                            
    2015-12-22  Yoshinori Sato <ysato@users.sourceforge.jp> 
                                                            
    opcodes/                                                
            * rx-decode.opc (movco): Use uniqe id.          
            (movli): Likewise.                              
            (stnz): Condition fix.                          
                                                            
[...snip...]                                                
                                                            
 /** 1111 1100 0100 1111 rsrc rdst      stnz    %1, %0 */   
-  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);               
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_nz);              
                                                            
[...snip...]

so eventually everything ended up in the right state. but it’s very funny to look through the history and realize there were two copy-paste errors in different directions about these two instructions. cursed additions!

cmp…

cmp encoding (2), for cmp #uimm:8 could be read as the bit pattern

0 1 1 1 0 1 li  | [ opc ] [ rs2 ]

like cmp encoding (3), or similar encodings of mul, and, or, but with opc=0b101. it has the additional constraint of li=0b01 in such a reading, but this raises a question.. if opc=0b000 allows four immediate operand lengths - 8, 16, 24, and 32 bits, sign-extended to 32 bits - why not allow all operand lengths with zero-extension for opc=0b101?? alas.

double-precision instructions…

also in the area of

0 1 1 1 0 1 li ...

instructions, in RXv3 a new set of double-precision and related instructions were added. this makes another pattern with this encoding clearer: li picks the number of bytes to be read for operands, even though none of the operands are necessarily interpreted as an immediate.

li=0b01 usually represents a 32-bit immediate encoded as a sign-extended 8-bit value. so, read 0x7a, read a byte for the opcode and destination register, then read one byte for the immediate. but for instructions like int, the encoding works out as

0 1 1 1 0 1 0 1 | 0 1 1 0 0 0 0 0 | [ uimm:8 ]                              
            li=01 opc=0110 rd=0000  ^ and read the 1-byte immediate of li=01

RXv3 extends this - where a 2-byte immediate might involved in an instruction like

0 1 1 1 0 1 1 0 | 0 0 0 1 0 1 1 0 | 0 1 0 1 0 1 0 1 | 1 0 1 0 1 0 1 0
            li=10 opc=0001 rs2=0110 imm=0x55AAi16

other new instructions, like dadd r6, r5, r4, are encoded…. similarly

0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | 0 1 0 1 0 0 0 0 | 0 1 1 0 0 1 0 0
            "li=10"  reserved?      rs2=0101 opc=0000 rd=0110 rs=0100

li still means “read two bytes”! they’re just not an immediate anymore. wild.

opcode selectors move around!

in RXv3, with the new double-precision instructions, there is an interesting consistency decision to note…

consider the {dadd,dsub,dmul} encoding pattern of

0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd  ] [ rs  ]

for these instructions, the exact opcode is chosen by the four opc bits in the low nibble of the third byte. sure, that’s fine! one of the possible opcodes here is dcmp, whose condition is indicated by the value of rd. this means that dcmp is encoded like:

0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd  ] [ rs  ]       
                                            opc=0111  rd=cm={.., UN, EQ, ..}

or, an instruction like double-OP src, src2 and dest repurposed otherwise.

this is in contrast of other two-operand instructions like dabs, encoded like:

0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs  ] [ opc ] | [ rd  ] [ opc2]  
                                            opc=1100          opc2=0001

where the instruction has a skeleton more like double-OP src, dest, with rs being the repurposed field. this follows! the instruction no longer has two source operands, but does have a destination operand.

i’m deeply curious why rs is the repurposed field here, rather than rs2. in that case, the “opcode” would be the third byte in its entirety, which seems like a nice property on its own. alternatively, maybe keeping the semantics of register selector bits the same simplifies decoder hardware…

float instruction encodings

the three-operand forms of float instructions have similar mappings from bits to opcodes, compared to scalar operations.

bits	scalar	float
`0000`	`sub`	`fsub`
`0001`	`cmp`	`undef`
`0010`	`add`	`fadd`
`0011`	`mul`	`fmul`

this does not continue to be the case for double-precision instructions, unfortunately. for those instructions, 0001 tends to select dadd, rather than leave space for a future fcmp.

bitfields

bfmov and bfmovz include a triplet of immediates to describe “move N bits starting from bit A out of source and into dest at bit B”. the manual then goes on to say,

If (slsb + width) > 32 and (dlsb + width) > 32, then dest becomes undefined.

… but that implies that if only one of the two overflows, dest is well-defined somehow? i think the manual means or in that sentence, alas.