Building a RISC-V simulator in Rust - Part 1

As way to gain some more experience using Rust I decided to implement a RISC-V instruction set simulation in Rust that I have imaginatively called rrs, the Rust RISC-V Simulator. So far I’ve implemented the RV32IM instruction set (For those unfamiliar with RISC-V this is all the basic instructions, ALU operations, branching, load/store, multiplication/division etc without any of the OS/system level stuff like exceptions or memory management and none of the more complex/interesting extensions like atomics).

The full code for rrs can be found on github, this blog was written for the c2bb07e commit

If you’re not familiar with Rust it’s a compiled language initially developed by Mozilla as something that could replace C or C++ that provides various memory and thread safety guarantees to eliminate whole classes of bugs (e.g. buffer overruns). It can achieve this whilst still providing high performance. I won’t discuss Rust at length here just a few aspects that are relevant to rrs. The Rust Book is a great way to learn the language if you want to find out more. Learn Rust With Entirely Too Many Linked Lists is another good read.

My initial aims with rrs were to produce something simple and modular without a big focus on performance (though I didn’t want it to be too slow). Plus as this was a project to learn the language I wanted to use a reasonable set of Rust features to implement it.

To aid modularity it is split into two separate parts (crates in Rust terms), rrs-cli and rrs-lib. rrs-lib is a collection of components that can be used to implement the ISS but offer flexibility in how they’re used. The idea being that someone who wants ISS components but not just a monolithic ISS could use rrs-lib to implement whatever it is they’re doing. For example a cycle accurate performance model of a CPU core would need to emulate RISC-V instructions but needs much more besides. It could use rrs-lib to implement RISC-V emulation and build the model on top of it.

Another possible use would be co-simulation for validating a CPU design. The simulation of the CPU (e.g. written in a hardware description language like Verilog) runs alongside a RISC-V simulation (using rrs-lib) being continuously cross-checked ensuring instructions executed on the CPU do what the RISC-V simulation says they should.

The second half is rrs-cli. This is a command line application that uses rrs-lib to execute compiled RISC-V binaries. At time of writing it has a very basic feature set. It can setup a simulated memory of any requested size at a given address, load a raw binary into it and then run it. It provides a couple of ‘devices’ that allow the program to output to stdout and terminate the simulation.

Processing Instructions

There are various strategies for implementing a instruction set simulator. A typical way is to use a giant switch case statement at its core. An instruction is fetched from a simulated memory, the opcode extracted, then the switch case used to execute the instruction based upon it’s opcode. In pseudo-C:

1
2
3
4
5
6
7
8
uint32_t instruction = fetch_next_instruction();
uint32_t opcode = instruction & opcode_mask;

switch (opcode) {
  case OPCODE_X: execute_x(instruction);
  case OPCODE_Y: execute_y(instruction);
  ....
}

This can be implemented using a structure called a ‘jump table’ where instead of having to execute many comparisons to determine which opcode you have a table of code addresses is constructed. The opcode is extracted from the instruction then used to lookup in this table and you jump to the location given. Where you have many opcodes this can provide optimal performance.

This switch case statement structure is what I’ll use in rrs-lib though with a slight modifications to aid modularity.

Rust doesn’t actually have a switch statement, instead it has match which can do much the same thing but is far more powerful as it allows pattern matching using Rust’s advanced type system. For the instruction processing this pattern matching isn’t used but I’ll have an example using pattern matching later.

To give the modularity I wanted I created a trait which is a Rust concept similar to an interface in other languages such as Java or C# or a pure virtual class in C++. I called the trait InstructionProcessor and it contains one function per RISC-V instruction implemented. Here’s part of the definition. I’ve removed much of it for compactness but you should get the idea

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
pub trait InstructionProcessor {
    type InstructionResult;

    fn process_add(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult;
    fn process_or(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult;

    // ...

    fn process_addi(&mut self, dec_insn: instruction_formats::IType) -> Self::InstructionResult;
    fn process_ori(&mut self, dec_insn: instruction_formats::IType) -> Self::InstructionResult;

    // ...

    fn process_lui(&mut self, dec_insn: instruction_formats::UType) -> Self::InstructionResult;
    fn process_auipc(&mut self, dec_insn: instruction_formats::UType) -> Self::InstructionResult;

    // ...

    fn process_beq(&mut self, dec_insn: instruction_formats::BType) -> Self::InstructionResult;
    fn process_bne(&mut self, dec_insn: instruction_formats::BType) -> Self::InstructionResult;
    fn process_blt(&mut self, dec_insn: instruction_formats::BType) -> Self::InstructionResult;

    // ...

    fn process_lw(&mut self, dec_insn: instruction_formats::IType) -> Self::InstructionResult;
    fn process_sw(&mut self, dec_insn: instruction_formats::SType) -> Self::InstructionResult;

    // ...

    fn process_jal(&mut self, dec_insn: instruction_formats::JType) -> Self::InstructionResult;

    // ...

    fn process_mul(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult;
    fn process_div(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult;

    // ...
}

A trait is implemented on a struct which is Rust’s way of describing structured data. You can implement functions for that struct (similar to classes in C++/C#/Java but Rust is not an object orientated language so they’re not identical concepts). Each function in the InstructionProcessor trait takes a reference to the struct it is implemented on and a decoded version of the instruction to be processed and returns some result type (Self::InstructionResult). RISC-V has a variety of standard instruction formats (RType, IType etc) and I created a Rust struct for each of them. Each struct includes a function to create itself from a encoded instruction. Here’s the RType struct definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
pub struct RType {
    pub funct7: u32,
    pub rs2: usize,
    pub rs1: usize,
    pub funct3: u32,
    pub rd: usize,
}

impl RType {
    pub fn new(insn: u32) -> RType {
        RType {
            funct7: (insn >> 25) & 0x7f,
            rs2: ((insn >> 20) & 0x1f) as usize,
            rs1: ((insn >> 15) & 0x1f) as usize,
            funct3: (insn >> 12) & 0x7,
            rd: ((insn >> 7) & 0x1f) as usize,
        }
    }
}

The provides modularity as you can implement many different types of InstructionProcessor to do different things that operate on decoded instructions. Instruction execution is one thing we want to do but there are others . Disassembly is one example, which we can implement by building an InstructionProcessor that produces a string describing the instruction. Another example would be an instruction profiler, something that counts the number of times a particular instruction is used.

A process_instruction function takes an encoded instruction and an InstructionProcessor and calls the appropriate process_X function. This is where we implemented our big match statement like our pseudo-C example above. As RISC-V decoding isn’t as simple as masking out a single set of bits to identify what instruction to execute this is implemented over multiple functions. Here’s process_instruction in full:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
pub fn process_instruction<T: InstructionProcessor>(
    processor: &mut T,
    insn_bits: u32,
) -> Option<T::InstructionResult> {
    let opcode: u32 = insn_bits & 0x7f;

    match opcode {
        instruction_formats::OPCODE_OP => process_opcode_op(processor, insn_bits),
        instruction_formats::OPCODE_OP_IMM => process_opcode_op_imm(processor, insn_bits),
        instruction_formats::OPCODE_LUI => {
            Some(processor.process_lui(instruction_formats::UType::new(insn_bits)))
        }
        instruction_formats::OPCODE_AUIPC => {
            Some(processor.process_auipc(instruction_formats::UType::new(insn_bits)))
        }
        instruction_formats::OPCODE_BRANCH => process_opcode_branch(processor, insn_bits),
        instruction_formats::OPCODE_LOAD => process_opcode_load(processor, insn_bits),
        instruction_formats::OPCODE_STORE => process_opcode_store(processor, insn_bits),
        instruction_formats::OPCODE_JAL => {
            Some(processor.process_jal(instruction_formats::JType::new(insn_bits)))
        }
        instruction_formats::OPCODE_JALR => {
            Some(processor.process_jalr(instruction_formats::IType::new(insn_bits)))
        }
        _ => None,
    }
}

At the top-level it reads an opcode and depending on which opcode it is we call into another function that deals with that opcode, e.g. here’s part of process_opcode_op (which deals with the basic ALU instructions e.g. add x1, x2, x3):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
fn process_opcode_op<T: InstructionProcessor>(
    processor: &mut T,
    insn_bits: u32,
) -> Option<T::InstructionResult> {
    let dec_insn = instruction_formats::RType::new(insn_bits);

    match dec_insn.funct3 {
        0b000 => match dec_insn.funct7 {
            0b000_0000 => Some(processor.process_add(dec_insn)),
            0b000_0001 => Some(processor.process_mul(dec_insn)),
            0b010_0000 => Some(processor.process_sub(dec_insn)),
            _ => None,
        },
        // ...
        0b111 => match dec_insn.funct7 {
            0b000_0000 => Some(processor.process_and(dec_insn)),
            0b000_0001 => Some(processor.process_remu(dec_insn)),
            _ => None,
        },
        _ => None,
    }
}

process_opcode_op does another level of matching to work out what instruction we have, decodes the instruction into an RType structure and calls the relevant function in the supplied InstructionProcessor.

process_instruction (and in turn the process_X functions it calls) returns an Option type. This can be Some or None. It returns None where the instruction couldn’t be decoded (it’s an invalid instruction or from an extension rrs doesn’t yet implement). When it returns Some this includes the InstructionResult from the InstructionProcessor.

Notably I’ve used Rust generics here. Say we implemented something like this in C++ we could use an InstructionProcessor class with all virtual functions. We’d pass a pointer to the class to process_instruction and it would call the relevant one. This will work but forces a virtual function call for each instruction processed, which is more expensive than a normal function call and cannot be inlined.

We could do something similar in Rust but by using generics process_instruction knows what kind of InstructionProcessor it is being used with and can do a normal function call instead. This also gives the compiler the opportunity to inline the function calls so our nice nested structure with the modular InstructionProcessor could turn into a single monolithic function in machine code and remove all function call overhead. Whether it actually does this (and whether that would actually be the most optimal thing to do) is anyone’s guess but it offers the compiler good opportunities for optimisation. I may delve into what code is actually being generated at a later date.

(Note you could do the same thing in C++ with templates)

Executing Instructions

With process_instruction in place we can implement an InstructionProcessor that executes instructions. First we need a struct to represent our CPU state. RISC-V uses the term ‘hart’ meaning hardware thread to mean a set of state we execute instructions against, so that struct is called HartState:

1
2
3
4
5
6
7
8
9
pub struct HartState {
    /// x1 - x31 register values. The contents of index 0 (the x0 zero register) are ignored.
    pub registers: [u32; 32],
    /// Program counter
    pub pc: u32,
    /// Gives index of the last register written if one occurred in the previous instruciton. Set
    /// to `None` if latest instruction did not write a register.
    pub last_register_write: Option<usize>,
}

In order to implement the zero register (which ignores writes and always reads 0) two functions are implemented on this struct for reading and writing registers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
impl HartState {
    /// Write a register in the hart state. Used by executing instructions for correct zero
    /// register handling
    fn write_register(&mut self, reg_index: usize, data: u32) {
        if reg_index == 0 {
            return;
        }

        self.registers[reg_index] = data;
        self.last_register_write = Some(reg_index)
    }

    /// Read a register from the hart state. Used by executing instructions for correct zero
    /// register handling
    fn read_register(&self, reg_index: usize) -> u32 {
        if reg_index == 0 {
            0
        } else {
            self.registers[reg_index]
        }
    }
}

A separate struct InstructionExecutor will implement InstructionProcessor. We could implement this directly on HartState but the separate struct keeps things decoupled.

1
2
3
4
5
pub struct InstructionExecutor<'a, M: Memory> {
    /// Memory used by load and store instructions
    pub mem: &'a mut M,
    pub hart_state: &'a mut HartState,
}

We’ll discuss the mem field (that uses the Memory trait) below.

Before we start implementing the various process_X functions we need to decide what return type will be used. We’ll use the following:

type InstructionResult = Result<bool, InstructionException>;

The Result can be an Ok or an Err. Where it is Ok a bool is provided. This is true when the instruction updates the PC (e.g. because it is a jump or a taken branch) and false otherwise. When the instruction execution results in a exception the result is an Err providing an InstructionException which describes the exception.

This done we can start implementing the process_X instructions, which is straight-forward but very repetitive, for example here’s an add implementation:

1
2
3
4
5
6
7
8
fn process_add(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult {
    let a = self.hart_state.read_register(dec_insn.rs1);
    let b = self.hart_state.read_register(dec_insn.rs2);
    let result = a + b;
    self.hart_state.write_register(dec_insn.rd, result);

    Ok(false)
}

This works fine but many other instructions will look almost identical, with just the let result line using a different operator. The addi instruction will also be very similar, this time using the same operation but taking the b from an immediate rather than a register read. Can we minimise this repetition?

We’ll do this in two steps, the first is implementing functions on InstructionExecutor to do some of the repetitive work, such as:

1
2
3
4
5
6
7
8
9
fn execute_reg_reg_op<F>(&mut self, dec_insn: instruction_formats::RType, op: F)
where
    F: Fn(u32, u32) -> u32,
{
    let a = self.hart_state.read_register(dec_insn.rs1);
    let b = self.hart_state.read_register(dec_insn.rs2);
    let result = op(a, b);
    self.hart_state.write_register(dec_insn.rd, result);
}

This does the work of functions like process_add above but it takes the operation to perform as an argument (you provide it with a ‘closure’ which effectively allows you to pass a function as an argument). process_add can then be implemented as:

1
2
3
4
5
fn process_add(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult {
  self.execute_reg_reg_op(dec_insn, |a, b| a + b)

  Ok(false)
}

As execute_reg_reg_op uses generics the use of closures here shouldn’t cause a performance issue. The compiler can inline the implementation rather than performing an extra function call to compute the operation.

It’d be nice to cut down the boilerplate further, to do so we’ll turn to macros. In C and other C like languages macros are purely text based, arguments passed to a macro are textually inserted within it and whatever results fed to the compiler. In Rust macros work a bit differently 1. They work on things called ‘token trees’ which are part of the Rust compiler’s parsing process. When defining a macro we specify kinds of token tree its arguments must match against (e.g. an expression or an identifier) and then an expansion for that match. We can define a macro that produces functions like process_add for us:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
macro_rules! make_alu_op_reg_fn {
    ($name:ident, $op_fn:expr) => {
        paste! {
            fn [<process_ $name>](
                &mut self,
                dec_insn: instruction_formats::RType
            ) -> Self::InstructionResult {
                self.execute_reg_reg_op(dec_insn, $op_fn);

                Ok(false)
            }
        }
    };
}

To implement add we’d use it like this:

make_alu_op_reg_fn! {add, |a, b| a + b}

Much more compact with a nice clear relation between the instruction name and the operation it performs. We can go one step further to remove the repetition between instructions like add and addi. Both implement the same operation and given the name of one we can work out the name of the other. So we define another macro that implements the register/register version and the register/immediate version together

1
2
3
4
5
6
macro_rules! make_alu_op_fns {
    ($name:ident, $op_fn:expr) => {
        make_alu_op_reg_fn! {$name, $op_fn}
        make_alu_op_imm_fn! {$name, $op_fn}
    };
}

The make_alu_op_imm_fn is much like the make_alu_op_reg_fn above just implementing the register/immediate version of the operation and adding an ‘i’ to the function name. With this all done we get a nice compact implementation of the various register/register and register/immediate operations

1
2
3
4
5
6
7
make_alu_op_fns! {add, |a, b| a.wrapping_add(b)}
make_alu_op_reg_fn! {sub, |a, b| a.wrapping_sub(b)}
make_alu_op_fns! {slt, |a, b| if (a as i32) < (b as i32) {1} else {0}}
make_alu_op_fns! {sltu, |a, b| if a < b {1} else {0}}
make_alu_op_fns! {or, |a, b| a | b}
make_alu_op_fns! {and, |a, b| a & b}
make_alu_op_fns! {xor, |a, b| a ^ b}

Note the use of wrapping_add and wrapping_sub. This is due to Rust’s approach to safety. It considers an overflow in an addition or a subtraction an error, which it checks for in debug builds and causes a panic (which kills the program with an error message) if it detects one. The wrapping_add and wrapping_sub tell Rust we don’t consider overflows to be an error and want to wrap around when one happens which matches the semantics of the RISC-V add and sub instructions.

I’ll skip the explanations for the other instructions so check out the source if you want to see the details for those.

Disassembling Instructions

To implement instruction disassembly we implement a new InstructionProcessor that has an InstructionResult type of String. We’ll implement InstructionProcessor on a new struct InstructionStringOutputter. As with InstructionExecutor the implementation of an individual instruction processing function is straight-forward and repetitive:

1
2
3
fn process_add(&mut self, dec_insn: instruction_formats::RType) -> Self::InstructionResult {
  format!("add x{}, x{}, x{}", dec_insn.rd, dec_insn.rs1, dec_insn.rs2)
}

Once again we turn to macros to cut down on the repetition. This time we’ve got groups of function where the only thing changing is the function name and the mnemonic (add, sub, or etc). We can use another macro feature here, variable arguments:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
macro_rules! string_out_for_alu_reg_op {
    ($name:ident) => {
        paste! {
            fn [<process_ $name>](
                &mut self,
                dec_insn: instruction_formats::RType
            ) -> Self::InstructionResult {
                format!("{} x{}, x{}, x{}", stringify!($name), dec_insn.rd, dec_insn.rs1,
                    dec_insn.rs2)
            }
        }
    };
}

// ...

macro_rules! string_out_for_alu_ops {
    ($($name:ident),*) => {
        $(
            string_out_for_alu_reg_op! {$name}
            string_out_for_alu_imm_op! {$name}
        )*
    }
}

The string_out_for_alu_reg_op implements our disassembly function for a particular named instruction. string_out_for_alu_ops takes multiple instruction names and applies string_out_for_alu_reg_op and string_out_for_alu_imm_op to each of them in turn (string_out_for_alu_imm_op is the immediate processing version of string_out_for_alu_reg_op). This gives us a nice compact implementation of InstructionProcessor:

1
2
3
4
5
6
7
impl InstructionProcessor for InstructionStringOutputter {
    type InstructionResult = String;

    string_out_for_alu_ops! {add, slt, xor, or, and}

    //....
}

This gives us something that given a 32-bit RISC-V instruction can output the disassembly. This can be used by the ISS to output disassembled instructions to a log file as they are executed like so:

1
2
3
4
5
6
// `insn_pc` used to disassemble branch instructions, not required in this
// example but the field must be set to something.
let mut outputter = InstructionStringOutputter { insn_pc: 0 };

// Result is "addi x17, x12, 123"
let disassembly = rrs_lib::process_instruction(&mut outputter, 0x07b60893);

Again I’ve skipped over implementations for the other instructions, check out the source to see how they work.

Memories

We need a way to implement a simulated memory. On top of ‘normal’ RAM, devices are also represented as memories. For instance you could have a memory that when written to outputs the byte written as a character to standard out. So we’ll have a new trait Memory so we can implement different kinds of memory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
pub trait Memory: Downcast {
    /// Read `size` bytes from `addr`.
    ///
    /// `addr` must be aligned to `size`.
    /// Returns `None` if `addr` doesn't exist in this memory.
    fn read_mem(&mut self, addr: u32, size: MemAccessSize) -> Option<u32>;

    /// Write `size` bytes of `store_data` to `addr`
    ///
    /// `addr` must be aligned to `size`.
    /// Returns `true` if write succeeds.
    fn write_mem(&mut self, addr: u32, size: MemAccessSize, store_data: u32) -> bool;
}

A straight-forward interface with one function to read and another to write is all we need, though it does have one peculiarity that’s needed due to how Rust works and that’s the Downcast bit.

Before explaining let’s take a look at a Memory implementation, MemorySpace. The HartState structure has a single reference to a Memory that is used for all load and store instructions but we may want multiple memories (e.g. a ‘normal’ RAM along with some devices). MemorySpace allows us to do this by holding multiple memories along with a base address and size. When it gets a read_mem or write_mem call it works out which memory the address belongs to and forward the call on. It uses the following function to add memories:

1
2
3
4
5
6
7
8
pub fn add_memory(
    &mut self,
    base: u32,
    size: u32,
    memory: Box<dyn Memory>,
) -> Result<usize, MemorySpaceError> {
  //...
}

The Box<dyn Memory> is a dynamic reference to a Memory trait. Unlike when you’re using generics add_memory and MemorySpace itself don’t know the actual type being used so a dynamic dispatch mechanism is used when calling read_mem and write_mem. This is useful here as it allows MemorySpace to have many different kinds of memory.

Users of rrs-lib may want to retain a reference to any memory they add to a MemorySpace (if it’s a device there may be some other way it needs to be interacted with or they may want ‘backdoor’ access to the memory rather than using the read_mem/write_mem functions). In any other language this would be simple, just hold a copy of the reference elsewhere, this is harder in Rust due to its safety properties.

Rust provides automatic memory management without needing a garbage collector as well as providing various safety properties. It does this with some strict rules around what can have a reference to something and ensuring any allocated memory has a single owner.

A reference can be handed out to a structure provided that reference doesn’t out live the owner of the structure. Then when the owner goes out of scope Rust knows it can safely deallocate the memory. Multiple references can exist to a structure but only one mutable reference can exist (which allows you to alter the structure). Something called the ‘borrow checker’ enforces these rules ensuring reference lifetimes are strictly shorter or the same as owner lifetimes. The borrow checker runs at compile time. This gives automatic memory management without run-time overhead but does restrict what you’re able to do.

The add_memory function takes ownership of whatever memory is added to it and the borrow checker won’t allow whatever calls add_memory to retain a reference (mutable or otherwise) to the memory as the borrow checker can’t ensure the MemorySpace outlives the reference at compile time.

The way around this is to add functionality to MemorySpace which allows you to get a reference or a mutable reference to a particular memory. add_memory returns an index (the usize in the Result) which can be used to get a reference to the memory again at a later point in the program.

We’re not quite done though, this is where the Downcast is important 2. As you might expect Rust doesn’t just let you convert between types as you wish. In C++ for instance if we had a Memory* we could convert it to some other SpecificMemory*. In Rust we cannot just convert our Box<dyn Memory> into a Box<SpecificMemory> or a & SpecificMemory. The Downcast bit in our Memory allows us to do this though, in a safe, checked way. Here’s the implementation of get_memory_ref that gives you a reference to a previously added memory.

1
2
3
4
5
6
/// Get a reference to an inner memory
///
/// This performs downcasting to the provided `T`. `None` is returned where the downcast fails.
pub fn get_memory_ref<T: Memory>(&self, index: usize) -> Option<&T> {
    self.memory_regions.get(index)?.memory.downcast_ref::<T>()
}

This is a generic function, a caller specifies the type they expect a memory to be and it returns an Option which will be Some if the memory is of that type, providing a reference to it and None if the memory is of a different type or the index doesn’t exist.

Bringing it all together

We’ve now got all the bits and pieces we need to execute a RISC-V binary we just need to bring them together. First let’s look at the step function of InstructionExecutor. This fetches an instruction from memory, executes it, then increments the PC to point to the next instruction as needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/// Execute instruction pointed to by `hart_state.pc`
///
/// Returns `Ok` where instruction execution was successful. `Err` with the relevant
/// [InstructionException] is returned when the instruction execution causes an exception.
pub fn step(&mut self) -> Result<(), InstructionException> {
    self.hart_state.last_register_write = None;

    // Fetch next instruction from memory
    if let Some(next_insn) = self.mem.read_mem(self.hart_state.pc, MemAccessSize::Word) {
        // Execute the instruction
        let step_result = process_instruction(self, next_insn);

        match step_result {
            Some(Ok(pc_updated)) => {
                if !pc_updated {
                    // Instruction didn't update PC so increment to next instruction
                    self.hart_state.pc += 4;
                }
                Ok(())
            }
            // Instruction produced an error so return it
            Some(Err(e)) => Err(e),
            // Instruction decode failed so return an IllegalInstruction as an error
            None => Err(InstructionException::IllegalInstruction(
                self.hart_state.pc,
                next_insn,
            )),
        }
    } else {
        // Return a FetchError as an error if instruction fetch fails
        Err(InstructionException::FetchError(self.hart_state.pc))
    }
}

This uses the pattern matching with the match statement I reference above. process_instructon returns a nested data type Option<Result<bool, InstructionException>>. The outer option is None if the instruction can’t be decoded. If it is Some we either have a Ok in the inner result or an Err. We can use a single match to cover all possibilities by using pattern matching on the inner structure.

That’s it for functionality in rrs-lib. I won’t go over all the details of how rrs-cli works let’s just take a look at the function that runs the simulation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
fn run_sim(sim_environment: &mut SimEnvironment) {
    let mut executor = InstructionExecutor {
        hart_state: &mut sim_environment.hart_state,
        mem: &mut sim_environment.memory_space,
    };

    let mut insn_count: u64 = 0;
    let start = Instant::now();

    loop {
        if let Some(log_file) = &mut sim_environment.log_file {
            // Output current instruction disassembly to log
            let insn_bits = executor
                .mem
                .read_mem(executor.hart_state.pc, MemAccessSize::Word)
                .unwrap_or_else(|| panic!("Could not read PC {:08x}", executor.hart_state.pc));

            let mut outputter = InstructionStringOutputter {
                insn_pc: executor.hart_state.pc,
            };

            writeln!(
                log_file,
                "{:x} {}",
                executor.hart_state.pc,
                rrs_lib::process_instruction(&mut outputter, insn_bits).unwrap()
            ).expect("Log file write failed");
        }

        // Execute instruction
        executor.step().expect("Exception during execution");

        insn_count += 1;

        // Stop if stop requested by emulated binary via SimulationCtrlDevice
        if executor
            .mem
            .get_memory_ref::<SimulationCtrlDevice>(sim_environment.sim_ctrl_dev_idx)
            .unwrap()
            .stop
        {
            break;
        }

        if let Some(log_file) = &mut sim_environment.log_file {
            if let Some(reg_index) = executor.hart_state.last_register_write {
                // Output register written by instruction to log if it wrote to one
                writeln!(
                    log_file,
                    "x{} = {:08x}",
                    reg_index, executor.hart_state.registers[reg_index]
                ).expect("Log file write failed");
            }
        }
    }

    let elapsed = start.elapsed();
    let mhz = (insn_count as f64) / (elapsed.as_micros() as f64);
    println!(
        "{} instructions executed in {} ms {} MHz",
        insn_count,
        elapsed.as_millis(),
        mhz
    );
}

It sets up an InstructionExecutor with a supplied HartState and MemorySpace which are created by rrs-cli before it calls run_sim. It enters a loop and steps the executor optionally writing disassembly (using InstructionstringOutputter) and register writes out to a log. There is a special SimulationCtrlDevice memory that we can write to to stop the simulation. MemorySpace::get_memory_ref is used to access this and see if a stop has been asked for terminating the simulation if so.

Performance and Testing

Finally a word on the performance of rrs and the testing methodology.

As way to test performance and test the implementation I ran the CoreMark benchmark on rrs. The actual benchmark result is meaningless but what is interesting is how many instructions per second rrs can execute. A run of a 1000 CoreMark iterations on my Thinkpad 490s (with an Intel i7-8565U) executes 275063040 instructions in 5.28s, giving us an instructions per second of 52.1 MHz. My CPU has a turbo frequency of 4.6 GHz and a 5 way decode so could execute at most 23 Billion instructions per second giving a ratio of ~440 real CPU instructions per emulated RISC-V instruction and in reality we won’t be near that theoretical max instructions per second so the ratio may be more like 200 or lower. A pretty decent result given I wasn’t trying hard to achieve high performance. Room for improvement though, at a later date I may take a look at how to improve performance.

I ran the same Coremark binary on Spike using time to measure execution time, this gave a wall clock time of 8.852 vs 5.376 for rrs-cli (this is total run time including initialisation), so rrs is ~60% faster. Spike is a far more complete and featureful simulator so it’s not a entirely fair comparison but still rrs stacks up well.

CoreMark also helps test rrs. It computes a checksum during the benchmark and checks against an expected value. This would fail if the benchmark executed incorrectly. CoreMark should give a reasonable workout of the RV32IM instruction set so this is a decent test of functionality.

On top of running CoreMark I’ve got a small set of tests in Rust. So far these test:

In particular I don’t have any Rust tests exhaustively testing the execution of all instructions. Instead I plan to use tests like CoreMark and the RISC-V Architectural Test Framework. I’d want to run these tests regardless of other tests (as demonstrating succesful execution of binaries is clearly important) and separate per instruction tests in Rust would mostly duplicate this so I decided to spend effort elsewhere.

What’s Next

There’s a few things I’d like to work on next in the rough order I’ll tackle them:

In doing some of these I’ll likely need to implement some more extensions (such as F for floating point and A for atomics).


  1. The Little Book of Rust Macros is good guide to Rust macros if you want to learn more ↩︎

  2. This isn’t a native language feature, it’s a external crate downcast-rs ↩︎