Unpacking CPU Cycles: A Closer Look at LW, SW, and BEQ Instructions

When we delve into the heart of how a computer processes instructions, we often encounter terms like 'single-cycle,' 'multicycle,' and 'pipelined' CPUs. These terms describe different architectural approaches to executing commands, and understanding them helps demystify the magic behind our digital devices. Today, let's zoom in on a few specific instructions – lw (load word), sw (store word), and beq (branch if equal) – and see how they fit into these execution models, particularly in the context of a single-cycle CPU.

Imagine a CPU as a highly organized workshop. In a single-cycle CPU, every instruction, no matter how simple or complex, must complete its entire journey from start to finish within one clock tick. This means all the necessary steps – fetching the instruction, decoding it, reading operands, performing the operation, and writing back the result – have to happen in rapid succession before the next clock pulse arrives. It's like a single, very busy assembly line where each worker has to finish their part before the product moves on, and the whole process for one item must be done before the next item even starts.

Let's break down lw, sw, and beq in this single-cycle world.

The lw (Load Word) Instruction

When a lw instruction comes along, say lw rd, base(rs1), it's asking the CPU to fetch a value from memory and put it into a specific register (rd). The memory address is calculated by adding the value in register rs1 (the base address) to an offset (which is implicitly handled here, but in the reference material, it's shown as base+rs1).

In a single-cycle CPU, this means the CPU needs to:

  1. Fetch the instruction: Get the lw instruction itself from memory.
  2. Decode it: Figure out what lw means and identify rd, rs1, and the offset.
  3. Read rs1: Get the value from the rs1 register.
  4. Calculate the memory address: Add rs1's value to the offset.
  5. Access Data Memory: Go to the calculated address in memory and read the data.
  6. Write to Register File: Take the data read from memory and write it into the rd register.

Crucially, for the CPU to update its state (like the PC register or the Register File) correctly, these updates typically happen on the rising edge of the clock. This means all the calculations leading up to the update must be completed before that rising edge. For lw, the PC needs to be updated to PC+4 (to point to the next instruction), and the Register File needs to be updated with the data fetched from memory. The data path must be designed so that the correct values are available at the inputs of the PC Register and Register File just before the clock edge.

The sw (Store Word) Instruction

Now, sw is the inverse of lw. An instruction like sw rt, base(rs1) takes a value from register rt and stores it into memory at an address calculated from rs1 and an offset. The process is similar but ends differently:

  1. Fetch and Decode: Similar to lw.
  2. Read rs1 and rt: Get values from both registers.
  3. Calculate the memory address: Add rs1's value to the offset.
  4. Access Data Memory: Write the value from rt into memory at the calculated address.

Unlike lw, sw doesn't modify a destination register. It modifies the Data Memory. The PC still needs to advance to PC+4, and this update, along with any other state changes, occurs at the clock's rising edge. The key difference is that the Register File isn't being written to with a new value from memory; instead, Data Memory is being written to.

The beq (Branch if Equal) Instruction

beq introduces a conditional element. An instruction like beq rs1, rs2, offset checks if the values in rs1 and rs2 are equal. If they are, the program counter (PC) is updated to a new address (calculated using the offset); otherwise, it simply moves to PC+4.

In a single-cycle design, this requires:

  1. Fetch and Decode: Standard steps.
  2. Read rs1 and rs2: Get values from both registers.
  3. Compare values: An ALU (Arithmetic Logic Unit) performs a subtraction. If the result is zero, the registers were equal.
  4. Calculate Branch Target Address: If the values are equal, a new PC address is computed (often PC + 4 + sign-extended offset).
  5. Update PC: Based on the comparison result, the PC is updated to either PC+4 or the calculated branch target address.

This instruction highlights the need for multiplexers and control signals. The PC's next value depends on the outcome of the ALU operation and a control signal that tells it whether to branch or not. Again, all this computation must be finalized before the clock's rising edge to update the PC Register correctly.

The Bigger Picture

While a single-cycle CPU is conceptually straightforward, it's often not the most efficient. The clock cycle must be long enough to accommodate the slowest instruction (like lw or sw which involve memory access). This means faster instructions might be waiting unnecessarily. This is where multicycle and pipelined architectures come in, breaking down instruction execution into smaller stages that can overlap, dramatically increasing throughput. But understanding the single-cycle model, especially for instructions like lw, sw, and beq, provides a fundamental building block for appreciating these more advanced designs.

Leave a Reply

Your email address will not be published. Required fields are marked *