Computer Organization CH2 Instructions: Language of the Computer

2.1 Introduction

Instruction Set

Instructions

command a computer’s hardware
the words of a computer’s language

An Instruction set

vocabulary of the language
considered as the repertoire(劇目) of instructions of a computer

Different computers have diff. instruction set

with many aspect in common
a few basic operations that all computer must provide(有些基礎的運算是通用的)

Early computers had very simple instruction set

many modern computers also

The RISC-V instruction set

the example throughout this course
developed at UC Berkeley as open ISA in 2010

Typical of many modern ISAs
Similar ISAs have a large share of embedded core market
- In consumer electronics, network/storage equipment, cameras, printers

RV32 Instructions

32 registers

x0~x31, x0=0
32-bit words registers

Addressing words

byte addressable(位元組定位)
Sequential word accesses differ by 4 (4 byte = 32 bit)
E.g., Mem[0], Mem[4], Mem[8], …

Instruction can be divide into 6 group

arithmetic
data transfer
logical
shift
conditional branch
unconditional branch

2.2 Operation of the Computer Hardware

Arithmetic Operation

this instruction instructs a computer to add the two var. b and c to put their sum to a

add a, b, c

Instructions, add and substract, each has three operands

two source
one destination
- 兩個來源，一個目的地

Design Principle 1:Simplicity favors reqularity(簡單有利於規律性)

Regularity makes imple. simpler(規律簡化實作)
simplicity enables higher performance at lower cost(簡單能夠以低消耗，實現更高的表現)

2.3 Operand of the Computer Hardware

Compiling a C statement Example

C code with the five var. f, g, h, i and j

f = (g+h)-(i+j)

add t0, g, h    //temp t0 = g+h
add t1, i, j
sub f, t0, t1    //f = t0-t1

The single statement is braked into several instructions

1 operation is performed per RISC-V instruction
Temporary var. (t0 and t1) are used to keep the intermediate results

Register Opearnds

The operands of arithmetic instrction are kept in a limited number of special locations

called register

Registers are primitive(原語) used in hardware design

Register are also visible to the programmer when the computer is completed
think of register as the bricks of computer construction

A 32-bit RISC-V computer has 32 register

use for frequency accessed data
32-bit data is called a “word”
64-bit data is called a “double word”
- for a 64-bit machine, there could be 32 * 64-bit general purpose register x0 to x31
- 仍舊是32個暫存器，但每個暫存器的大小變成64 bit

Design Principle 2:Smaller is faster(小的速度更快)

main memory: million of location, but slower

Why 32 register?

31 register may not be faster than 32
- but designer have to balanc the clock cycle and more registers, and number of bits in the instruction format
A very large number of register may increase the clock cycle time
- takes electronic signals longer when they must travel farther
- 過ㄉㄚ會導致clock cycle time增加，因為需要電訊號需要跑更遠

RISC-V Register

The common arrangement for x0~x31 registers

x0: the constant value 0
x1: return address
x2: stack pointer
x3: global pointer
x4: thread pointer

x5-x7, x28-x31:temporaties (not reserved)
x8: frame pointer
x9, x18-x27: saved registers
x10-x11: function arguments/results (not reserved)
x12-x17: function arguments (not reserved)

Compiling a C statement Ex.

like previous part

f=(g+h)-(i+j)

In fact, a compiler has to associate program var. with registers

//f, g, h, i, j are in x19, x20, .., x23
add x5, x20, x21
add x6, x22, x23
sub x19, x5, x6 //x5 and x6 are for temporary data

Memory Operands

Main memory used for composite data(主記憶體被用於複雜的資料)

simple data ele.: int., float
complex data structure: array, dynamic data

Arithmetic operations occur only on register(運算操作只使用於暫存器上)

data transfer instruction are required to
- load values from memory into register
- store result from register to memory

memory is byte addressed

memory is considered as a large single dimentional array
- address is the index to this array
each memory address identifies an 8-bit byte (8個bit，其實就是一個byte)

RISC-V does not require words to be aligned in memory

unlike some ISAs
不需要對齊，也就是不用每個字都必須要從0, 4, 8, 12, …的位置開始
只需要每個字之間的間隔為4

RISV-C is Little Endian

least-significant byte at least address of a word
Big Endian
- most-significant byte at least address

Memory Operand Ex.

A is an array of 100 words

compiler associates the var. g and h with the registers x20 and x21
base address of A is in x22

g = h+A[8]

its RISC-V code

lw x9 8(x22)    //transfer A[8] to register x9
add x20, x21, x9

Memory address

Compiler helps

associate var. with registers
allocate data structure
- arrays and structures, to memory location

RISC-V is byte addressable

the address of a word matches the address of one of the 4 bytes within the word
address of sequential words differ by four(連續字之間的位置差了4 byte)
each array ele. is 4-byte data(其實就是一個word)

Alignment restriction

In many architectures, words musts start at addresses that are multiple of 4
- for better accessing efficiency
- e.g. MIPS

with the byte addressing concept
previous code become

lw x9, 32(x22) //32 = 8*4
add x20, x21, x9

A sw(store word) instruction is needed if the result should be stored into a memory location

sw x20, 48(x22) //48 = 12*4, A[12] = f

what is th offset when we handling the array in 32-bit machine?

each ele. is word

Registers vs. memory

Registers are faster tn access than memory

assuming 32-bit data
- register are roughly 200 times faster (0.25 vs. 50 nano ec.)
- 10,000 times more energy effcient (0.1 vs. 1000 picoJoules0)
- than DRAM in 2020

Operating on memory data requires loads and stores

more instruction to be executed

Many programs have more var. than computers registers

compiler must use register for var. as much as possible
“Spilling register” is putting less frequency used var. into memory
register optimization is important for execution performanc

Immediate Operands

Constant are often used in prog.

increment an index(constant) to point to the next ele. of an array
in fact, more than half of the RISC-V arithmetic instruction have a constant as an operand when running the SPEC CPU2006 benchmarks(一種CPU效能檢測的標準工具)

add immediate(addi) instruction can be used to avoid the load instruction

addi x22, x22, 4    //x22 = x22+4

The constant zero has a special role

which is to simplify instruction set by offering useful variations
- can negate value, by using the sub instruction with zero for the first operand
- RISC-V dedicates register x0 to be hard-wired(硬連線) to the value zero

2.4 Signed and Unsigned Numbers

Unsigned numbers(bits)

Two’s complement

Two’s complement signed numbers

Negation Shortcut

to negate a two’s complement number

,
simply invert every 0 to 1 and every 1 to 0
- add one to the result

sign extension shortcut

simply copied most significant bit to fill the new bits

copy sign bit

Examples: extends an 8-bit number to 16-bit number

+2: 0000 0010 => 0000 0000 0000 0010
–2: 1111 1110 => 1111 1111 1111 1110

In RISC-V

lb: sign-extend loaded byte
lbu: zero-extend loaded byte
- 補0

2.5 Representing Instructions in the Computer

Representing instruction

The way computers see instruction

Machine language

is the numeric version of instruction
binary representation used for communication within a computer system

Machine code

a sequenc of instruction encoded in binary numbers
- translated from assembly code
to avoid reading/writing long, tiresome binary string
- hexadecimal(base 16) numbers are often adopted
4 bit per hexadecimal(hex) digit
- Ex.

RISC-V instructions

encoded as 32-bit instruction words
Regularity
An instruction encodes
- operation code
- register numbers, etc.

RISC-V R-format instructions

Instruction fields

opcode: operaton code, denotating the operation and format of an instruction
rd: destination register number
funct3: 3-bit fun. code(additional opcode)
rs1: the first source register number
rs2: the second source register number
funct7: 7-bit function code(addition opcode)

I-format instructions

An instruction will need longer fields than those shown above

E.g., constant may be used to select ele. from arrays or data structure
often needs to be much larger than 31
- 5-bit field is too small to useful
for addi or lw inst.

Immediate arithmetic(addi) and load instructions(lw)

rs1: source register or base address number
immediate: constant operand, or offset added to base address(for load inst.)
- two’s-complement, sign extended numbers
- region of or 2048 bytes( or 512 words)

Design Priciple 3:Good design demands good compromises(好的設計需要好的妥協)

different formats compilcate decoding(不同的格式，會造成編碼上的困難)
- but allow 32-bit instruction uniformly
keep format as simplier as possible

S-format instructions

We also need a format for the store word inst.

sw, which needs two source registers
- the base addr.
- store data
- an immediate for the addr. offset
a little different from R-type inst.
- rd vs. imm
- same layout

format

rs1: base addr. register number
rs2: sourc operand register number
immediate: 12-bit immediate offset to base addr
- split so that rs1 and rs2 fields always in the same pos.

Summary of R-, I-, S-format inst.

The format are distinguished by the val. in the opcode field

each format is assigned a distinct set of opcode val. in the first field
hardware knows how to treat the rest of the inst.

The “Stored Program” concept

Instructions represented in binary, just like data
Instructions and data stored in memory

programs can operate on programs

E.g., compilers, linkers, …
Binary compatibility(二進位相容性) allows compiled programs to work on different computers
- Standardized ISAs

2.6 Logical Operations

Logical operations

It is useful to operate on fields of bits within a word or even on individual bits

examining characters within a word, each of which is stored as 8 bits(每個字符儲存為8位元，就是一個位元組)

These operations were added to programming languages
and instruction set architectures

simplify the packing & unpacking of bits into words
Useful for extracting and inserting groups of bits in a word

Shift Operations

Shifts move all the bits in a word to the left or right, filling the emptied bits with 0s(用0填滿空位)

immed: how many positions to shift

Shift left logical immediate

shift left
fill with 0

slli by i bits, multiplies by

slli x11, x19, 4 //reg x11 = reg 19 << 4 bits

Shift right logical immediate

shift right
fill with 0
srli by i bits divides by (unsigned only)

AND op.

To isolate fields via bit-by-bit operation

Useful to mask bits in a word
Select some bits, clear others to 0

OR and XOR op.

Useful to include bits in a word

Differencing operation

It can be used to emulate NOT operation
That is, it calculates a 1 only if the values are different in the two operands

2.7 Instructions for making Decisions

Conditional op.

Decision making is commonly represented in programming languages

using the if statement, sometimes combined with go to statements and labels

Two decision-making instructions (conditional
branches*) in RISC-V assembly language

beq (branch if equal)

beq rs1, rs2, L1(最大的跳躍範圍就是32bit)
This instruction means go to the statement labeled L1 if the value in register rs1 = the value in register rs2

bne (branch if not equal)

bne rs1, rs2, L1
It means go to the statement labeled L1 if the value in register rs1 does != the value in register rs2

*An instruction that tests a value and that allows for a subsequent transfer of control to a new address in the program based on the outcome of the test

A higher-level view of branches

Branch to a labeled instruction if a condition is true

Otherwise, continue sequentially

Decisions are important for if statements and for
loops (iterating a computation)

Compiling if state.

C code

if(i==j)
    f=g+h;
else
    f=g-h;
    
// note:f, g, ... in x19, x20, ...

Compiled RISC-V code

Exit is assembler calculates address

    bne x22, x23, Else
    add x19, x20, x21
    beq x0, 0, Exit
Else:
    sub x19, x20, x21
Exit:
    ...

While loop

C code

while(save[i] == k)
    i+=1;
// note: i in x22, k in x24, address of save in x25

Compiled RISC-V code

Loop:
    slli x10, x22, 2 //i代表index, 每個index是4個byte，所以i需要乘上4
    add x10, x10, x25
    lw x9, 0(x10)
    bne x9, x24, Exit
    addi x22, x22, 1
    beq x0, x0, Loop
Exit:
    ...

Basic Blocks

A basic block is a sequence of instructions with

No embedded branches (except at end)
No branch targets (except at beginning)

基本區塊(basic block)是編譯程式執行最佳化的最基本單位，其定義為沒有分支與沒有分支目的的指令，若有分支也只能在區塊的最後一個指令，若有分支目的也只能在區塊的第一個指令。
https://hackmd.io/@HsuChiChen/computer_organization

A compiler identifies basic blocks for optimization

An advanced processor can accelerate execution of basic blocks

2.8 Supporting Procedures in Computer Hardware

Procedures

A procedure (or function)

Is a stored subroutine that performs a specific task based on the parameters with which it is provided
Is one tool programmers use to structure programs, both to make them easier to understand and to allow code to be reused(使程式易懂，以及可重複利用)
Is one way to implement abstraction in software

Six steps required for invoking a procedural call

Place parameters in registers x10 to x17
- 把非保留的暫存器中的資料，儲存進記憶體中
Transfer control to procedure (with branch inst.)
- 把控制交給procedure
Acquire storage for procedure (spill registers)
- 獲取procedure的儲存空間
Perform procedure’s operations
Place result in register for caller (restore registers as well)
- 計算結果放進register
Return to place of call (return address in x1 and jump back)

Procedure Calls

RISC-V software follows the following procedure calling convention in allocating its 32 registers

x10–x17: eight parameter registers in which to pass parameters or return values
x1: one return address register to return to the point of origin

RISC assembly language has the two instructions for procedures

jal (jump-and-link): It branches to an address and simultaneously saves the address of the following instruction to the destination register rd (the “link” portion)
- 把接下來指令的位址存到rd
jalr (jump-and-link register): it helps return from a procedure; branches to the address stored in the return register x1, which is just what we want
- 跳回r1中的位址

Procedure Calls (Cont’d)

Procedure call w/ jump and link
jal x1, ProcedureLabel

Address of following instruction put in x1
Jumps to target address
Caller is the program that instigates a procedure and provides the necessary parameter values
Program counter (PC) is a register to hold the address of the current instruction being executed
- 掌握現在使用執行中的指令的register
- PC can be seen as instruction address register
The jal instruction actually saves PC + 4 in its designation register (usually x1) to link to the byte address of the following instruction to set up the procedure return
- 因為現在的指令在PC，下一個指令就要+4

Procedure return w/ jump and link register
jalr x0, 0(x1)

Like jal, but jumps to 0 plus the address in x1
Use x0 as rd (x0 cannot be changed)
Can also be used for computed jumps
- e.g., for case/switch statements
Callee is a procedure (ProcedureLabel) that executes a series of stored instructions based on parameters provided by the caller and then returns control to the caller

Compiling a C procdure

C code

int leaf_example(int g, int h, int i, int j) {
    int f;
    f = (g+h)-(i+j);
}

//argument g, ..., j in x10, ..., x13
//f in x20
//temp in x5, x6
//needs to save x5, x6, x20 on stack

RISC-V code

leaf_example:
    addi sp, sp, -12
    sw x5, 8(sp)
    sw x6, 4(sp)
    sw x20, 0(sp)
    add x5, x10, x11
    add x6, x12, x13
    sub x20, x5, x6
    addi x10, x20, 0
    lw x20, 0(sp)
    lw x6, 4(sp)
    lw x5, 8(sp)
    addi sp, sp, 12
    jalr x0, 0(x1)

Stack

It is a data structure organized as a last-in-first-out queue for spilling registers (to preserve the data contents)

Placing data onto the stack is called a push, and removing data from the stack is called a pop

Temporary and Preserved Register

RISC-V sw separates following 19 registers into two groups

to avoid saving/restoring a register whose value isn’t used
which might happen with a temporary register

x5−x7 and x28−x31: temporary registers that are not preserved by the callee (called procedure) on a procedure call
- 這些register不需要被callee保留起來
- The caller does not expect registers x5 and x6 to be preserved across a procedure call (these registers are maintained by caller)
- The example in p. 49 does not follow this arrangement
x8−x9 and x18−x27: saved registers that must be preserved on a procedure call
- 在callee中需要保護這些register
- if used, the callee saves and restores them
- These registers are maintained by callee

Nested Procdures: Leaf and Non-leaf Procedures

Leaf procedures are procedures that do not call other

Only be callee (e.g., leaf_example)

Non-leaf procedures are procedures that do call others

Be both caller and callee (e.g., non_leaf)

Without proper protection of registers, there will be conflicts on their contents

E.g., since non_leaf hasn’t finished its task yet, there is a conflict over the use of registers (e.g., x0, x1, x10)
上面的暫存器會被覆蓋

For a nested call, caller needs to save on the stack

Its return address (x1)
Any arguments and temporaries needed after the call
Restore from the stack after the call

void root() {
    ...
    non_leaf(10); //jal x1, non_leaf (x10=10)
    // jalr x0, 0(x1)
    ...
}

void non_leaf(int x){
    int a=0;
    a = leaf_example(4, x, 2, 1); //jal x1, leaf_example(x10=4, x11=x, x12=2, x13=1)
    //jalr x0, 0(x1)
}

int leaf_example(int g, int h, int i, int j) {
    int f;
    f=(g+h)-(i+j);
    return f;
}

A Non-leaf Example

C code

int fact(int n) {
    if(n<1)
        return f;
    else
        return n*fact(n-1);
}
//argument n in x10
//result in x10

RISC-V code

fact:
    addi sp, sp, -8
    sw x1, 4(sp)
    sw x10, 0(sp)
    
    //if statement
    addi x5, x10, -1    //x5 = n-1
    bge x5, x0, L1    //if n>=1, go to L1
    
    addi x10, x0, 1    //else set return value to 1
    addi sp, sp, 8    //pop stack
    jalr x0, 0(x1)    //return to caller
L1:
   addi x10, x10, -1    //n = n-1
   jal x1, fact    //call fact(n-1)
   
   addi x6, x10, 0    //move result of fact(n-1) to x6
   lw x10, 0(sp)    //restore caller's n
   lw x1, 4(sp)    //restore caller's address
   addi sp, sp, 8    //pop stack
   
   mul x10, x10, x6    //return n*fact(n-1)
   jalr x0, 0(x1)    //return to caller

Local data on the Stack

The stack is also used to store variables

that are local to the procedure (and maintained by callee)
but do not fit in registers
E.g., local arrays, structures, C automatic variables

The stack grows downwards at runtime

SP value might change during the procedure
so references to a local variable in memory might have different offsets depending on where they are in the procedure
making the procedure harder to understand
- 變動的sp會導致procedure變得很難懂

Procedure frame (activation record) (程式框架)

A stack segment contains a procedure’s saved registers and local variables

A frame pointer fp (x8) points to the first word of the frame of a procedure

Solution: fp offers a stable base register for procedure frame for local variable accesses
if there are no local var. on the stack within a procedure
- the compiler will save time by not setting and restoring the fp
if using fp, it is init. using the address in sp on a call
- and sp is restored by fp

Memory Layout

In addition to local variables, global data in a program is also kept in the memory

E.g., static variables and dynamic data structures in C programs

The memory layout shown in figure shows

the RISC-V convention for allocation of memory when running the Linux operating system

Text seg.

Starts at 0000 0000 0040 0000hex
The RISC-V program (machine) code

Static data seg.

Starts at 0000 0000 1000 0000hex
Constants and other static variables
Pointed by gp (global pointer)

Dynamic data (heap; grows upwards)

Structures grow/shrink during their lifetimes, e.g., linked lists
The C functions, malloc and free, manipulates the heap data

Stack (grows downwards)

Automatic storage

2.9 Communicating with People

Representing ASCII Char.

Byte-encoded character sets

American Standard Code for Information Interchange (ASCII)
ASCII: 128 characters (95 graphic, 33 control)
Latin-1: 256 characters (ASCII, +96 more graphic characters)

Unicode: 32-bit character set

Used in Java, C++ wide characters, …
Most of the world’s alphabets, plus symbols
UTF-8, UTF-16: variable-length encodings

Load/Store Data in Different widths

LOAD/STORE byte/halfword/word

Load byte/halfword/word: Sign extend to bits in rd
- lb rd, offset(rs1)
- lh rd, offset(rs1)
- lw rd, offset(rs1)
Load byte/halfword/word unsigned: Zero extend to bits in rd
- lbu rd, offset(rs1)
- lhu rd, offset(rs1)
- lwu rd, offset(rs1)
Store byte/halfword/word: Store rightmost 8/16/32 bits
- sb rs2, offset(rs1)
- sh rs2, offset(rs1)
- sw rs2, offset(rs1)

lbu x12, 0(x10) //read byte from source, placing it in the rightmost 8 bits in x12

sb x12, 0(x11) //takes a byte from the rightmost 8 bits of x12 and writes the byte to x11[0]

A string copy ex.

C code

void strcpy(char x[], char y[]) {
    size_t i;
    i = 0;
    while((x[i]=y[i])!='\0')
        i+1;
}

\\Null-terminated string
\\assume the base addr. for arrays x and y found in x10 and x11

RISC-V code

strcpy:
    addi sp, sp, -4
    sw x10, 0(sp)
    add x19, x0, x0
L1:
    add x5, x19, x10
    lbu x6, 0(x5)
    add x7, x9, x10
    sb x6, 0(x7)
    beq x6, x0, L2
    addi x19, x19, 1
    jal x0, L1
L2:
    lw x19, 0(sp)
    addi sp, sp, 4
    jalr x0, 0(x1)

2.10 RISC-V Addressing for Wide Immediates and Addresses

Most constants are small

12-bit immediate is sufficient
But, sometimes there are bigger immediates (constants)
In this case, lui instruction can be used to set up a 32-bit immediate into a register for further use

lui (load upper immediate)

lui rd, constant
- To copy a 20-bit constant to bits [31:12] of rd, and clear bits [11:0] of rd to 0
The example below loads a 32-bit immediate 0x003D 0500hex into x19 by using a lui and an addi instruction
- 如何製造表示32bit位置的register
  - 沒辦法直接add出來，slli也只能做出2的倍數
  - 透過lui設置[31:12]的bit，再使用addi即可
- For 64-bit, it can extend bit 31 to bits [63:32]

Branch Addressing

Branch instructions specify the contents

Opcode, two registers, offset (to target address)

Most branch targets are near branch

Forward or backward relative to PC
- → PC-relative addressing (in byte)
- Target address = PC + offset
- Target address = PC + immediate * 2(immediate代表half word)
branch instruction的offset的數字介於branch以及branch target是halfword，因為設計者希望支援2 bytes long

SB format

A 7-bit opcode, a 3-bit function code, two 5-bit register
operands (rs1 and rs2), and a 12-bit address immediate
bne x10, x11,
- If x10 != x11, go to location = 0111 1101 0000

Jump Addressing

jal(unconditional jump and link)

jal rd, immed
uses 20-bit immediated for larger range

UJ format

A 7-bit opcode, a 5-bit destination register operand (rd), and a 20-bit address immediate
The link address, which is the address of the instruction following the jal, is written to rd
- 跳躍回來的下一個指令放在rd
jal x0,
*Go to location = 0111 1101 0000

Addressing Mode Summary

Addressing mode

One of several addressing regimes delimited by their varied use of operands and/or addresses(多個尋址之一由其劃定的不同使用運算元和/或地址)

Immediate addressing, where the operand is a constant within the instruction itself
Register addressing, where the operand is a register
Base or displacement addressing, where the operand is at the memory location whose address is the sum of a register and a constant in the instruction PC = Reg + offset (in mem.)
PC-relative addressing, where the branch address is the sum of the PC and a constant in the instruction

Inst. Encoding

find out the exact op. of a given inst.

check the fields: opcode, func3, and (optional) funct7

2.11 Parallelism and Instructions: Synchronization

Parallel Execution and Synchronization

Parallel execution is hard when tasks need to cooperate

E.g., some tasks are writing new values that others must read

Synchronization is used to know when a task is finished writing so that it is safe for another to read

A data race would occur if they don’t synchronize
where the results of the program can change depending on how events happen to occur
同步，用於確認任務是否完成，接下來的讀取是否安全?

Example

Two processors sharing an area of memory
P1 writes, then P2 reads
A data race occurs if P1 and P2 don’t synchronize
- Result depends on order of accesses

Synchronization

Synchronization mechanisms are typically built with userlevel routines (e.g., lock and unlock)

that rely on hardware-supplied synchronization instructions
Lock/unlock can be used straightforwardly to create regions where only a single processor can operate, called a mutual exclusion(互斥),
as well as to implement more complex synchronization mechanisms

Hardware-supplied primitives for read/write memory atomically(原子記憶體的讀取)

Nothing else can interpose itself between the read or the write of the memory location(沒有其他東西可以在記憶體讀取之間插入)

Two possible implementation of the HW primitives

A single instruction → e.g., atomic swap of register and memory
An atomic pair of instructions → the second instruction returns a value showing whether the pair of instructions was executed as if the pair was atomic

Synchronization in RISC-V

In RISC-V, this special pair of instructions include a special load (lr.w) and a special store (sc.w)

if the contents of the memory location specified by the load-reserved are changed before the store-conditional to the same address occurs, then
the store-conditional fails and does not write the value to memory
Check Sec. 2.11, p. 128~131, in the textbook

lr.w (load-reserved word)
lr.w rd,(rs1)

Load from address in rs1 to rd
Place reservation on memory address

sc.w (store-conditional word)
sc.w rd,(rs1),rs2

Store from rs2 to address in rs1
- Succeeds if location not changed since the lr.w
  - Write 0 in rd
- Fails if location is changed
  - Write non-zero value in rd

Ex.

Example 1: Impl. atomic swap (to test/set lock variable)
Swap the contents between [x20] and x23

again:
    lr.w x10,(x20)
    sc.w x11,(x20),x23 // x11 = status (1 means failure)
    bne x11,x0,again // branch if store failed
    addi x23,x10,0 // X23 = loaded value

Example 2: Impl. lock acquire

    addi x12,x0,1 // copy locked value
again: 
    lr.w x10,(x20) // read lock
    bne x10,x0,again // check if it is 0 yet
    sc.w x11,(x20),x12 // attempt to store
    bne x11,x0,again // branch if fails
Unlock:
    sw x0,0(x20) // free lock

2.12 Translating and Starting a Program

C Program Execution Flow

Compiler and Assembler

Compiler translates a high-level language program (e.g., C) into machine instructions

Assembler further converts the instructions into machine code (binary)

Handle pseudoinstructions that are not implemented by the target hardware but found in machine instructions
li x9, 123 // load immediate value 123 into register x9
- RISC-V assembler converts it into the following code (li is acceptable by assembler, but is not implemented in RISC-V machine language)
- addi x9, x0, 123 // register x9 gets register x0 + 123

Assembler

In fact, the assembler turns the assembly language program into an object file

把組合語言的程式轉換成.o檔
which is a combination of code, data, and information needed to place instructions properly in memory
It must determine the addresses corresponding to all labels
A symbol table helps assemblers keep track of labels used in branches and data transfer instructions

An object file contains the six parts:

Header: described contents of object module
Text segment: translated instructions
Static data segment: data allocated for the life of the program
Relocation info: for contents that depend on absolute location of loaded program
Symbol table: global definitions and external references
Debug info: for associating with source code

Linker

It takes all the independently assembled machine
language programs and “stitches” them together

It produces an executable file running on a
computer

Merges segments
Resolve labels (determine their addresses)
Patch location-dependent and external references

NOTE: It could leave location dependencies to be fixed by a relocating loader later(留下定位資訊以方便後續追蹤位置)

Loader

Loader is a system software

that is called by the operating system
to place an object program in main memory so that it is ready to execute

In UNIX systems, the loader follows these steps:

Reads the executable file header to determine size of the text and data segments
Creates an address space large enough for the text and data
Copies the instructions and data from the executable file into memory (also initialize the data)
- Or set page table entries so they can be faulted in
Copies the parameters (if any) to the main program onto the stack
Initializes the processor registers (including sp, fp, gp) and sets the stack pointer to the first free location
Branches to a start-up routine that copies the parameters into the argument registers and calls the main routine of the program
- Copies arguments to x10, … and calls main
- When main returns, the start-up routine terminates the program with an exit syscall

Static vs. Dynamic Linking

Two ways to link to external object files:

Static link: Library routines that are linked to a program during linking

External references are resolved during the linking stage
Bad for upgrade and code size, since the library routines become part of
the executable code

Dynamic link: Library routines that are linked to a program
during execution

Only load and link library the procedure when it is called
Require procedure code to be relocatable
Avoid image bloat caused by static linking of all (transitively) referenced libraries (i.e., smaller size of the built executable)
Automatically pick up new library versions
Check Figure 2.21 for more information

2.13 A C Sort Example to Put It All Together

Instruction count and CPI are not good performance
indicators in isolation

Java/JIT compiled code is significantly faster than
JVM interpreted

2.22 fallacies and pitfall

Powerful instruction => higher performance

Fewer instructions required
But complex instructions are hard to implement
- May slow down all instructions, including simple ones
Compilers are good at making fast code fromsimple instructions

Use assembly code for high performance

But modern compilers are better at dealing with
modern processors
More lines of code  more errors and less
productivity

Backward compatibility(向下相容性) => instruction set doesn’t
change

But they do accrete more instructions

Sequential words are not at sequential addresses

Increment by 4 (for 32-bit architecture), not by 1 !!!

Keeping a pointer to an automatic variable after procedure returns

E.g., passing pointer back via an argument
Pointer becomes invalid when stack popped

2.23 concluding remark

Design principles

Simplicity favors regularity
Smaller is faster
Good design demands good compromises

Make the common case fast

Layers of software/hardware Compiler, assembler, hardware

RISC-V: typical of RISC ISAs ➢c.f. x86

Computer Organization

Computer Organization CH2 Instructions: Language of the Computer

https://z-hwa.github.io/webHome/[object Object]/Computer Organization/Computer-Organization-CH2-Instructions-Language-of-the-Computer/

作者

crown tako

發布於

2024年4月18日

許可協議

Computer Organization CH1 Abstraction 上一篇

Computer Organization CH3 Arithmetic for Computeres 下一篇