九州大学2022年夏计算机组成
九州大学2022年计算机组成答案 by 偷偷
催更|辅导|私塾兼职|联系偷偷:LifeGoesOn_Rio
[Q2] Let us consider an in-order microprocessor that has 5-stage pipelined datapath. The implemented pipeline stage are IF, ID, EX, MEM, and WB. The operations in each stage for the “add” are defined in the following table. Assume that the operations in each pipeline stage can be completed in one clock cycle. The written data in the WB stage of an instruction can be read in the ID stage of a subsequent instruction in the same clock cycle. All RAW(Read-After-Write) hazards are resolved by applying a pipeline stall mechanism. Answer the following questions.

(1) Consider the following assembly program. The words on the right of the ‘#’ symbol in each line are comments. Identify all flow dependencies by describing which instruction depends on which instructions through which register.
add $1, $3, $5 # <1>
add $9, $2, $3 # <2>
add $6, $3, $3 # <3>
add $3, $4, $3 # <4>
add $4, $7, $1 # <5>
add $5, $7, $4 # <6>
add $9, $3, $6 # <7>
add $2, $7, $6 # <8>
Hint: 流依赖(Flow Dependency),也称为Read-After-Write (RAW) 依赖,是指一条指令需要读取一个由前一条指令写入的寄存器值。
从上到下依次分析写入的寄存器,可以得到存在的Flow Dependencies 有:
- <5> depends on <1> through $1
- <7> depends on <3> through $6
- <8> depends on <3> through $6
- <7> depends on <4> through $3
- <6> depends on <5> through $4
(2) Assume the instruction issue width is one. Answer the number of clock cycles required for the execution of the assembly program.
Hint: 一条指令在写回(WB)阶段写入的数据,可以在同一个时钟周期内被随后的指令在指令解码(ID)阶段读取。
We can draw the pipeline stages below.
According to the pipeline stages, it requires 14 clock cycles for the execution of the assembly program.
(3) We extend the datapath by increasing the instruction issue width from one to two that forms an in-order superscalar microprocessor, i.e., at most, two independent instructions can be executed in parallel. Answer the number of clock cycles required for the execution of the assembly program on the extended datapath.
We can draw the pipeline stages below.
According to the pipeline stages, it requires 12 clock cycles for the execution of the assembly program.
(4) The data path extension presented in (3) causes a 5% decrease in its clock frequency. Answer the performance improvement rate achieved by implementing the extension.
We know that,
$$\text{CPU Time} = IC \times CPI \times \frac{1}{f}$$
When extending the datapath, IC reimains the same, CPI improves $\frac{14}{12}$ and the frequency decreases 5%. The improvement $K$ is:
$$ K = \frac{\text{CPU Time}}{\text{CPU Time’}}= \frac{14 \times 0.95}{12} \approx \text{111%} $$
[Q3] Explain what “compulsory misses”, “conflict misses”, and what “capacity misses” are in cache memory systems, respectively.
Hint: 直接映射缺失的3C模型
Compulsory Misses: Misses that occur when data is accessed for the first time.
Conflict Misses: Misses that occur when different data blocks are mapped to the same cache line position causing frequent replacements, resulting in misses even if the cache capacity is sufficient.
Capacity Misses: Due to limited cache capacityeven if all blocks in the cache are fully utilized, it still cannot accommodate all the needed data blocks, resulting in misses.