九州大学2021年夏计算机组成
九州大学2021年计算机组成答案 by 偷偷
催更|辅导|私塾兼职|联系偷偷:LifeGoesOn_Rio
[Q2] Let us consider a microprocessor that has a 5-stage pipelined datapath. The implemented pipeline stages are IF (Instruction Fetch), ID (Instruction Decode), EX (EXecution),MEM (MEMory access), and WB (Write Back). Answer the following questions.
(1) The latency of pipeline stages, IF, ID, EX, MEM, and WB, is 240 ps, 400 ps, 200ps, 250 ps, and 180 ps, respectively. Answer the maximum clock frequency of this datapath (unit is GHz).
Hint: The clock period is determined by the slowest (longest latency) pipeline stage.
The slowest stage is ID with a latency of 400 ps. The clock period must be at least as long as this latency to ensure that the slowest stage completes in one cycle.
$$
\text{Clock Frequency} = \frac{1}{\text{Clock Period}} = \frac{1}{400 \ \text{ps}} = \frac{1}{400 \times 10^{-12}} = 2.5\ \text{GHz}
$$
(2) Consider increasing the number of pipeline stages from 5 to 6 by partitioning a single pipeline stage, IF, ID, EX, MEM, or WB, to two stages. Assume the latency of the partitioned pipeline stages is half of that of the original stage. Choose one pipeline stage that should be partitioned for maximising the clock frequency of the datapath, and answer the maximum clock frequency we can achieve by this design optimization (unit is GHz).
To get the maximum clock frequency, we need to minimize the clock period, thus the best stage to partition for maximizing the clock frequency is the ID stage.
- By splitting the ID stage, its latency is halved to 200 ps.
- The MEM stage becomes the longest (250 ps), so the clock period is now determined by MEM.
$$
\text{Clock Frequency} = \frac{1}{\text{Clock Period}} = \frac{1}{250 \ \text{ps}} = \frac{1}{250 \times 10^{-12}} = 4 \ \text{GHz}
$$
(3) For the execution of a program, the pipeline stage partitioning presented in (2)caused 10 % increase in CPI (Clock cycles Per Instruction). Assume there are no negative effects caused by the pipeline stage partitioning except for the CPI increase.Answer the performance improvement rate achieved by applying the pipeline stage partitioning.
- Before:
$$\text{CPU Time} = \frac {IC \cdot CPI}{f} = \frac{IC \cdot CPI} {2.5 \ \text{GHz}}$$
- After partitioning:
$$ \text{CPU Time’} = \frac {IC \cdot CPI \cdot \text{110%} }{f’} = \frac{IC \cdot CPI \cdot \text{110%}}{4 \ \text{GHz}} $$
- The improvement rate K is:
$$ K = \frac{\text{CPU Time}}{\text{CPU Time’}} \approx \text{145%} $$
(4) Explain the advantages and disadvantages of increasing the number of pipeline stages.
Advantages:
- Higher Clock Frequency: each stage performs a smaller part of the instruction cycle. This can reduce the latency of each stage, allowing the processor to run at a higher clock frequency.
- Improved Instruction Throughput: With more pipeline stages, instructions can be processed more concurrently. This increases the potential instruction throughput because multiple instructions can be in different stages of execution at the same time.
Disadvantages:
- Increased Latency per Instruction: With more stages being executed in the pipeline, even though each individual stage is shorter, an instruction still takes more cycles to pass through all the stages which leads to the increasing of CPI
- More Complex Control Logic: As the number of pipeline stages increases, the control logic required to manage the pipeline also becomes more complex, which may results in pipeline hazards
[Q3] Consider a direct-mapped cache memory that accepts a 32-bit memory address consisting of a cache tag field, a cache index field, and a cache block offset field. Suppose a byte addressing scheme, the word size is 4 bytes, the cache size is 16 Kilo-bytes, and the cache block size is 32 bytes. Answer the bit-width of the cache tag field.
$32 \ bytes = 2^5 \ bytes$, the block offset is $5 \ bits$
$\text{number of cache blocks} = \frac{16KByte}{32Byte} = 2^9$, the cache index requires $9 \ bits$
$$ Cache \ tag \ width = 32 − Index bits − Offset bits = 32 - 9 - 5 = 18\ bits $$