Part A:Writing a Cache Simulator

说明

• For this lab, we are interested only in data cache performance, so your simulator should ignore all
instruction cache accesses (lines starting with “I”).

• For this this lab, you should assume that memory accesses are aligned properly, such that a single
memory access never crosses block boundaries. By making this assumption, you can ignore the
request sizes in the valgrind traces.

算法

• Store
• Modify

• The data modify operation (M) is treated as a load followed by a store to the same address

• 第一轮迭代，计算最大的LRU，如果$(\text{cache[index_][i].valid_bit} == 1) \&\& (\text{cache[index_][i].tag == tag})$则hit，同时找到$\text{cache[index_][i].valid_bit} == 0$的位置
• 如果没有hit，输出miss
• 如果存在$\text{cache[index_][i].valid_bit} == 0$的位置，则直接保存结果
• 否则再次遍历，找到最大LRU对应的位置，保存结果
• 输出evict

Part B: Optimizing Matrix Transpose

$64\times 64$

