TY - GEN
T1 - A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches
AU - Kuan, Kyle
AU - Adegbija, Tosiron
N1 - Funding Information: V. CONCLUSIONS AND FUTURE WORK In this paper, we studied prefetching in reduced retention STTRAM L1 caches. We showed that using expired unused prefetches, and practically, tracking changes in expired prefetches (expiredPF) with respect to total prefetches (allPF), we could provide an accurate description of the best retention with regards to energy consumption and derive insights into the best prefetch distance. Based on these insights, we proposed prefetch-aware retention time tuning (PART) and retention time based prefetch control (RPC) to predict the best retention time and the best prefetch distance during runtime. Experiments show that PART+RPC can reduce the average cache energy and latency by 22.24% and 24.59%, respectively, compared to a base architecture, and by 3.50% and 3.59%, respectively, compared to prior work, while reducing the implementation hardware overheads by 54.55%. For future work, we plan to explore the implications of PART on shared lower level caches and in the presence of workload variations. ACKNOWLEDGEMENT This work was supported in part by the National Science Foundation under grant CNS-1844952. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Publisher Copyright: © 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - Spin- Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired-unused-prefetches? the number of unused prefetches expired by the reduced retention time STTRAM cache-can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24 % and 24.59 %, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50 % and 3.59 %, respectively, and reduced the hardware overhead by 54.55 %.
AB - Spin- Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired-unused-prefetches? the number of unused prefetches expired by the reduced retention time STTRAM cache-can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24 % and 24.59 %, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50 % and 3.59 %, respectively, and reduced the hardware overhead by 54.55 %.
KW - Spin Transfer Torque RAM; STTRAM; prefetcher; stride prefetcher; L1 cache; prefetching; GEM5; SPEC CPU 2017
UR - http://www.scopus.com/inward/record.url?scp=85098890084&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098890084&partnerID=8YFLogxK
U2 - 10.1109/ICCD50377.2020.00051
DO - 10.1109/ICCD50377.2020.00051
M3 - Conference contribution
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 247
EP - 254
BT - Proceedings - 2020 IEEE 38th International Conference on Computer Design, ICCD 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 38th IEEE International Conference on Computer Design, ICCD 2020
Y2 - 18 October 2020 through 21 October 2020
ER -