Publication:
Analysis and Mitigation of Radiation Effects in SRAM-based Register Files
Date
2025
Proceedings Paper
Loading...
Journal
2025 IEEE EUROPEAN TEST SYMPOSIUM, ETS
Abstract
High-performance computing (HPC) systems, including those used for artificial intelligence (AI) training and inference, such as graphics processing units (GPUs), increasingly rely on frequent access to register files to support efficient parallel processing. Static random-access memory (SRAM)-based register files—optimized for power, performance, and area (PPA)—are fundamental to these architectures. However, their susceptibility to radiation-induced soft errors remains a critical reliability concern, especially in radiation-prone environments. While SRAM caches are primarily vulnerable during data retention, register files are uniquely challenged due to their frequent read operations, where the read duration is comparable to the hold time, making both phases susceptible to radiation-induced soft errors. This paper explores soft errors during SRAM-based register file reads, a relatively underexamined phenomenon. Key contributing factors such as particle energy are analyzed, and mitigation strategies are proposed at both the circuit and architecture levels. Our findings offer insights to enhance the reliability of widely used HPC systems in radiation-prone environments.