EXPERTISE: An Effective Software-level Redundant Multithreading Scheme against Hardware Faults

Hwisoo So, Moslem Didehban, Yohan Ko, Aviral Shrivastava, Kyoungwoo Lee

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Error resilience is the primary design concern for safety- and mission-critical applications. Redundant MultiThreading (RMT) is one of the most promising soft and hard error resilience strategies because it does not require additional hardware modification. While the state-of-the-art software RMT scheme can achieve a high degree of error protection, our detailed investigation revealed that it suffers from performance overhead and insufficient fault coverage. This paper proposes EXPERTISE, a compiler-level RMT scheme that can detect the manifestation of hardware faults in all processor components. EXPERTISE transformation generates a checker-thread for the main execution thread. These redundant threads are executed simultaneously on two physically different cores of a multicore processor and perform almost the same computations. After each memory write operation is committed by the main-thread, the checker-thread loads back the written data from the memory and checks it against its own locally computed values. If they match, the execution continues. Otherwise, the error flag is raised. In order to evaluate the effectiveness of the proposed solution, we performed soft and hard error injection experiments on all the different hardware components of an ARM Cortex53-like μ-architecturally simulated microprocessor. Based on statistical fault injection campaigns, we have found that EXPERTISE provides 188× better fault coverage with 27% faster performance as compared to the state-of-the-art scheme.

Original languageEnglish (US)
Article number3546073
JournalACM Transactions on Architecture and Code Optimization
Volume19
Issue number4
DOIs
StatePublished - Sep 16 2022

Keywords

  • Soft error
  • redundant multithreading
  • transient fault

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'EXPERTISE: An Effective Software-level Redundant Multithreading Scheme against Hardware Faults'. Together they form a unique fingerprint.

Cite this