TY - GEN
T1 - Parapim
T2 - 24th Asia and South Pacific Design Automation Conference, ASPDAC 2019
AU - Angizi, Shaahin
AU - He, Zhezhi
AU - Fan, Deliang
N1 - Publisher Copyright: © 2019 Association for Computing Machinery.
PY - 2019/1/21
Y1 - 2019/1/21
N2 - Recent algorithmic progression has brought competitive classification accuracy despite constraining neural networks to binary weights (+1/-1). These findings show remarkable optimization opportunities to eliminate the need for computationally-intensive multiplications, reducing memory access and storage. In this paper, we present ParaPIM architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) sub-arrays to massively parallel computational units capable of running inferences for Binary-Weight Deep Neural Networks (BWNNs). ParaPIM's in-situ computing architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers, accelerate BWNNs inference, eliminate unnecessary off-chip accesses and provide ultra-high internal bandwidth. The device-to-architecture co-simulation results indicate ∼4× higher energy efficiency and 7.3× speedup over recent processing-in-DRAM acceleration, or roughly 5× higher energy-efficiency and 20.5× speedup over recent ASIC approaches, while maintaining inference accuracy comparable to baseline designs.
AB - Recent algorithmic progression has brought competitive classification accuracy despite constraining neural networks to binary weights (+1/-1). These findings show remarkable optimization opportunities to eliminate the need for computationally-intensive multiplications, reducing memory access and storage. In this paper, we present ParaPIM architecture, which transforms current Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) sub-arrays to massively parallel computational units capable of running inferences for Binary-Weight Deep Neural Networks (BWNNs). ParaPIM's in-situ computing architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers, accelerate BWNNs inference, eliminate unnecessary off-chip accesses and provide ultra-high internal bandwidth. The device-to-architecture co-simulation results indicate ∼4× higher energy efficiency and 7.3× speedup over recent processing-in-DRAM acceleration, or roughly 5× higher energy-efficiency and 20.5× speedup over recent ASIC approaches, while maintaining inference accuracy comparable to baseline designs.
UR - http://www.scopus.com/inward/record.url?scp=85061117233&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061117233&partnerID=8YFLogxK
U2 - 10.1145/3287624.3287644
DO - 10.1145/3287624.3287644
M3 - Conference contribution
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 127
EP - 132
BT - ASP-DAC 2019 - 24th Asia and South Pacific Design Automation Conference
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 January 2019 through 24 January 2019
ER -