In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar

Cai, Jack, Kaleem, Muhammad Ahsan, Genov, Roman, Azghadi, Mostafa Rahimi, and Amirsoleimani, Amirali (2024) In-Memory Transformer Self-Attention Mechanism Using Passive Memristor Crossbar. In: Proceedings of the IEEE International Symposium on Circuits and Systems. From: ISCAS 2024: IEEE International Symposium on Circuits and Systems, 19-22 May 2024, Singapore.

PDF (Published Version) - Published Version
Restricted to Repository staff only

DOI: 10.1109/ISCAS58744.2024.10558182

View at Publisher Website: https://doi.org/10.1109/ISCAS58744.2024....

Abstract

Transformers have emerged as the state-of-the-art architecture for natural language processing (NLP) and computer vision. However, they are inefficient in both conventional and in-memory computing architectures as doubling their sequence length quadruples their time and memory complexity due to their self-attention mechanism. Traditional methods optimize self-attention using memory-efficient algorithms or approximate methods, such as locality-sensitive hashing (LSH) attention that reduces time and memory complexity from O(L<sup>2</sup>) to O(L log L). In this work, we propose a hardware-level solution that further improves the computational efficiency of LSH attention by utilizing in-memory computing with semi-passive memristor arrays. We demonstrate that LSH can be performed with low-resolution, energy-efficient 0T1R arrays performing stochastic memristive vector-matrix multiplication (VMM). Using circuit-level simulation, we show our proposed method is feasible as a drop-in approximation in Large Language Models (LLMs) with no degradation in evaluation metrics. Our results set the foundation for future works on computing the entire transformer architecture in-memory.


Item ID:	87423
Item Type:	Conference Item (Research - E1)
ISBN:	9798350330991
ISSN:	0271-4310
Keywords:	Backpropagation, In-Memory, Memristor, Neural Network Training, Self-Attention, Transformer
Copyright Information:	© 2024 IEEE
Date Deposited:	26 Nov 2025 23:49
FoR Codes:	40 ENGINEERING > 4008 Electrical engineering > 400801 Circuits and systems @ 70% 46 INFORMATION AND COMPUTING SCIENCES > 4611 Machine learning > 461104 Neural networks @ 30%
SEO Codes:	28 EXPANDING KNOWLEDGE > 2801 Expanding knowledge > 280110 Expanding knowledge in engineering @ 100%
	More Statistics

Actions (Repository Staff Only)

Item Control Page