MOL-based In-Memory Computing of Binary Neural Networks

Abstract

Convolutional neural networks (CNN) have proven very effective in a variety of practical applications involving Artificial Intelligence (AI). However, the layer depth of CNN deepens as user applications become more sophisticated, resulting in a huge number of operations and increased memory size. The massive amount of the produced intermediate data leads to intensive data movement between memory and computing cores causing a real bottleneck. In-Memory Computing (IMC) aims to address this bottleneck by directly computing inside memory, eliminating energy-intensive and time-consuming data movement. On the other hand, the emerging Binary Neural Networks (BNN), which is a special case of CNN, shows a number of hardware-friendly properties including memory saving. In BNN, the costly floating-point multiply-and-accumulate is replaced with lightweight bit-wise XNOR and popcount operations. In this paper, we propose an IMC programmable architecture targeting efficient implementation of BNN. Computational memories based on the recently introduced Memristor Overwrite Logic (MOL) design style are employed. The architecture, which is presented in semi-parallel and parallel models, efficiently executes the advanced quantization algorithm of XNOR-Net BNN. Performance evaluation based on CIFAR-10 dataset demonstrates between 1.24× to 3× speedup, and 49% to 99% energy saving compared to state-of-the-art implementations, and up to 273 image/sec/Watt throughput efficiency.

Publication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (IEEE)