ED-1-4

Design of A Small Area Binary Neural Processing Unit Using Single Flux Quantum Circuits with Time-Domain Analog and Digital Mixed-Signal

11:15-11:30 28/11/2023

*Zeyu Han1, Zongyuan Li1, Yuki Yamanashi1,2, Nobuyuki Yoshikawa1,2
1. Department of Electrical and Computer Engineering, Yokohama National University, Japan.
2. Institute of Advanced Sciences, Yokohama National University, Japan.
Abstract Body

The binary convolutional neural network (BCNN) can replace the matrix multiplication operation with XNOR operations and bit count operations by binarizing both weights and activations, which is more hardware friendly [1]. On the other hand, Single flux quantum (SFQ) circuits are known for their low power consumption and high-speed operation [2] which are the candidate to improve CNN performance. For SFQ-based BCNN circuits, although binary convolutional circuits [3] have been designed, the total area of the chip is limited in SFQ circuits because of the limitations of the SFQ process. Despite the use of XNOR instead of multiplication in BCNN, the cost of the accumulation operation is still very high, which makes it difficult to implement a complete SFQ neural network on a chip. For example, designing a 3x3 convolutional operation using conventional SFQ logic requires 3270 Josephson junctions (JJs) [3].

To reduce the convolutional circuit area, we proposed a novel binary neural processing unit based on time-domain logic. Instead of the conventional SFQ logic that uses the presence or absence of pulses to represent logic 1 or 0. Time-domain logic uses the time of pulses to represent data which is an analog and digital mixed-Signal [4]. In the proposed binary neural processing unit, logical transmission is generally carried out by two transmission lines. One transmission line serves as the reference time signal with a fixed delay. The other transmission line has an adjustable delay, and depending on the results of the XNOR determine whether the delay is faster or slower than the reference time signal. By connecting these two logical transmission lines of multiple processing units, the result of XNOR controls the delay adjustable transmission line, and finally, by judging the time difference of signal between the delay adjustable transmission line and the reference transmission line, we can get the accumulation result of XNOR, thus completing the convolutional operation. For the proposed binary neural processing unit, since the accumulation operation needs to be done in one clock cycle, it is unsuitable for conventional neural network architectures such as using the systolic array. Therefore, we propose a dedicated structure to reduce memory access in applications.

The layout of 3x3 convolutional circuits realized by the proposed binary neural processing unit is designed for AIST ADP2 process with the critical current density of 10 kA/cm2 for the simulation purpose. The too-high operating frequency in time domain logic can lead to a situation where faster signals catch up with slower signals in the transmission line. Therefore, the operating frequency is inversely proportional to the number of processing units on the transmission line. Simulation results show that the maximum operating frequency is above 10 GHz in the 3x3 convolutional circuits. The number of JJs is 56% less and the area is about 60% smaller than the conventional SFQ logic.

References

[1] M. Courbariaux et al., arXiv preprint, arXiv:1602.02830, 2016.
[2] K. K. Likharev and V. K. Semenov., IEEE Trans. Appl. Supercond., vol. 1, no. 1, pp. 3–28, Mar.1991.
[3] Z. Li et al., IEEE Trans. Appl. Supercond., vol. 32, no. 4, pp. 1-5, Jun. 2022.
[4] D. Miyashita et al., IEEE Journal of Solid-State Circuits, vol. 52, no. 10, pp. 2679-2689, Oct. 2017.

Acknowledgment

This work was supported by JSPS KAKENHI Grant Number JP22H01542.