

This is the author-created version of the following work:

## **Lammie, Corey, Hamilton, Tara Julia, van Schaik, André, and Rahimi Azghadi, Mostafa (2019)** *Efficient FPGA implementations of pair and triplet-based STDP for neuromorphic architectures.* **IEEE Transactions on Circuits and Systems I: Regular Papers, 66 (4) pp. 1558-1570.**

Access to this file is available from: https://researchonline.jcu.edu.au/56547/

Copyright © 2018 IEEE.

Please refer to the original source for the final version of this work: https://doi.org/10.1109/TCSI.2018.2881753

# Efficient FPGA Implementations of Pair and Triplet-based STDP for Neuromorphic Architectures

Corey Lammie, *Student Member, IEEE*, Tara Julia Hamilton, *Member, IEEE*, André van Schaik, *Fellow, IEEE*, and Mostafa Rahimi Azghadi, *Member, IEEE*

*Abstract*—Synaptic plasticity is envisioned to bring about learning and memory in the brain. Various plasticity rules have been proposed, among which Spike-Timing Dependent Plasticity (STDP) has gained the highest interest across various neural disciplines, including neuromorphic engineering. Here, we propose highly efficient digital implementations of pair-based STDP (PSTDP) and Triplet-based STDP (TSTDP) on Field Programmable Gate Arrays (FPGA) that do not require dedicated floating-point multipliers, hence need minimal hardware resources. The implementations are verified by using them to replicate a set of complex experimental data, including those from pair, triplet, quadruplet, frequency-dependent pairing, as well as Bienenstock-Cooper-Munro (BCM) experiments. We demonstrate that the proposed TSTDP design has a higher operating frequency that leads to 2.46 times faster weight adaptation (learning), and achieves 11.55 folds improvement in resource usage, compared to a recent implementation of a calcium-based plasticity rule capable of exhibiting similar learning performance. In addition, we show that the proposed PSTDP and TSTDP designs respectively consume 2.38 and 1.78 times less resources than the most efficient PSTDP implementation in the literature. As a direct result of the efficiency and powerful synaptic capabilities of the proposed learning modules, they could be integrated in large-scale digital neuromorphic architectures to enable high-performance STDP learning.

*Index Terms*—STDP, Neuromorphic Engineering, Hebbian Learning, FPGA, Synaptic Plasticity.

#### I. INTRODUCTION

RESEARCH into the hardware realization of Spiking Neural<br>Networks (SNNs) is becoming increasingly popular to Networks (SNNs) is becoming increasingly popular to inspire the development of efficient brain-inspired computing platforms [1]–[5]. Existing digital implementations of SNNs utilize various well developed digital models of neurons such as the Izhikevich model [6], [7], which are able to emulate biophysical neural behaviour accurately while consuming minimal physical resources. However, these implementations tend to be limited in resemblance to real biological systems as they use simplified variants of Hebbian learning such as pair-based STDP (PSTDP) [8], [9].

It has been demonstrated that PSTDP, which takes into account the timing of a pair of pre- and post-synaptic action potentials to induce plasticity, fails to reproduce the outcome

A. van Schaik is with The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, NSW 2751, Australia (email:a.vanschaik@westernsydney.edu.au).



Fig. 1: STDP alters the weight of a synapse connecting a pre- to a post-synaptic neuron. [A] A pre-synaptic neuron is connected and sends a spike train to an excitatory synapse, which also receives a post-synaptic spike train and alters its efficacy depending on the spike timing of the pre and post neurons. [B] STDP modelling considers Pre- and post-synaptic spikes leaving potential traces,  $r_1$ ,  $r_2$ ,  $o_1$ , and  $o_2$ . These traces and their time constants govern changes in synaptic weight according to PSTDP and TSTDP functions shown in Equations 3-4 and 7-8, respectively.

of a number of essential plasticity experiments [10]. Furthermore, PSTDP is unable to induce pattern selectivity driven by third order correlations among input spike trains [11]. To address these shortcomings, the PSTDP rule was extended to account for interactions among triplets of spikes, giving rise to Triplet-based STDP (TSTDP). When implementing digital SNNs, the PSTDP learning approach may inhibit the learning performance of these networks. Therefore, the implementation and use of more powerful plasticity techniques such as TSTDP or calcium-based plasticity algorithms [12] will be beneficial.

TSTDP possesses the capability to replicate a group of biological experiments performed in both hippocampal cultures [13] and the visual cortex [14] regions of the brain [10]. These experiments involve higher order interactions between

Corey Lammie and M. Rahimi Azghadi are with the College of Science and Engineering, James Cook University, Townsville, QLD 4814, Australia (e-mail: {corey.lammie, mostafa.rahimiazghadi}@jcu.edu.au).

T.J. Hamilton is with the School of Engineering at Macquarie University, NSW 2109, Australia (e-mail: tara.hamilton@mq.edu.au).

pre- and post-synaptic spikes and have been demonstrated to be linked to the brain's ability to induce pattern selectivity driven by third order correlations [11]. In addition, the TSTDP rule, which is a timing-based learning algorithm, is shown to elicit plasticity behaviours similar to the rate-based Bienenstock-Cooper-Munro (BCM) learning rule [11], [15], [16] as an emerging feature. This further enhances the synaptic capabilities of the triplet rule and makes it appealing for use in SNNs.

TSTDP learning has so far been realized in software using conventional computers [10], and in hardware using techniques such as Very Large Scale Integration (VLSI), floating gate, and memristive technologies [17]–[21]. Software implementations are useful to replicate the outcome of various plasticity experiments in large-scale neural architectures, however, they require to run on supercomputers with huge size, weight, and power consumption. On the other hand, modular digital implementations could be used to replicate plasticity experiments in hardware accelerated large-scale neuromorphic architectures with significantly less power requirement and smaller size amenable to portable devices and systems.

While analog, mixed-signal, memristive and floating gate implementations of synaptic plasticity algorithms may occupy less silicon area, result in lower power consumption, and are designed to have very high performance [17]–[22] they are inflexible and cannot be reprogrammed to implement different learning algorithms. On the other hand, digital FPGA designs provide high flexibility and reprogrammability, which make them suitable for exploring various learning algorithms and network architectures. These ideal characteristics are conveniently facilitated by recent advances in high level design description languages and synthesis tools. Moreover, compared to other technologies, digital designs are robust against manufacturing and environmental variances, and do not require calibration post fabrication. Furthermore, digital designs, such as the work presented here, can be carefully optimized toward implementing a variety of learning algorithms to be comparable to ASIC and analog designs in terms of power, area, and performance.

Only one recent study has claimed the implementation of TSTDP using digital hardware adopting Piece Wise Linear (PWL) approximation techniques [23]. However, this implementation does not demonstrate the replication of any experimental data involving triplet or quadruplet sets of spikes, as presented in [10].

This paper introduces the first digital realization of TSTDP without dedicated floating-point multipliers, which are known to consume large area and require a long operation time when implemented on FPGAs. Replacing such dedicated floatingpoint multipliers with low resolution unsigned shift-and-add multipliers significantly reduces the proposed design resource consumption and makes it highly efficient.

The paper also demonstrates the capability of the proposed TSTDP implementation in successfully replicating many biological experiments involving high-order spike interactions, similar to the computational TSTDP learning algorithm proposed in [10]. Furthermore, the paper shows that the performance, speed, and resource usage of the developed synaptic module is significantly improved compared to digital state-ofthe-art synaptic plasticity [12], [23]–[25].

Not only does the proposed digital circuit show strong synaptic plasticity (learning) capabilities for reproducing complex experimental data [10], it surpasses its counterparts in terms of hardware usage and speed. These features make the implemented design useful for the development of large-scale neuromorphic architectures with extended synaptic plasticity capabilities.

This paper is structured as follows. The PSTDP and TSTDP learning algorithms are presented in Section II. The proposed software and digital hardware implementations are described in Sections III and IV. Section V presents the synaptic plasticity experimental results. Section VI compares the proposed hardware to previous works and discusses its significance. Finally, Section VII provides concluding remarks.

## II. STDP LEARNING RULES

STDP is a Hebbian learning rule, which induces synaptic plasticity as a result of the exact time difference between the spikes of pre- and post-synaptic neurons, as shown in Fig. 1. As a result of STDP, Long Term Potentiation (LTP) or Long Term Depression (LTD) of synapses can happen, which are believed to account for learning and memory. There exist several variants of STDP rule in the literature including Pairbased and Triplet-based STDP [26].

#### *A. Pair-based STDP*

Pair-based STDP expresses the change in synaptic weight, ∆*w*, within a given synapse by updating and considering the state of a pair of parameters,  $r_1$  and  $o_1$ . These parameters update iteratively at each computation step using a set of differential equations shown as Equations 1 and 2.

$$
\frac{dr_1(t)}{dt} = -\frac{r_1(t)}{\tau_+}, \text{ if } t = t_{pre}, \text{ then } r_1 \to 1
$$
 (1)

$$
\frac{d\sigma_1(t)}{dt} = -\frac{\sigma_1(t)}{\tau_-}, \text{ if } t = t_{post}, \text{ then } \sigma_1 \to 1. \tag{2}
$$

Here,  $r_1$  and  $o_1$  determine exponentially decaying potentiation and depression potentials starting at the time of a pre-synaptic or a post-synaptic event, respectively. The synaptic weight is then updated at the time of each pre- or post-synaptic spike based on Equations 3 and 4.

$$
w(t) \to w(t) - A_2^- o_1(t), \text{ if } t = t_{pre},
$$
 (3)

$$
w(t) \to w(t) + A_2^+ r_1(t), \text{ if } t = t_{post}.
$$
 (4)

Here,  $A_2^-$  and  $A_2^+$  represent the maximum weight change amplitude, while  $\tau_+$  and  $\tau_-$  denote time constants for the decay of potentiation  $(r_1)$  or depression  $(o_1)$  potentials, respectively [10]. The change in synaptic weight as a result of the PSTDP algorithm and the effect of  $r_1$  and  $o_1$  is shown in Fig. 1, in blue.

## *B. Triplet-based STDP*

Triplet-based STDP (TSTDP) is an extension of PSTDP, proposed by Pfister and Gerstner [10]. The total TSTDP weight change,  $\frac{dw}{dt}$ , within a synapse over a time period dt, is modeled using the two PSTDP parameters  $r_1$  and  $o_1$ , as well as two additional parameters  $r_2$  and  $o_2$ . These parameters are updated iteratively at each time step using a set of differential equations as shown in Equations 5 and 6.

$$
\frac{dr_2(t)}{dt} = -\frac{r_2(t)}{\tau_x}, \text{ if } t = t_{pre}, \text{ then } r_2 \to 1
$$
 (5)

$$
\frac{d\sigma_2(t)}{dt} = -\frac{\sigma_2(t)}{\tau_y}, \text{ if } t = t_{post}, \text{ then } \sigma_2 \to 1. \tag{6}
$$

Here,  $r_2$  and  $o_2$  are additional decaying potentials in response to the arrival of a pre- or post-synaptic event, respectively. Furthermore, parameters  $\tau_x$  and  $\tau_y$  are time constants associated with the decaying potential created after a pre- or post-synaptic spike [10]. By introducing these new potentials, Equations 3 and 4 are expanded to Equations 7 and 8.

$$
w(t) \to w(t) - o_1(t)[A_2^- + A_3^- r_2(t - \varepsilon)] \text{ if } t = t_{pre}
$$
 (7)

$$
w(t) \to w(t) + r_1(t)[A_2^+ + A_3^+ o_2(t - \varepsilon)] \text{ if } t = t_{post}. \tag{8}
$$

Here,  $A_3^-$  and  $A_3^+$  are weight change amplitude constants (similar to  $A_2^-$  and  $A_2^+$ ), while  $\varepsilon$  is a small positive constant to ensure the weight change occurs before the update of the triplet synaptic potentials, i.e.  $r_2$  and  $o_2$ .

As shown in [10], the TSTDP algorithm significantly improves the capability of a synaptic device to closely replicate weight changes measured in several experiments including those involving triplet, quadruplet, and frequency-dependent pairs of spikes. However, [10] demonstrates that due to the simplistic accumulative nature of the PSTDP rule, it is not able to capture the nonlinearity observed in the mentioned experiments. In addition, it is shown that in contrast to PSTDP, TSTDP brings about sensitivity to high-order spatiotemporal correlations among natural stimuli as measured in the brain. This in turn gives rise to speed and direction selectivity, as demonstrated in TSTDP computational modeling presented in [11]. Due to these significant features of TSTDP, here we investigate and propose an efficient implementation of this highperformance and biologically plausible learning algorithm.

#### III. SOFTWARE STDP IMPLEMENTATION

Prior to realizing our digital STDP implementations, all presented architectures were functionally verified using software emulations. In addition, software was used to perform boundary parameter optimization, and to investigate the accuracy degradation using N-Bit fixed point multipliers.

## *A. STDP Boundary Parameter Optimization*

The STDP boundary parameters were optimized by minimizing the Normalized Mean Square Error (NMSE) using *scipy.optimize.fmin()* from the *SciPy* Python library. This is described in Algorithm (1).

Please note that in order to minimize the NMSE using 2*<sup>N</sup>* discretization, all remaining full resolution parameters were



- 1: Optimize all Parameters and Minimize NMSE *LOOP Process*
- 2: for Parameter in P do
- 3: Discretize Parameter
- 4: Update the Discretized Parameter in P
- 5: Optimize all Parameters and Minimize NMSE
- 6: end for
- 7: return Optimized Discretized 2*<sup>N</sup>* Parameters, P

re-optimized after discretizing each singular parameter. Hence, the optimized  $log_2$ (Parameter Value) values do not explicitly correlate with their full resolution counterparts. As such, we only report and compare the NMSE values for the software emulations adopting both full resolution multipliers and fourbit fixed point multipliers using full resolution parameters to our FPGA implementations.

*B. Accuracy Degradation using N-Bit Fixed Point Multipliers*



Fig. 2: Product resolution and NMSE achieved for the hippocampal data using the emulated Full TSTDP circuit, while using N-bit proposed multipliers with N ranging from 1 to 8.

Software emulations showed that increasing the number of bits in shift-and-add fixed point multipliers above four does not significantly improve the multiplication accuracy. Fig. 2 depicts that the product resolution of an 8-bit multiplier increases only by a marginal amount (0.0293) compared to a 4-bit multiplier. In our software emulations, the use of the 8-bit multiplier resulted in only a slight improvement in the hippocampal data NMSE, i.e. from 2.530286657 achieved using the four-bit multiplier to 2.5281715. Therefore, for all FPGA implementations, a four-bit multiplier was used, which proved efficient and accurate when replicating experimental data.

## IV. DIGITAL STDP IMPLEMENTATION

The digital implementation of the STDP algorithms require implementing Equations 1–2 and 5–6 to update the synaptic potentials  $r_1$ ,  $r_2$ ,  $o_1$ , and  $o_2$ , at each time step and at the time of each pre- or post-synaptic spike event. The proposed module should be also able to accurately update the weight at each spike arrival as shown in Equations 3 and 4 for PSTDP and 7 and 8 for TSTDP. Evidently, implementing these equations in hardware involves accounting for several multiplications, which is known to be a hardware-hungry operation. Therefore, a straight implementation of TSTDP or even PSTDP, would result in a bulky and slow digital hardware, which is not useful for integration in large-scale neuromorphic architectures that are needed for engineering applications. Therefore, reducing the complexity and increasing the performance of the STDP synapse is critical.

In order to maximize the computational efficiency and achieve a minimal hardware cost, here we incorporated several approximation techniques to realize the STDP algorithms. First, we used fixed point arithmetic and number representation. The domain of the required numbers in the proposed implementation was determined, through software emulation, to be −1.9999 ≤ *valuesToRepresent* ≤ 1.9999, requiring signed representation. In the adopted fixed-point system, 16 bits were assigned to 'fraction bits', which resulted in a resolution of  $2^{-16} = 0.00001525878$ , while two bits were used for the sign and integer component. The large resolution was used to account for required minute changes in the values of exponential learning potentials, e.g.  $o_2[n+1]$  and  $r_1[n+1]$ . Note that, the range and accuracy of the number representation was first examined in software emulations performed in Python to optimize the synaptic potentials of the STDP rules to reach minimal errors comparable to those reported in the STDP computational modeling presented in [10].

Next, the developed synapse was designed to eliminate all multiplications required in Equations 3–4 and 7–8. As the first elimination step, all the multiplications associated with the synaptic parameters,  $A_2^+$ ,  $A_2^-$ ,  $A_3^+$ ,  $A_3^-$ ,  $\tau_+$ ,  $\tau_-$ ,  $\tau_x$  and  $\tau_y$ were approximated by powers of 2, i.e.  $2^N$ , where  $N \in \mathbb{Z}$ . This allowed each multiplication, involving one of the eight aforementioned synaptic parameters, to be computed within a single clock cycle using a simple bit-shift operation.

To eliminate the remaining multiplications in the TSTDP model, i.e. multiplications in  $A_3^- o_1(t) r_2(t - \varepsilon)$ and  $A_3^{\dagger} r_1(t) o_2(t - \varepsilon)$ , conventional techniques such as bitshifting and Piece-wise Linear (PWL) approximations were deemed unsuitable as these multiplications are multi-variable in nature. Therefore, a different approximation approach was required to efficiently implement the required multiplications without a need for dedicated floating-point multipliers.

Since the synaptic potential values i.e.  $r_1$ ,  $r_2$ ,  $o_1$ , and  $o_2$  are confined between 0 and 1, a special unsigned approximative multiplier could be devised for them. Here, an optimized fourbit unsigned shift-and-add multiplier was constructed in hardware to approximate high resolution unsigned multiplications for the values between 0 and 1 and to a degree of accuracy of  $2^{-8} = 0.00390625$ .



Fig. 3: Digital architecture and the flow diagram of the proposed STDP synaptic module. The top panel depicts a block diagram of the synaptic module. The bottom panel demonstrates digital implementation of each block shown in the top panel. This includes: [A] Realization of the decaying synaptic potential differential equations, i.e. Equations 3, 4, 7 and 8. [B] The proposed hardware for determining the change in synaptic weight when a pre-synaptic, or [C] a post-synaptic spike occurs. As shown here, with the dashed horizontal and vertical lines in the bottom panels, the whole weight update procedure is carried out in 6 clock cycles.

The proposed fixed-point multiplier is described using Algorithm 2. In order to realize the unsigned multiplication required. In the developed approach, the four most-significant fraction bits of the first operand, e.g.  $o_1(t)$  or  $r_1(t)$  are *'multiplied'* with the four most-significant fraction bits of the second operand, e.g.  $r_2(t)$  or  $o_2(t)$ . The 8-bit result is then assigned to the 8 most-significant fraction bits of the resulting value, which its 2 integer bits and the 8 leastsignificant fraction bits are zeroed. Software emulations of the proposed approximative multiplication demonstrate that increasing the number of bits from four does not significantly

Algorithm 2 Algorithm of the Four Bit Unsigned Multiplier.

improve the multiplication accuracy for  $o_1[n+1]r_2[n]$  and  $r_1[n+1]o_2[n]$ . The multiplication accuracy and its impact on the synaptic plasticity performance is discussed in more details in Section VI.

Using the above mentioned techniques, the proposed design is able to update the current synaptic weight  $W[n]$ , to the new value,  $W[n+1]$ , without the need for any dedicated floatingpoint multipliers and mainly by means of specifically designed hardware-friendly shift-add multipliers. The full hardware structure and the flow diagram of the implemented design is presented in Fig. 3. It is demonstrated that the synaptic weight update takes place in a mere six clock cycles, as shown by dotted vertical/horizontal lines in the bottom panels of Fig. 3, making the proposed design fast and very hardware efficient. However, as each synapse is updated during every clock cycle, independent of the pre-synaptic and post-synaptic activity, the power efficiency of our design is not optimal. In future works, additional circuitry could be used to detect preor post-synaptic spike events, in addition to native parallelism and pipelining techniques, where several synapses can run in parallel. This is possible by interfacing one singular STDP module with all synapses in a given neuron by storing the synapse address of the pre-synaptic spike and then sending it out with the increment or decrement update signals [27]. By using techniques discussed in [27] up to two synapses per postsynaptic event can be updated. By adding counters that flush all the addresses in the First in First Out (FIFOs) registers, this can be extended to update an arbitrary number of synapses for each post-synaptic event.

As demonstrated in Fig. 3, at the first clock cycle, all the four decaying synaptic potentials are updated using only shift and subtraction operations. At the second and third clock cycles, the developed four-bit multiplier circuits produce the 8-bit results, required in the fourth step, where the bulk of Equations 7 and 8 are calculated in parallel. Depending on having a pre- or post-synaptic event, at clock cycle five, no weight change ('0'), or the value calculated in the previous clock cycles is selected to be added to, or subtracted from the current weight value  $w[n]$  in the sixth clock cycle to update the weight.

The proposed design was described in Verilog HDL and



Fig. 4: Architecture of the hardware implementation. Here the DE1-SOC development board is interfaced with a host computer using an ATMEGA 2560 micro-controller over UART. Pre- and post-synaptic spikes are generated using C code on the Hardware Processor System (HPS) as per our previous work [28]. A photo of the used hardware is included in [29].

synthesized using Xilinx ISE Design Suite to compare resource usage and frequency to previous works, which have all used Xilinx FPGAs. The high-level architecture<sup>1</sup> used to physically realize our implemented hardware is demonstrated in Figure (4).

#### V. IMPLEMENTATION RESULTS

In order to validate the proposed STDP hardware and verify its synaptic plasticity capabilities, all the STDP experiments used in [10], which proposes the TSTDP model, were performed on the developed hardware synapse. These include replicating experimental data measured in hippocampal cultures and the visual cortex regions of the rat brain from [10], [13], [14], [30]. Furthermore, to measure the resemblance of the synaptic plasticity of the proposed hardware compared to the biological plasticity data, the NMSE as shown in Eq. 9 was calculated [10]. Here, the replicated weight change has the same range as in biological experiments.

$$
NMSE = \frac{1}{P} \sum_{i=1}^{P} \left( \frac{\Delta W_i^{exp} - \Delta W_i^{cir}}{\sigma_i} \right)^2, \tag{9}
$$

where, similar to [10], P denotes the number of available data points in the targeted brain regions, i.e, 13 for hippocampal cultures [13] and 10 for the visual cortex [31],  $\Delta W_i^{exp}$  and σ*<sup>i</sup>* represent the weight change values measured in the biological experiments and their respective standard error mean, and  $\Delta W_i^{cir}$  is the actual synaptic weight change calculated by the developed digital synaptic circuit.

Furthermore, it is shown in [10] that the full TSTDP model presented in Equations 7 and 8 could be minimized without affecting the accuracy of the triplet model in replicating experimental data. This minimization includes removing the synaptic

 $1$ All HDL code and corresponding documentation is openly accessible at https://github.com/coreylammie/TCAS-STDP.

Parameter PSTDP Minimal TSTDP Full TSTDP<br>Parameter Parameter *log*2(Parameter Value) Parameter Value *log*2(Parameter Value) Parameter Value *log*2(Parameter Value) Parameter Value  $A_2^+$ -8 0.00390625 -8 0.00390625 -8 0.00390625  $A_2^{\leq}$ -9 0.00195313 -9 0.00195313 -9 0.00195313  $A_3^{\tilde{+}}$  $\frac{1}{3}$  NA NA -9 0.00195313 -8 0.00390625  $A_3^ \frac{1}{3}$  NA  $\frac{10}{3}$  NA  $\frac{10}{3}$  NA  $\frac{10}{3}$  0.00097656  $\tau_+$  | 6 64.0000000 6 64.0000000 6 64.0000000 6 64.0000000 6 64.000000 τ<sup>−</sup> 8 256.000000 8 256.000000 8 256.000000 τ*<sup>x</sup>* NA NA NA NA 10 1024.00000

τ*<sup>y</sup>* NA NA 5 32.0000000 5 32.0000000

TABLE I: Optimized synaptic parameters for the hippocampal culture experiments performed on FPGA. Here,  $N \in \mathbb{Z}$ . All parameter values are expressed using  $log_2$ (Parameter Value) and standard decimal notation.



Fig. 5: Digital architecture and the flow diagram of the proposed minimal synaptic modules. [A] Realization of the decaying synaptic potentials differential equations, i.e. Equations 3, 4, 7 and 8. [B] The proposed hardware for determining the change in synaptic weight when a pre-synaptic spike occurs. [C.H] The proposed hardware for determining the change in synaptic weight when a post-synaptic spike occurs for the hippocampal module. [C.V] The proposed hardware for determining the change in synaptic weight when a given postsynaptic spike occurs for the visual cortex module.

potentials that proved negligible during the weight change updates, i.e, those with no or minute effect in decreasing the NMSE. Following the same approach, here we implement the minimal TSTDP models in hardware to replicate various experiments as performed in [10]. In the following subsections, these experiments, as well as the developed circuits, are explained and the generated results and the achieved NMSEs are reported.

#### *A. Hippocampal Culture Experiments*

Hippocampal culture experiments include measuring synaptic plasticity using conventional spike pairing, triplet, and quadruplet interactions [10], [13]. As previously demonstrated,

all the data from these experiments could be closely reproduced using the TSTDP model and one set of optimized synaptic parameters, i.e.  $A_2^-$ ,  $A_2^+$ ,  $A_3^-$ ,  $A_3^+$ ,  $\tau_-$ ,  $\tau_+$ ,  $\tau_x$ , and  $\tau_v$ , that results in reaching a minimal NMSE of 2.9, as reported in [10]. Note that, this NMSE could be lowered to near zero, if the parameters are optimized to generate the data from only one experiment, e.g. triplet. However, when optimizing the parameters to replicate the results of several distinct experiments, the minimum NMSE of the model is 2.9.

Following the same approach as in [10], the 8 synaptic parameters should be also optimized for the proposed TSTDP circuit shown in Fig. 3, to reach a minimal NMSE. Table I shows all these optimized synaptic parameters that result in reaching an NMSE of 2.53, which is lower than the NMSE achieved using the TSTDP computational model, where  $\tau_{+}$ and  $\tau_$  were bound to 16.8 ms and 33.7 ms respectively.

It was also shown in [10] that, the full TSTDP model could be minimized by removing the triplet depression contribution, i.e. by neglecting  $r_2$ . In this case, there is no need for the optimization of  $\tau_x$  and the same minimal NMSE = 2.9 is achieved using the computational model. Therefore, the TSTDP model for hippocampal culture experiments could be minimized to Equations 10 and 11, hence, only 6 synaptic parameters, i.e.  $A_2^-$ ,  $A_2^+$ ,  $A_3^+$ ,  $\tau_-$ ,  $\tau_+$ , and  $\tau_y$ , need to be optimized to reach the best NMSE.

$$
w(t) \rightarrow w(t) - o_1(t)A_2^- \text{ if } t = t_{pre}
$$
 (10)

$$
w(t) \to w(t) + r_1(t)[A_2^+ + A_3^+o_2(t - \varepsilon)] \text{ if } t = t_{post}. \tag{11}
$$

According to these minimal TSTDP equations, the circuit shown in Fig. 3 was simplified by removing the digital components that did not contribute to the synaptic weight update. Fig. 5 demonstrates the minimal TSTDP circuit needed for hippocampal culture experiments. As shown, the triplet depression synaptic potential  $r_2$ , as well as its required multiplication circuitry are removed to reach a very efficient implementation of TSTDP circuit. In order to reach the best NMSE using this minimal hardware, 6 synaptic parameters needed to be optimized. Our experiments show that after optimizing the parameters as shown in Table I, an NMSE of 2.67 is reached, which is close to the full TSTDP circuit case.

In addition, the PSTDP rule, shown in Equations (3) and (4), was also implemented by removing the triplet synaptic contribution parameters from the architecture shown in Fig. 3. In order to reach a minimal NMSE for this rule, 4 parameters



Fig. 6: Pairing STDP experiments replicated by both the PSTDP and TSTDP circuits. Here, the two data points including the error bars are extracted from [13] and are used for calculating the NMSE as in [10]. The overall trend achieved using the circuits also resembles the STDP behaviour observed in the inset, which shows the Normalized EPSP Slope (correlated with synaptic weight change), extracted from [30].

needed to be optimized. As shown in [10] and confirmed in our experiments, even after optimizing the 4 PSTDP parameters for the hippocampal data set, the NMSE was still very large (9.16) and was not comparable to the TSTDP circuits. The four PSTDP optimized parameters are also shown in Table I. Note that, for all the circuits, the optimized values are confined to powers of two, so that a low-cost and high-performance synaptic device is reached.

Using the optimized values for the full and minimal triplet as well as the PSTDP circuits, we then performed all hippocampal culture experiments to mimic 13 experimental data points from [13], which are used for the TSTDP modeling in [10]. Following the approach used in [10], these data points include 2 pair-based, 8 triplet, and 3 quadruplet values, all shown in black in the following figures. For all these data points, error bars indicate the standard deviation in the biological experiments. Due to the strong similarity in the results from the minimal and full TSTDP circuits, in the following figures we only show the results for the minimal circuit. For the sake of performance comparison, all the figures demonstrate the synaptic weight changes generated using the optimized PSTDP circuit, as well.

*1) Pair STDP Experiments:* As the first step, the synaptic weight changes as a function of the time differences between a pair of pre- and post-synaptic spikes were obtained. The resulting weight changes that nicely resemble the well-known asymmetric STDP window are demonstrated in Fig. 6. As shown, both the PSTDP and TSTDP circuits generate similar weight changes for various  $\Delta t = t_{post} - t_{pre}$  values. This is in good agreement with the overall exponential trend of weight changes observed in the STDP biological experiments first reported in [30].

*2) Triplet STDP Experiments:* Next, two triplet experiments performed in hippocampal cultures [13], were replicated using both the TSTDP and PSTDP circuits. These experiments involve i) stimulating the circuit under test by a pre-post-pre spike triplet, where  $\Delta t_1 = t_{post} - t_{pre1}$  and  $\Delta t_2 = t_{post} - t_{pre2}$  and ii) a post-pre-post spike sequence, where  $\Delta t_1 = t_{post1} - t_{pre}$  and  $\Delta t_2 = t_{post2} - t_{pre}$ .

Fig. 7 demonstrates the synaptic weight changes produced under the triplet protocols. As shown, the PSTDP circuit generates the same synaptic weight changes in both cases of post-pre-post (Fig. 7(a)) and pre-post-pre (Fig. 7(b)). This is due to the inability of the PSTDP to distinguish the difference between the two cases arising from the fact that PSTDP simply sums up the pre-post and post-pre weight changes to calculate the weight change in result of a spike triplet, while, obviously this is not the case in biology. However, when using the proposed TSTDP circuit, it successfully distinguishes the two different triplet combinations, showing higher potentiation for the post-pre-post case, while causing inhibition in the pre-postpre triplet (see Fig. 7). This is in good agreement with the biological data and provides a close fit to the TSTDP model presented in [10].

*3) Quadruplet STDP Experiments:* Further experiments involving quadruplets of spikes [10], [13] were also performed using both the PSTDP and TSTDP circuits and the result is shown in Fig. 8. In the quadruplet experiments, a post-pre pair with  $\Delta t = -5$  ms arrives  $T > 0$  ms before a pre-post pair with  $\Delta t = 5$  ms, or a pre-post pair with  $\Delta t = 5$  ms arrives *T* < 0 ms before a post-pre pair with  $\Delta t = -5$  ms. Here,  $T = (t_{pre2} + t_{post2})/2 - (t_{pre1} + t_{post1})/2$ . It was observed that while the PSTDP circuit does not closely fit the three targeted data points, the proposed TSTDP design provides a better fit to the data and better follows the weight change trend.

Considering the three experiments performed so far and the observed plasticity behaviours of the two circuits under test, it is obvious that the TSTDP circuit replicates the experimental data, especially the triplet experiments, significantly better than PSTDP. Furthermore, our circuit results are in total agreement with those reported in [10], which are calculated using the TSTDP computational model.

Table II summarizes the minimal NMSEs achieved using the three sets of different circuits under test, all when their synaptic parameters are optimized to have the best fit to the 13 experimental data points of hippocampal culture experiments shown in Figures 6 to 8. In Table II and all the following tables, Software Full Resolution represents software implementations using floating-point multiplications, while Software 4-FP shows full-resolution software implementations using four-bit fixed point multipliers. In addition, Digital Full Resolution, is the FPGA design using floating-point multipliers, while digital optimized shows the proposed optimized design on FPGA. This is in very good agreement with the findings of the TSTDP model paper [10]. Here, it can be observed that our digital implementations perform similarly to their software counterparts. In the worst case scenario, a performance degradation of 7.41% is observed between our digital implementation and our full-resolution software implementations using full resolution



Fig. 7: Synaptic weight changes in result of [A] post-pre-post and [B] pre-post-pre spike combinations. As demonstrated, the PSTDP circuit fails to distinguish between two different triplet spike combinations and generates the same weight updates for both cases. The TSTDP circuit, on the other hand, successfully distinguishes between two different triplet spike combinations.

TABLE II: Minimal NMSEs achieved for hippocampal data.

| Model             | <b>NMSE</b>                              |                             |                                           |                            |  |  |  |
|-------------------|------------------------------------------|-----------------------------|-------------------------------------------|----------------------------|--|--|--|
|                   | <b>Digital Full</b><br><b>Resolution</b> | <b>Digital</b><br>Optimized | <b>Software Full</b><br><b>Resolution</b> | <b>Software</b><br>$4$ -FP |  |  |  |
| <b>PSTDP</b>      | 8.63338609                               | 9.16524002                  | 8.48601482                                | 8.61875652                 |  |  |  |
| Minimal TSTDP     | 2.55450684                               | 2.67254576                  | 2.54163893                                | 2.55450684                 |  |  |  |
| <b>Full TSTDP</b> | 2.52634531                               | 2.53028666                  | 2.45102116                                | 2.48701814                 |  |  |  |

floating point multipliers.

#### *B. Visual Cortex Experiments*

The synaptic modeling study in [10] demonstrated that, in addition to the hippocampal culture, the visual cortex experimental data can also be closely mimicked using the TSTDP rule, however, it needs a new set of optimized synaptic parameters. It has also shown that a minimized version of the TSTDP rule could replicate the visual cortex data with no significant degradation in the synaptic ability of the model. This minimized TSTDP rule eliminates the pairing potentiation and triplet depression contributions of the full model (Equations 7 and 8). The minimal rule, is therefore shown as Equations 12 and 13. Hence, only 5 synaptic parameters, i.e.  $A_2^-$ ,  $A_3^+$ ,  $\tau_-$ ,  $\tau_{+}$ , and  $\tau_{y}$ , need to be optimized to reach the best NMSE.

$$
w(t) \rightarrow w(t) - o_1(t)A_2^- \text{ if } t = t_{pre}
$$
 (12)

$$
w(t) \to w(t) + r_1(t)[A_3^+o_2(t-\varepsilon)] \text{ if } t = t_{post}. \tag{13}
$$

Equations 12 and 13 lead to a minimal TSTDP circuit, as shown in Fig. 5. Similar to the hippocampal minimal circuit, this one does not need the triplet depression contribution, therefore  $r_2$  and its related multiplications were removed. In addition, it does not require the pair-based potentiation, therefore, some extra addition and multiplications were also



Fig. 8: Synaptic weight changes in result of quadruplet experiments generated using the proposed TSTDP circuit, as well as the PSTDP circuit. Here, the three biological data points including their error bars are extracted from [10] and the remaining biological data points are extracted from [13].

TSTDP rule, however, it needs a new set of optimized synaptic  $100 -80 -60 -40 -20$  0 20 40 60 80 100<br>
TSTDP rule, also shown that a minimized version of the<br>
TSTDP rule could replicate the visual cortex data with no sig-<br>
Fi The visual cortex experimental data as presented in [10], [14] include 10 data points, which have been extracted by sweeping the frequency  $(\rho)$  of pairs of pre-post or postpre spikes and measuring their respective weight changes. As observed in biological experiments, when increasing the frequency of pre-post pairs of spikes with a fixed  $\Delta t = 10$  ms, the synaptic weight shows constant potentiation until it saturates at a frequency around 40 Hz. On the other hand though, increasing the frequency of post-pre pairs with a fixed  $\Delta t = -10$  ms, first results in a depression, but it changes to

| <b>Parameter</b> | <b>PSTDP</b>              |                 | <b>Minimal TSTDP</b>      |                 | <b>Full TSTDP</b>         |                 |
|------------------|---------------------------|-----------------|---------------------------|-----------------|---------------------------|-----------------|
|                  | $log_2$ (Parameter Value) | Parameter Value | $log_2$ (Parameter Value) | Parameter Value | $log_2$ (Parameter Value) | Parameter Value |
| $A_2^-$          | $-33$                     | $1.1632e-10$    | <b>NA</b>                 | NA              | $-33$                     | $1.1632e-10$    |
|                  | -8                        | 0.00390625      | -9                        | 0.00195312      | $-8$                      | 0.00390625      |
|                  | NA                        | NA              | -8                        | 0.00390625      | -9                        | 0.00195312      |
| $A_2^-$          | NA                        | <b>NA</b>       | <b>NA</b>                 | NA              | -9                        | 0.00195312      |
| $\tau_+$         | 6                         | 64.0000000      | 6                         | 64.0000000      | h                         | 64.0000000      |
| $\tau_-$         | 8                         | 256.000000      | 8                         | 256.000000      | 8                         | 256.000000      |
| $\tau_x$         | NA                        | NA              | NA                        | NA              | 10                        | 1024.00000      |
| $\tau_{\rm v}$   | NA                        | NA              |                           | 32.0000000      |                           | 32.0000000      |

TABLE III: Optimized synaptic parameters for the visual cortex experiments performed on FPGA. Here,  $N \in \mathbb{Z}$ . All parameter values are expressed using *log*2(Parameter Value) and standard decimal notation.

TABLE IV: Minimal NMSEs achieved for visual cortex data.

| Model             | <b>NMSE</b>                       |                             |                                           |                            |  |  |  |
|-------------------|-----------------------------------|-----------------------------|-------------------------------------------|----------------------------|--|--|--|
|                   | Digital Full<br><b>Resolution</b> | <b>Digital</b><br>Optimized | <b>Software Full</b><br><b>Resolution</b> | <b>Software</b><br>$4$ -FP |  |  |  |
| <b>PSTDP</b>      | 5.913764482                       | 6.12773109                  | 5.6821826                                 | 5.8428476                  |  |  |  |
| Minimal TSTDP     | 0.197023795                       | 0.21998375                  | 0.1916552                                 | 0.1956726                  |  |  |  |
| <b>Full TSTDP</b> | 0.175826965                       | 0.19583439                  | 0.1710069                                 | 0.1728316                  |  |  |  |

potentiation after reaching a certain threshold of around 25 Hz. After that it increases with a fast pace, as shown in Fig. 9.

This figure depicts that the PSTDP circuit completely fails to follow the weight changes as they happen in biology, while the proposed TSTDP circuit, which benefits from the intrinsic characteristics of the TSTDP algorithm, closely fits the curve and nicely follows the trend of experimental weight changes.

The optimized synaptic parameters used to replicate the data shown in Fig. 9 are listed in Table III. Using these parameters, the proposed TSTDP circuit very closely mimics the data and achieves a minimal NMSE of 0.19 as shown in Table IV. In addition, our digital implementations perform similarly to their software counterparts. In the worst case scenario, a performance degradation of 12.88% is observed between our digital implementation and our full-resolution software implementations using full resolution floating point multipliers.

In addition, an NMSE of 0.22 was also recorded for the minimal TSTDP circuit, which generates the output shown in Fig. 9. Also, an NMSE of 6.13 was reached for the PSTDP circuit, which completely fails to follow the experimental data as demonstrated in Fig. 9. These values are in good agreement with the NMSEs achieved using the TSTDP and PSTDP computational models [10].

#### *C. Rate-based BCM Experiments*

In addition to replicating the hippocampal cultures and visual cortex experimental data, our proposed spike timingbased synapse is shown to be able to mimic rate-based synaptic plasticity experiments such as those performed in [16]. These experiments are reminiscent of the rate-based Bienenstock-Cooper-Munro (BCM) learning rule [11], [15], [16]. The BCM rule includes a threshold frequency at which synaptic depression changes to potentiation. This threshold is shown to slide (increase/decrease) for various pre- and post-synaptic spiking rates,  $\rho_x$  and  $\rho_y$ , respectively [10].



Fig. 9: Synaptic weight changes in result of visual cortex experiment generated using the proposed TSTDP circuit, as well as the PSTDP circuit. Here, solid lines represent weight changes for pre-post pairs with  $\Delta t = 10$ , whereas dashed lines are for post-pre pairs with  $\Delta t = -10$ .

Here, we replicate the sliding threshold behaviour of the BCM learning rule, using the proposed minimal TSTDP circuit that was used for the visual cortex experiments. We also use the same optimized parameters for those experiments, therefore, there is no need for further modification or parameter tuning in the TSTDP hardware. The resulting BCM learning behaviour obtained using the minimal TSTDP rule of Equations 12 and 13 is shown in Fig. 10. For these experiments, the hardware synapse receives Poissonian spike trains with mean frequency of  $\rho_x$  from the pre-synaptic, and  $\rho$ <sub>y</sub> from the post-synaptic side. For the performed experiment,  $\rho$ <sub>*y*</sub> is swept between 0 and 50 Hz, while the pre-synaptic rate,  $\rho_x$ , is kept fixed. It is shown that for higher pre-synaptic spike rates, the BCM threshold slides toward higher frequencies, confirming the TSTDP model results shown in [10], while resembling the experimental data presented in [16].

### *D. Power Analysis*

Power consumption is a major factor for brain inspired computing and hence we have characterised the power consumption of our proposed circuits in Table (VI). Table (VI)

TABLE V: Performance and device utilization comparison for the implemented synapse module. Here NA means not available in the source paper. Abbreviations used in the table correspond to the circuit used for each plasticity rule. These include: Minimal TSTDP Hippocampal (TMH), Minimal TSTDP Visual Cortex (TMVC), Calcium Based (CAB), Serial PSTDP (Serial), and Cell-based PSTDP (Cell-based) [32]. All reported hardware utilization numbers for our current works have been obtained from re-synthesizing our original HDL designs using the Xilinx ISE Design Suite in order to provide a direct comparison to previous works. Here, the maximum synthesizable frequency is determined by 1/(Requirement-Slack), where requirement and slack values are taken from Xilinx *Max Delay Path* report.

| Implementation                                                                                             | <b>Plasticity Rule</b> | <b>Flop Flops</b> | <b>Slice Registers</b> | <b>LUTs</b> | <b>Targeted FPGA Device</b> | <b>Max Synthesizable</b><br>Freq [MHz] | <b>Visual Cortex</b><br><b>NMSE</b> | <b>Hippocampal Cultures</b><br><b>NMSE</b> |
|------------------------------------------------------------------------------------------------------------|------------------------|-------------------|------------------------|-------------|-----------------------------|----------------------------------------|-------------------------------------|--------------------------------------------|
| Our current digital implementations using the optimized architecture with four-bit fixed point multipliers |                        |                   |                        |             |                             |                                        |                                     |                                            |
| Digital Optimized                                                                                          | <b>PSTDP</b>           | 16                | 12                     | 8           | Xilinx Spartan-6 XC6SLX9    | 816                                    | 6.127731093                         | 9.165240019                                |
| Digital Optimized                                                                                          | <b>Full TSTDP</b>      | 34                | 27                     | 18          | Xilinx Spartan-6 XC6SLX9    | 816                                    | 0.195834388                         | 2.530286657                                |
| Digital Optimized                                                                                          | <b>TMH</b>             | 20                | 16                     | 15          | Xilinx Spartan-6 XC6SLX9    | 816                                    | <b>NA</b>                           | 2.672545759                                |
| Digital Optimized                                                                                          | <b>TMVC</b>            | 20                | 17                     | 12          | Xilinx Spartan-6 XC6SLX9    | 816                                    | 0.219983745                         | <b>NA</b>                                  |
| Our current digital implementations synthesized using full resolution [FR] dedicated multipliers           |                        |                   |                        |             |                             |                                        |                                     |                                            |
| Digital Full Resolution                                                                                    | <b>PSTDP</b>           | 671               | 642                    | 859         | Xilinx Spartan-6 XC6SLX9    | 362                                    | 5.913764482                         | 8.63338609                                 |
| Digital Full Resolution                                                                                    | <b>Full TSTDP</b>      | 1498              | 1370                   | 1943        | Xilinx Spartan-6 XC6SLX9    | 362                                    | 0.175826965                         | 2.52634531                                 |
| Digital Full Resolution                                                                                    | <b>TMH</b>             | 882               | 811                    | 1629        | Xilinx Spartan-6 XC6SLX9    | 362                                    | NA                                  | 2.55450684                                 |
| Digital Full Resolution                                                                                    | <b>TMVC</b>            | 882               | 826                    | 1593        | Xilinx Spartan-6 XC6SLX9    | 362                                    | 0.197023795                         | <b>NA</b>                                  |
| Previous Works                                                                                             |                        |                   |                        |             |                             |                                        |                                     |                                            |
| $[25]$                                                                                                     | <b>PSTDP</b>           | 39                | <b>NA</b>              | 18          | Xilinx Spartan-3 XC3S1500   | <b>NA</b>                              | NA                                  | <b>NA</b>                                  |
| $[12]$                                                                                                     | CAB                    | 292               | <b>NA</b>              | 309         | Xilinx Spartan-6 XC6SLX45T  | 332                                    | <b>NA</b>                           | <b>NA</b>                                  |
| $[23]$                                                                                                     | <b>PSTDP</b>           | <b>NA</b>         | 46                     | 36          | Xilinx Spartan-6 XC6SLX9    | 138                                    | <b>NA</b>                           | 0.7                                        |
| $[23]$                                                                                                     | <b>TMH</b>             | <b>NA</b>         | 54                     | 41          | Xilinx Spartan-6 XC6SLX9    | 192                                    | 0.18                                | 0.18                                       |
| $[23]$                                                                                                     | <b>TMVC</b>            | <b>NA</b>         | 47                     | 26          | Xilinx Spartan-6 XC6SLX9    | 192                                    | 0.18                                | 0.18                                       |
| $[32]$                                                                                                     | Serial                 | <b>NA</b>         | <b>NA</b>              | 47          | Xilinx Spartan-3 XC3S1500   | <b>NA</b>                              | NA                                  | <b>NA</b>                                  |
| [32]                                                                                                       | Cell-based             | <b>NA</b>         | <b>NA</b>              | 339         | Xilinx Spartan-3 XC3S1500   | <b>NA</b>                              | <b>NA</b>                           | <b>NA</b>                                  |
| [24]                                                                                                       | <b>PSTDP</b>           | 398               | <b>NA</b>              | 1430        | Xilinx Virtex-6 XC6VLX240T  | 200                                    | NA                                  | <b>NA</b>                                  |



Fig. 10: The proposed minimal TSTDP circuit for visual cortex experiments successfully demonstrates a BCM-like behaviour including a sliding threshold frequency, at which depression changes to potentiation. Here,  $\rho_x$  and  $\rho_y$ , denote the mean frequency of pre- and post-synaptic spike trains with Poisson statistics, respectively [10]. The inset shows the percentage change in synaptic weight with respect to stimulation frequency extracted from [16], which demonstrates a BCM-like behaviour including the sliding threshold.

reports the total on-chip FPGA power calculated using the Xilinx Power Estimator (XPE) after HDL synthesis. In order to measure the power consumption of one synaptic module, we added a second instance of the module to be able to calculate

| Model             | <b>Total On-Chip Power [W]</b><br>Digital Full<br><b>Digital</b><br>Optimized<br><b>Resolution</b> |       | <b>Synaptic Module Power [W]</b><br>Digital Full<br><b>Digital</b><br>Optimized<br><b>Resolution</b> |       |  |
|-------------------|----------------------------------------------------------------------------------------------------|-------|------------------------------------------------------------------------------------------------------|-------|--|
| <b>PSTDP</b>      | 0.185                                                                                              | 0.11  | 0.128                                                                                                | 0.085 |  |
| <b>Full TSTDP</b> | 0.407                                                                                              | 0.244 | 0.222                                                                                                | 0.132 |  |
| <b>TMH</b>        | 0.305                                                                                              | 0.179 | 0.143                                                                                                | 0.098 |  |
| <b>TMVC</b>       | 0.308                                                                                              | 0.182 | 0.145                                                                                                | 0.099 |  |

TABLE VI: Total On-Chip Power Consumption (W), and Synaptic Module Power (W) for all digitally implemented synapse modules.

the estimated power used only by the synapse and not other on-chip components.

It was observed that the power consumption has an almost linear relationship with the total instanced synapse modules. As such, we introduce a new parameter, Synaptic Module Power, which indicates the additional power draw of instancing a new synapse module.

Table (VI) shows our digitally optimized designs incorporating four-bit fixed point multipliers reduce the total on-chip power required by 40%. We are not able to make a comparison to previous relevant FPGA works because none of them have reported their power consumption. However, when comparing to other hardware technologies such as ASIC, analog, and memristive solutions, our FPGA design consumes significantly higher power. Nonetheless, the main reason for adopting FPGAs instead of ASIC or other implementation technologies, is greater design flexibility and much shorter development period. Given the power savings that we have reported in Table VI, the presented circuits would be ideal as building blocks for large-scale neuromorphic processors or hardwareaccelerated brain simulators and optimization strategies such as pipelining and time multiplexing could easily be employed

to reduce further the power overhead of the system.

#### VI. DISCUSSION AND COMPARISON TO PREVIOUS WORK

This paper proposed an efficient digital implementation of the triplet STDP rule [10], and we demonstrated that our developed hardware is capable of replicating a series of biological experiments [13], [14], [16]. The proposed synaptic hardware was carefully optimized to utilize minimal FPGA resources and to gain the highest speed possible. The resource usage and speed proved to be significantly improved compared to previous implementations of both PSTDP and TSTDP in digital hardware as shown in Table V. This was achieved by devising specific circuits to realize lower resolution binary multiplications using shift and add operations to replace hardware-hungry complex floating-point multipliers. Also, to further improve the hardware resource usage of the proposed synapse, all the multiplications by fixed values required in the circuit were performed by rounding these values to powers of two, and therefore being able to replace them using hardwarefriendly shift operations.

Table V demonstrates that the PSTDP design that emerges from within the proposed TSTDP circuit, outperforms the best previous FPGA PSTDP design [25] by 2.38 times in hardware resource usage, while it significantly surpasses a recent PSTDP design based on a Piece Wise Linear approximation (PWL) approach [23] by 4.10 times less hardware usage and 5.91 times faster operation. Also, Table V shows that the proposed PSTDP circuit consumes much less resources compared to the two of the smaller designs presented in [32], by almost 6 and 42 times.

In addition, a minimal version of the presented TSTDP module is more efficient than the best previous PSTDP design [25] by 1.78 times in FPGA hardware usage. It is worth noting that, this is a significant improvement because, even though our synaptic hardware implements a more complex learning rule with improved capability in reproducing biological data, it consumes fewer resources and operates faster. Compared to [12] that proposes a digital PWL implementation of the calcium-based plasticity rule of [33], the proposed hardware achieves a significant improvement of 2.46 times in maximum FPGA update frequency and 11.55 times in FPGA resources.

The calcium-based rule that is able to replicate high order plasticity experiments such as those performed by the TSTDP rule, needs 11 parameters to be approximated to replicate an asymmetric STDP learning curve [12] such as Fig. 6. In addition, it can show a BCM-like behaviour as shown in Fig. 10, however, this feature has not been investigated in [12]. Furthermore, non of the previous studies have shown other features such as the frequency-dependent pairing experiment shown in [33] and in Fig. 9.

Beside these, our developed digital synapse outperforms a recent implementation of STDP algorithms, which utilized PWL techniques to design an efficient synaptic circuit [23]. Our proposed synapse operates 5.91 times faster than the PSTDP and 4.25 times faster than the TSTDP circuits presented in [23]. In addition, the resource usage of our minimal TSTDP synapse has improved by over 3.06 times compared to the minimal hippocampal, and 2.51 times compared to the minimal visual cortex TSTDP implementation of [23]. Note that [23] does not show any of the experimental data that we have reproduced in this paper.

Another work that proposed a generic synaptic plasticity circuit that can realize multiple STDP rules in both digital and analog circuitries is presented in [24]. Table V shows that the proposed design significantly improves the resource usage compared to this general design. Although [24] consumes more resources, it has been used in a time-multiplexing manner, to realize 1800 synapses. This time multiplexing strategy, which is to be investigated in our future research, could potentially result in similar performance. Furthermore, the TSTDP digital circuit implemented here is specifically implemented to realize TSTDP and PSTDP rules, while the design presented in [24] can be configured for multiple synaptic plasticity rules including Spike Timing Dependent Delay Plasticity (STDDP), without changing its structure.

In designs such as [34], many cores are replicated in hardware which are fixed, but can be programmed to implement learning algorithms. Our design, on the other hand, has the learning algorithm specifically implemented, which could be used in large-scale neuromorphic systems by having several instances of the hardware and then time-multiplexing them to extend the network size, leveraging the parallel processing feature of FPGAs.

All these comparisons show that the proposed devices in this paper are the most efficient digital implementations of STDP learning algorithms to date. Furthermore, this paper presents the first digital synaptic device that is shown to account for doublet, triplet, quadruplet, and frequency-dependent pairing experiments, as measured in a biological brain. It was also shown that not only does the proposed timing-based synaptic hardware replicates the outcomes of time-dependent plasticity experiments in two different regions of the brain, a ratebased learning behaviour [15], [16] can also emerge from the developed hardware. These distinct features render the proposed design a powerful synaptic component that can bring complex learning and plasticity abilities to neuromorphic architectures, while having a high FPGA update frequency and consuming minimal resources. This is an essential requirement for developing large-scale neuromorphic digital architectures, where speed, minimal resource usage, and powerful synaptic abilities are of paramount importance.

#### VII. CONCLUSION

We designed and implemented two variants of the STDP learning algorithm on a FPGA without the use of any dedicated floating point multiplier blocks. The proposed TSTDP hardware and a few minimal variants of it were used to replicate experimental data from hippocampal cultures and visual cortex regions of the brain with minimal errors. We demonstrated that conventional designs employing the pair-based STDP learning algorithm are unable to accurately reproduce experimental data involving higher order interactions, hence, result in very large errors compared to our proposed design which utilized the higher order TSTDP rule. Moreover, the implemented timingbased circuit was demonstrated to accurately replicate the

outcome of a well known rate-based plasticity experiment as an emerging feature. In summary, our key contributions are:

- We present the first digital synaptic device that is shown to account for doublet, triplet, quadruplet, and frequencydependent pairing experiments.
- We have innovatively used 4-bit unsigned multipliers in place of dedicated floating-point multipliers to reduce FPGA resource usage and increase synaptic circuit operation frequency.
- We demonstrate that, in comparison with the state-of-theart digital synaptic plasticity circuits, our new designs proved to be the best, improving the hardware usage as well as the frequency of synaptic plasticity FPGA circuits by several orders of magnitude.

These contributions make our synaptic module a valuable design for large-scale neuromorphic architectures that are currently being sought for various applications to make smarter machines, and future designs using digital ASIC technology via HDL synthesis.

#### **REFERENCES**

- [1] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar, and D. S. Modha, "A million spiking-neuron integrated circuit with a scalable communication network and interface,' *Science*, vol. 345, no. 6197, pp. 668–673, 2014.
- [2] M. R. Azghadi, S. Moradi, D. B. Fasnacht, M. S. Ozdas, and G. Indiveri, "Programmable spike-timing-dependent plasticity learning circuits in neuromorphic VLSI architectures," *ACM Journal on Emerging Technologies in Computing Systems (JETC)*, vol. 12, no. 2, p. 17, 2015.
- [3] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sumislawska, and G. Indiveri, "A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128k synapses, *Frontiers in neuroscience*, vol. 9, 2015.
- [4] G. Indiveri and S.-C. Liu, "Memory and information processing in neuromorphic systems," *Proceedings of the IEEE*, vol. 103, no. 8, pp. 1379–1397, 2015.
- [5] R. Wang, C. S. Thakur, G. Cohen, T. J. Hamilton, J. Tapson, and A. van Schaik, "Neuromorphic hardware architecture using the neural engineering framework for pattern recognition," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 11, no. 3, pp. 574–584, 2017.
- [6] K. L. Rice, M. A. Bhuiyan, T. M. Taha, C. N. Vutsinas, and M. C. Smith, "FPGA implementation of izhikevich spiking neural networks for character recognition," in *2009 International Conference on Reconfigurable Computing and FPGAs*, Dec 2009, pp. 451–456.
- [7] G. Indiveri, B. Linares-Barranco, T. Hamilton, A. van Schaik, R. Etienne-Cummings, T. Delbruck, S. Liu, P. Dudek, P. Häfliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. Saighi, T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, and K. Boahen, "Neuromorphic silicon neuron circuits," *Frontiers in Neuroscience*, vol. 5, no. 73, 2011.
- [8] A. Yousefzadeh, T. Masquelier, T. Serrano-Gotarredona, and B. Linares-Barranco, "Hardware implementation of convolutional STDP for online visual feature learning," in *2017 IEEE International Symposium on Circuits and Systems (ISCAS)*, 2017.
- [9] R. Wang, T. J. Hamilton, J. Tapson, and A. van Schaik, "An FPGA design framework for large-scale spiking neural networks," in *2014 IEEE International Symposium on Circuits and Systems (ISCAS)*, 2014, pp. 457–460.
- [10] J. Pfister and W. Gerstner, "Triplets of spikes in a model of spike timingdependent plasticity," *The Journal of Neuroscience*, vol. 26, no. 38, pp. 9673–9682, 2006.
- [11] J. Gjorgjieva, C. Clopath, J. Audet, and J. Pfister, "A triplet spike-timing–dependent plasticity model generalizes the bienenstock–cooper–munro rule to higher-order spatiotemporal correlations," *Proceedings of the National Academy of Sciences*, vol. 108, no. 48, pp. 19 383–19 388, 2011.
- [12] E. Jokar and H. Soleimani, "Digital multiplierless realization of a calcium-based plasticity model," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 7, pp. 832–836, July 2017.
- [13] H. Wang, R. Gerkin, D. Nauen, and G. Bi, "Coactivation and timingdependent integration of synaptic potentiation and depression," *Nature Neuroscience*, vol. 8, no. 2, pp. 187–193, 2005.
- [14] P. J. Sjöström, E. A. Rancz, A. Roth, and M. Häusser, "Dendritic excitability and synaptic plasticity," *Physiological Reviews*, vol. 88, no. 2, pp. 769–840, 2008.
- [15] E. Bienenstock, L. Cooper, and P. Munro, "Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex," *The Journal of Neuroscience*, vol. 2, no. 1, p. 32, 1982.
- [16] A. Kirkwood, M. Rioult, and M. Bear, "Experience-dependent modification of synaptic plasticity in visual cortex," *Nature*, vol. 381, no. 6582, pp. 526–528, 1996.
- [17] M. R. Azghadi, S. Al-Sarawi, D. Abbott, and N. Iannella, "A neuromorphic VLSI design for spike timing and rate based synaptic plasticity," *Neural Networks*, vol. 45, pp. 70–82, 2013.
- [18] M. R. Azghadi, S. Al-Sarawi, N. Iannella, G. Indiveri, and D. Abbott, "Spike-based synaptic plasticity in silicon: Design, implementation, application, and challenges," *Proceedings of the IEEE*, 2014.
- [19] W. Cai, F. Ellinger, and R. Tetzlaff, "Neuronal synapse as a memristor: Modeling pair-and triplet-based STDP rule," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 9, no. 1, pp. 87–95, 2015.
- [20] M. R. Azghadi, B. Linares-Barranco, D. Abbott, and P. H. Leong, "A hybrid CMOS-memristor neuromorphic synapse," *IEEE transactions on biomedical circuits and systems*, vol. 11, no. 2, pp. 434–445, 2017.
- [21] R. Gopalakrishnan and A. Basu, "Triplet spike time-dependent plasticity in a floating-gate synapse," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 28, no. 4, pp. 778–790, April 2017.
- [22] Y. Babacan and F. Kaçar, "Memristor emulator with spike-timingdependent-plasticity," *AEU - International Journal of Electronics and Communications*, vol. 73, pp. 16 – 22, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1434841116313292
- [23] M. Nouri, M. Jalilian, M. Hayati, and D. Abbott, "A digital neuromorphic realization of pair-based and triplet-based spike-timing-dependent synaptic plasticity," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. PP, no. 99, pp. 1–1, 2017.
- [24] R. M. Wang, T. J. Hamilton, J. C. Tapson, and A. van Schaik, "A neuromorphic implementation of multiple spike-timing synaptic plasticity rules for large-scale neural networks," *Frontiers in neuroscience*, vol. 9, 2015.
- [25] A. Cassidy, A. G. Andreou, and J. Georgiou, "A combinational digital logic approach to STDP," in *The 2011 International Symposium on Circuits and Systems*, 2011, pp. 673–676.
- [26] M. R. Azghadi, S. Al-Sarawi, N. Iannella, and D. Abbott, "Tunable low energy, compact and high performance neuromorphic circuit for spikebased synaptic plasticity," *PLoS ONE*, vol. 9, no. 2, p. art. no. e88326, 2014.
- [27] A. S. Cassidy, J. Georgiou, and A. G. Andreou, "Design of silicon brains in the nano-cmos era: Spiking neurons, learning synapses and neural architecture optimization," *Neural Networks*, vol. 45, pp. 4 – 26, 2013, neuromorphic Engineering: From Neural Systems to Brain-Like Engineered Systems. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0893608013001597
- [28] C. Lammie, T. Hamilton, and M. R. Azghadi, "Unsupervised character recognition with a simplified fpga neuromorphic system," in *2018 IEEE International Symposium on Circuits and Systems (ISCAS)*, May 2018.
- [29] -, "Live demonstration: Unsupervised character recognition with a fpga neuromorphic system," in *2018 IEEE International Symposium on Circuits and Systems (ISCAS)*, May 2018.
- [30] G. Bi and M. Poo, "Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type," *The Journal of Neuroscience*, vol. 18, no. 24, pp. 10 464– 10 472, 1998.
- [31] P. Sjöström, G. Turrigiano, and S. Nelson, "Rate, timing, and cooperativity jointly determine cortical synaptic plasticity," *Neuron*, vol. 32, no. 6, pp. 1149–1164, 2001.
- [32] B. Belhadj, J. Tomas, Y. Bornat, A. Daouzli, O. Malot, and S. Renaud, "Digital mapping of a realistic spike timing plasticity model for real-time neural simulations," in *Proc. XXIV Conference on Design of Circuits and Integrated Systems (DCIS)*, 2009, pp. 326–331.
- [33] M. Graupner and N. Brunel, "Calcium-based plasticity model explains sensitivity of synaptic changes to spike pattern, rate, and dendritic location," *Proceedings of the National Academy of Sciences*, vol. 109, no. 10, pp. 3991–3996, 2012.

[34] F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha, "Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 34, no. 10, pp. 1537–1557, Oct 2015.



Corey Lammie (Student Member) started a bachelor of Electrical and Electronic Engineering, and a bachelor of Information Technology at James Cook University (JCU) in 2014, where he is currently a research assistant and full time student. Corey was the recipient of the 2017 Engineers Australia CN Barton Medal, awarded to the best undergraduate Engineering thesis at JCU. Corey's main research interests include brain-inspired computing, Spiking Neural Networks, and Digital Design.



Tara Julia Hamilton (S'97–M'00) received the B.E. degree (Hons. I) in Electrical Engineering and the B.Com. degree from the University of Sydney, Australia, in 2001, the M.Sc. degree in Biomedical Engineering from the University of New South Wales, Australia, in 2003, and the Ph.D. degree from the University of Sydney in 2009. She is currently an Associate Professor in the School of Engineering at Macquarie University, Australia. She has authored over 100 journal papers, conference papers, book chapters, and patents in integrated circuit design,

neuromorphic systems, and biomedical engineering. Her current research interests include neuromorphic engineering, mixed-signal integrated circuit design, and biomedical devices.



Andre van Schaik (M'00-SM'02-F'14) received the M.Sc. degree in electrical engineering from the University of Twente, Enschede, The Netherlands, in 1990 and the Ph.D. degree in electrical engineering from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, in 1998. He is a Professor of Bioelectronics and Neuroscience in the MARCS Institute at Western Sydney University, where he leads the Biomedical Engineering and Neuromorphic Systems program. His research focuses on three main areas: neuromorphic engi-

neering, bioelectronics, and neuroscience. He has authored more than 200 publications and is an inventor of more than 30 patents. He is a founder of three start-up companies.



Mostafa Rahimi Azghadi (S'07–M'14) completed his PhD in Electrical & Electronic Engineering at The University of Adelaide, Australia, earning the Doctoral Research Medal, as well as the Adelaide University Alumni Medal in 2014. From 2012-2014, he was a visiting PhD student in the Neuromorphic Cognitive System group, Institute of Neuroinformatics, University and Swiss Federal Institute of Technology (ETH) Zurich, Switzerland.

He is currently a lecturer at the College of Science and Engineering, James Cook University,

Townsville, Australia, where he researches neuromorphic engineering and brain-inspired architectures. Dr. Rahimi was a recipient of several national and international awards and scholarships such as Queensland Young Tall Poppy Science Award in 2017 and South Australia Science Excellence Awards in 2015. He serves as an associate editor of *Frontiers in Neuromorphic Engineering* and *IEEE Access* .