Lightweight error-tolerant edge detection using memristor-enabled stochastic computing

Stochastic number encoders

SNEs are the units encoding data into stochastic numbers. They have been conventionally realised with electronic circuits (e.g. those based on linear feedback shift registers)15,16,17. However, the circuits are typically on large scales and can lead to considerable computational cost (Supplementary Table 1). As the memristor technology advances, memristors show potential in developing SNEs.

Memristors tend to exhibit stochasticity in switching, originating from the underlying switching mechanisms. For example, due to the stochastic diffusion of the conductive elements, filamentary memristors switch with stochasticity18. This characteristic makes memristors promising for realising compact SNEs towards stochastic computing implementation19,20,21. Fig. 2a shows a compact circuit design of SNEs we propose, where each SNE consists of a memristor and a few comparators. By harnessing the switching stochasticity, the SNEs can encode the input data into stochastic numbers – when fed with pulsed inputs \({V}_{{{\rm{in}}}}\), the memristor is switched stochastically and the output carrying the stochasticity is then binarised by the comparators via the reference \({V}_{{{\rm{ref}}}}\) for stochastic number encoding with a probability. As such, the probability of the stochastic numbers is well-regulated by \({V}_{{{\rm{ref}}}}\). The SNEs via convenient circuit reconfiguration can also output stochastic numbers in varying positive and negative correlations, while two or more parallel SNEs can encode uncorrelated stochastic numbers.

Fig. 2: Stochastic number encoder (SNE).

a Schematic SNE, consisting of a memristor and a set of comparators. The output probability and correlation are regulated by both the input \({V}_{{{\rm{in}}}}\) and reference \({V}_{{{\rm{ref}}}}\). For negative correlation, a NOT gate is connected to the comparator, and the voltage supply of the NOT gate is synchronised with \({V}_{{{\rm{in}}}}\) to the memristors to avoid output during the pulse intervals. Independent parallel SNEs are integrated to yield uncorrelated stochastic numbers. See Supplementary Fig. 7 for the hardware realisation of SNE. b 12 × 12 memristor array in a crossbar configuration, with a fabrication yield of 100%. A typical device area is ~20 × 20 µm2. Scale bar – 1 cm and 20 µm. c Schematic and cross-sectional transmission electron microscopic image of a typical memristor. Scale bar – 50 nm. d Current–voltage output from a typical memristor, showing 1000-cycle stochastic yet stable switching with a ratio of ~105. \({V}_{{{\rm{hold}}}}\) and \({V}_{{{\rm{th}}}}\) denote the hold voltage and threshold voltage. e Distributions of the measured \({V}_{{{\rm{hold}}}}\) (0.23 \(\pm \,\)0.18 V) and \({V}_{{{\rm{th}}}}\) (0.78 \(\pm\) 0.39 V), along with the corresponding Gaussian fittings. f \({P}_{{{\rm{uncorrelated}}}}\)-\({V}_{{{\rm{in}}}}\) relation of a typical SNE in uncorrelation, fitting sigmoid function \({P}_{{{\rm{uncorrelated}}}}=1/(1+\exp [-38.9({V}_{{{\rm{in}}}}-1.34)])\). The error bar representing the standard deviation at each data point is obtained from 100 repeated samplings, where each sampling consists of 100 consecutive pulsed signal cycles. g \({P}_{{{\rm{positive}}}}\)-\({V}_{{{\rm{in}}}}\) and \({P}_{{{\rm{negative}}}}\)-\({V}_{{{\rm{in}}}}\) relations of the SNE in positive and negative correlations, fitting sigmoid function \({P}_{{{\rm{negative}}}}=1/(1+\exp [-63.1({V}_{{{\rm{in}}}}-0.19)])\) and \({P}_{{{\rm{positive}}}}=1-{P}_{{{\rm{negative}}}}\).

To implement the SNEs, we prepare filamentary memristors from solution-processed hexagonal boron nitride (hBN), following our previous report22. Briefly, hBN is produced by liquid-phase exfoliation (Supplementary Fig. 1) and used to fabricate memristors in a Pt/Au/hBN/SiOx/Ag configuration (Fig. 2b, c, and Supplementary Fig. 2). This solution-based fabrication approach is scalable with high yield. As demonstrated in Supplementary Fig. 3, the success rate of an array of 12 × 12 memristors is 100% in the sampling test. In typical switching (Fig. 2d), the memristor switches to a low resistive state at the threshold voltage \({V}_{{{\rm{th}}}}\) as the silver ions diffuse and form conductive filaments, and spontaneously resets to a high resistive state once the bias drops below the hold voltage \({V}_{{{\rm{hold}}}}\). See also Supplementary Fig. 4 for the ultrafast volatile switching (switching time ~50 ns, and relaxation time ~1200 ns) and the ultralow energy consumption (~33 fJ per bit). Due to the stochastic diffusion of the silver ions, the switching exhibits stochasticity in both \({V}_{{{\rm{th}}}}\) and \({V}_{{{\rm{hold}}}}\). The volatile switching eliminates the need for any peripheral circuits or excessive resetting for SNE implementation and operation, while the switching stochasticity can be harnessed for stochastic number encoding, leading to compact circuit designs of SNEs.

To assess the stochasticity, we conduct a full sweeping cycling test. The measured current–voltage output exhibits a cycle-to-cycle stochasticity in the switching (Fig. 2d), with \({V}_{{{\rm{th}}}}\) (0.78 \(\pm\) 0.39 V) and \({V}_{{{\rm{hold}}}}\) (0.23 \(\pm\) 0.18 V) well fitting Gaussian distributions (Fig. 2e). This shows a stabilised cycle-to-cycle stochasticity. We further test the device-to-device stochasticity, and prove a high device-to-device uniformity, with variations of 6.6% in \({V}_{{{\rm{hold}}}}\) and 7.4% in \({V}_{{{\rm{th}}}}\) (Supplementary Fig. 3). The uniformity, along with the high fabrication yield, allows for SNE implementation without excess device calibrations or circuit reconfigurations. To evaluate the stochasticity further, we perform the Ornstein-Uhlenbeck process modelling of the measured \({V}_{{{\rm{th}}}}\) (Supplementary Fig. 5). As demonstrated, \({V}_{{{\rm{th}}}}\) renders a mean-reverting behaviour with random fluctuations, well-fitting an Ornstein-Uhlenbeck process, i.e. a stochastic process in a dynamical system19. This indicates the high-level stability of stochasticity of our memristors in prolonged switching operations, critical for SNE operations. Indeed, the endurance test for over 5\(\times\)106 cycles proves a highly stable yet stochastic switching of our memristors (Supplementary Fig. 6), outperforming state-of-the-art reports23,24,25 and allowing for a reliable integration of our memristors into circuits for implementing stochastic computing.

We integrate the memristors into the circuits to develop the SNEs (Fig. 2a). When in operation, signals in both digit and analogue forms are first encoded into pulsed inputs, \({V}_{{{\rm{in}}}}\), and then processed into stochastic numbers via the SNEs, as regulated by \({V}_{{{\rm{ref}}}}\). See Supplementary Fig. 7 for the hardware realisation of the SNEs. Here we present in Fig. 2f the probability of uncorrelated stochastic number \({P}_{{{\rm{uncorrelated}}}}\) with respect to \({V}_{{{\rm{in}}}}\). As \({V}_{{{\rm{in}}}}\) increases, \({P}_{{{\rm{uncorrelated}}}}\) is increased, as the memristors tend to be switched on. This proves that the stochastic number occurring at a certain time is probabilistically 0 or 1, and \({P}_{{{\rm{uncorrelated}}}}\) is determined by \({V}_{{{\rm{in}}}}\). Particularly, \({P}_{{{\rm{uncorrelated}}}}\) follows a sigmoidal fitting \({P}_{{{\rm{uncorrelated}}}}=1/(1+\exp [-38.9({V}_{{{\rm{in}}}}-1.34)])\), proving that the SNEs can encode data into stochastic numbers with a well-regulated probability, thereby promising for stochastic computing implementation. In turn, the \({P}_{{{\rm{uncorrelated}}}}\)-\({V}_{{{\rm{in}}}}\) relation can be employed as a guidance to practically determine \({P}_{{{\rm{uncorrelated}}}}\) with \({V}_{{{\rm{in}}}}\). Similarly, we show in Fig. 2g the probabilities of positively and negatively correlated stochastic numbers \({P}_{{{\rm{positive}}}}\) and \({P}_{{{\rm{negative}}}}\) with respect to \({V}_{{{\rm{ref}}}}\). \({P}_{{{\rm{positive}}}}\) \(({P}_{{{\rm{negative}}}})\) decreases (increases) as \({V}_{{{\rm{ref}}}}\) increases in positive (negative) correlation, as \({V}_{{{\rm{ref}}}}\) serves as the threshold for binarization. Again, \({P}_{{{\rm{negative}}}}\) follows a sigmoidal fitting \({P}_{{{\rm{negative}}}}=1/(1+\exp [-63.1({V}_{{{\rm{ref}}}}-0.19)])\), and \({P}_{{{\rm{positive}}}}=1-{P}_{{{\rm{negative}}}}\). See Supplementary Fig. 8 for an example of positively correlated stochastic number encoding. Therefore, the memristor-enabled SNEs prove data encoding into stochastic numbers with regulated probabilities and correlations, facilitating subsequent stochastic logic development. Here we note the encoding frequency of \({V}_{{{\rm{in}}}}\) is typically configured as 100 kHz, far below the switching of the memristors (up to 50 ns, or equivalently 20 MHz) and the clock frequency of the digital circuits (~GHz). This ensures that the SNEs can be applied in the implementation of stochastic computing hardware and applications.

Stochastic logics

We integrate the SNEs with compact logic gates to build lightweight stochastic logics in different correlations. Using stochastic AND logic in uncorrelation as an example, we connect two parallel SNEs to a typical AND gate (Fig. 3a). In this design, the uncorrelated stochastic outputs encoded by the SNEs serve as the inputs to the AND gate, enabling stochastic multiplication of the stochastic outputs. When in operation, based on the demonstrated \({P}_{{{\rm{uncorrelated}}}}\)-\({V}_{{{\rm{in}}}}\) relation in Fig. 2f, the SNEs are fed with pulsed signal cycles of the corresponding \({V}_{{{\rm{in}}}}\) to encode uncorrelated stochastic numbers, denoted as \(a\) and \(b\), with probabilities of \(P\left(a\right)\) and \(P\left(b\right)\), respectively. Then, \(a\) and \(b\) are bit-by-bit fed into the AND gate, yielding a stochastic number output, denoted as \(c\), with a probability of \(P\left(c\right)\). We show in Fig. 3a the corresponding stochastic numbers and probabilities from the experimental hardware test. The statistical relation between the probabilities, i.e., \(P\left(a\right)P\left(b\right)\approx P\left(c\right)\), proves that the stochastic AND logic functions as a stochastic multiplier for one-step multiplication of stochastic numbers. Importantly, compared to the binary multiplier in Fig. 1a, this stochastic multiplier significantly simplifies circuit design and reduces the computational cost. Besides, the SNEs can be configured to exhibit positive (negative) correlation, enabling positively (negatively) correlated stochastic AND logic operations (Fig. 3a). The output probability \(P\left(c\right)\) in the correlated cases is determined by the minimum (maximum) value of \(P(a)\) and \(P(b)\) instead. Similarly, we build stochastic OR logic in all three correlations, and it performs different logic operations as designed (Fig. 3b).

Fig. 3: Stochastic logics.

Schematic stochastic logics in uncorrelation implemented with two independent SNEs and a AND, b OR, c XOR, and d MUX, and the corresponding circuit tests of the stochastic logic operations. The stochastic logics can be reconfigured in the positive and negative correlations to yield the stochastic logic operations as respectively demonstrated. For stochastic MUX, the frequency of the select \(s\) is half of that of the inputs to ensure that both the inputs participate in the logic operations. \(P({{\rm{red\; sq}}}{{\rm{uare}}})\) represents the probability of the 1 s in the sequences, i.e. the value of the stochastic numbers. The outputs of stochastic logics in uncorrelation, positive correlation, and negative correlation are consistent with the statistical formulas in Supplementary Table 2.

Edge detection involves matrix multiplication and gradient computation that normally require large-scale logic circuits and considerable computational operations3. In contrast, it is possible to perform absolute-valued subtraction for gradient computation with minimal computational cost using the stochastic logics. Here we propose in Fig. 3c the design of a stochastic XOR logic, consisting of only an SNE and an XOR gate, to perform the function. Specifically, the SNE is fed with pulsed signal cycles of the corresponding \({V}_{{{\rm{in}}}}\) according to the \({P}_{{{\rm{positive}}}}\)-\({V}_{{{\rm{in}}}}\) relation in Fig. 2g to encode positively correlated stochastic numbers, denoted as \(a\) and \(b\), with respective probabilities of \(P\left(a\right)\) and \(P\left(b\right)\). Then, \(a\) and \(b\) serve as the inputs to the XOR gate, and the resultant \(P\left(c\right)\) satisfies \(P(c)\approx \left|P\left(a\right)-P\left(b\right)\right|\). In this case, positively correlated stochastic numbers mean a maximum overlap of 0 s and 1 s, such that the probability for two 1 s or two 0 s is \(\min (P\left(a\right),P\left(b\right))\) or \(\min \left(1-P\left(a\right),1-P\left(b\right)\right)\). Assuming \(P\left(a\right) \, > \, P\left(b\right)\), the stochastic XOR logic outputs \(P\left(c\right)=1-P\left(b\right)-\left(1-P\left(a\right)\right)=P\left(a\right)-P(b)\), and vice versa. This proves the capability of the stochastic XOR logic to perform the absolute-valued subtraction function in only one step. Besides gradient computation, denoising, smoothing, and down-sampling are also essential matrix operations in edge detection. A general approach in performing these functions is to use mean convolutional filters to process the pixels. Here we propose in Fig. 3d (see also Supplementary Fig. 9) the design of stochastic MUX logic to realise a mean convolutional filter.

We present in Supplementary Fig. 10 the pairwise correlations between the inputs of the above stochastic logics in the uncorrelated, positively correlated, and negatively correlated conditions, and summarise the statistical relations between \(P(a)\), \(P(b)\) and \(P(c)\) in Supplementary Table 2. The pairwise correlations and the statistical relations confirm that our stochastic logics can work in the desired correlation conditions and conduct the corresponding logic operations for performing edge detection tasks. Pearson correlation is adopted here to quantify the correlations. Note that in the above demonstrations, the stochastic numbers are encoded in 100-bit for illustrative purposes. The bit length can be adjusted to accommodate the different computational precision requirements, given the trade-off between the computational cost and precision.

Stochastic edge detection

As discussed, edge detection in the conventional binary computing approaches relies on the use of large-scale logic filters, such as Roberts cross and Sobel operators, leading to significant hardware and computational cost as well as latency3. In this context, we propose a hardware stochastic Roberts cross operator using the stochastic logics to address the challenges. Briefly, two SNEs, two XOR gates, and one MUX are integrated to build the stochastic Roberts cross operator. See Fig. 4 and Supplementary Fig. 11 for the design and hardware realisation of the operator.

Fig. 4: Stochastic edge detection.

a The example image, i.e. the first frame of the The Horse in Motion, for edge detection demonstration. The region as marked is used to illustrate the edge detection process with the operator. The pixels in 0-255 grayscale are encoded into 100-bits. b Schematic stochastic Roberts cross operator, consisting of two SNEs, two XORs, and one MUX. See Supplementary Fig. 11 for the hardware realisation of the operator. c Gradient map yielded from scanning with the operator, showing successful edge detection. d Edge detection of the first frame with the operator, and e the corresponding structural similarity index measure (SSIM) maps and peak signal-to-noise ratios (PSNR). The pixels are encoded into 4, 16, 64, and 256-bits as the inputs. The SSIM and PSNR show that the operator using more bits gives higher edge detection precision. For comparison, the edge detection performed using the standard algorithmic method is presented as the ground truth.

We apply the stochastic Roberts cross operator in image processing to demonstrate the feasibility of stochastic edge detection. The image for illustrative purposes is captured from the artwork The Horse in Motion (Fig. 4a). Here each pixel in 0–255 grayscale is encoded in 100-bits. As shown in Fig. 4b, the stochastic Roberts cross operator is used to scan over the pixel map to yield a gradient map \((i,j)\) reconstructed from the output stochastic numbers. Specifically, one SNE and one XOR gate work consecutively to yield the \(x\) component of the output gradient \((i,j)\), denoted as \({|Gx|}\), while the other SNE and XOR gate yield the \(y\) component, denoted as \({|Gy|}\). The gradient \(G\left(i,j\right)\) is obtained by averaging \(\left|{Gx}\right|\) and \(\left|{Gy}\right|\) using the MUX logic, i.e. \(G\left(i,j\right)=0.5(\left|{Gx}\right|+\left|{Gy}\right|)\). The coefficient 0.5 scales the gradient within the original grayscale. As such, as demonstrated in Fig. 4c, scanning with stochastic Roberts cross operator over the marked image region of 5 × 5 pixels in Fig. 4a yields a 4 × 4 pixeled gradient map that evidently demonstrates successful edge detection, as outlined by the red dashed lines. This confirms the feasibility of the stochastic Roberts cross operator in performing edge detection.

As discussed, the bit length of the stochastic numbers can govern the computational precision. To investigate the impact of the bit length on the stochastic Roberts cross operator for edge detection, we encode the pixels of the image frame in Fig. 4a in 4, 16, 64, and 256-bits, respectively. The edge detection results (Fig. 4d) prove that the edges are successfully detected and recognised in all cases. However, as observed, a longer bit length yields better edge detection. To quantitatively evaluate the performance, we compare the edge detection results with those obtained from the standard algorithmic method. We consider the algorithmic result as the ground truth, and assess the fidelity of the stochastic edge detection using two metrics: the structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR). Here we visualise the loss in performance by the SSIM maps (Fig. 4e), where a brighter pixel indicates a higher similarity to the ground truth, i.e. a better edge detection performance. This thus reveals and confirms that a longer bit length indeed leads to an improved edge detection performance. For instance, the 256-bit achieves a near-ideal performance, with SSIM > 0.95 and PSNR > 30 dB. In contrast, the 4-bit exhibits relatively poor performance, as the limited precision in the 4-bit length fails to accurately encode the 0–255 grayscale. However, as evident in Fig. 4d, e, the 4-bit still successfully detects the edges.

We further investigate the error-tolerance capacity of the stochastic cross operator against bit-flips. Specifically, as illustrated in Fig. 5a, bit-flips from 5% to 50% are injected into the stochastic numbers (in 256-bit encoding). Again, we adopt the SSIM and PSNR metrics to evaluate the edge detection performance. As evident in Fig. 5b, the stochastic Roberts cross operator demonstrates successful edge detection in all levels of bit-flip injections. Notably, the operator even retains an SSIM of >0.95 and a PSNR of >30 dB at a 50% bit-flip injection. In comparison, the performance from the standard algorithmic method substantially degrades at a bit-flip injection of only 5%, with the edges hardly recognised and the SSIM and PSNR significantly decreased (Fig. 5c, d, e). See Supplementary Fig. 12 for the SSIM maps from the standard algorithmic method, and the error-tolerance results at more bit-flip injection levels. The superior error-tolerance capacity of the stochastic Roberts cross operator originates from the fact that each bit in the stochastic numbers carries an equal weight, and thus the impact of pairs of bit-flips can be cancelled.

Fig. 5: Error-tolerance test.

a Error-tolerance test of the stochastic cross operator, showing bit-flips are injected into the original stochastic numbers for the tests. b Stochastic and (c) standard edge detection results and the corresponding structural similarity index measure (SSIM) maps of the first frame with bit-flip injection at a ratio of 0%, 5%, 10%, 20%, and 50%. For the stochastic edge detection, the high SSIM (>90%) and peak signal-to-noise ratios (PSNR) (>30 dB) prove that the bit-flip injection does not degrade the edge detection performance. In contrast, a low level of bit-flip injection significantly degrades the performance of the standard algorithmic edge detection. See Supplementary Fig. 12 for the SSIM map of the standard edge detection, and the error-tolerance results at more bit-flip injection levels. Performance comparison in d SSIM and e PSNR between the stochastic and standard edge detection results.

Hardware and computational cost

Exploiting the volatile switching and stochasticity of our hBN memristors, circuits to implement stochastic computing are highly compact. For instance, the SNEs require down to three electrical components to realise, outperforming those based on the conventional electronic circuits and the other memristors (Supplementary Table 1). The compact circuits can not only lead to a much lower hardware cost but also a much less computational cost. To discuss this further, here we compare the energy consumption of our stochastic computing approach with the counterpart in the binary computing domain. Note that the comparison is conducted under the same computational precision, i.e. each n-bit binary number is represented by 2n-bit stochastic numbers.

For our stochastic edge detection operators, the energy consumption is mainly contributed to the memristors and the remaining comparators and logic gates. In terms of the memristors, the switching energy is estimated as ~33 fJ per bit (Supplementary Fig. 4). As such, encoding a 2n-bit stochastic number consumes ~33\(\times\)2n fJ. Specifically, this estimation assumes the worst-case scenario, where a sufficiently large \({V}_{{{\rm{in}}}}\) of 2 V is adopted. In fact, a \({V}_{{{\rm{in}}}}\) of 1.1–1.5 V is adequate to perform the stochastic number encoding (Fig. 2f). In terms of the remaining circuits, here we estimate the energy consumption based on the required counts of logic gates and clock cycles for the stochastic edge detection operators, following \(W=k\times {T}_{{{\rm{c}}}}\times P\), where \({T}_{{{\rm{c}}}}\) is the clock cycle, \(k\) is the required counts of \({T}_{{{\rm{c}}}}\), and \(P\) is the total power of the remaining electrical components, including the comparators and logic gates. For the stochastic Roberts cross operator (Fig. 6a), 2n-bit stochastic numbers are processed serially. Therefore, it requires \(({2}^{n}+1){T}_{{{\rm{c}}}}\) and thus, \(({2}^{n}+1){T}_{{{\rm{c}}}}{P}_{{{\rm{stochastic}}}}\) energy. Given the input encoding frequency of 100 kHz, \({T}_{{{\rm{c}}}}\) is 10 µs. To calculate \({P}_{{{\rm{stochastic}}}}\), we refer to the product power datasheet of the logic gate chips used in our work (Supplementary Table 3).

The energy consumption to perform the edge detection in the binary computing domain is incurred by the conventional edge detection operators. Here we estimate the energy consumption of the conventional Roberts cross operator (Fig. 6b). As illustrated, the operator consists of two n-bit subtractors and one n-bit adder. Each n-bit subtractor and adder can be built using n full adders (FA) and several XOR gates (Fig. 6c). Considering parallel computation, each n-bit subtractor and adder requires \((2n+3){T}_{{{\rm{c}}}}\). Therefore, assuming two subtractors run in parallel, the conventional Roberts cross operator requires \(2(2n+3){T}_{{{\rm{c}}}}\) and thus, \(2(2n+3){T}_{{{\rm{c}}}}{P}_{{{\rm{conventional}}}}\) energy. Similarly, we assume \({T}_{{{\rm{c}}}}=10\) µs and refer to Supplementary Table 3 to calculate \({P}_{{{\rm{conventional}}}}\). Here we note that binary computing does not encode the n-bit binary number input into stochastic numbers, and additional circuits and computational operations in practical computing applications are often necessitated to deal with errors.

Fig. 6: Performance and energy consumption comparison between the stochastic and conventional Roberts cross operators.

Circuit designs of a our stochastic and b the conventional Roberts cross operator. The 2n-bit inputs to our stochastic Roberts operator are stochastic numbers. c Circuit design of a n-bit subtractor and adder in (b). Relation of required d clock cycles and e energy consumption with respect to the bit length of the binary numbers and stochastic numbers. f Energy consumption ratio in (e). When the length of the stochastic numbers is within 2048 bits, the energy consumption of the stochastic Roberts cross operator is lower. All comparisons are conducted at an input encoding frequency of 100 kHz.

We present in Fig. 6d–f the comparison between our stochastic and the conventional Roberts cross operators for edge detection. As shown in Fig. 6d, the stochastic operator requires fewer clock cycles to complete edge detection with the stochastic number inputs 6e, the stochastic operator can maintain a lower energy consumption within 2048-bits. To provide a more intuitive representation of the energy efficiency, we plot the energy consumption ratio of the stochastic operator to the conventional counterpart in Fig. 6f. The results show that with 4-bit stochastic number inputs, the stochastic operator can consume ~95% less energy. Similarly, with 64-bit stochastic number inputs, it can still consume ~90% less energy.

Lightweight error-tolerant edge detection using memristor-enabled stochastic computing

Tags: