{"id":56148,"date":"2025-04-28T00:57:26","date_gmt":"2025-04-28T00:57:26","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/56148\/"},"modified":"2025-04-28T00:57:26","modified_gmt":"2025-04-28T00:57:26","slug":"a-bioinspired-in-materia-analog-photoelectronic-reservoir-computing-for-human-action-processing","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/56148\/","title":{"rendered":"A bioinspired in-materia analog photoelectronic reservoir computing for human action processing"},"content":{"rendered":"<p>An analog photoelectronic reservoir computing system<\/p>\n<p>Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1a<\/a> shows the dynamic vision processing in the human visual system. Visual inputs are initially detected by the retina located at the bottom of the eyeballs, which would trigger the action potential by ganglion cells through the transmission of bipolar cells. The encoded information is then transmitted to the thalamus<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Demb, J. B. &amp; Singer, J. H. Functional circuitry of the retina. Annu Rev. Vis. Sci. 1, 263&#x2013;289 (2015).\" href=\"#ref-CR15\" id=\"ref-link-section-d388759675e698\">15<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Grill-Spector, K. &amp; Malach, R. The human visual cortex. Annu Rev. Neurosci. 27, 649&#x2013;677 (2004).\" href=\"#ref-CR16\" id=\"ref-link-section-d388759675e698_1\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 17\" title=\"Balasubramanian, V. &amp; Sterling, P. Receptive fields and functional architecture in the retina. J. Physiol. 587, 2753&#x2013;2767 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR17\" id=\"ref-link-section-d388759675e701\" target=\"_blank\" rel=\"noopener\">17<\/a>. Cells in thalamus with receptive fields are layered and in the form of populations, enabling the feature extraction of visual information. Features like edge, angle, orientation, and direction of motion can be extracted before entering the visual cortex. Through spike precoding by multilevel neuronal populations with receptive fields within the retina, intellectual activities such as recognition and decision-making can be finally realized in the human brain. The powerful and energy-efficient human visual system for dynamic vision processing has inspired us to conceptualize a neuromorphic visual system featuring receptive fields and a spike encoding scheme. Hence, a bioinspired in-materia analog photoelectronic reservoir computing (Alpho-RC) system was built for dynamic vision processing.<\/p>\n<p><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1: A bioinspired in-materia analog photoelectronic reservoir computing (Alpho-RC) system.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-56899-3\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41467_2025_56899_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"530\"\/><\/a><\/p>\n<p><b>a<\/b> Conceptual diagram of dynamic vision processing in human visual system. The walking man and eye are reproduced with permission from Pixabay. <b>b<\/b> Schematic diagram of calculation processes in Alpho-RC system. <b>c<\/b> Photograph of EDL-coupled IGZO photoelectronic transistor array and schematic diagram of IGZO transistor structure. <b>d<\/b> The response and relaxation behaviors of EDL-coupled IGZO photoelectronic transistor in response to a light pulse (38\u2009ms, 2.95 nW\u00b7\u03bcm\u20132) under different gate bias conditions (\u22120.4, \u22120.2, and 0\u2009V, respectively). <b>e<\/b> A micrograph image of a 32\u2009\u00d7\u200932 1T1R array (scale bar: 200\u2009\u03bcm). <b>f<\/b> 100 cycles of I-V sweeps of a Pt\/Ta\/TaOx\/Pt memristor in the array.<\/p>\n<p>Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1b<\/a> shows the schematic diagram of such Alpho-RC system. The data modes of human action are mainly divided into visual modes (RGB, Depth, Skeleton and so on) and non-visual modes (Inertial acceleration data, Wireless-transmission signal and so on)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Ji, S., Xu, W., Yang, M. &amp; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221&#x2013;231 (2013).\" href=\"#ref-CR18\" id=\"ref-link-section-d388759675e751\">18<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, C., Liu, K. &amp; Kehtarnavaz, N. Real-time human action recognition based on depth motion maps. J. Real.-Time Image Process. 12, 155&#x2013;163 (2013).\" href=\"#ref-CR19\" id=\"ref-link-section-d388759675e751_1\">19<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Song, S., Lan, C., Xing, J., Zeng, W. &amp; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proc. AAAI Conference on Artificial Intelligence Vol. 31 (AAAI Press, 2017).\" href=\"#ref-CR20\" id=\"ref-link-section-d388759675e751_2\">20<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Bruno, B., Mastrogiovanni, F., Sgorbissa, A., Vernazza, T. &amp; Zaccaria, R. Analysis of human behavior recognition algorithms based on acceleration data. In 2013 IEEE International Conference on Robotics and Automation 1602&#x2013;1607 (IEEE, 2013).\" href=\"#ref-CR21\" id=\"ref-link-section-d388759675e751_3\">21<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 22\" title=\"Wang, W., Liu, A. X., Shahzad, M., Ling, K. &amp; Lu, S. Understanding and modeling of wifi signal based human activity recognition. In Proc. 21st Annual International Conference on Mobile Computing and Networking 65&#x2013;1676 (Paris, France, 2015).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR22\" id=\"ref-link-section-d388759675e754\" target=\"_blank\" rel=\"noopener\">22<\/a>. Among them, the visual mode is more in line with human intuitive feelings. Compared with other visual modalities, a 3D skeleton frame as a topological form of joints and skeletons abstracts the human body into a three-dimensional coordinate space, which tends to be more robust in complex environments. Therefore, 3D skeleton-based human action frames are selected as visual input in the system.<\/p>\n<p>The core elements of spike encoding scheme are neurons with GRF without additional filtering process. In biology, stimuli carried essential features are selectively responded by neurons with overlapped and graded receptive fields, and the varied levels of features thus can be extracted in different neural pathways<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 23\" title=\"Sharpee, T. O. Computational identification of receptive fields. Annu. Rev. Neurosci. 36, 103&#x2013;120 (2013).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR23\" id=\"ref-link-section-d388759675e761\" target=\"_blank\" rel=\"noopener\">23<\/a>. Stimulus can significantly fire at the neurons with proper receptive field, and might be silence in other neurons. Such biological encoding mechanisms enable only a small size of data being processed by the central nervous system. In contrast, running machine learning algorithms in a digit system always requires complex feature extraction steps, which lead to a huge computational burden<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 24\" title=\"Sze, V., Chen, Y.-H., Emer, J., Suleiman, A. &amp; Zhang, Z. Hardware for machine learning: challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC) 1&#x2013;8 (IEEE, 2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR24\" id=\"ref-link-section-d388759675e765\" target=\"_blank\" rel=\"noopener\">24<\/a>. In Alpho-RC system, several GRF neurons are defined as one population encoder, which is corresponding to population transistor consisting of electric-double-layer (EDL) coupled IGZO photoelectronic transistors. EDL coupled IGZO transistors have been demonstrated with photoelectronic synaptic plasticity by gate voltage tunability based on persistent photoconductivity (PPC) effects and proton relaxation dynamics<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Yang, Y. et al. Reservoir computing based on electric-double-layer coupled InGaZnO artificial synapse. Appl. Phys. Lett. 122, 043508 (2023).\" href=\"#ref-CR25\" id=\"ref-link-section-d388759675e769\">25<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Lee, M. et al. Brain-inspired photonic neuromorphic devices using photodynamic amorphous oxide semiconductors and their persistent photoconductivity. Adv. Mater. 29, 1700951 (2017).\" href=\"#ref-CR26\" id=\"ref-link-section-d388759675e769_1\">26<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 27\" title=\"Ke, S. et al. Indium-gallium-zinc-oxide based photoelectric neuromorphic transistors for modulable photoexcited corneal nociceptor emulation. Adv. Electron. Mater. 2100487, 1&#x2013;9 (2021).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR27\" id=\"ref-link-section-d388759675e772\" target=\"_blank\" rel=\"noopener\">27<\/a>. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1c<\/a> shows the photograph of photoelectronic transistor array and schematic diagram of IGZO transistor structure. Details on transistor array fabrication, basic properties and hardware operation system can be found in \u201cMethods\u201d section and Supplementary Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">1<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">9<\/a>. The response and relaxation behaviors of the EDL-coupled IGZO photoelectronic transistor in response to a light pulse (38\u2009ms, 2.95 nW\u00b7\u03bcm\u20132) under different gate bias conditions (\u20130.4, \u20130.2, and 0\u2009V, respectively) has shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1d<\/a>. The electrons generated by PPC effect would increase the channel conductance transiently. When a light stimulation ends, accumulated electrons would recombine gradually with the triggered oxygen vacancies within a certain time. As a consequence, the channel conductance gradually decreases, exhibiting relaxation characteristics. A negative gate voltage would decrease the channel conductance through the EDL coupling, leading to a decreased decay time. The decrement is positively related to the absolute value of the negative gate voltage (detailed information can be found in Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">1<\/a>). By applying varied gate biases that are corresponded to GRF neurons with different distribution centers, current states of the transistor are tuned in response to the encoding pulses from GRF neurons, mapping the visual input information to the high-dimensional space in turn.<\/p>\n<p>Physical reservoir computing (PRC) based on material intrinsic dynamics has been demonstrated with excellent time signal processing capabilities, which is selected as the calculation method in Alpho-RC system<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Zhong, Y. et al. Dynamic memristor-based reservoir computing for high-efficiency temporal signal processing. Nat. Commun. 12, 408 (2021).\" href=\"#ref-CR28\" id=\"ref-link-section-d388759675e797\">28<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Liang, X. et al. Rotating neurons for all-analog implementation of cyclic reservoir computing. Nat. Commun. 13, 1549 (2022).\" href=\"#ref-CR29\" id=\"ref-link-section-d388759675e797_1\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428&#x2013;431 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR30\" id=\"ref-link-section-d388759675e800\" target=\"_blank\" rel=\"noopener\">30<\/a>. Obtained response currents by population transistors are used as reservoir states for training output weights. Compared with the processing path by single device in traditional PRC systems, such receptive field-enhanced population encoding mechanism increases the information processing capacity by providing multiple parallel information processing ports. Meanwhile, no additional feature extraction is required to apply on skeleton sequences, through such bioinspired in-materia reservoir computing framework. However, the feature extraction is still required for skeleton frames in vast number of reported algorithms. For example, while facing the skeleton-based action recognition tasks, feature set including spatial-domain-feature (relative position, distances between joints, distances between joints and lines) and temporal-domain-feature (joint distances map, joint trajectories map) needs to be extracted prior to the training of neural networks<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 31\" title=\"Li, Chuankun. et al. Skeleton-based action recognition using LSTM and CNN. In 2017 IEEE International conference on multimedia &amp; expo workshops (ICMEW) 585-590 (IEEE, 2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR31\" id=\"ref-link-section-d388759675e804\" target=\"_blank\" rel=\"noopener\">31<\/a>.<\/p>\n<p>The output layer was also implemented in a 32\u2009\u00d7\u200932 1T1R array for fully hardware implementation of the bioinspired in-materia reservoir computing system, as shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1e<\/a> (the detailed device structure and fabrication can be found in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">10<\/a> and \u201cMethods\u201d section). TaOx-based memristors were integrated on top of a foundry-made complementary-metal-oxide-semiconductor (CMOS) chip with silicon-based selecting transistors and fan-outs (the detailed characterizations and the setup of hardware operation system can be found in Supplementary Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">11<\/a>\u2013<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">14<\/a>). The memristors demonstrated good switching uniformity and stability for reliable programming of offline-trained weights for human action processing (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig1\" target=\"_blank\" rel=\"noopener\">1f<\/a>). Our analog photoelectronic reservoir computing system is built based on the IGZO transistors for reservoir states collection and the TaOx-based memristors for matrix operations.<\/p>\n<p>The bioinspired in-materia reservoir computing framework<\/p>\n<p>The dimensionality enhancement process of the photoelectronic reservoir in Alpho-RC system contains time multiplexing and GRF encoding. Time multiplexing with mask matrixes is a common method in PRC systems, which solves the difficulty of interconnecting physical device nodes based on the concept of delayed nodes<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 30\" title=\"Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428&#x2013;431 (2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR30\" id=\"ref-link-section-d388759675e838\" target=\"_blank\" rel=\"noopener\">30<\/a>. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2a<\/a> shows the calculation flow of time multiplexing on original action skeleton information. Firstly, K body joint points with coordinate information (<b>x<\/b><b>i<\/b>, <b>y<\/b><b>i<\/b>, <b>z<\/b><b>i<\/b>) are collected from a subject, which maps the human body\u2019s action posture into a three-dimensional coordinate space. Therefore, temporal changes of the K joint coordinate positions from frame-to-frame can describe the dynamic change of an action. 3D joint coordinates of each frame are written as one-dimensional sequence vector [<b>x<\/b><b>1<\/b> <b>y<\/b><b>1<\/b> <b>z<\/b><b>1<\/b> <b>x<\/b><b>2<\/b> <b>y<\/b><b>2<\/b> <b>z<\/b><b>2<\/b> \u2026\u2026 <b>x<\/b><b>K<\/b> <b>y<\/b><b>K<\/b> <b>z<\/b><b>K<\/b>] or [<b>J<\/b><b>1<\/b> <b>J<\/b><b>2<\/b> \u2026\u2026 <b>J<\/b><b>K<\/b>] according to the order of the skeleton joints. In this case, the dynamic coordinates of a complete the action (Act, with a size of 3\u00b7K\u00d7N) can be represented by:<\/p>\n<p>$${Act}=[ \\, \\, {{{{\\boldsymbol{J}}}}}_{{{{\\bf{11}}}}}\\,{{{{\\boldsymbol{J}}}}}_{{{{\\bf{12}}}}}\\ldots {{{{\\boldsymbol{J}}}}}_{{{{\\bf{1}}}}{{{\\boldsymbol{k}}}}};\\ldots \\ldots ;{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{n}}}}{{{\\bf{1}}}}} \\, {{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{n}}}}{{{\\bf{2}}}}}\\ldots {{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{nk}}}}};\\ldots \\ldots ;{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{N}}}}{{{\\bf{1}}}}} \\, {{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{N}}}}{{{\\bf{2}}}}}\\ldots {{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{Nk}}}}}]$$<\/p>\n<p>\n                    (1)\n                <\/p>\n<p>where n (n\u2009=\u20091, 2, \u2026, N) is the frame order of action and N is the total number of frames consisting of an action. The dynamic change of the 3D joint coordinates (\u0394Act, with a size of 3\u00b7K \u00d7 N) can be represented by the subtraction of the action matrix and the first sequence of the action:<\/p>\n<p>$$\\varDelta {Act}=[{{{\\bf{0}}}}\\,{{{\\bf{0}}}}\\ldots {{{\\bf{0}}}};\\ldots \\ldots ;{{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{n}}}}{{{\\bf{1}}}}} \\, {{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{n}}}}{{{\\bf{2}}}}}\\ldots {{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{nk}}}}};\\ldots \\ldots ;{{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{N}}}}{{{\\bf{1}}}}} \\, {{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{N}}}}{{{\\bf{2}}}}}\\ldots {{{\\boldsymbol{\\Delta }}}}{{{{\\boldsymbol{J}}}}}_{{{{\\boldsymbol{Nk}}}}}]$$<\/p>\n<p>\n                    (2)\n                <\/p>\n<p>where \u0394Jnj is the change of coordinates (\u0394Jnj=Jnj-J1j), and j is the number of joint (j\u2009=\u20091, 2,\u2026, K). Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2b<\/a> shows examples of \u0394Actn (\u0394Actn\u2009=\u2009[<b>\u0394J<\/b><b>n<\/b><b>1<\/b><b>\u0394J<\/b><b>n<\/b><b>2<\/b>\u2026 <b>\u0394J<\/b><b>nK<\/b>]) corresponding to the action \u201copenarm\u201d shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2a<\/a>. After that, \u0394Act is mapped into high dimensional space by multiplying mask matrixes. The mask matrix is a randomly generated two-dimensional matrix consisting of 1 and -1, with a size of 3\u00b7K \u00d7 M (M is the mask length). At nth frame of the action, the \u0394Actn (1 \u00d7 3\u00b7K) is multiplied by the mask matrix, generating virtual nodes (Vn\u2009=\u2009[<b>V<\/b><b>n<\/b><b>1<\/b> <b>V<\/b><b>n<\/b><b>2<\/b> \u2026 <b>V<\/b><b>nM<\/b>]) with a size of 1\u00d7M. In this case, the \u0394Act is mapped to a higher dimensional matrix noted as PreIn matrix (1\u00d7M\u00b7N):<\/p>\n<p>$${PreIn}=[{{{{\\boldsymbol{V}}}}}_{{{{\\bf{1}}}}}{{{{\\boldsymbol{V}}}}}_{{{{\\bf{2}}}}}\\ldots {{{{\\boldsymbol{V}}}}}_{{{{\\boldsymbol{N}}}}}]$$<\/p>\n<p>\n                    (3)\n                <\/p>\n<p><b id=\"Fig2\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 2: The Gaussian receptive field based encoding and photoelectronic reservoir computing.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-56899-3\/figures\/2\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig2\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41467_2025_56899_Fig2_HTML.png\" alt=\"figure 2\" loading=\"lazy\" width=\"685\" height=\"507\"\/><\/a><\/p>\n<p><b>a<\/b> The calculation flow of time multiplexing on original action skeleton information in photoelectronic reservoir computing. <b>b<\/b> Examples of \u0394Actn corresponding to the action \u201copenarm\u201d in (<b>a<\/b>). <b>c<\/b> An example of PreIn corresponding to \u0394Actn in (<b>b<\/b>) with a mask matrix (M\u2009=\u200910). <b>d<\/b> The converted light pulse trains from spike trains of Neuron #1, #2, and #3 and are applied on one population transistor consists of three IGZO-based photoelectronic synaptic transistors with different modulated biases. <b>e<\/b> The response current of transistor with gate bias of \u22120.4\u2009V to four consecutive light pulses (38\u2009ms ON and 38\u2009ms OFF). <b>f<\/b> MC values with different numbers (1, 2, 3, and 4) of Gaussian receptive fields. The error bars represent the standard deviation of five independent tests.<\/p>\n<p>Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2c<\/a> shows one example of PreIn corresponding to \u0394Actn in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2b<\/a> with a mask matrix (M\u2009=\u200910).<\/p>\n<p>Then, the PreIn corresponding to a completed action is encoded to spikes through GRF neurons. The Gaussian receptive fields of the Alpho-RC system are set based on the intensity of PreIn, which is inspired from encoding process of biological systems and spiking neural networks (SNN)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Bohte, S. M. et al. Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks. IEEE Trans. neural Netw. 13, 426&#x2013;435 (2002).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR32\" id=\"ref-link-section-d388759675e1679\" target=\"_blank\" rel=\"noopener\">32<\/a>. For every input, the output through ith receptive field yields the Gaussian distribution (Gi) with a mean value of \u03bci and deviation of \u03c3i. So, the output (Gi(A)) is dependent on the distance between input intensity (A) and \u03bci:<\/p>\n<p>$${G}_{i}(A)=\\frac{1}{\\sqrt{2\\pi {\\sigma }_{i}^{2}}}\\exp \\left(-\\frac{{\\left(A-{\\mu }_{i}\\right)}^{2}}{2{\\sigma }_{i}^{2}}\\right)$$<\/p>\n<p>\n                    (4)\n                <\/p>\n<p>In this case, each input can trigger an output through all receptive fields, while the outputs are different. The GRF neuron would trigger a firing spike only if the output from the corresponding receptive field is the largest. Here, several neurons (Neuron #1, #2, \u2026 #i) corresponding to receptive fields (GRF #1, #2, \u2026 #i) are defined as one population encoder which is used for spike encoding. The \u03c3 of neurons is uniformly set to the standard value of 1, and the centers (\u03bc1,\u03bc2, \u2026\u03bci) of GRFs are set by:<\/p>\n<p>$${{{\\mu }}}_{i}={A}_{\\min }+\\frac{{A}_{\\max }-{A}_{\\min }}{m+1}\\times i$$<\/p>\n<p>\n                    (5)\n                <\/p>\n<p>where Amin and Amax represent the minimum value and the maximum value of input intensity (A), m is the number of GRF neurons. To facilitate data processing, we perform a normalization process, so that the amplitude of PreIn is limited between -1 and 1. Accordingly, Gaussian centers are set by:<\/p>\n<p>$${{{\\mu }}}_{i}=\\frac{2}{m+1}\\times i-1$$<\/p>\n<p>\n                    (6)\n                <\/p>\n<p>The normalized PreIn of a completed action is then converted into spike trains (Spike-input) through the aforementioned GRF-based population encoding scheme. The spike trains from Neuron #1, #2, \u2026 #i are converted into light pulse trains (405\u2009nm, 2.95 nW\u00b7\u03bcm\u20132) and applied on one population transistor consisting of EDL-coupled IGZO photoelectronic transistors under different gate bias conditions (as shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2d<\/a>). Each light pulse corresponding to firing spike is set to 38\u2009ms ON, and stationary state without firing spike is set to 38\u2009ms OFF. Thus, the time step between each node of PreIn is fixed, and the triggered channel currents of population transistor at each time step are collected as reservoir states. Relaxation characteristic curves are used as the activation function. As mentioned before, the characteristics are varied in transistors with different gate biases, which increase the richness of reservoir states. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2e<\/a> shows the response current of transistor with gate bias of -0.4\u2009V to four consecutive light pulses (38\u2009ms ON and 38\u2009ms OFF). Supplementary Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">15<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">16<\/a> show the device responses under different gate bias voltages to light pulse trains of different frequencies. Specific model fitting and calculation details can be found in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">17<\/a>, Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">18<\/a>, Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">1<\/a> and Note 2. L population encoders are connected in parallel to build a large parallel RC system. Each population encoder is responsible for encoding different PreIn, which is generated with L different mask matrixes parallelly. Thus, L population transistors with 3\u00d7L devices in arrays are used to obtain parallel reservoir states and 3\u00d7L\u00d7M reservoir states are recorded per frame. Supplementary Video\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM3\" target=\"_blank\" rel=\"noopener\">I<\/a> shows the aforementioned spike encoding and reservoir states collection processes.<\/p>\n<p>Finally, the reservoir states are collected for training and testing. During the training process, only the output weights (Wout) connected to the output layer are needed to be trained. The collected reservoir states are subjected to a one-step linear regression along with labels to obtain the desired weights<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Lukosevicius, M. &amp; Jaeger, H. Survey: reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127&#x2013;149 (2009).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR33\" id=\"ref-link-section-d388759675e2142\" target=\"_blank\" rel=\"noopener\">33<\/a>. The resulting output weights are multiplied with reservoir states collected from the testing process, and the output label is obtained based on the winner-take-all method<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Appeltant, L. et al. Information processing using a single dynamical node as complex system. Nat. Commun. 2, 468 (2011).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR34\" id=\"ref-link-section-d388759675e2146\" target=\"_blank\" rel=\"noopener\">34<\/a>. Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">19<\/a> shows the specific procedures, and detailed calculation procedures can be found in \u201cMethods\u201d.<\/p>\n<p>Memory capacity (MC) is a task-independent evaluation index of reservoir computing, which represents the ability of reservoir states to preserve input information previously<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Liang, X. et al. Rotating neurons for all-analog implementation of cyclic reservoir computing. Nat. Commun. 13, 1549 (2022).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR29\" id=\"ref-link-section-d388759675e2156\" target=\"_blank\" rel=\"noopener\">29<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Lee, O. et al. Task-adaptive physical reservoir computing. Nat. Mater. 23, 79&#x2013;87 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR35\" id=\"ref-link-section-d388759675e2159\" target=\"_blank\" rel=\"noopener\">35<\/a>. We evaluated the MC metrics of the bioinspired reservoir with different numbers (1, 2, 3, and 4) of GRF neurons. During encoding, the Gaussian centers were also sequentially set according to Eq. (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"equation anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Equ6\" target=\"_blank\" rel=\"noopener\">6<\/a>). Population encoders with different GRF neuron numbers (1, 2, 3, and 4) are corresponding to population transistor consisting of different amounts of EDL-coupled IGZO photoelectronic transistors with fixed VGS of 0\u2009V, \u20130.2\u2009V\/0\u2009V, \u20130.4\u2009V\/\u20130.2\u2009V\/0\u2009V, \u20130.8\u2009V\/\u20130.4\u2009V\/\u20130.2\u2009V\/0\u2009V, respectively. Specific calculation process of MC can be found in Supplementary Note.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">3<\/a>. As shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2f<\/a>, high MC value (~6.9) is obtained by three receptive fields with filtering and encoding. This should be due to the enriched reservoir states rendered by multiple GRF neurons. When encoding number is more than three, the MC value is almost remained. However, the greater number of GRF neurons, the larger the data scale is required, which would increase the computational burden on the following steps. Therefore, three GRF neurons are fixed as one population encoder and one population transistor is fixed of three EDL-coupled IGZO photoelectronic transistors with fixed VGS of \u20130.4\u2009V, \u20130.2\u2009V, 0\u2009V, respectively in the bioinspired reservoir.<\/p>\n<p>Human action recognition tasks based on standard datasets<\/p>\n<p>The UTD-MHAD dataset contains 27 classes of human actions collected from 8 subjects<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Chen, C., Jafari, R. &amp; Kehtarnavaz, N. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In 2015 IEEE International conference on image processing (ICIP) 168&#x2013;172 (IEEE, 2015).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR36\" id=\"ref-link-section-d388759675e2185\" target=\"_blank\" rel=\"noopener\">36<\/a>. Each skeleton frame consists of dynamic coordinates of 20 body joints. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3a<\/a> shows examples of color images and skeleton frames corresponding to a home-made action \u201cHigh throw\u201d (left panel) and an action \u201cBasketball shoot\u201d (right panel) in UTD-MHAD dataset. Color images and skeleton frames of home-made action were collected by authors through a mobile phone camera and a Kinect camera, respectively. In the UTD-MHAD standard dataset test, we randomly selected 30 samples of each action class, and divided them into training set and testing set at a ratio of 9:1. Specific encoding processes can be found in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">20<\/a>. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3b<\/a> shows partial time sections of reservoir states triggered by the light pulse trains and collected from the IGZO synaptic transistors. Different mask matrixes were used and the optimized performance was achieved with M\u2009=\u200930 and L\u2009=\u200930. After training and testing processes, the recognition rate was calculated by counting the recognition results of the entire data in the testing set. Repeated random subsampling validation method was used by ten times to enhance the reliability of results<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Pei, M. et al. Power-efficient multisensory reservoir computing based on Zr-Doped HfO2 memcapacitive synapse arrays. Adv. Mater. 35, 2305609 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR37\" id=\"ref-link-section-d388759675e2205\" target=\"_blank\" rel=\"noopener\">37<\/a>. The average of ten results was used as the final system recognition result.<\/p>\n<p><b id=\"Fig3\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 3: Recognition results and analysis on standard dataset tests based on bioinspired reservoir computing paradigm.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-56899-3\/figures\/3\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig3\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41467_2025_56899_Fig3_HTML.png\" alt=\"figure 3\" loading=\"lazy\" width=\"685\" height=\"504\"\/><\/a><\/p>\n<p><b>a<\/b> Examples of digital images and skeleton frames corresponding to a home-made action \u201cHigh throw\u201d and an action \u201cBasketball shoot\u201d in UTD-MHAD dataset, where the digital images and skeleton frames of home-made action were collected by authors through a mobile phone camera and a Kinect camera, respectively. Digital images and skeleton data in UTD-MHAD dataset is reprinted with permission<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Chen, C., Jafari, R. &amp; Kehtarnavaz, N. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In 2015 IEEE International conference on image processing (ICIP) 168&#x2013;172 (IEEE, 2015).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR36\" id=\"ref-link-section-d388759675e2223\" target=\"_blank\" rel=\"noopener\">36<\/a>, Copyright 2015, IEEE. <b>b<\/b> The partial time sections of reservoir states triggered by the light pulse trains and collected from the IGZO synaptic transistors. <b>c<\/b> Specific recognition results with bioinspired reservoir computing on UTD-MHAD dataset. <b>d<\/b> The recognition accuracies of UTD-MHAD dataset standard test with different numbers (2, 3, and 4) of Gaussian receptive fields. The error bars represent the standard deviation of five independent tests. <b>e<\/b> Recognition accuracies achieved on different validation datasets with bioinspired reservoir computing.<\/p>\n<p>Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3c<\/a> shows the recognition results with respect to UTD-MHAD dataset. The recognition rate based on all action classes reaches 93.58%. As can be seen, most actions can be well recognized and only a few actions like \u201cWave\u201d are with relatively low recognition accuracy. Among all action classes, the system achieves an excellent recognition rate of 100% on 14 classes, and high accuracy (\u2265 90%) on 21 classes. This shows that our system can well distinguish multiple types of complex action processes.<\/p>\n<p>The recognition tasks were also verified with different numbers (2, 3, and 4) of Gaussian receptive fields as shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3d<\/a>. The Gaussian centers were also sequentially set according to Eq. (<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"equation anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Equ6\" target=\"_blank\" rel=\"noopener\">6<\/a>). As shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3d<\/a>, high recognition accuracy (&gt;90%) on UTD-MHAD dataset is obtained by three receptive fields. This is consistent with the results of MC metrics in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig2\" target=\"_blank\" rel=\"noopener\">2f<\/a>. We further investigated the effect of standard deviation of Gaussian receptive fields on the recognition results. Our results indicate that \u03c3 can have a slight effect on recognition accuracy. By varying \u03c3 from 0.1 to 10, the maximum variation on recognition accuracies is only 1.11% (Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">21<\/a>). Computation complexity can be reduced based on such GRF encoding scheme. On the one hand, the scale of our network parameters are almost 2 orders of magnitudes smaller than previous machine learning algorithms (e.g., ResNet18) regarding UTD-MHAD dataset (detailed illustration can be found in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">22<\/a>)<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Tasnim, N., Islam, M. K. &amp; Baek, J. H. (2021). Deep learning based human activity recognition using spatio-temporal image formation of skeleton joints. Appl. Sci. 11, 2675 (2021).\" href=\"#ref-CR38\" id=\"ref-link-section-d388759675e2281\">38<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Usmani, A., Siddiqui, N. &amp; Islam, S. Skeleton joint trajectories based human activity recognition using deep RNN. Multimed. Tools Appl. 82, 46845&#x2013;46869 (2023).\" href=\"#ref-CR39\" id=\"ref-link-section-d388759675e2281_1\">39<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Tasnim, N. &amp; Baek, J. H. Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors 23, 778 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR40\" id=\"ref-link-section-d388759675e2284\" target=\"_blank\" rel=\"noopener\">40<\/a>. On the other hand, multiple training iterations are not required in our training process, and only one-step linear regression is needed, which also reduces the computation complexity. Our results indicate that GRF-based preprocessing is vital important for implementing the human action recognition by reservoir computing, and the very limited number of GRF neurons can have a significant improvement in recognition accuracy.<\/p>\n<p>We further verified this bioinspired reservoir computing paradigm on human action recognition tasks by three datasets of MSR Action3D, Florence 3D, and MSR Action Pairs. The MSR Action3D dataset contains 20 classes of skeleton-based action frames, and 20 skeleton joints\u2019 coordinates are recorded per frame<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Li, W., Zhang, Z. &amp; Liu, Z. Action recognition based on a bag of 3D points. In 2010 IEEE Conference on Computer Vision and Pattern Recognition 9&#x2013;14 (IEEE, 2010).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR41\" id=\"ref-link-section-d388759675e2291\" target=\"_blank\" rel=\"noopener\">41<\/a>. The dataset was used for verification, where 90% samples were chosen randomly for training and the rest for testing. A high recognition rate of 90.50% is obtained (Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">23<\/a>). Florence 3D dataset contains 9 action classes, and 15 skeleton joints\u2019 3D coordinates are recorded in each frame<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A. &amp; Pala, P. Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In 2013 IEEE Conference on Computer Vision and Pattern Recognition 479-485 (IEEE, 2013).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR42\" id=\"ref-link-section-d388759675e2298\" target=\"_blank\" rel=\"noopener\">42<\/a>. Our system can identify 91.11% of the testing samples successfully (Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">24<\/a>). The actions in MSR Action pairs dataset are in paired with similar action trajectories<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Oreifej, O. &amp; Liu, Z. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In 2013 IEEE Conference on Computer Vision and Pattern Recognition 716&#x2013;723 (IEEE, 2013).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR43\" id=\"ref-link-section-d388759675e2305\" target=\"_blank\" rel=\"noopener\">43<\/a>. This makes it difficult to recognize actions in a pairwise relationship. As shown in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">25<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">12<\/a> action classes can be well distinguished with an average recognition accuracy of 90.67%. Specific parameters of bioinspired reservoir on three dataset standard tests can be found in Supplementary Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">2<\/a>. Repeated random subsampling validation method was applied by five times in the three tasks to enhance the reliability of results. In addition, such bioinspired reservoir computing paradigm can be also applied for high-precision and multi-type action recognition task based on depth images in videos (achieving an accuracy of 92.35% on 27 labels, and the calculation process can be found in Supplementary Note\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">4<\/a>). As high recognition accuracies (&gt;90%) can be achieved on different validation datasets as shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig3\" target=\"_blank\" rel=\"noopener\">3e<\/a> and Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">26<\/a>, our system exhibits remarkable versatility and fault tolerance for human action recognition.<\/p>\n<p>Recognition of falling behaviors based on analog system<\/p>\n<p>The conceptual diagram of our Alpho-RC system for real-world human action processing tasks is shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4a<\/a>. Skeleton frames of daily activities collected by the Microsoft Kinect camera are the original input, and the Gaussian receptive field based encoding mechanism is implemented. During the training process, currents obtained by population transistors in response to encoding pulse trains are used as reservoir states for training output weights. During the testing process, resulting output weights are mapped by 1T1R array, and output currents of testing action sample are obtained. The video processing tasks are mostly involved in human action recognition and prediction. Human action recognition tasks usually focus on completed actions, and achieve classification purposes by learning the entire processes of actions. However, in actual scenarios, it is often necessary to achieve early classification and prediction before action ends, such as predicting and alerting before falling behaviors are completed<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Kong, Y., Kit, D. &amp; Fu, Y. A discriminative model with multiple temporal scales for action prediction. In Computer Vision&#x2013;ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 596-611 (Springer, 2014).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR44\" id=\"ref-link-section-d388759675e2339\" target=\"_blank\" rel=\"noopener\">44<\/a>. Purpose of the prediction task is to output the corresponding action label by using only the early frame sequences for inferencing.<\/p>\n<p><b id=\"Fig4\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 4: Recognition of falling behaviors based on Alpho-RC system.<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-56899-3\/figures\/4\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig4\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/04\/41467_2025_56899_Fig4_HTML.png\" alt=\"figure 4\" loading=\"lazy\" width=\"685\" height=\"712\"\/><\/a><\/p>\n<p><b>a<\/b> The computational architecture schematic diagram of Alpho-RC system for human action recognition and prediction. <b>b<\/b> Examples of five actions including two normal actions (squat and stretch) and three falling actions (fall down, fall to the left and fall to the right) recorded by a mobile phone camera<b>. c<\/b> Examples of action skeleton frames corresponding to five actions. <b>d<\/b> Device conductance from 200\u22121300\u2009\u00b5S mapped from the numerical weight values. <b>e<\/b> The write error tolerance of \u00b11% during mapping program. <b>f<\/b> Recognition accuracy of 96.67% obtained by memristor-based output layer. <b>g<\/b> Comparison of energy consumption per action among Alpho-RC system and algorithms based on various advanced processors.<\/p>\n<p>Falling behaviors as common events in real life often involves potential health risks. For example, elderly people living alone and children are more likely to suffer physical injuries if they fall, thus affecting their health. Effectively identify falling behaviors from normal behaviors is crucial to the safety of both young and old<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Rubenstein, L. Z. Falls in older people: epidemiology, risk factors and strategies for prevention. Age Ageing 35, ii37&#x2013;ii41 (2006).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR45\" id=\"ref-link-section-d388759675e2387\" target=\"_blank\" rel=\"noopener\">45<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Haarbauer-Krupa, J. et al. Fall-related traumatic brain injury in children ages 0&#x2013;4 years. J. Saf. Res. 70, 127&#x2013;133 (2019).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR46\" id=\"ref-link-section-d388759675e2390\" target=\"_blank\" rel=\"noopener\">46<\/a>. A home-made 3D skeleton-based falling dataset was used for verifying the efficacy of such system for dealing with real-world tasks. Such dataset including two normal actions (squat and stretch) and three falling actions (fall down, fall to the left and fall to the right) was collected and built by both digital and Kinect camera. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4b<\/a> shows examples of the five actions recorded by a mobile phone camera. Figure\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4c<\/a> shows examples of action skeleton frames recorded by a Kinect camera correspondingly. Specific demonstration of skeleton frames can be found in Supplementary Video\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM4\" target=\"_blank\" rel=\"noopener\">II<\/a>. The collection details can be found in the \u201cMethods\u201d. Here, the recognition on such home-made falling dataset was implemented by our Alpho-RC system. The 90% of action samples in the dataset were randomly selected as the training set, and the output weights were obtained by noise-aware linear regression training method (M\u2009=\u20096, L\u2009=\u20095). The detailed training process can be found in Supplementary Note.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">5<\/a>. The offline-trained weights were then programmed into the memristor cells in the 1T1R array. The numerical weight values were firstly mapped to device conductance from 200-1300 \u00b5S (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4d<\/a>) and then programmed into the array using a write-and-verify programming scheme. Detailed reliability data of the hardware setup can be found in Supplementary Figs.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">27<\/a>, <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">28<\/a>. Good programming accuracies were achieved with write error tolerance of \u00b11% (Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4e<\/a>). The memristor-based output layer successfully classified five actions with recognition accuracy of 96.67% (as shown in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4f<\/a>), which is close to software simulations of 98.33% (as shown in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">29<\/a>). The successful demonstration of human action recognition in the hardware based bioinspired in-materia reservoir computing paves the way for energy-efficient edge computing applications. Our Alpho-RC system was also applied for human action prediction. Only the observed frames were fed into the photoelectronic reservoir for predication. Here, the ratio between the number of observed frames and the frames of the completed action is defined as observation ratio. As shown in Supplementary Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">30<\/a>, the prediction accuracies are plotted as function of the observation ratios with respect to home-made falling dataset. The prediction accuracies are larger than 80% with the observation ratio higher than 50% and larger than 90% with the observation ratio higher than 70%. Such results indicate our system can achieve excellent prediction results, verifying the capability of both human action recognition and prediction.<\/p>\n<p>Finally, the energy efficiency of such system is evaluated by calculating the energy consumption corresponding to processing one completed action. As mentioned before, no complex filtering process is required for the feature extraction of every action. The most energy consumption before the training of weights connected to the output layer should be attributed to encoding process and the current flow through transistors. The energy consumption for processing an action by our system is only ~45.78\u2009\u03bcJ (detailed estimation can be found in experimental section). However, feature extraction is required in most previous software-based methods, which dramatically increases the floating-point operations and the energy consumption in turn. The comparison of energy consumption is summarized in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#Fig4\" target=\"_blank\" rel=\"noopener\">4g<\/a>. Our Alpho-RC system is at least two orders of magnitude lower than that of CMOS-based processors including CPU, FPGA, and GPU<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Ma, X., Borbon, J. R., Najjar, W. &amp; Roy-Chowdhury, A. K. Optimizing hardware design for human action recognition. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL) 1-11 (IEEE, 2016).\" href=\"#ref-CR47\" id=\"ref-link-section-d388759675e2441\">47<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Zhang, B., Han, J., Huang, Z., Yang, J. &amp; Zeng, X. A real-time and hardware-efficient processor for skeleton-based action recognition with lightweight convolutional neural network. IEEE Trans. Circuits Syst. II: Express Briefs. 66, 2052&#x2013;2056 (2019).\" href=\"#ref-CR48\" id=\"ref-link-section-d388759675e2441_1\">48<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 49\" title=\"Zhang, P. et al. View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In Proceedings of the IEEE International Conference on Computer Vision 2117&#x2013;2126 (IEEE, 2017).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR49\" id=\"ref-link-section-d388759675e2444\" target=\"_blank\" rel=\"noopener\">49<\/a>. We also compared the reported physical reservoir computing architectures for human action recognition (Supplementary Table\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"supplementary material anchor\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#MOESM1\" target=\"_blank\" rel=\"noopener\">3<\/a>), achieving largest type number of human action classification<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 13\" title=\"Tan, H. &amp; van Dijken, S. Dynamic computer vision with retinomorphic photomemristor-reservoir computing. Nat. Commun. 14, 2169 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR13\" id=\"ref-link-section-d388759675e2451\" target=\"_blank\" rel=\"noopener\">13<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, R. et al. Thin-film transistor for temporal self-adaptive reservoir computing with closed-loop architecture. Sci. Adv. 10, eadl1299 (2024).\" href=\"#ref-CR50\" id=\"ref-link-section-d388759675e2454\">50<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Jang, Y. H. et al. A high-dimensional in-sensor reservoir computing system with optoelectronic memristors for high-performance neuromorphic machine vision. Mater. Horiz. 11, 499&#x2013;509 (2024).\" href=\"#ref-CR51\" id=\"ref-link-section-d388759675e2454_1\">51<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 52\" title=\"Sun, Y. et al. In-sensor reservoir computing based on optoelectronic synapse. Adv. Intell. Syst. 5, 2200196 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41467-025-56899-3#ref-CR52\" id=\"ref-link-section-d388759675e2457\" target=\"_blank\" rel=\"noopener\">52<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"An analog photoelectronic reservoir computing system Figure\u00a01a shows the dynamic vision processing in the human visual system. Visual&hellip;\n","protected":false},"author":2,"featured_media":56149,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3164],"tags":[3284,29693,3965,29694,3966,70,53,16,15],"class_list":{"0":"post-56148","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-electronic-devices","10":"tag-humanities-and-social-sciences","11":"tag-information-storage","12":"tag-multidisciplinary","13":"tag-science","14":"tag-technology","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114412876368970025","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/56148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=56148"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/56148\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/56149"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=56148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=56148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=56148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}