Stacked Intelligent Metasurfaces for
Wireless Sensing and Communication:
Applications and Challenges

Hao Liu, Jiancheng An, Xing Jia, Shining Lin, Xianghao Yao, Lu Gan,
Bruno Clerckx, Chau Yuen, Mehdi Bennis,
and Mérouane Debbah
H. Liu, X. Jia, S. Lin, X. Yao, and L. Gan are with the School of Information and Communication Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, Sichuan 611731, China. L. Gan is also with the Yibin Institute of UESTC, Yibin, Sichuan 644000, China (e-mail: liu.hao@std.uestc.edu.cn, xingjia1999@163.com, 202221011710@std.uestc.edu.cn, xianghao_yao@163.com, ganlu@uestc.edu.cn). J. An and C. Yuen are with the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore 639798 (e-mail: jiancheng.an@ntu.edu.sg, chau.yuen@ntu.edu.sg). B. Clerckx is with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, U.K. (e-mail: b.clerckx@imperial.ac.uk). M. Bennis is with the Center for Wireless Communications, Oulu University, Oulu 90014, Finland (e-mail: mehdi.bennis@oulu.fi). M. Debbah is with KU 6G Research Center, Khalifa University of Science and Technology, P O Box 127788, Abu Dhabi, UAE and CentraleSupelec, University Paris-Saclay, 91192 Gif-sur-Yvette, France (e-mail: merouane.debbah@ku.ac.ae).
Abstract

The rapid advancement of wireless communication technologies has precipitated an unprecedented demand for high data rates, extremely low latency, and ubiquitous connectivity. In order to achieve these goals, stacked intelligent metasurfaces (SIM) has been developed as a novel solution to perform advanced signal processing tasks directly in the electromagnetic wave domain, thus achieving ultra-fast computing speed and reducing hardware complexity. This article provides an overview of the SIM technology by discussing its hardware architectures, advantages, and potential applications for wireless sensing and communication. Specifically, we explore the utilization of SIMs in enabling wave-domain beamforming, channel modeling and estimation in SIM-assisted communication systems. Furthermore, we elaborate on the potential of utilizing a SIM to build a hybrid optical-electronic neural network (HOENN) and demonstrate its efficacy by examining two case studies: disaster monitoring and direction-of-arrival estimation. Finally, we identify key implementation challenges, including practical hardware imperfections, efficient SIM configuration for realizing wave-domain signal processing, and performance analysis to motivate future research on this important and far-reaching topic.

Index Terms:
Stacked intelligent metasurfaces (SIM), diffractive neural network, hybrid optical-electronic neural network (HOENN), wave-domain signal processing, MIMO transceiver.

I Introduction

The rapid development of wireless networks has led to an unprecedented demand for high data rates, low latency, and massive connectivity. To address these challenges, researchers have been exploring innovative transceiver architectures and communication technologies, aiming to enhance the performance of wireless networks in terms of capacity, efficiency, and reliability. Additionally, the deep convergence of sensing and communication capabilities has defined a major evolution of the next-generation wireless networks, which allows the utilization of wireless resources more efficiently [1].

Over the past decades, multi-antenna systems, which are known as multiple-input multiple-output (MIMO), have significantly reshaped modern wireless communications and enhanced wireless sensing capabilities [2]. By employing multiple antennas at both the transmitter and receiver, MIMO communication systems can exploit multiplexing gain and spatial diversity to improve spectral efficiency and link reliability, respectively. In MIMO radar systems, leveraging multiple orthogonal channels is capable of further improving the parameter estimation accuracy for sensing the targets of interest. However, as the number of antennas increases, the hardware complexity and energy consumption of MIMO systems grow significantly, posing challenges in their practical implementation [2, 3].

To address these issues, hybrid MIMO architectures have been developed. By combining a few radio frequency (RF) chains with a number of low-cost analog phase shifters, hybrid MIMO systems can strike flexible trade-offs between performance and complexity [2]. Furthermore, reconfigurable holographic surface (RHS) technology has recently garnered significant attention to efficiently implement hybrid MIMO. Specifically, an RHS is an artificial 2D metasurface composed of a large number of sub-wavelength elements, which can manipulate electromagnetic (EM) waves in a programmable manner [4]. By integrating RHSs into transceivers, it becomes possible to realize near-continuous apertures and generate highly directional beams. Nevertheless, the limited number of RF chains in hybrid MIMO systems may constrain their capability of harvesting the full spatial diversity and multiplexing gain offered by the large-scale antenna/metasurface array. Additionally, the single-layer RHS has restricted signal processing capabilities, which still rely on baseband beamforming to mitigate inter-stream interference.

Against this background, the concept of stacked intelligent metasurfaces (SIM) has emerged recently [1, 5, 6]. Generally speaking, a SIM comprises multiple programmable transmissive metasurfaces, each containing numerous low-cost meta-atoms. By appropriately configuring the EM response of these meta-atoms, SIMs can perform advanced signal processing tasks required in wireless sensing and communication applications, such as MIMO precoding and direction-of-arrival (DOA) estimation, directly in the wave domain [3, 7]. Thanks to the wave-domain signal processing, deploying SIMs in wireless networks is capable of significantly reducing the processing delay, hardware complexity, and energy consumption. Specifically, in communication systems, single-stream detection can be achieved by relying on low-resolution analog-to-digital converters and a small number of RF chains [3]. Additionally, sensing systems can benefit from replacing conventional power-intensive array receivers with low-cost energy detectors [7]. The real-time programmability of SIMs enables dynamically tuning their EM responses according to various propagation environments and specific computing tasks.

Benefiting from the programmable multi-layer architecture and the advanced analog computing paradigm, a SIM has the potential to realize deep neural network (DNN) functionalities as EM waves propagate through it [5]. Considering the inference capability of a SIM is limited by the linearity property of its transfer function, the hybrid optical-electronic neural network (HOENN) architecture has been developed [6], which inherit the advantages of both SIM-empowered optical neural networks (ONNs) and electronic neural networks (ENNs). By stacking multiple low-cost metasurfaces, ONNs extract the latent information by directly preprocessing the incident signals in the wave domain. Then, ENNs, operating on the received amplitude signals, are utilized to supplement the inference capabilities of SIMs. As a result, HOENNs are capable of striking flexible trade-offs between inference performance, complexity, and power consumption.

Refer to caption
Figure 1: Illustration of three typical SIM HTs.

Despite the great potential of SIMs and HOENNs in realizing various analog computing tasks, new technological challenges arise accordingly. In this article, we provide an overview of the SIM technology, discussing its potential applications in wireless sensing and communication systems. We commence by discussing its utilization in wave-domain beamforming as well as the challenges in modeling EM propagation and probing channels in SIM-assisted communication systems. Furthermore, we explore the applications of HOENNs in disaster monitoring and DOA estimation. Finally, we identify several research challenges that need to be addressed for unleashing the full potential of SIMs for motivating future research endeavors.

II Functionality, Hardware, and Deployment of SIMs

The most remarkable feature of SIMs is to directly process complex signals in the wave domain, thus significantly reducing the energy consumption and processing latency required for performing matrix computations compared to conventional digital signal processors. Specifically, as EM waves propagate through each diffractive metasurface layer, the meta-atoms act as secondary radiation sources, illuminating the subsequent layer. Consequently, a SIM has a physical neural network architecture, where each meta-atom on a layer functions as a neuron, processing and conveying information to the next layer. This advanced architecture enables the SIM to perform parallel computations with ultra-fast processing speed.

As shown in Fig. 1, the existing SIM prototypes can be categorized into three hardware types (HTs) based on whether they are programmable and integrated with active amplifiers. Next, we elaborate on the unique hardware characteristics and respective application scenarios of three HTs.

Refer to caption
Figure 2: Conventional transceiver vs. SIM-based transceiver.
  • HT-I: Non-programmable passive SIMs operate without control circuits, thus having fixed interconnection structures once manufactured. As expected, they can manipulate the EM behaviors of the waves propagating through them without consuming extra energy during the wave-domain computation process. Fig. 1(a) illustrates a SIM (HT-I) having multi-layer diffractive metasurfaces [6], where three diffractive layers and an image encoding layer are placed closely with layer spacing of 0.030.030.030.03 m. Each diffractive layer is fabricated using 3D printed VeroBlackPlus RGD875. In [6], the SIM is integrated with an ENN to construct an HOENN for implementing image reconstruction. Due to their static property, SIMs (HT-I) are cost-efficient and well-suited for performing local tasks that are not sensitive to environmental variations, e.g., DOA estimation in suburban areas.

  • HT-II: Programmable passive SIMs can be dynamically reconfigured using a field programmable gate array (FPGA) according to the environment variations and specific computing tasks. Once the transmission coefficients are configured, SIMs (HT-II) operate passively, realizing the desired function without consuming additional energy. As shown in Fig. 1(b), a programmable passive SIM is fabricated in [1] and applied to an integrated sensing and communication (ISAC) system operating at 5.85.85.85.8 GHz. Specifically, each meta-atom on a metasurface comprises a radiating layer, a ground plane, a receiving layer, and a bias layer, with three substrate layers in between. Moreover, each meta-atom is capable of imposing a discrete phase shift with 1-bit quantization precision, i.e., 00 or π𝜋\piitalic_π, for shaping the wavefronts. Thanks to the programmable properties, SIMs (HT-II) showcase great potential to realize beamforming functionalities in the wave domain, thereby simplifying MIMO transceiver architectures and reducing RF-related power consumption.

  • HT-III: Programmable active SIMs are capable of reconfiguring their amplitude and phase response simultaneously by using an FPGA, where each meta-atom is integrated with an amplifier to provide a wide dynamic range to adjust the amplitude of the incident signal. As depicted in Fig. 1(c), Liu et al. [5] developed a five-layer SIM (HT-III) that operates at 5.45.45.45.4 GHz, with a layer spacing of 0.10.10.10.1 m. Specifically, each diffractive layer contains 64646464 meta-atoms, and each meta-atom is composed of two amplifiers and three substrate layers: two F4B layers (with 1 mm thickness) and a prepreg layer (with 0.2 mm thickness) being sandwiched between the F4B layers. Moreover, the modulation of both the amplitude and phase of the transmissive wave passing through each meta-atom is intrinsically coupled when adjusting the amplifier’s voltage supply. Benefiting from its capability of realizing nonlinear power amplification, SIMs (HT-III) have the potential to fully implement DNN functionalities in the wave domain. This enables them to handle complex tasks, such as real-time image recognition.

Moreover, SIMs can be deployed in various locations (L) in wireless networks to perform specific functions. Specifically, there are three categories for deploying SIMs in practice:

  • L-T: SIMs deployed at the transmitter (T) are capable of performing encoding, modulation, and beamforming functions directly in the wave domain, substantially improving the signal processing efficiency.

  • L-E: SIMs deployed in the environment (E) can create favorable propagation conditions, thus enhancing signal strength in desired positions and nulling interference.

  • L-R: SIMs deployed at the receiver (R) could effectively decode the received signals and extract useful information for various sensing and communication applications.

III SIM-assisted Wireless Communications

In this section, we introduce the applications of SIMs in wireless communications, particularly focusing on wave-domain beamforming, as well as channel modeling and estimation. In Fig. 2, we compare conventional wireless transceivers with SIM-assisted counterparts, taking the transmitter as an example. Specifically, as shown in Fig. 2(a), conventional MIMO transmitters rely on digital precoding to suppress inter-stream interference, which needs to assign a high-resolution digital-to-analog converter (DAC) and an active RF chain for each transmitting antenna. Thus, in extremely large antenna array (ELAA) systems, the hardware costs and computational complexity would become prohibitive [3]. By contrast, SIM-assisted transmitters realize parallel matrix computations in the wave domain, replacing conventional digital precoding. As shown in Fig. 2(b), this advanced transmitter brings two substantial changes:

  1. 1.

    Reduction in Hardware Cost: Thanks to wave-domain precoding, SIMs are capable of creating multiple parallel subchannels that are interference-free in the physical space. As a result, when considering the practical modulation (e.g., binary phase shift keying), each single data stream can be transmitted via a low-resolution DAC from the corresponding transmitting antenna, and a small number of active RF chains become sufficient to implement the multi-stream transmission, which significantly reduces the hardware cost [3].

  2. 2.

    Expansion of Near-Field Region: Compared to conventional antenna arrays, it becomes possible to integrate a huge number of low-cost meta-atoms on a metasurface. Therefore, each metasurface layer of a SIM has a large array aperture, which significantly enlarges the near-field region. Specifically, for a metasurface aperture of 0.50.50.50.5 m operating at 28282828 GHz, the Rayleigh distance, which distinguishes between the near and far fields, extends to 47474747 m. As a result, more users are located within the radiating near-field region (see Fig. 2(b)), facilitating more precise beamfocusing.

III-A Wave-domain Beamforming

Wave-domain beamforming, taking advantage of the parallel computation capabilities and ultra-low processing latency afforded by the multi-layer diffractive structure of the SIM, has been shown as a promising solution for enabling next-generation wireless networks [3]. However, achieving the desired wave-domain computation functionalities requires appropriately tuning the SIM transmission coefficients. To this end, Liu et al. [8] customized a deep reinforcement learning (DRL) approach to jointly optimize the SIM phase shifts and transmit power allocation. Considering the practical modulation, Perovic and Tran [9] developed a projected gradient descent (PGD) method to successively optimize the phase shifts of SIMs at the transceivers layer-by-layer for minimizing the channel cutoff rate. In addition, Papazafeiropoulos et al. [10] proposed a hybrid digital and wave-domain MIMO transceiver architecture, where the SIM is utilized to enhance the performance of the conventional analog precoding and combining components relying on a single-layer metasurface. Specifically, a projected gradient ascent (PGA) method was proposed to simultaneously optimize phase shifts of dual SIMs along with digital transmit precoding and receive combining to maximize the achievable rate.

Refer to caption
Figure 3: SIMs for interference cancellation, where we consider four users in the near field.

Again, with the utilization of large-aperture metasurfaces, it is possible to realize near-field beamfocusing for serving multiple users in the same angular direction. Fig. 3 verifies the beamfocusing capability of a SIM having a layer spacing of 3333 mm, where we consider 225225225225 meta-atoms per layer operating at 10101010 GHz. Moreover, four users are deployed in the boresight direction of the SIM, with a spacing of 1.51.51.51.5 m. As shown in Fig. 3(a), the conventional digital zero-forcing (ZF) precoding scheme accurately directs multiple beams toward the users’ locations. Fig. 3(b) shows the capability of a SIM to realize beamfocusing in the wave domain. Specifically, a single-layer metasurface can only focus on approximately two users due to its limited tuning capability on the incident waves. By contrast, increasing the number of layers can achieve comparable performance with the ZF precoder, benefiting from the interference cancellation capabilities brought by the multi-layer SIM.

III-B Channel Modeling

Channel modeling is crucial for precisely characterizing the phase and amplitude variations of EM waves through the multi-layer architecture of the SIM, as well as their spatial distributions during wireless propagation. To this end, An et al. [3] employed Rayleigh-Sommerfeld diffraction theory to model the channels between adjacent layers, which depends on the interlayer spacing, radio frequency, and spatial distance between meta-atoms. Additionally, the wireless channel from the transmitter to the receiver was characterized using a spatially correlated Rayleigh fading model to account for the effects of densely arranging meta-atoms [3]. Furthermore, Matteo and Clerckx [11] utilized multiport network theory to develop a physically consistent model for describing SIM-assisted communications. Specifically, they modeled the EM coupling between transmitting antennas based on source voltages and impedances, while also accounting for the coupling between metasurface layers. By sequentially modeling and cascading the interaction of multiple metasurface layers on the propagating EM waves, the transfer function of the SIM could be accurately predicted.

In theory, precisely describing the EM propagation through the SIM requires solving complex Maxwell’s equations, which is quite challenging. Numerical methods such as finite element analysis can be employed to model the energy distribution of EM waves. Additionally, the absorption and polarization losses as EM waves pass through the multiple layers should be considered in the channel modeling. In summary, developing a parametric channel model that is consistent with empirical measurements is of primary importance for designing appropriate SIM hardware structures to mitigate adverse diffractive behaviors and optimizing its transmission coefficients to achieve desired analog computing functions.

III-C Channel Estimation

Channel estimation is essential for the efficient operation of SIM-assisted communication systems. However, the large number of meta-atoms on each layer and various inherent limitations in practical SIM hardware architectures present a significant challenge for channel estimation. As shown in Fig. 2(b), SIMs generally cannot probe the channel associated with each meta-atom due to the fact that the number of RF chains at the transmitter is smaller than that of meta-atoms in the output layer of the SIM [12, 13]. This would lead to an underdetermined channel estimation problem. Additionally, as the near-field region enlarges, it becomes urgent to design effective near-field channel estimation schemes.

To address the underdetermined problem, Yao et al. [12] suggested collecting multiple copies of the uplink pilot signals to probe the wireless channels. Additionally, the phase shifts of the SIM in each time slot were astutely configured to further improve the channel estimation accuracy [12, 13]. However, these channel estimation schemes may incur excessive pilot overhead, which is proportional to the number of meta-atoms normalized by that of RF chains. Leveraging channel sparsity in both angular and polar domains is interesting, as well as using compressed sensing techniques to estimate high-dimensional channels in SIM-assisted systems with moderate pilot overhead. Additionally, the phase shifts of the SIM can be appropriately configured to satisfy the restricted isometry property.

TABLE I: A survey of recent advances in SIM-assisted wireless sensing and communication.
Ref. Application Scenario
Link
direction
SIM’s
location
HT
Objective function
Algorithms
[1] Wave-domain beamforming
ISAC
Downlink L-E HT-II
Cramer-Rao bound
AO, SDR
[3] Single-user MIMO Downlink L-T, L-R HT-II
Fitting error
GD
[8] Multi-user MISO Downlink L-T HT-II Sum rate DRL
[9] Single-user MIMO Downlink L-T, L-R HT-II Channel cutoff rate AO, PGD
[10] Single-user MIMO Downlink L-T, L-R HT-II Sum rate PGA
[11]
SISO
Downlink L-T HT-II Channel gain Analytical solution
[14]
Cell-free network
Uplink L-R HT-II Channel gain
AO, analytical solution
[15] Multi-user MISO Uplink L-R, L-E HT-II
Spectral efficiency
PGA
[12] Channel estimation Multi-user MISO Uplink L-R HT-II
Mean squared error
Codebook
[13] Multi-user MISO Uplink L-R HT-II
Mean squared error
GD
[5]
Image
recognition
Multiple probes
—– L-T HT-III Cross-entropy error
Reinforcement learning
[6]
Image
reconstruction
Single probe
—– L-T HT-I
Mean absolute error,
reversed Huber loss
GD
[7]
DOA
estimation
Single source
—– L-R HT-II
Fitting error
GD
MISO: Multiple-input single-output. SISO: Single-input single-output. SDR: Semidefinite relaxation. GD: Gradient descent.

III-D Application Scenarios

Indeed, SIMs can be deployed in various wireless networks by leveraging their wave-domain beamforming capabilities. This simplifies system architectures and improves network energy efficiency. For instance, Li et al. [14] explored the integration of SIMs with distributed access points in cell-free networks, which significantly reduces power consumption. Additionally, designing various analog signal processors in the wave domain for implementing specific functions is now possible. In [7], An et al. utilized a SIM deployed in front of the receiving array to perform 2D discrete Fourier transform (DFT) in the wave domain and generated the angular spectrum for efficient DOA estimation. Furthermore, Wang et al. [1] demonstrated the potential of SIMs in generating multiple beams toward multiple users and an extended target for dual functions. When deploying SIMs in the propagation environment, their non-diagonal response can enhance the capability of customizing wireless channels [15].

Additionally, thanks to their energy-efficient signal processing capabilities, SIMs are particularly suited for internet-of-things networks, by extending the battery life of connected devices. Moreover, the integration of SIMs into edge computing systems enables the efficient preprocessing of raw data at the edge, thereby relieving the burden on central processing units and minimizing processing latency. These diverse applications demonstrate the versatility and potential of SIMs in significantly reducing the computation of next-generation wireless networks. To elaborate, Tab. 1 summarizes the state-of-the-art research on SIMs.

IV Hybrid Optical-Electronic Neural Network

Although SIM-based ONN has shown its capability in simple image recognition tasks at the speed of light [1, 5, 6], its inference capability remains limited due to the inherent linear property of the transfer function. To further enhance the inference capability, the authors in [6] cascaded an ENN with an SIM-based ONN to construct an HOENN, yielding two primary advantages. Firstly, the power-efficient ONN preprocesses EM waves directly, effectively reducing the network scale requirements for the ENN and alleviating the system’s computation load. Second, by extracting the key information hidden in the phase characteristics of the incident EM wave and transforming it into the amplitude characteristics of the transmissive EM waves, power-efficient envelope detectors become adequate at the receiver to enable the operation of the ENN. As a result, HOENNs significantly decrease RF-related power consumption and hardware complexity while compensating for the inference capability of diffractive neural networks. Next, we examine two specific sensing tasks: disaster monitoring and DOA estimation, to illustrate the efficacy of the SIM-based HOENN.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: (a) The probability of disaster identified by the HOENN; (b) Illustration of an HOENN.

IV-A Disaster Monitoring

As depicted in Fig. 4, we consider a scenario where a SIM is mounted on an unmanned aerial vehicle (UAV) for monitoring natural disasters. Specifically, three types of natural disasters: fire, landslide, flood, and an ordinary landscape are considered in Fig. 4(a), while the schematic of the sensing system is shown in Fig. 4(b). At the UAV, the geomorphic photo is first encoded into transmission coefficients of the input layer of the SIM. The rest of the SIM layers construct an ONN to extract information from the captured geomorphic photo. Note that the SIM processes the information-carrying EM waves at light speed, substantially reducing the processing delay compared to conventional neural networks. As a result, the UAV is capable of transmitting signals with reduced hardware cost and power consumption, thus enhancing its endurance performance. At the ground receiving station (GRS), the signals are received by using an energy detector and then fed into an ENN, which further processes the signals and then produces the recognition results. The HOENN inherits the benefits of the power-efficient preprocessing capability of SIM and the strong inference capabilities of conventional neural networks. As shown in Fig. 4(a), the HOENN can correctly recognize the geomorphic photos of natural disasters.

IV-B DOA Estimation

SIMs can generate the angular spectrum of the incident EM signals by performing DFT operations in the wave domain [7]. However, a large number of observational snapshots are required to improve the accuracy of the DOA estimation [7], which is unsuitable for high-speed mobility scenarios. By contrast, HOENNs with strong inference capabilities can circumvent this drawback. Specifically, the SIM-based ONN extracts spatial distribution characteristics of EM waves received by the large-aperture metasurface for mitigating the ambiguity brought by power measurements. Following this, the ENN is capable of generating high-precision angular spectrum according to the magnitude signal observed at the receiving array. Next, we examine the rough DOA estimation of a single radiation source in the far field. The DOA estimation is modeled as a multi-classification problem and can be addressed by employing an HOENN. In this case study, the entire half space is uniformly divided into 8×8888\times 88 × 8 subregions. A SIM deployed in front of the receiving array is used as the ONN, and a single fully-connected layer is used as the ENN component of the HOENN.

Figs. 5(a) and (b) display the angular spectra generated by the HOENN. It is evident that the peaks can be perfectly detected in the corresponding regions. Moreover, Fig. 5(c) illustrates the classification accuracy of the HOENN versus the received signal-to-noise ratio (SNR), where the ONN and ENN are also utilized for performance comparison. For the ONN system, each receiving antenna corresponds to a specific DOA region. The DOA is estimated by detecting the antenna with the highest received power. For the ENN system, all phase shifts of the SIM are configured randomly. The specific network parameters are detailed in Fig. 5(d). As shown in Fig. 5(c), HOENN exhibits a distinct enhancement in terms of classification accuracy compared to the benchmarks, indicating the gain of cascading these two complementary components.

In a nutshell, utilizing SIMs in HOENN can reduce the energy consumption of conventional neural networks in processing complex tasks, enabling the sensing system to quickly identify and extract critical features [5, 6]. Additionally, by introducing ENNs to complement the SIMs, the HOENN exhibits enhanced inference capabilities compared to using the SIM alone [6]. Furthermore, the SIM-based HOENN can be integrated into wireless networks to enable integrated artificial intelligence and communication as well as hyper-reliable low-latency communications, which are two typical usage scenarios identified by IMT-2030. For instance, in the autonomous driving scenario, the SIM can be installed on a car sunroof to implement high-precision positioning and speed measurement tasks in real time.

Refer to caption
(a) The azimuth angle is 120superscript120120^{\circ}120 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, and the elevation angle is 60superscript6060^{\circ}60 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT.
Refer to caption
(b) The azimuth angle is 240superscript240240^{\circ}240 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, and the elevation angle is 30superscript3030^{\circ}30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT.
Refer to caption
(c) The classification accuracy versus received SNR using different networks.
Refer to caption
(d) Simulation setup.
Figure 5: Simulation results of utilizing an HOENN for DOA estimation.

V Implementation Challenges of SIMs

In this section, we present some theoretical and practical challenges of implementing SIMs in wireless sensing and communication systems.

V-A Practical Hardware Imperfections

Challenges: Efficient SIM design relies on accurate system models. However, several practical factors may cause hardware imperfections, such as meta-atom inconsistencies due to manufacturing or assembly deformations, bias voltage jitter of control circuits [5], and coupling effects between adjacent meta-atoms and metasurfaces. Additionally, the inter-layer propagation coefficients may also deviate from their ideal values. All of these hardware imperfections will lead to the performance degradation of SIM in implementing wave-domain signal processing and affect the inference capability of HOENNs.

Solutions: The inter-layer propagation coefficients can be calibrated by measuring the practical radiation patterns of SIMs and designing appropriate compensation techniques. Additionally, leveraging or mitigating the coupling effects requires establishing a more precise physically consistent model [11]. Moreover, it is crucial to enhance the robustness of wave-domain computation. For example, during the training of HOENNs, incorporating a set of training samples that take into account the distributions of various hardware imperfections may improve HOENNs’ robustness [6].

V-B Efficient SIM Configuration

Challenges: In SIM-assisted wireless sensing and communication systems, developing efficient SIM configuration algorithms according to dynamic environments and tasks constitutes a key challenge. Generally, achieving optimal performance requires jointly designing the SIM phase shifts and resource allocation strategies. Such complex non-convex optimization problems with multiple coupled variables are often decomposed into subproblems using alternating optimization (AO), which may lead to slow convergence, high computational complexity, and sensitivity to the initial point [10, 15]. Moreover, the practical constraints on the phase shift resolution of meta-atoms further increase the difficulty in training the HOENN with quantized neural network parameters.

Solutions: One straightforward approach to solve the non-convex discrete optimization problem is to appropriately relax it into a tractable convex problem. The obtained solution is then projected into a feasible set satisfying the practical hardware constraints. Moreover, to alleviate the computational complexity of optimizing a large number of coupled phase shifts, efficient machine learning (ML) techniques such as DRL can be utilized.

V-C Performance Analysis

Challenges: To characterize the fundamental performance limits of the SIM, it is necessary to establish a theoretical analysis framework for evaluating the trade-offs between the computing capability, energy consumption, and complexity. The complex coupling of transmission coefficients associated with multiple layers in optimization problems hinders the derivation of closed-form solutions. When evaluating HOENN, it is crucial to accurately model the nonlinear component and errors caused by signal conversion.

Solutions: Matrix decomposition is essential to characterize the capabilities of SIMs to perform matrix operations through cascaded multiplication of wave propagation and transmission coefficient matrices. Additionally, the explainable HOENN may be beneficial to understanding the semantic information extracted and pursuing fundamental performance limits for wireless sensing and communication applications.

VI Conclusion

SIMs have shown great potential in wireless sensing and communication systems due to their capability of efficiently performing computation in the wave domain at the speed of light. In this article, we comprehensively overviewed the prospective applications and challenges of SIMs. We elaborated on the operational principles and existing hardware architectures of SIMs. Furthermore, we discussed the applications of SIMs in wave-domain beamforming as well as channel modeling and estimation. Then, HOENN, an extended concept of SIMs, was developed and verified by examining two tasks: disaster monitoring and DOA estimation. Some implementation challenges, including hardware imperfections, efficient SIM configuration, and performance analysis, are identified. In a nutshell, SIM technology presents a promising paradigm for realizing low-latency power-efficient computation with extensive research opportunities ahead.

References

  • [1] Z. Wang, H. Liu, J. Zhang, R. Xiong, K. Wan, X. Qian, M. Di Renzo, and R. C. Qiu, “Multi-user ISAC through stacked intelligent metasurfaces: New algorithms and experiments,” arXiv preprint arXiv:2405.01104, 2024.
  • [2] X. Zhai, X. Chen, J. Xu, and D. W. Kwan Ng, “Hybrid beamforming for massive MIMO over-the-air computation,” IEEE Trans. Commun., vol. 69, no. 4, pp. 2737–2751, Apr. 2021.
  • [3] J. An, C. Xu, D. W. K. Ng, G. C. Alexandropoulos, C. Huang, C. Yuen, and L. Hanzo, “Stacked intelligent metasurfaces for efficient holographic MIMO communications in 6G,” IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2380–2396, Aug. 2023.
  • [4] R. Deng, B. Di, H. Zhang, D. Niyato, Z. Han, H. V. Poor, and L. Song, “Reconfigurable holographic surfaces for future wireless communications,” IEEE Wireless Commun., vol. 28, no. 6, pp. 126–131, Dec. 2021.
  • [5] C. Liu, Q. Ma, Z. J. Luo, Q. R. Hong, Q. Xiao, H. C. Zhang, L. Miao, W. M. Yu, Q. Cheng, L. Li et al., “A programmable diffractive deep neural network based on a digital-coding metasurface array,” Nat. Electron., vol. 5, no. 2, pp. 113–122, Feb. 2022.
  • [6] J. Li, D. Mengu, N. T. Yardimci, Y. Luo, X. Li, M. Veli, Y. Rivenson, M. Jarrahi, and A. Ozcan, “Spectrally encoded single-pixel machine vision using diffractive networks,” Sci. Adv., vol. 7, no. 13, p. eabd7690, Mar. 2021.
  • [7] J. An, C. Yuen, Y. L. Guan, M. Di Renzo, M. Debbah, H. V. Poor, and L. Hanzo, “Two-dimensional direction-of-arrival estimation using stacked intelligent metasurfaces,” IEEE J. Sel. Areas Commun., to be appeared.
  • [8] H. Liu, J. An, D. W. K. Ng, G. C. Alexandropoulos, and L. Gan, “DRL-based orchestration of multi-user MISO systems with stacked intelligent metasurfaces,” arXiv preprint arXiv:2402.09006, 2024.
  • [9] N. S. Perović and L.-N. Tran, “Mutual information optimization for SIM-based holographic MIMO systems,” arXiv preprint arXiv:2403.18307, 2024.
  • [10] A. Papazafeiropoulos, J. An, P. Kourtessis, T. Ratnarajah, and S. Chatzinotas, “Achievable rate optimization for stacked intelligent metasurface-assisted holographic MIMO communications,” IEEE Trans. Wireless Commun., pp. 1–14, May 2024, Early Access.
  • [11] M. Nerini and B. Clerckx, “Physically consistent modeling of stacked intelligent metasurfaces implemented with beyond diagonal RIS,” IEEE Commun. Lett., pp. 1–5, May 2024, Early Access.
  • [12] X. Yao, J. An, L. Gan, M. Di Renzo, and C. Yuen, “Channel estimation for stacked intelligent metasurface-assisted wireless networks,” IEEE Wireless Commun. Lett., vol. 13, no. 5, pp. 1349–1353, May 2024.
  • [13] Q.-U.-A. Nadeem, J. An, and A. Chaaban, “Hybrid digital-wave domain channel estimator for stacked intelligent metasurface enabled multi-user MISO systems,” arXiv preprint arXiv:2309.16204, 2023.
  • [14] Q. Li, M. El-Hajjar, C. Xu, J. An, C. Yuen, and L. Hanzo, “Stacked intelligent metasurfaces for holographic MIMO aided cell-free networks,” IEEE Trans. Commun., pp. 1–13, May 2024, Early Access.
  • [15] A. Papazafeiropoulos, P. Kourtessis, and S. Chatzinotas, “Performance of double-stacked intelligent metasurface-assisted multiuser massive MIMO communications in the wave domain,” arXiv preprint arXiv:2402.16405, 2024.