Gemini: Integrating Full-fledged Sensing upon Millimeter Wave Communications

Yilong Li1, Zhe Chen2, Jun Luo3, Suman Banerjee1 University of Wisconsin-Madison USA1, Fudan University2China,
Nanyang Technological University3Singapore
Abstract.

Integrating millimeter wave (mmWave) technology in both communication and sensing is promising as it enables the reuse of existing spectrum and infrastructure without draining resources. Most existing systems piggyback sensing onto conventional communication modes without fully exploiting the potential of integrated sensing and communication (ISAC) in mmWave radios (not full-fledged). In this paper, we design and implement a full-fledged mmWave ISAC system Gemini; it delivers raw channel states to serve a broad category of sensing applications. We first propose the mmWave self-interference cancellation approach to extract the weak reflected signals for near-field sensing purposes. Then, we develop a joint optimization scheduling framework that can be utilized in accurate radar sensing while maximizing the communication throughput. Finally, we design a united fusion sensing algorithm to offer a better sensing performance via combining monostatic and bistatic modes. We evaluate our system in extensive experiments to demonstrate Gemini’s capability of simultaneously operating sensing and communication, enabling mmWave ISAC to perform better than the commercial off-the-shelf mmWave radar for 5G cellular networks.

conference: The 30th Annual International Conference on Mobile Computing and Networking; UW-Madison.; USAbooktitle: The 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom’24), September 30 – October 4, 2023, Washington, D.C., USAcopyright: none

1. Introduction

The past decade has witnessed significant progress in millimeter wave (mmWave) technology since it offers wider bandwidth and more antennas (e.g., 2​ GHz bandwidths and 16-element antennas (mmFLEX-MobiSys20, )), compared with commodity sub-10​ GHz technologies (e.g., Wi-Fi 6 (802_11ax, ; wifi6, )). On the one hand, these features enable mmWave communications to support a wide range of high-throughput applications, such as ultra HD video streaming, virtual and augmented reality (MoVR-NSDI17, ; TengVR-MobiCom17, ). On the other hand, the same features also endow mmWave sensing with high spatial resolution, making mmWave radars necessary components of smart vehicles and robots (mmV2X-MobiCom20, ; milliMap-MobiSys20, ; radatron-ECCV22, ). While progress in communication and sensing used to be fairly independent, both academia and industry have started exploring the promising integration of sensing and communication (ISAC) for the next generation of mmWave systems (SPARCS-IPSN22, ; liu2018toward, ), enabling the use of existing spectrum and infrastructure, avoiding the cost of new hardware.

A number of existing theoretical proposals on mmWave ISAC systems for 6G and beyond have appeared (fan6GISAC-JSAC22, ) focus mainly on focus on waveform design, rather than guiding the practical systems to merge one function (e.g., sensing) with devices designed for another purpose (e.g., mmWave Wi-Fi (mmtrack-INFOCOM20, ; mmeye-IoTJ, )). Among the few existing mmWave ISAC systems, most are designed to piggyback sensing onto communication infrastructure, hence confining sensing to only the multi-static communication setting with transmitter (Tx) and receiver (Rx) physically separated (SPARCS-IPSN22, ). Only one SDR-based system by far has emulated a radar-like monostatic sensing capability (guan-TMTT2021, ), where the Tx and Rx are co-located to enable precise range estimation, yet it bears no intention in realizing ISAC: its waveform is designed only for sensing, rendering it largely incompatible with the existing commodity mmWave devices (intelwigig, ; qualcommwigig, ). In a nutshell, while theoretical studies on ISAC barely contribute to the development of real systems, existing system design stays at adapting phase far from a full integration.

Refer to caption
Figure 1. Vision of full-fledged mmWave ISAC system.

Unlike conventional sub-10​ GHz Wi-Fi often getting only a couple of antennas, mmWave communication systems have their access point (AP) and user equipment (UE) both equipped with multiple phased arrays (a.k.a. hybrid beamforming for massive MIMO (RobertsFDmmWave, ; ISAChybridbeamforming-ICC22, )). As a result, typical mmWave ISAC scenarios shown in Figure 1 have their communication links highly directional, making them sub-optimal for the multi-static sensing setting where sensing subjects are often off the link. Therefore, a full-fledged mmWave ISAC system would need to better leverage the beamforming capability to handle sensing and communication in a truly integrated manner. In particular, the beam patterns should be designed to support three functions: i) pure communication, ii) pure sensing under monostatic (reflection) sensing mode, and iii) simultaneous communication and sensing under monostatic and multi-static modes, as illustrated in Figure 1.

Implementing such a truly integrated mmWave ISAC system faces three major challenges. Firstly, unlike long-range sensing where direct Tx interference can be readily removed thanks to the fine-grained temporal resolution resulting from mmWave’s wide bandwidth, short-range sensing commonly adopted for indoor scenarios has to cope with the Tx interference potentially overwhelming the reflected sensing signals (isacot, ). Second, though beam scheduling exists for mmWave communications, satisfying the three functions of mmWave ISAC demands a largely enhanced fair scheduling algorithm to weigh among all necessary beam patterns. Last but not least, both monostatic and multi-static modes may coexist and thus generate complementary sensing information concerning the same subject, incurring the need for a unified framework to fuse such diversified information in a constructive manner.

To this end, we build Gemini as a system that effectively integrates full-fledged sensing and communication at mmWave band. Gemini employs both a smart beamforming and a deep neural model to cancel the 2​ GHz wideband Tx interference; this allows the weak reflected signals from sensing subjects to be effectively extracted for short-range sensing. Extending the idea of smart beamforming, Gemini further innovates in a set cover inspired beam scheduling algorithm, in order to satisfy both sensing accuracy and communication throughput. Finally, Gemini is equipped with a distributed fusion mechanism for unified estimation, leveraging diversified information gathered from both monostatic and multi-static sensing modes. All these are wrapped into a mmWave ISAC protocol largely compatible with existing 802.11ay (802_11ay, ), implemented upon Sivers IMA EVK06003 (sivers, ) that operates at 60​ GHz band and is equipped with 16-element phased arrays. Our major contributions are summarized as follows:

  • To the best of our knowledge, Gemini is the first mmWave ISAC system with a full-fledged sensing capability integrated with default communication function.

  • We propose an interference cancellation scheme driven by deep neural model to combat the wideband and non-linear Tx interference to short-range sensing.

  • We invent an application-aware beam scheduling algorithm for jointly optimizing sensing accuracy and communication throughput.

  • We design a unified estimation framework to leverage the sensing diversity offered by both monostatic and multi-static modes.

  • We evaluate our Gemini prototype with extensive experiments. The results confirm that Gemini gains full-fledged mmWave ISAC capability under realistic scenarios, such as point cloud and human tracking.

Noted that Gemini delivers raw channel states to enable various sensing application; it is not meant for any specific sensing purpose. The rest of our paper is organized as follows. Section 2 provides background and motivations for mmWave ISAC systems. Section 3 presents the critical components of Gemini. Section 4 specifies how the prototype is implemented. Section 5 reports the evaluation results on different applications. We briefly discuss general literature respectively on mmWave platforms and ISAC in Section 6, where we also explain limitations and future directions of Gemini. Finally, Section 7 concludes our paper.

2. Background and Motivations

We start with the background on mmWave communication, and we then provide motivating examples to concretely demonstrate the challenges faced by the mmWave ISAC system design.

2.1. Basics of mmWave Communication

The following procedure summarizes default mmWave communication of IEEE 802.11ay (802_11ay, ).

  • Carrier Sense: In order to avoid collisions, mmWave devices (including AP and UEs) configure quasi-omni-directional patterns to their Rx phased arrays, and perform listen-before-talk principle.

  • Sector Sweep: This phase determines the alignment direction of the main beams between AP and UE. If channel is idle, AP broadcasts sector sweep (SSW) frames with different narrow beams to scan all bearings, and UEs respond with signal strength indicators.

  • MIMO Setup & Training: The AP selects UEs intended for transmissions, and sends special frames to notify them. This is followed by the AP transmitting training (TRN) sequences to these UEs for channel state information (CSI) estimation, and the UEs report results back to the AP.

  • Beamforming: The optimal MIMO configurations are set by the AP, and beamforming is performed.

The implications of the above procedure are twofold: i) only one mmWave device is allowed to transmit due to carrier sensing and later responses from an UE, and ii) the bearings of all UEs111In fact, the ranges between the AP and all UEs can be also obtained via the two-way time of arrival ranging method (ScalingmmWave_IEEE, ). are obtained in the sector sweep phase. Therefore, it is reasonable to consider that all UE locations are known to the AP; this applies to all subjects too (see Section 3.1.2).

2.2. Tx Interference on Short-range Sensing

To realize the radar-like monostatic sensing function, Rx needs to be co-located with Tx to receive the reflected signals incurred by transmissions (TImmWave, ), but this arrangement naturally leads to Tx interference that potentially overpowers the reflected sensing signals. For long-range monostatic sensing envisioned for 5/6G base stations (guan-TMTT2021, ) where subjects are hundreds or even thousands of meters away from the signal source, one may leverage the fine-grained range bins (TImmWave, ) offered by the wide bandwidth of mmWave to handle Tx interference. Unfortunately, short-range monostatic sensing, often used indoors, is challenged by strong transmission interference, which can spread across several bins and overwhelm the sensing signals.

Though the same challenge also exists for enabling ISAC upon sub-10​ GHz IoT devices (isacot, ), the Tx interference of mmWave is more complicated, due to the much higher carrier frequency and wider bandwidth. Such interference is quite different in nature too, as existing mmWave communication platforms (e.g., (mmFLEX-MobiSys20, ; MIMORPH-MobiSys21, )) often have separated Tx and Rx chains with their respective phased arrays, the Tx interference takes place in a cross-chain manner, instead of the intra-chain style for IoT devices (isacot, ). To better understand how Tx interference affects the short-range monostatic sensing performance, we take the “push-pull” hand gesture as an example, and measure the signal magnitude variations caused by it, using Gemini whose implementation will be introduced in Section 4.

Our experiments follow 802.11ay standard (802_11ay, ): OFDM is adopted for transmission and the CSI is obtained from the TRN field of a packet. For sensing purpose, we apply the inverse FFT to each CSI to obtain the CIR (channel impulse response) as the fast-time dimension, and combine multiple CIRs as the slow-time dimension. For the sake of clarity, we specifically differentiate between Tx-beamforming and Rx-beamforming, where only the respective phased arrays are leveraged to achieve the “focusing” effect in this experiment.

We first control the main beams of Tx-beamforming and Rx-beamforming at 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT orientation to point to the hand, and show the heatmap of CIR matrix in Figure 2(a). As expected, the heatmap exhibits random patterns that totally annihilate any features from the push-pull hand gesture, clearly demonstrating the damaging effect of the strong Tx interference to saturate the Rx chain. We then use Tx and Rx beamforming to cancel the Tx interference as much as possible. As shown in Figure 2(b), although certain beamforming patterns may help “single out” the features of the push-pull hand gesture, minor residual interference still persists to affect the CIR matrix used for sensing. More importantly, as beamforming is commonly used by mmWave communication systems to enhance channel quality, borrowing the same technique for sensing purpose may cause conflict between these two objectively; which leads to our next challenge.

Refer to caption
(a) Without proper beamforming.
Refer to caption
(b) With proper beamforming.
Figure 2. Sensing results, in the form of CIR matrices, of push-pull hand gesture, w/ and w/o beamforming.

2.3. Beam Scheduling Matters

Recalling that most of the current mmWave research proposals treat communication and sensing independently: they either focus on improving network throughput (mmV2X-MobiCom20, ; openmili-MobiCom16, ; nullifi-NSDI21, ), or dedicate the mmWave devices to serve bistatic sensing (SPARCS-IPSN22, ). Only one recent proposal (SideLobe-UbiComp23, ) considers simultaneous communication and sensing, but the sensing is only piggybacked on communications by leveraging the side lobes to conduct low-effective sensing: it trades latency for temporal diversity so as to enhance sensing quality, at the cost of handling only static sensing subjects. For full-fledged ISAC mmWave systems, we need to balance the need from three functions, namely pure communication, pure sensing, and simultaneous communication and sensing. In the following, we conduct two experiments using the same setting, as in Section 2.2 to demonstrate the inherent conflict among these functions.

Given a setup consisting of an AP, a subject, and a UE arranged in an isosceles triangle configuration, the performance evaluation is carried out in two distinct cases: Case 1 chooses the best alignment between the main beams of the AP and UE for communications, while Case 2 has the AP and UE beamforming towards the subject for sensing hand gesture (only AP Tx and Rx beamforming is needed for monostatic sensing). The performance of these cases are depicted in Figure 3 for both monostatic and bistatic modes. It is evident that Case 1 gains much higher throughput than Case 2 in both sensing modes, but it is the reverse situation for hand gesture sensing. Also, Figure 3(d) serves as a counterexample for the effectiveness of the sidelobe sensing (SideLobe-UbiComp23, ). Apparently, the distinct beam patterns and the one-dimensional scheduling (only in bearings) are the key factor for the performance in both cases, indicating the need for a new beam scheduling algorithm to serve the best interest of both communication and sensing under both sensing modes.

Refer to caption
(a) Throughput (monostatic).
Refer to caption
(b) CIR matrix (monostatic).
Refer to caption
(c) Throughput (bistatic).
Refer to caption
(d) CIR matrix (bistatic).
Figure 3. The communication throughput and sensing CIR heatmaps in monostatic and bistatic modes.
Refer to caption
(a) Reflection power.
Refer to caption
(b) S-SNR.
Figure 4. The reflection power and S-SNR in monostatic and bistatic modes, respectively.
Refer to caption
Figure 5. Architecture of Gemini with four software components and one hardware platform.

2.4. Complementary Sensing Modes

Whereas the monostatic mode has many advantages (mostly in terms of the synchrony between Tx and Rx) (isacot, ), there exist certain situations where the bistatic mode may provide complementary sensing information: this is the diversity gain achievable due to the different “viewpoints”. Given the same experimental setting as Section 2.3, we measure the reflection power and the signal-to-noise ratio for sensing (S-SNR), computed similarly to that for communication) in a setup where the subject stands sideways with the shoulder facing the AP to minimize the impact of the hand gesture to the monostatic sensing signals (hence rendering it disadvantageous); the measurements for both monostatic and bistatic modes are shown in Figure 4. Clearly, the bistatic mode outperforms the monostatic mode in both reflection power and SNR, with 3 ​dBm gain in average power (nearly doubling that of monostatic mode’s 3.3 dBm power), as shown in Figure 4(a), and more than 2 ​dB gain in average S-SNR upon that of the monostatic mode’s 2.05 ​dB, as shown in Figure 4(b).

These results can be explained by the relation between the motion direction of hand and the direction of signal reflections. As motion sensing with mmWave signals relies on the signal magnitude variations caused by motion, the better the motion direction is aligned with that of reflection, the stronger the reflection variations are. In reality, the chances for an arbitrary motion direction to be aligned with either monostatic or bistatic sensing mode is surely higher than with only one of them, because combining distinct reflection directions could improve the signal diversity. Since an ISAC system should definitely leverage this sensing diversity to offer effective sensing capabilities, it is imperative to have a unified framework for merging monostatic and multi-static sensing modes in a constructive way.

3. Gemini: mmWave ISAC Design

Motivated by the observations made in Section 2, our Gemini design comprises five key components: i) a sensing-aware channel probing scheme, ii) a two-stage Tx interference cancellation, iii) a holistic beamforming and scheduling mechanism for both sensing and communications, iv) an algorithm to exploit the diversity in sensing modes, and v) a hardware platform to support previous components. Given the overall construction of Gemini shown in Figure 5, the first four components are held in the three leftmost blocks, and they control the remaining (mostly hardware) blocks to perform beamforming and cancel interference, aiming to satisfy the diversified ISAC requirements. In the following, we introduce the first four components respectively but postpone the platform implementation details to Section 4.

3.1. Channel Modeling and Probing

In this section, we design a probing scheme to obtain mmWave MIMO channel states for both sensing and communication, based on a properly defined channel model.

3.1.1. Modeling mmWave MIMO Channels

Compared with conventional (communication) channel, the mmWave MIMO channels for ISAC can be far more complicated, as they include at least four components: Tx interference, monostatic sensing (reflection), communication, and multi-static sensing (reflection). In a typical indoor mmWave MIMO setting, the AP has N𝑁Nitalic_N Tx/Rx chains, each of them equipped with an M𝑀Mitalic_M-elements phased array. For ISAC-oriented temporal scheduling, the AP selects NUsubscript𝑁UN_{\mathrm{U}}italic_N start_POSTSUBSCRIPT roman_U end_POSTSUBSCRIPT UEs and NSsubscript𝑁SN_{\mathrm{S}}italic_N start_POSTSUBSCRIPT roman_S end_POSTSUBSCRIPT subjects (where N=NU+NS𝑁subscript𝑁Usubscript𝑁SN=N_{\mathrm{U}}+N_{\mathrm{S}}italic_N = italic_N start_POSTSUBSCRIPT roman_U end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT roman_S end_POSTSUBSCRIPT) from all UEs and subjects to serve at each time slot. Since the MU-MIMO via hybrid beamforming is adopted by IEEE 802.11ay (802_11ay, ), we consider the AP performing the MU-MIMO to the UEs. We let 𝒔(t)=[s1(t),,sN(t)]𝒔𝑡subscript𝑠1𝑡subscript𝑠𝑁𝑡\bm{s}(t)=[s_{1}(t),\cdots,s_{N}(t)]bold_italic_s ( italic_t ) = [ italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , ⋯ , italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) ], 𝒚u(t)=[y1u(t),,yNUu(t)]superscript𝒚u𝑡subscriptsuperscript𝑦u1𝑡subscriptsuperscript𝑦usubscript𝑁U𝑡\bm{y}^{\mathrm{u}}(t)=[y^{\mathrm{u}}_{1}(t),\cdots,y^{\mathrm{u}}_{N_{% \mathrm{U}}}(t)]bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) = [ italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , ⋯ , italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ], and 𝒚s(t)=[y1s(t),,yNSs(t)]superscript𝒚s𝑡subscriptsuperscript𝑦s1𝑡subscriptsuperscript𝑦ssubscript𝑁S𝑡\bm{y}^{\mathrm{s}}(t)=[y^{\mathrm{s}}_{1}(t),\cdots,y^{\mathrm{s}}_{N_{% \mathrm{S}}}(t)]bold_italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ( italic_t ) = [ italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , ⋯ , italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_S end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) ] respectively represent the Tx signals from the AP, the Rx signals at the UEs, and the monostatic sensing signals received by the AP. Note that, 𝒚u(t)superscript𝒚u𝑡\bm{y}^{\mathrm{u}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) and 𝒚s(t)superscript𝒚s𝑡\bm{y}^{\mathrm{s}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ( italic_t ), even when happening to the same device, are temporally separated by the MAC protocol (see Section 2.1), while 𝒚u(t)superscript𝒚u𝑡\bm{y}^{\mathrm{u}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) is the superposition of the multi-static sensing signals 𝒚us(t)superscript𝒚us𝑡\bm{y}^{\mathrm{us}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_us end_POSTSUPERSCRIPT ( italic_t ) and the pure communication signals 𝒚uc(t)superscript𝒚uc𝑡\bm{y}^{\mathrm{uc}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_uc end_POSTSUPERSCRIPT ( italic_t ), i.e., 𝒚u(t)=𝒚us(t)+𝒚uc(t)superscript𝒚u𝑡superscript𝒚us𝑡superscript𝒚uc𝑡\bm{y}^{\mathrm{u}}(t)=\bm{y}^{\mathrm{us}}(t)+\bm{y}^{\mathrm{uc}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) = bold_italic_y start_POSTSUPERSCRIPT roman_us end_POSTSUPERSCRIPT ( italic_t ) + bold_italic_y start_POSTSUPERSCRIPT roman_uc end_POSTSUPERSCRIPT ( italic_t ). In total, we can characterize the simultaneous ISAC (both AP and UE) Rx signals by:

(1) 𝒚(t)𝒚𝑡\displaystyle\bm{y}(t)bold_italic_y ( italic_t ) =[𝒚u(t),𝒚s(t)]absentsuperscriptsuperscript𝒚u𝑡superscript𝒚s𝑡\displaystyle=[\bm{y}^{\mathrm{u}}(t),\bm{y}^{\mathrm{s}}(t)]^{\prime}= [ bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) , bold_italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ( italic_t ) ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
(2) =𝑾DBRx𝑾ABRx𝑯(t)𝑾ABTx𝑾DBTx𝒔(t)+𝑾DBRx𝑾ABRx𝒏(t),absentsuperscriptsubscript𝑾DBRxsuperscriptsubscript𝑾ABRx𝑯𝑡superscriptsubscript𝑾ABTxsuperscriptsubscript𝑾DBTx𝒔𝑡superscriptsubscript𝑾DBRxsuperscriptsubscript𝑾ABRx𝒏𝑡\displaystyle=\bm{W}_{\mathrm{DB}}^{\mathrm{Rx}}\bm{W}_{\mathrm{AB}}^{\mathrm{% Rx}}\bm{H}(t)\bm{W}_{\mathrm{AB}}^{\mathrm{Tx}}\bm{W}_{\mathrm{DB}}^{\mathrm{% Tx}}\bm{s}(t)+\bm{W}_{\mathrm{DB}}^{\mathrm{Rx}}\bm{W}_{\mathrm{AB}}^{\mathrm{% Rx}}\bm{n}(t),= bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_H ( italic_t ) bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT bold_italic_s ( italic_t ) + bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_n ( italic_t ) ,

where 𝑾DBRxsuperscriptsubscript𝑾DBRx\bm{W}_{\mathrm{DB}}^{\mathrm{Rx}}bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT, 𝑾DBTxsuperscriptsubscript𝑾DBTx\bm{W}_{\mathrm{DB}}^{\mathrm{Tx}}bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT, 𝑾ABRxsuperscriptsubscript𝑾ABRx\bm{W}_{\mathrm{AB}}^{\mathrm{Rx}}bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT, and 𝑾ABTxsuperscriptsubscript𝑾ABTx\bm{W}_{\mathrm{AB}}^{\mathrm{Tx}}bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT are the Rx digital beamformers, the Tx digital beamformers, the Rx analog beamformers, and the Tx analog beamformers, respectively, []superscriptdelimited-[][\cdot]^{\prime}[ ⋅ ] start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denotes matrix transpose, and 𝒏(t)𝒏𝑡\bm{n}(t)bold_italic_n ( italic_t ) is the additive Gaussian noise with variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. It is worth noting that 𝑾Rxsuperscriptsubscript𝑾Rx\bm{W}_{*}^{\mathrm{Rx}}bold_italic_W start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT is of block-diagonal form, as it involves the beamformer of one device (e.g., UE) for 𝒚u(t)superscript𝒚u𝑡\bm{y}^{\mathrm{u}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_u end_POSTSUPERSCRIPT ( italic_t ) and that of another (e.g., AP) for 𝒚s(t)superscript𝒚s𝑡\bm{y}^{\mathrm{s}}(t)bold_italic_y start_POSTSUPERSCRIPT roman_s end_POSTSUPERSCRIPT ( italic_t ).

The 𝑯(t)𝑯𝑡\bm{H}(t)bold_italic_H ( italic_t ) in Eqn. (1) represents the mmWave MIMO channel; it involves the following major components:

(3) 𝑯(t)𝑯𝑡\displaystyle\bm{H}(t)bold_italic_H ( italic_t ) =𝑯ti(t)+𝑯s(t)+𝑯c(t)+𝑯ms(t),absentsubscript𝑯ti𝑡subscript𝑯s𝑡subscript𝑯c𝑡subscript𝑯ms𝑡\displaystyle=\bm{H}_{\mathrm{ti}}(t)+\bm{H}_{\mathrm{s}}(t)+\bm{H}_{\mathrm{c% }}(t)+\bm{H}_{\mathrm{ms}}(t),\vspace{-1.5ex}= bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) + bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) + bold_italic_H start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( italic_t ) + bold_italic_H start_POSTSUBSCRIPT roman_ms end_POSTSUBSCRIPT ( italic_t ) ,

where the 4 terms on the right-hand side denote the Tx interference channels, monostatic sensing channels, communication channels, and multi-static sensing channels, respectively. According to Section 2.2, 𝑯ti(t)subscript𝑯ti𝑡\bm{H}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) may strongly affect 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ), but the impact from 𝑯c(t)subscript𝑯c𝑡\bm{H}_{\mathrm{c}}(t)bold_italic_H start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( italic_t ) to 𝑯ms(t)subscript𝑯ms𝑡\bm{H}_{\mathrm{ms}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ms end_POSTSUBSCRIPT ( italic_t ) can be readily handled (mmtrack-INFOCOM20, ). Moreover, Eqn. (1) confirms what we have observed in Section 2.3: the hybrid beamformers can be scheduled to largely reshape the channels so as to affect performance in both communication and sensing. Consequently, though the two groups of channels, i.e., {𝑯ti(t),𝑯s(t)}subscript𝑯ti𝑡subscript𝑯s𝑡\{\bm{H}_{\mathrm{ti}}(t),\bm{H}_{\mathrm{s}}(t)\}{ bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) , bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) } vs {𝑯c(t),𝑯ms(t)}subscript𝑯c𝑡subscript𝑯ms𝑡\{\bm{H}_{\mathrm{c}}(t),\bm{H}_{\mathrm{ms}}(t)\}{ bold_italic_H start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( italic_t ) , bold_italic_H start_POSTSUBSCRIPT roman_ms end_POSTSUBSCRIPT ( italic_t ) }, are seemingly independent of each other as they are physically or temporally separated, they become correlated as they may share the same Tx beamformers 𝑾Txsuperscriptsubscript𝑾Tx\bm{W}_{*}^{\mathrm{Tx}}bold_italic_W start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT. In a nutshell, the goal of Gemini is to optimize the hybrid Tx/Rx beamformers on all these channels in 𝐇(t)𝐇𝑡\bm{H}(t)bold_italic_H ( italic_t ) for balancing the performance between sensing and communication. We take a divide-and-conquer method to approach this optimization, by eliminating 𝑯ti(t)subscript𝑯ti𝑡\bm{H}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) in Section 3.2 and jointly scheduling {𝑯s(t),𝑯c(t),𝑯ms(t)}subscript𝑯s𝑡subscript𝑯c𝑡subscript𝑯ms𝑡\{\bm{H}_{\mathrm{s}}(t),\bm{H}_{\mathrm{c}}(t),\bm{H}_{\mathrm{ms}}(t)\}{ bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) , bold_italic_H start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( italic_t ) , bold_italic_H start_POSTSUBSCRIPT roman_ms end_POSTSUBSCRIPT ( italic_t ) } in Section 3.3. However, before executing this plan, a scheme to probe the channel states needs to be in place.

3.1.2. Probing mmWave ISAC Channels

Recall that a probing scheme is employed for mmWave communications to acquire 𝑯csubscript𝑯c\bm{H}_{\mathrm{c}}bold_italic_H start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT in Section 2.1 under the sector sweep phase, which also yields 𝑯ms(t)subscript𝑯ms𝑡\bm{H}_{\mathrm{ms}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ms end_POSTSUBSCRIPT ( italic_t ) and the bearings of subjects via static background removal (adib2015multi, ) and signal detection (richards2014fundamentals, ). Since {𝑯ti(t),𝑯s(t)}subscript𝑯ti𝑡subscript𝑯s𝑡\{\bm{H}_{\mathrm{ti}}(t),\bm{H}_{\mathrm{s}}(t)\}{ bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) , bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) } cannot be directly acquired by this scheme, we piggyback an additional probing scheme on the existing one by emulating its behavior. Consequently, we mainly focus on this added scheme aiming to estimate the Tx interference channels 𝑯ti(t)subscript𝑯ti𝑡\bm{H}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) for the follow-up cancellation that obtains monostatic sensing channel 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) in Section 3.2. A byproduct of this two-round probing and the later resulting 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) is the locations of all subjects, as subject ranges can be inferred from 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t )’s time-of-flight.

Since the phased array equipped at Tx chain has multiple beam patterns and each of them causes different Tx interference, a probing scheme needs to collect all kinds of corresponding Tx interference channels. Therefore, our added scheme involves an additional round of SSW with a substantially lower Tx power to create a “wireless shortcut” for probing 𝑯ti(t)subscript𝑯ti𝑡\bm{H}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ), in order to avoid the interference from surrounding environments. As shown in Figure 6, upon transmitting SSW frames with Tx sector ID sequentially from a device (the pink sectors), the phased arrays of its Rx chains are set to quasi-omni-directional patterns to receive SSW frames (the blue sectors). In this way, the channel estimation field with TRN of SSW is utilized to extract the Tx interference channels 𝑯ti(t)subscript𝑯ti𝑡\bm{H}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) via a minimum mean square-error estimation (speth1999optimum, ).

Refer to caption
Figure 6. Tx interference probing via extended SSW: “SSW sector” bars show the interference strength for each direction.
Refer to caption
(a) Comunication.
Refer to caption
(b) Bistatic.
Refer to caption
(c) Tx interference for monostatic.
Refer to caption
(d) Tx interference correlation.
Figure 7. Beamforming heatmaps of (a-b) default probing for communications and multi-static sensing, (c) new probing scheme for Tx interference, with (d) statistics on interference stability.

Whereas the UE channel in Figure 7(a) is discovered by the 1st round SSW, the two sensing channels in Figure 7(b) are probed during the same SSW round. Apparently, the beamforming for UE may allow for sensing ‘Subject1’ at the same time, yet sensing ‘Subject2’ has to be scheduled in a different time slot. Moreover, the 2nd round SSW obtains Tx interference states shown in Figure 7(c): the impact appears to be more intensive at 0 and is otherwise quasi-symmetric on two sides. We also measure the correlation between any two Tx interference channels at different time slots, and the results in Figure 7(d) demonstrate correlation coefficients mostly above 0.8. Therefore, the information provided by one round SSW can help train a cancellation process that remains valid for a long period.

3.2. Monostatic Sensing for mmWave ISAC

In this section, we introduce a two-stage Tx interference cancellation process to enable monostatic sensing: namely beam nulling and deep denoising.

3.2.1. Beam Nulling

We leverage the hybrid beamforming to reduce the Tx interference at the first stage. Since advanced phased array allows for controlling both amplitude and phase (SiversBF01_2021, ) via antenna weights vectors (AWV), we leverage this ability to offer efficient solutions for future developments. To achieve beam nulling for the i𝑖iitalic_i-th sector containing at least a subject, we leverage the hybrid beamforming to minimize Tx interference 𝑯tii(t)subscriptsuperscript𝑯𝑖ti𝑡\bm{H}^{i}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ):

(4) min𝑾DBRx,𝑾ABRx,𝑾ABTx,𝑾DBTxsubscriptsuperscriptsubscript𝑾DBRxsuperscriptsubscript𝑾ABRxsuperscriptsubscript𝑾ABTxsuperscriptsubscript𝑾DBTx\displaystyle\textstyle{\min_{\bm{W}_{\mathrm{DB}}^{\mathrm{Rx}},\bm{W}_{% \mathrm{AB}}^{\mathrm{Rx}},\bm{W}_{\mathrm{AB}}^{\mathrm{Tx}},\bm{W}_{\mathrm{% DB}}^{\mathrm{Tx}}}}~{}~{}roman_min start_POSTSUBSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT , bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT , bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT , bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT end_POSTSUBSCRIPT 𝑾DBRx𝑾ABRx𝑯tii(t)𝑾ABTx𝑾DBTxF2,subscriptsuperscriptnormsuperscriptsubscript𝑾DBRxsuperscriptsubscript𝑾ABRxsubscriptsuperscript𝑯𝑖ti𝑡superscriptsubscript𝑾ABTxsuperscriptsubscript𝑾DBTx2𝐹\displaystyle\|\bm{W}_{\mathrm{DB}}^{\mathrm{Rx}}\bm{W}_{\mathrm{AB}}^{\mathrm% {Rx}}\bm{H}^{i}_{\mathrm{ti}}(t)\bm{W}_{\mathrm{AB}}^{\mathrm{Tx}}\bm{W}_{% \mathrm{DB}}^{\mathrm{Tx}}\|^{2}_{F},∥ bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Rx end_POSTSUPERSCRIPT bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) bold_italic_W start_POSTSUBSCRIPT roman_AB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT bold_italic_W start_POSTSUBSCRIPT roman_DB end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Tx end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ,

where F\|\cdot\|_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the Frobenius norm. Since the hybrid beamformers involve a large amount of parameters to be optimized, searching for an optimal solution directly using conventional optimization methods can be highly inefficient. Moreover, since these parameters will also be needed for the joint scheduling in Section 3.3, the degree of freedom in fine-tuning them is limited.

Fortunately, we find the problem (4) can be approached as training a neural network, given the similarity between the data pipeline (see Figure 5) and a linear autoencoder (kunin2019loss, ). We reformulate Eqn. (4) into an inverse neural network shown in Figure 8, where the Tx interference channels and the Tx/Rx hybrid beamformers are respectively modeled as the bottleneck layer (weights frozen as the channel 𝑯tii(t)subscriptsuperscript𝑯𝑖ti𝑡\bm{H}^{i}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) is physically determined) and decoder/encoder whose weights are beamformer parameters, while the Tx signals 𝒔𝒔\bm{s}bold_italic_s become the TRN. The objective of a hypothetical training process should result in the output 𝒙𝒙\bm{x}bold_italic_x being the 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) after canceling the overwhelming 𝑯tii(t)subscriptsuperscript𝑯𝑖ti𝑡\bm{H}^{i}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ). However, the network cannot be trained without any ground truth dataset for 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ).

Refer to caption
Figure 8. The two-stage Tx interference cancellation via linear autodecoder (LAE) and cGAN.

3.2.2. Deep Denoising

By far, we have reformed the problem to reconstructing 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) from the observation 𝑯(t)=𝑯tii(t)+g(𝑯s(t))superscript𝑯𝑡subscriptsuperscript𝑯𝑖ti𝑡𝑔subscript𝑯s𝑡\bm{H}^{\prime}(t)=\bm{H}^{i}_{\mathrm{ti}}(t)+g\left(\bm{H}_{\mathrm{s}}(t)\right)bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) = bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) + italic_g ( bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) ) where the g()𝑔g(\cdot)italic_g ( ⋅ ) is a deterministic mapping introduced by the beamforming operations. Since obtaining the ground truth for 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) is infeasible, we try to bypass this issue by using a mmWave radar at the 60 ​GHz with 2 ​GHz bandwidth (TImmWave, ) to collect a dataset for 𝑯s(t)subscriptsuperscript𝑯s𝑡\bm{H}^{\prime}_{\mathrm{s}}(t)bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ): since the radar is designed for sensing but with system parameters similar to Gemini, 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) and 𝑯s(t)subscriptsuperscript𝑯s𝑡\bm{H}^{\prime}_{\mathrm{s}}(t)bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) should share high-level yet intrinsic features. It should be stressed that the mmWave radar is only deployed during the training stage and is surely not required by Gemini in runtime. Though we cannot directly use 𝑯s(t)subscriptsuperscript𝑯s𝑡\bm{H}^{\prime}_{\mathrm{s}}(t)bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) as ground truth for supervised learning, exploiting it to perform adversarial learning is certainly feasible. To this end, we append an digital cancellation neural network 𝑪D(t)subscript𝑪D𝑡\bm{C}_{\mathrm{D}}(t)bold_italic_C start_POSTSUBSCRIPT roman_D end_POSTSUBSCRIPT ( italic_t ) to the above model (see Figure 8), and we train the whole model as a cGAN (conditional Generative Adversarial Network) (mirza2014conditional, ). Our model is trained to learn how to cancel Tx interference regardless of all environment inferences, so only one training is needed for each type of mmWave device. We denote by 𝒢𝒢\mathcal{G}caligraphic_G the whole neural network constructed so far; it generates samples 𝒙g=𝒢(𝒛|𝒔)subscript𝒙𝑔𝒢conditional𝒛𝒔\bm{x}_{g}=\mathcal{G}(\bm{z}|\bm{s})bold_italic_x start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = caligraphic_G ( bold_italic_z | bold_italic_s ) with 𝒛𝒛\bm{z}bold_italic_z being the background Gaussian noise introduced by the signal processing/propagation pipeline (LAE). We further employ a discriminator network 𝒟(𝒙)𝒟𝒙\mathcal{D}(\bm{x})caligraphic_D ( bold_italic_x ) aiming to recognize if 𝒙𝒙\bm{x}bold_italic_x comes from {𝑯s(t)}subscriptsuperscript𝑯s𝑡\{\bm{H}^{\prime}_{\mathrm{s}}(t)\}{ bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) }. According to (mirza2014conditional, ), 𝒟𝒟\mathcal{D}caligraphic_D and 𝒢𝒢\mathcal{G}caligraphic_G play a min-max game modeled by:

min𝒢max𝒟𝔼𝒙p𝑯s[log𝒟(𝒙|𝒔)]+𝔼𝒛𝒩[log(1𝒟(𝒢(𝒛|𝒔)))],subscript𝒢subscript𝒟subscript𝔼similar-to𝒙subscript𝑝subscriptsuperscript𝑯sdelimited-[]𝒟conditional𝒙𝒔subscript𝔼similar-to𝒛𝒩delimited-[]1𝒟𝒢conditional𝒛𝒔\min_{\mathcal{G}}\max_{\mathcal{D}}\mathbb{E}_{\bm{x}\sim p_{\bm{H}^{\prime}_% {\mathrm{s}}}}[\log\mathcal{D}(\bm{x}|\bm{s})]+\mathbb{E}_{\bm{z}\sim\mathcal{% N}}[\log(1-\mathcal{D}(\mathcal{G}(\bm{z}|\bm{s})))],roman_min start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_x ∼ italic_p start_POSTSUBSCRIPT bold_italic_H start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ roman_log caligraphic_D ( bold_italic_x | bold_italic_s ) ] + blackboard_E start_POSTSUBSCRIPT bold_italic_z ∼ caligraphic_N end_POSTSUBSCRIPT [ roman_log ( 1 - caligraphic_D ( caligraphic_G ( bold_italic_z | bold_italic_s ) ) ) ] ,

where 𝒩𝒩\mathcal{N}caligraphic_N denotes the Gaussian noise distribution. Essentially, we use the cGAN as a powerful non-linear filter to obtain the monostatic sensing channels by “generating” it out of the channel states contaminated by 𝑯tii(t)subscriptsuperscript𝑯𝑖ti𝑡\bm{H}^{i}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ).

Refer to caption
(a) Output (heatmap) of LAE at 1.
Refer to caption
(b) Output (heatmap) of 𝒢𝒢\mathcal{G}caligraphic_G at 2.
Refer to caption
(c) Respiration sensing results at different stages.
Figure 9. Beamforming heatmaps of our two-stage Tx interference cancellation (a-b) and the resulting respiration sensing outcome (c).

To verify the effectiveness of our design, we let a subject stand in the front of our platform with 15superscript1515^{\circ}15 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT bearing and extract its respiration via the proposed pipeline. We first show the heatmaps of our two-stage cancellation in Figures 9(a) and 9(b); it clearly demonstrates the gradually “diminishing” of 𝑯tii(t)subscriptsuperscript𝑯𝑖ti𝑡\bm{H}^{i}_{\mathrm{ti}}(t)bold_italic_H start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ti end_POSTSUBSCRIPT ( italic_t ) through the pipeline. Further plotting the respiration sensing results in Figure 9(c) confirms the largely suppressed Tx interference 222While Gemini enables Tx interference cancellation, packet detection delays may introduce ranging errors. This is mitigated through an RF loopback calibration technique (guan-TMTT2021, ) in the RF front-end to correct the corresponding phase offsets; it processes Rx signals before ranging..

3.3. Beam Scheduling for mmWave ISAC

Since 802.11ay has a default beam scheduling scheme for MU-MIMO, our new scheduling scheme simply piggybacks on the existing one, aiming to integrate sensing and communication beamforming in the most efficient manner, as briefly envisioned by Figure 10. In the following, we first study how sensing subject selection differs from UE selection, then a scheduling algorithm is designed for efficient coverage of both sensing and communications.

3.3.1. Subject and UE Selection

UE selection has a default implementation in 802.11ay as MU-MIMO. In particular, MU-MIMO starts with AP using the SSW (see Section 2.1) to obtain channel states from UEs and grouping UEs with correlated channels (of similar bearing) (shen2015sieve, ) as a set to be covered by the same analog beam pattern. Then the bearing scheduling arranges UEs from the orthogonal (non-correlated) sets to communicate via distinct Tx chains simultaneously. For UEs covered by the same beam pattern or unable to be scheduled spatially, a temporal scheduling operating in a round-robin fashion is adopted to serve them. In summary, the communication-oriented UE selection operates in a single dimension manner for bearing scheduling, leading to inefficiency in handling the conflict among pure communication, pure sensing, and simultaneous communication and sensing (see Section 2.3) under ISAC context.

Unlike communication, sensing allows subjects with the same bearing but different ranges to be differentiated, given a sufficient range resolution. With the 2 ​GHz bandwidth of Gemini, the resulting centimeter-level resolution may enable many subjects to be simultaneously covered by one beam pattern, adding one more dimension for the ISAC scheduling. In addition, locations of UEs and subjects are known (see Sections 2.1 and 3.1.2). Consequently, we upgrade the correlated set definition to a two-dimensional Beam-Compatible Set (BC-Set) in the beamforming phase; it aims to group both UEs and subjects under a single beam pattern that may serve one UE and all the covered subjects simultaneously. We plot five BC-Sets in Figure 10; they are meant for illustrative purpose only, as realistic beam patterns (shown on the right part) can be far more irregular.

3.3.2. Communication and Sensing Scheduling

To determine the BC-Sets and their corresponding beam patterns, we start from the default communication scheduling that outputs the correlated communication sets 𝒰={U1,}𝒰subscript𝑈1\mathcal{U}=\{U_{1},\cdots\}caligraphic_U = { italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ } (802_11ay, ). As illustrated in Algorithm 1, our algorithm essentially executes in a greedy manner: it first adds as many subjects as possible to sets in 𝒰𝒰\mathcal{U}caligraphic_U for forming the initial BC-Sets, then it constructs new BS-Sets to cover the remaining subjects. It first gathers all positions covered by beam patterns of individual sets in 𝒰𝒰\mathcal{U}caligraphic_U (line 1). Then it performs a range-wise depth-first search (DFS) to gather subjects that can be covered by these sets (line 1), since initially beam patterns are often very narrow. If possible, it adjusts the original narrow beam pattern to trade its range for beam width, making it possible to conduct a bearing-wise breadth search for covering more subjects (line 1). Upon upgrading all sets in 𝒰𝒰\mathcal{U}caligraphic_U to BC-Sets, the algorithm again proceeds in the greedy DFS manner in the next stage, aiming to construct new BC-Sets to cover the remaining subjects (line 1 till the end).

Input :  𝒳𝒳\mathcal{X}caligraphic_X: Range-bearing matrix of all entities
𝒰𝒰\mathcal{U}caligraphic_U: Original communication sets for UEs
r𝑟ritalic_r: Adjustable beam width
Output : \mathcal{B}caligraphic_B: BC-Sets.
1
2for Ui𝒰subscript𝑈𝑖𝒰U_{i}\in\mathcal{U}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_U do
3       𝒞𝖻𝖾𝖺𝗆_𝗉𝖺𝗍𝗍𝖾𝗋𝗇_𝖼𝗈𝗏𝖾𝗋𝖺𝗀𝖾(Ui)𝒞𝖻𝖾𝖺𝗆_𝗉𝖺𝗍𝗍𝖾𝗋𝗇_𝖼𝗈𝗏𝖾𝗋𝖺𝗀𝖾subscript𝑈𝑖\mathcal{C}\leftarrow\mathsf{beam\_pattern\_coverage}(U_{i})caligraphic_C ← sansserif_beam _ sansserif_pattern _ sansserif_coverage ( italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
4       Bi𝗆𝖺𝗑_𝖽𝖾𝗉𝗍𝗁_𝗋𝖺𝗇𝗀𝖾_𝗌𝖾𝖺𝗋𝖼𝗁(𝒞,𝒳)subscript𝐵𝑖𝗆𝖺𝗑_𝖽𝖾𝗉𝗍𝗁_𝗋𝖺𝗇𝗀𝖾_𝗌𝖾𝖺𝗋𝖼𝗁𝒞𝒳B_{i}\leftarrow\mathsf{max\_depth\_range\_search}(\mathcal{C},\mathcal{X})italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← sansserif_max _ sansserif_depth _ sansserif_range _ sansserif_search ( caligraphic_C , caligraphic_X )
5       Bi=𝖻𝗋𝖾𝖺𝖽𝗍𝗁_𝖻𝖾𝖺𝗋𝗂𝗇𝗀_𝗌𝖾𝖺𝗋𝖼𝗁(𝒞,𝒳,r)limit-fromsubscript𝐵𝑖𝖻𝗋𝖾𝖺𝖽𝗍𝗁_𝖻𝖾𝖺𝗋𝗂𝗇𝗀_𝗌𝖾𝖺𝗋𝖼𝗁𝒞𝒳𝑟B_{i}\cup=\mathsf{breadth\_bearing\_search}(\mathcal{C},\mathcal{X},r)italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ = sansserif_breadth _ sansserif_bearing _ sansserif_search ( caligraphic_C , caligraphic_X , italic_r )
6      
7={B1,,Bi,}subscript𝐵1subscript𝐵𝑖\mathcal{B}=\{B_{1},\cdots,B_{i},\cdots\}caligraphic_B = { italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ }
8 𝒳¯𝖿𝗂𝗇𝖽_𝗇𝗈𝗇_𝗌𝖾𝗅𝖾𝖼𝗍𝖾𝖽_𝗌𝗎𝖻𝗃𝖾𝖼𝗍𝗌(B,𝒳)¯𝒳𝖿𝗂𝗇𝖽_𝗇𝗈𝗇_𝗌𝖾𝗅𝖾𝖼𝗍𝖾𝖽_𝗌𝗎𝖻𝗃𝖾𝖼𝗍𝗌𝐵𝒳\bar{\mathcal{X}}\leftarrow\mathsf{find\_non\_selected\_subjects}(B,\mathcal{X})over¯ start_ARG caligraphic_X end_ARG ← sansserif_find _ sansserif_non _ sansserif_selected _ sansserif_subjects ( italic_B , caligraphic_X )
9 while 𝒳¯¯𝒳\bar{\mathcal{X}}\neq\emptysetover¯ start_ARG caligraphic_X end_ARG ≠ ∅ do
10       [𝒞,B]𝗆𝖺𝗑_𝖽𝖾𝗉𝗍𝗁_𝗋𝖺𝗇𝗀𝖾_𝗌𝖾𝖺𝗋𝖼𝗁(𝒳¯)𝒞𝐵𝗆𝖺𝗑_𝖽𝖾𝗉𝗍𝗁_𝗋𝖺𝗇𝗀𝖾_𝗌𝖾𝖺𝗋𝖼𝗁¯𝒳[\mathcal{C},B]\leftarrow\mathsf{max\_depth\_range\_search}(\bar{\mathcal{X}})[ caligraphic_C , italic_B ] ← sansserif_max _ sansserif_depth _ sansserif_range _ sansserif_search ( over¯ start_ARG caligraphic_X end_ARG )
11       B=𝖻𝗋𝖾𝖺𝖽𝗍𝗁_𝖻𝖾𝖺𝗋𝗂𝗇𝗀_𝗌𝖾𝖺𝗋𝖼𝗁(𝒞,𝒳,r)limit-from𝐵𝖻𝗋𝖾𝖺𝖽𝗍𝗁_𝖻𝖾𝖺𝗋𝗂𝗇𝗀_𝗌𝖾𝖺𝗋𝖼𝗁𝒞𝒳𝑟B\cup=\mathsf{breadth\_bearing\_search}(\mathcal{C},\mathcal{X},r)italic_B ∪ = sansserif_breadth _ sansserif_bearing _ sansserif_search ( caligraphic_C , caligraphic_X , italic_r )
12       𝒳¯𝖿𝗂𝗇𝖽_𝗇𝗈𝗇_𝗌𝖾𝗅𝖾𝖼𝗍𝖾𝖽_𝗌𝗎𝖻𝗃𝖾𝖼𝗍𝗌(B,𝒳)¯\bar{\mathcal{X}}\leftarrow\mathsf{find\_non\_selected\_subjects}(B,\bar{% \mathcal{X})}over¯ start_ARG caligraphic_X end_ARG ← sansserif_find _ sansserif_non _ sansserif_selected _ sansserif_subjects ( italic_B , over¯ start_ARG caligraphic_X ) end_ARG
13       ={B}limit-from𝐵\mathcal{B}~{}\cup=\{B\}caligraphic_B ∪ = { italic_B }
Algorithm 1 BC-Sets construction.
Refer to caption
Figure 10. BC-Sets for ISAC-oriented beam scheduling.

What Algorithm 1 outputs are only (logical) BC-Sets; they need to be “translated” into (physical) beam patterns. Essentially, the beam patterns should be shaped by both the distribution of UEs/subjects covered by the BC-Sets and the capability of the phased array (gu2021packaging, ). The procedure of shaping a beam pattern follows two basic principles: i) adopting wider beams to trade range for width coverage and ii) adjusting the direction of the main beam to balance the coverage among multiple entities. Since the resulting patterns have not taken into account the beam nulling requirements (see Section 3.2), the corresponding beamformers are used as the initial setting for the problem (4) (hence the LAE) to derive proper cancellation schemes for individual BC-Sets. Finally, we analyze the complexity of Algorithm 1 to demonstrate its efficiency for real-time execution. Let the number of UEs be K𝐾Kitalic_K and that for range (resp. bearing) bins be KRsubscript𝐾RK_{\mathrm{R}}italic_K start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT (resp. KDsubscript𝐾DK_{\mathrm{D}}italic_K start_POSTSUBSCRIPT roman_D end_POSTSUBSCRIPT), the complexity is O(KKDKR)𝑂𝐾subscript𝐾Dsubscript𝐾RO(KK_{\mathrm{D}}K_{\mathrm{R}})italic_O ( italic_K italic_K start_POSTSUBSCRIPT roman_D end_POSTSUBSCRIPT italic_K start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT ), often up to the scale of several hundreds only.

Refer to caption
(a) One Tx chain.
Refer to caption
(b) Two Tx chains.
Figure 11. Average time spans of round-robin (RR), BC-Sets (scheduling) and the optimal solution (Opt).

We may walk through the algorithm using the BC-Sets in Figure 10 as examples. During the 1st stage, the three UE sets (red, green, and blue) are sequentially upgraded to form BC-Sets by adding in more subjects. Then the remaining three subjects are covered by two new BC-Sets (yellow and cyan). While the red and yellow BC-Sets can be covered by the default (narrow) beam sectors, others should be reshaped: in particular, both principles apply to the blue one. We also perform a trace-driven emulation to evaluate the advantages of BC-Sets. We set 6 UEs and 14 subjects in our experiment, and randomize their positions in each experiment. We run three algorithms each for 100 trials, and show the average time spans for serving all UEs and subjects in Figure 11. Apparently, our scheduling requires only half of RR’s time span and goes close to the optimal solution.

3.4. Exploiting Sensing Diversity

The scheduling algorithm presented in Section 3.3 has actually implicated two steps. On one hand, this scheduling takes place not only on the AP side but also on the UE side, albeit with less powerful beam fine-tuning ability. On the other hand, whereas covering monostatic sensing subjects can be determined in a unilateral manner, conducting bistatic sensing demands collaborative beamforming on both sides, similar to the default (bistatic) communication scheduling. As the locations of all subjects are estimated in advance (see Section 3.1.2), all these can be achieved naturally by Algorithm 1. Therefore, the question now is how to exploit the sensing diversity introduced by different sensing modalities, and our answer is the following unified estimation framework at signal processing level.

One of the major reasons that motivates us to consider full-fledged ISAC is the deficit of multi-static sensing in lack of synchronizations among different parties (isacot, ). As a result, multi-static sensing can only be used to sense motion-induced information that incurs channel variations, whereas sensing (quasi-)static subjects can be fully achieved by monostatic sensing: for example, locating a subject can be done by AP estimating range/bearing and refined by the same estimations from UEs. Nevertheless, in motion tracking scenarios for both micro-motion (movifi, ) and macro-motion (jiang2018towards, ), joint estimation with diversified sensing results can only lead to improved accuracy (fleury1999channel, ). Instead of dwelling on the design of estimation algorithms, we simply leverage existing estimation frameworks for this purpose. On the one hand, micro-motion estimation is taken care by SAGE (fleury1999channel, ), which requires a subject to remain quasi-static. On the other hand, macro-motion estimation (a.k.a motion tracking) is handled by the well-known extended Kalman filter (kalman, ). We demonstrate how sensing diversity helps to improve precision of subject location estimation (via SAGE) in Figure 12: from left to right, the “hot spot” shrinks as the number of sensing modalities (hence the amount of beam intersections) increases, indicating a higher estimation precision.

Refer to caption
Figure 12. Improving estimation precision via joint monostatic and multi-static sensing.

4. Implementation

We hereby elaborate on the implementation of Gemini, particularly explaining how to configure Gemini for carrying out the design objectives outlined in Section 3. The implementation follows IEEE 802.11ay standard operating at the unlicensed 60 ​GHz frequency band, as demonstrated in Figure 13. We omit the illustration for the UE part, as it involves similar components but with only one mmWave frontend.

Refer to caption
Figure 13. Part of the implementation and evaluation setup with mmWave frontend and baseband RFSoC.
Hardware Platform

Gemini has three hardware components: mmWave front-end module, baseband processing module, and high-performance processor module. For the RF front-end, Gemini adopts the EVK06003 development kits from Sivers Wireless (sivers, ); each kit has two 16-element phased arrays333Since many commercial transceiver chips of mmWave offer separated Tx and Rx phased arrays (sivers, ; qualcommwigig, ), Gemini is equipped with two such arrays. The baseband processing and high-performance processor modules are both realized upon the Xilinx RFSoC ZCU208 development board (xilinxzcu208, ); it offers a multitude of advanced features and capabilities, including i) AD/DA converters for baseband sampling with a rate up to 4 ​GHz, ii) DDR memory providing ample storage space for processed data, and iii) multi-core ARM processors and high-end Ultrascale FPGA offering substantial computational power.

Software Components

The software of Gemini is responsible for a variety of tasks, including beamforming, interference cancellation (Section 3.2), beamform scheduling (Section 3.3), data streaming to and from a PC controller. We implement above tasks using Verilog and C/C++, and compile them as a firmware for Xilinx ZYNQ RFSoC. We also implement a Matlab interface to pull the data from the RFSoC into the PC. The sensing algorithms, except for the deep neural module, are implemented in Matlab.

Deep Neural Network

Our deep neural network is built upon PyTorch (pytorch, ) platform using Python 3.7, and an mmWave radar (TImmWave, ) is adopted to act as both a baseline and ground truth sensor. We synchronize the clocks of the radar and Gemini (both driven by a PC) via precision time protocol (ptp, ), hence aligning their starting time to μ𝜇\muitalic_μs level. The encoder/decoder of the LAE are actually formed by beamformers, so we only need to construct 𝑪D(t)subscript𝑪D𝑡\bm{C}_{\mathrm{D}}(t)bold_italic_C start_POSTSUBSCRIPT roman_D end_POSTSUBSCRIPT ( italic_t ) as a 5-layer perceptron and the discriminator with three CNN layers whose input size (2000×1200012000\times 12000 × 1) matches the output data of the radar. The batch size is set to 64 for training, and the model is optimized using the Adam optimizer with a learning rate of 0.001. We further quantize the neural network weights (HAQ_CVPR2019, ) to control the 4-bit phase and amplitude via AWV look-up table of mmWave frontend.

5. Experiment Evaluations

We now perform three sets of experiments to verify the major functions of Gemini, namely Tx interference cancellation for monostatic sensing, beam scheduling for joint sensing and communications, and sensing diversity exploitation. Part of the experiment setup is depicted in Figure 13, but other necessary details shall be provided later. Our experiments have strictly followed the IRB of our institutes.

Refer to caption
Figure 14. Power spectrum of the received baseband signal after two-stage Tx interference cancellation.
Refer to caption
(a) Ranging.
Refer to caption
(b) Angle of arrival (AoA).
Refer to caption
(c) Motion sensing.
Figure 15. Monostatic sensing performance. “TI” and “W/o-C” denote the TI mmWave cascaded radar and Gemini without Tx interference cancellation, respectively. TI disappears from (c) as it is used as the ground truth collector.

5.1. Monostatic Sensing

We report the outcome of Tx interference cancellation and the consequent monostatic sensing performance of Gemini. The experiments involve range, bearing, and speed estimations of a static or moving object (a metal block held by a person). The ground truths for range and bearing are measured by Intel RealSense LiDAR (realsense-L515, ), while that for speed is provided by the TI radar (TImmWave, ). We adopt absolute error as the evaluation metric.

5.1.1. Tx Interference Cancellation

We begin by quantifying the interference cancellation ability at different stages. We set the Tx power to 40​ dBm and observe the signal strength at different stages indicated in Figure 8. The results shown in Figure 14 confirm that LAE reduces Tx interference by about 39​ dB, and deep denoising further suppresses it by 27​ dB. Overall, the total cancellation is approximately 66​ dB, and the remaining Tx-interference power goes very close to the noise floor. Apparently, the monostatic reflection channels 𝑯s(t)subscript𝑯s𝑡\bm{H}_{\mathrm{s}}(t)bold_italic_H start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( italic_t ) has been distilled from 𝑯(t)𝑯𝑡\bm{H}(t)bold_italic_H ( italic_t ) in Eqn. (3).

5.1.2. Range

The ability to ranging is a fundamental and crucial sensing feature, but it can only be accomplished in the monostatic mode (isacot, ). We study the ranging performance with/without Tx interference cancellation, given the subject staying statically in the range of [1,5]15[1,5][ 1 , 5 ] ​m with a step size of 0.5 ​m from Gemini, and Gemini performing ranging via CIR. We use the TI radar as a baseline by co-locating it with Gemini. The range errors shown in Figure 15(a) confirm that Gemini can obtain comparable performance to the radar, but much higher median range errors up to 30 ​cm are introduced without the Tx interference cancellation. The range errors of both TI radar and Gemini slightly grow as the distance increases, potentially because the adopted beam pattern covers more background clutters at further distances and multipath reflections from them affect the ranging accuracy. Moreover, reflected (sensing) signals attenuate in distance and hence also result in degraded performance.

5.1.3. Bearing

To study another fundamental sensing function, we leverage the phased arrays equipped with both Gemini and the TI cascaded radar (TImmWave, ) to estimate object bearing (or AoA). We let the object stay within a bearing of [45,45]superscript45superscript45[-45^{\circ},45^{\circ}][ - 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] and vary with a step size of 15superscript1515^{\circ}15 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT. As shown in Figure 15(b), Gemini achieves better performance than that of radar because Gemini has a much more powerful phased array than that of radar. Again, Tx interference significantly degrades the bearing estimation performance if not properly handled.

5.1.4. Speed

The motion sensing is also crucial to the capability of Gemini, so we conducted tests where the object moves at a varying speed ranging from 0.5​ m/s to 3.0​ m/s within a 3 ×\times× 20​ m2superscriptm2\text{m}^{2}m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT corridor. Since the LiDAR cannot monitor motion, we have to change the role of the TI radar from baseline to measuring the ground truth. The evaluation results shown in Figure 15(c) clearly demonstrate that Gemini achieves a much lower speed estimation error with the enhancement offered by the Tx interference cancellation.

Remark:

During all above experiments, we have a communication session going on between the AP and UE. Since monostatic sensing takes place only on the AP side and it simply piggybacks on the Tx signals, the communication throughput is not affected at all.

Refer to caption
Figure 16. An illustration of the experiment setup.

5.2. Beam Scheduling for ISAC

One may expect a full-scale evaluation of our beam scheduling algorithm with many subjects and UEs involved simultaneously. Our experimental setup, depicted in Figure 16, is straightforward, involving one subject and one UE. Specifically, we demonstrate the simultaneous tracking of two subjects with two UEs in human tracking experiments to illustrate the potential for involving additional UEs. On one hand, our scheduling algorithm excels in dispatching BC-Sets to cover subjects with compatible demands and hence all BC-Sets (or all subjects covered by a BC-Set) are similarly served by the AP. Consequently, evaluating the performance with one subject is sufficiently representative. On the other hand, serving multiple UEs bears no difference from serving one UE, as multiple UEs would inevitably need to be served by distinct Tx chains or in different time slots (see the media access described in Section 2.1). Also, due to the extremely high cost of our high-performance hardware (mmWave frontend (sivers, ) each costs over $ ​3300 and RFSoC (xilinxzcu208, ) costs $ ​15000), we cannot afford to support a lot of UEs. Therefore, we believe our evaluation scenario does produce results with practical significance. In the following, we evaluate our scheduling algorithm in three sensing applications, namely respiration monitoring, point cloud of human pose, and human tracking, along with a communication session. We consider three scheduling baselines: sensing only (SO), communication only (CO) (SideLobe-UbiComp23, ), and round-robin (RR), while also adopting the TI radar as a sensing baseline.

5.2.1. Respiration Monitoring

This experiment takes the subject’s respiration as the sensing target, aiming to accurately estimate the breath rate. The results plotted in Figure 17 showcase the obtained throughput against corresponding sensing accuracy. We may observe that, while Gemini achieves a sensing accuracy almost the same as that for both SO and TI radar (which is much better than that for CO and RR), its throughput is only marginally lower than that of CO (but still much higher than that for SO or RR). In fact, for continuous yet spatially coarse-grained sensing applications, Gemini can always obtain nearly perfect sensing performance and barely sacrifice throughput.

Refer to caption
(a) Throughput.
Refer to caption
(b) Breath rate error.
Figure 17. Scheduling for simultaneous communications and respiration monitoring.
Refer to caption
(a) Throughput.
Refer to caption
(b) Gemini.
Refer to caption
(c) Sensing only (SO).
Refer to caption
(d) Communication only (CO).
Refer to caption
(e) Round-robin (RR).
Refer to caption
(f) TI radar.
Figure 18. Scheduling for simultaneous communications and point cloud generation (of a human figure).

5.2.2. Point Cloud

Different from continuous sensing application in Section 5.2.1, point cloud is a one-shot sensing application (e.g., 1​ s duration in our experiment). In such applications, a much narrower beam is required to generate dense point cloud, yet Gemini still needs to maintain its beam schedules. Our solution is to have asymmetric Tx and Rx beam patterns for the AP: while the Tx one is designed to match that determined by the corresponding BC-Set, the Rx chain tunes its beam to the finest 1.5superscript1.51.5^{\circ}1.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT beamwidth and quickly switches to distinct azimuth and elevation angles for “scanning” the subject under the BC-Set coverage. According to the reflection power at different angle pairs, we set a threshold to filter out reflection signals with very lower power, and map the residue signals from polar coordinates to Cartesian coordinates so as to generate point cloud (gao2019experiments, ).

We test Gemini and the baselines in sequence and report the results in Figure 18: while throughput is statistics from all trials, the one-shot point cloud results are arbitrarily chosen examples (a human figure) for demonstration purpose only. Results in Figure 18(a) suggest that the throughput gap between Gemini and CO gets slightly wider, but Gemini still outperforms other two baselines: to generate 3D point cloud, Gemini has to temporarily sacrifice a bit throughput to capture fine-grained reflection signals. Though SO in Figure 18(c) performs marginally better than Gemini in Figure 18(b), the main beam of SO has to solely target the subject, hence resulting in the lowest throughput. RR sits in the middle of CO and SO for both throughput and point cloud performance shown in Figure 18(a) and Figure 18(e), due to its time-division nature. Finally, the TI cascaded radar with a number of antennas (12 Tx ×\times× 16 Rx) also performs poorly with its lower bearing resolution than Sivers’ phased array antennas (SiversBF01_2021, ) illustrated in Figure 18(f).

Refer to caption
(a) Throughput (Gbps).
Refer to caption
(b) Tracking RMSE (m).
Refer to caption
(c) Gemini-Single target (Gemini).
Refer to caption
(d) Sensing only (SO).
Refer to caption
(e) Comm. only (CO).
Refer to caption
(f) Round-robin (RR).
Refer to caption
(g) TI radar (TI).
Refer to caption
(h) Gemini-Multiple targets(MT).
Figure 19. Scheduling for simultaneous communications and human movement tracking in the test area. Orange lines show the ground truth—predefined paths—while blue points display tracking results.

5.2.3. Human Movement

We captured the human movements in a room approximately 21×\times×18 feet, slightly larger than a typical US family room, accommodating scenarios with larger movements involving multiple UEs and several subjects. Human subjects follow the predefined paths marked on the floor, specifically ’Rectangular’ and ’Straight Line’ trajectories as depicted in Figure 19. Throughput and tracking performance are depicted in Figure 19. As for the sensing performance shown in Figures 19(c) to 19(g), human movement exhibits a similar performance ranking as static point cloud generation, as it is also about producing point cloud but in both temporal and spatial dimensions. For the case involving multiple human subjects and UEs as shown in Figure 19(h), we placed two frontends as UEs to receive the data from the AP, with each subject following one of the two predefined paths. Despite the slight decreases in throughput and tracking accuracy observed with two UEs tracking two targets (MT), Gemini still outperforms baselines with only one UE and one target. It is attributed to our beam scheduling algorithm, which effectively balances communication and sensing tasks.

In fact, the idea of ISAC is not meant to benefit any specific applications; it is basically striking a balance between sensing and communication, at the benefit of having them co-existing in one hardware. Meanwhile, Gemini, with its innovative scheduling driven by BC-Set, is superior to existing mmWave-ISAC solutions, as it is demonstrated by our intensive experiments to have obtained performance very close to either SO (for sensing) or CO (for communications), given that both sensing and communication functions co-exist in one hardware and operate simultaneously.

5.3. Sensing Diversity

We hereby revisit the point cloud application to evaluate our unified estimation framework explained Section 3.4 for fusing multiple sensing modalities. Basically, we start from the AP monostatic sensing already shown in Figure 18(b) and gradually fuse in more UE (multi-static) sensing results. As expected, the density of the point cloud increases positively with the number of UEs from Figures 20(a) to 20(d), while the 3D point scattering getting quickly shrunk, rendering the overall human figure clearer step-by-step.

Refer to caption
(a) One UE.
Refer to caption
(b) Two UEs
Refer to caption
(c) Three UEs
Refer to caption
(d) Four UEs
Figure 20. Sensing diversity with different UEs may substantially enhance the point cloud quality.

6. Related Works and Discussions

This section focuses on three main streams of related works, namely mmWave platforms, ISAC, and full duplex radio (FDR), with brief discussions on limitations of Gemini.

mmWave Platforms

Earlier mmWave platforms (openmili-MobiCom16, ; X60-WiNTECH17, ) only offer a relatively wide bandwidth, yet their phased arrays cannot be precisely controlled for fast alignment, thus not exactly meeting the requirements of IEEE 802.11ad (802_11ad, ). M-Cube (MCube-MobiCom20, ) utilizes a commodity 802.11ad RF front-end with phased array antennas, but it was soon superseded by mmFlex (mmFLEX-MobiSys20, ) adopting similar mmWave front-end as Gemini to also match the more advanced standard of 802.11ay.

ISAC

Though theoretical works studying mmWave ISAC for 5/6G cellular network scenarios are plentiful (liu2018toward, ; ZhangJCS-CommSurvey21, ), they only focus on waveform design without providing useful guidelines for system developments. For ISAC on mmWave band, a couple of proposals (SPARCS-IPSN22, ; SideLobe-UbiComp23, ) have leveraged existing communication traffic to enable only multi-static sensing confined to subjects compatible with existing communication beam patterns. Though a monostatic sensing solution has been mentioned in (guan-TMTT2021, ; JUMP_TWC24), it is far from fully integration into existing communication devices.

Nulling and FDR

Existing proposals on nulling for communications (RobertsFDmmWave, ; mmWaveFD-MobiCom20, ; nullifi-NSDI21, ) are only marginally related and hence omitted from our discussions. Certain Wi-Fi sensing developments (adib2013see, ; joshi2015wideo, ) have gone very close to FDR (FDR-SIGCOMM13, ), as they adopt either nulling or FDR to cancel Tx interference for motion sensing. However, as pointed out by (isacot, ), monostatic sensing for ISAC is fundamentally different from FDR.

Gemini

Gemini cannot leverage FDR-like technologies to extract monostatic sensing signals (explained in Section 3.2), so we exploit a hybrid (hardware-software) deep learning model to remove Tx interference. We also innovate in designing a beam scheduling for ISAC (see Section 3.3), and fusing multiple sensing modalities for improving estimation precision (see Section 3.4). In the meantime, we are considering fusing sub-6​ GHz Wi-Fi and mmWave to enhance the capability of ISAC, as it has been done for communication only (sur2017wifi, ). Also, endowing Wi-Fi with sensing capability may cause unexpected information leakage (li2016csi, ), which can be exacerbated by the powerful mmWave-ISAC, so security issue should be part of our future work.

7. Conclusion

In this paper, we have proposed, designed and implemented Gemini, a full-fledge mmWave ISAC system. We have first given three concrete analyses to motivate our design. then we have elaborated all key components for Gemini, namely i) Tx interference cancellation driven by deep learning to enable monostatic sensing, ii) beam scheduling algorithm for jointly optimizing sensing accuracy and communication throughput, and iii) a unified estimation framework for exploiting diversified sensing modalities. Finally, we have conducted extensive experiments to evaluate the performance of Gemini; our results have strongly demonstrated advantages of Gemini in actually taking care of both communication and sensing under its ISAC framework. We believe that our initial trials in realizing Gemini signify a pivotal step towards more practical mmWave ISAC systems.

References

  • [1] Intel WiGig. https://www.intel.com/content/www/us/en/products/docs/ wireless-products/wigig-overview.html.
  • [2] IWR6843 Intelligent mmWave Sensor Standard Antenna Plug-in Module. https://www.ti.com/tool/IWR6843ISK.
  • [3] PyTorch. https://pytorch.org/.
  • [4] Sivers Semiconductors mmWave 60Ghz Evaluation Kits (EVK). https://www.sivers-semiconductors.com/sivers-wireless/evaluation-kits/.
  • [5] Wi-Fi CERTIFIED 6. https://www.wi-fi.org/discover-wi-fi/wi-fi-certified-6.
  • [6] IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems–Local and Metropolitan Area Networks–Specific Requirements-Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 3: Enhancements for Very High Throughput in the 60 GHz Band. IEEE Std 802.11ad-2012 (Amendment to IEEE Std 802.11-2012, as amended by IEEE Std 802.11ae-2012 and IEEE Std 802.11aa-2012), pages 1–628, 2012.
  • [7] Intel RealSense LiDAR L515 Camera. https://www.intelrealsense.com/lidar-camera-l515/, 2020.
  • [8] Xilinx Zynq UltraScale+ RFSoC ZCU208 Evaluation Kit. https://www.xilinx.com/products/boards-and-kits/zcu208.html/, 2020.
  • [9] IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems Local and Metropolitan Area Networks–Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 1: Enhancements for High-Efficiency WLAN. IEEE Std 802.11ax-2021 (Amendment to IEEE Std 802.11-2020), pages 1–767, 2021.
  • [10] IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems Local and Metropolitan Area Networks–Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 2: Enhanced Throughput for Operation in License-exempt Bands above 45 GHz. IEEE Std 802.11ay-2021 (Amendment to IEEE Std 802.11-2020 as amendment by IEEE Std 802.11ax-2021), pages 1–768, 2021.
  • [11] O. Abari, D. Bharadia, A. Duffield, and D. Katabi. Enabling High-Quality Untethered Virtual Reality. In Proc. of the 14th USENIX NSDI, page 531–544, 2017.
  • [12] F. Adib, Z. Kabelac, and D. Katabi. Multi-person Localization via RF Body Reflections. In Proc. of 12th USENIX NSDI, pages 279–292, 2015.
  • [13] F. Adib and D. Katabi. See Through Walls with WiFi! In Proc. of the ACM SIGCOMM, pages 75–86, 2013.
  • [14] D. Bharadia, E. McMilin, and S. Katti. Full Duplex Radios. In Proc. of the 27th ACM SIGCOMM, pages 375–386, 2013.
  • [15] Z. Chen, T. Zheng, C. Cai, and J. Luo. MoVi-Fi: Motion-robust Vital Signs Waveform Recovery via Deep Interpreted RF Sensing. In Proc. of the 27th ACM MobiCom, pages 392–405, 2021.
  • [16] Z. Chen, T. Zheng, C. Hu, H. Cao, Y. Yang, H. Jiang, and J. Luo. ISACoT: Integrating Sensing with Data Traffic for Ubiquitous IoT Devices. IEEE Communications Magazine, pages 1–7, 2023.
  • [17] C. Fiandrino, H. Assasa, P. Casari, and J. Widmer. Scaling Millimeter-Wave Networks to Dense Deployments and Dynamic Environments. Proc. of the IEEE, 2019.
  • [18] Gao, Xiangyu and Xing, Guanbin and Roy, Sumit and Liu, Hui. Experiments with mmWave Automotive Radar Test-bed. In Proc. of the 53rd IEEE Asilomar, pages 1–6, 2019.
  • [19] X. Gu, D. Liu, and B. Sadhu. Packaging and Antenna Integration for Silicon-based Millimeter-wave Phased Arrays: 5G and Beyond. IEEE Journal of Microwaves, pages 123–134, 2021.
  • [20] J. Guan, A. Paidimarri, A. Valdes-Garcia, and B. Sadhu. 3-D Imaging using Millimeter-wave 5G Signal Reflections. IEEE Transactions on Microwave Theory and Techniques, pages 2936–2948, 2021.
  • [21] IEFT. Precision Time Protocol Version 2 (PTPv2), 2017. Accessed: 2023-03-12.
  • [22] M. A. Islam, G. C. Alexandropoulos, and B. Smida. Integrated Sensing and Communication with Millimeter Wave Full Duplex Hybrid Beamforming. In Proc. of the 31st IEEE ICC, pages 4673–4678, 2022.
  • [23] W. Jiang, C. Miao, F. Ma, S. Yao, Y. Wang, Y. Yuan, H. Xue, C. Song, X. Ma, D. Koutsonikolas, et al. Towards Environment Independent device Free Human Activity Recognition. In Proc. of the 24th ACM MobiCom, pages 289–304, 2018.
  • [24] K. Joshi, D. Bharadia, M. Kotaru, and S. Katti. Wideo: Fine-grained Device-free Motion Tracing using RF Backscatter. In Proc. of the 12th USENIX NSDI, pages 189–204, 2015.
  • [25] D. Kunin, J. Bloom, A. Goeva, and C. Seed. Loss Landscapes of Regularized Linear Autoencoders. In Proc. of the 36th ICML, pages 3560–3569, 2019.
  • [26] J. O. Lacruz, D. Garcia, P. J. Mateo, J. Palacios, and J. Widmer. mm-FLEX: An Open Platform for Millimeter-Wave Mobile Full-Bandwidth Experimentation. In Proc. of the 18th ACM MobiSys, pages 1–13, 2020.
  • [27] J. O. Lacruz, R. R. Ortiz, and J. Widmer. A Real-Time Experimentation Platform for Sub-6 GHz and Millimeter-Wave MIMO Systems. In Proc. of the 19th ACM MobiSys, page 427–439, 2021.
  • [28] M. Li, Y. Meng, J. Liu, H. Zhu, X. Liang, Y. Liu, and N. Ruan. When CSI Meets Public WiFi: Inferring Your Mobile Phone Password via WiFi Signals. In Proc. of the 23rd ACM CCS, pages 1068–1079, 2016.
  • [29] F. Liu, Y. Cui, C. Masouros, J. Xu, T. X. Han, Y. C. Eldar, and S. Buzzi. Integrated Sensing and Communications: Toward Dual-Functional Wireless Networks for 6G and Beyond. IEEE Journal on Selected Areas in Communications, pages 1728–1767, 2022.
  • [30] F. Liu, L. Zhou, C. Masouros, A. Li, W. Luo, and A. Petropulu. Toward Dual-functional Radar-communication Systems: Optimal Waveform Design. IEEE Transactions on Signal Processing, pages 4264–4279, 2018.
  • [31] C. X. Lu, S. Rosa, P. Zhao, B. Wang, C. Chen, J. A. Stankovic, N. Trigoni, and A. Markham. See through Smoke: Robust Indoor Mapping with Low-Cost MmWave Radar. In Proc. of the 18th ACM MobiSys, page 14–27, 2020.
  • [32] S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh. Radatron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar. In Proc. of the 17th ECCV, pages 160–178. Springer, 2022.
  • [33] S. Madani, S. Jog, J. O. Lacruz, J. Widmer, and H. Hassanieh. Practical Null Steering in Millimeter Wave Networks. In Proc. of the 18th USENIX NSDI, pages 903–921, 2021.
  • [34] M. Mirza and S. Osindero. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784, 2014.
  • [35] K. I. Pedersen. Channel Parameter Estimation in Mobile Radio Environments using the SAGE Algorithm. IEEE Journal on Selected Areas in Communications, pages 434–450, 1999.
  • [36] J. Pegoraro, J. O. Lacruz, M. Rossi, and J. Widmer. SPARCS: A Sparse Recovery Approach for Integrated Communication and Human Sensing in mmWave Systems. In Proc. of the 21st ACM/IEEE IPSN, pages 79–91, 2022.
  • [37] Qualcomm. Qualcomm Wigig 60Ghz chipset QCA9500. https://www.qualcomm.com/products/technology/wi-fi/qca9500.
  • [38] M. A. Richards. Fundamentals of Radar Signal Processing. McGraw-Hill Education, 2014.
  • [39] I. P. Roberts, J. G. Andrews, H. B. Jain, and S. Vishwanath. Millimeter-Wave Full Duplex Radios: New Challenges and Techniques. IEEE Wireless Communications, pages 36–43, 2021.
  • [40] S. K. Saha, Y. Ghasempour, M. K. Haider, T. Siddiqui, P. De Melo, N. Somanchi, L. Zakrajsek, A. Singh, O. Torres, D. Uvaydov, J. M. Jornet, E. Knightly, D. Koutsonikolas, D. Pados, and Z. Sun. X60: A Programmable Testbed for Wideband 60 GHz WLANs with Phased Arrays. In Proc. of the 11th ACM WiNTECH, page 75–82, 2017.
  • [41] M. A. Seifeldin, A. F. El-keyi, and M. A. Youssef. Kalman Filter-Based Tracking of a Device-Free Passive Entity in Wireless Environments. In Proc. of the 6th ACM WiNTECH, page 43–50, 2011.
  • [42] W.-L. Shen, K. C.-J. Lin, M.-S. Chen, and K. Tan. SIEVE: Scalable User Grouping for Large MU-MIMO Systems. In Proc. of IEEE INFOCOM, pages 1975–1983, 2015.
  • [43] V. Singh, S. Mondal, A. Gadre, M. Srivastava, J. Paramesh, and S. Kumar. Millimeter-wave Full Duplex Radios. In Proc. of the 26th ACM MobiCom, pages 1–14, 2020.
  • [44] M. Speth, S. A. Fechtel, G. Fock, and H. Meyr. Optimum Receiver Design for Wireless Broad-band Systems using OFDM. I. IEEE Transactions on Communications, pages 1668–1677, 1999.
  • [45] S. Sur, I. Pefkianakis, X. Zhang, and K.-H. Kim. WiFi-assisted 60 GHz Wireless Networks. In Proc. of the 23rd ACM MobiCom, pages 28–41, 2017.
  • [46] K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han. HAQ: Hardware-Aware Automated Quantization With Mixed Precision. In Proc. of the 32nd IEEE/CVF CVPR, 2019.
  • [47] S. Wang, J. Huang, and X. Zhang. Demystifying Millimeter-Wave V2X: Towards Robust and Efficient Directional Connectivity under High Mobility. In Proc. of the 26th ACM MobiCom, 2020.
  • [48] T. Wei and X. Zhang. Pose Information Assisted 60 GHz Networks: Towards Seamless Coverage and Mobility Support. In Proc. of the 23th ACM MobiCom, page 42–55, 2017.
  • [49] S. Wireless. TRX BF/01 User Manual. In Sivers Semiconductors, 2021.
  • [50] C. Wu, F. Zhang, B. Wang, and K. J. Ray Liu. mmTrack: Passive Multi-Person Localization Using Commodity Millimeter Wave Radio. In Proc. of the 39th IEEE INFOCOM, pages 2400–2409, 2020.
  • [51] Q. Yang, H. Wu, Q. Huang, J. Zhang, H. Chen, W. Li, X. Tao, and Q. Zhang. Side-Lobe Can Know More: Towards Simultaneous Communication and Sensing for MmWave. Proc. of ACM UbiComp, 2023.
  • [52] F. Zhang, C. Wu, B. Wang, and K. J. R. Liu. mmEye: Super-Resolution Millimeter Wave Imaging. IEEE Internet of Things Journal, pages 6995–7008, 2021.
  • [53] J. Zhang, X. Zhang, P. Kulkarni, and P. Ramanathan. OpenMili: A 60 GHz Software Radio Platform with a Reconfigurable Phased-Array Antenna. In Proc. of the 22nd ACM MobiCom, page 162–175, 2016.
  • [54] J. A. Zhang, M. L. Rahman, K. Wu, X. Huang, Y. J. Guo, S. Chen, and J. Yuan. Enabling Joint Communication and Radar Sensing in Mobile Networks—A Survey. IEEE Communications Surveys & Tutorials, pages 306–345, 2021.
  • [55] R. Zhao, T. Woodford, T. Wei, K. Qian, and X. Zhang. M-Cube: A Millimeter-Wave Massive MIMO Software Radio. In Proc. of the 26th ACM MobiCom, page 1–14, 2020.