Enhancement of Over-the-Air Federated Learning by Using AI-based Fluid Antenna System

IEEE Publication Technology    Mohsen Ahmadzadeh, Saeid Pakravan, Ghosheh Abed Hodtani, Ming Zeng, and Jean-Yves Chouinard M. Ahmadzadeh and G. Abed Hodtani are with the Department of Electric and Computer Engineering, Ferdowsi University, Mashhad, Iran. email: m.ahmadzadehbolghan@mail.um.ac.ir; ghodtani@gmail.com.S. Pakravan, M. Zeng, and J-Y. Chouinard are with the Department of Electrical and Computer Engineering, Laval University, Quebec City, QC, G1V 0A6, CA. email: saeid.pakravan.1@ulaval.ca; ming.zeng@gel.ulaval.ca; Jean-Yves.Chouinard@gel.ulaval.ca.
Abstract

This letter investigates an over-the-air federated learning (OTA-FL) system that employs fluid antennas (FAs) at the access point (AP) to enhance learning performance by leveraging the additional degrees of freedom provided by antenna mobility. First, we analyze the convergence of the OTA-FL system and derive the optimality gap to illustrate the influence of FAs on learning performance. Then, we formulate a nonconvex optimization problem to minimize the optimality gap by jointly optimizing the positions of the FAs and the beamforming vector. To address the dynamic environment, we cast this optimization problem as a Markov decision process (MDP) and propose the recurrent deep deterministic policy gradient (RDPG) algorithm. Finally, extensive simulations show that the FA-assisted OTA-FL system outperforms systems with fixed-position antennas and that the RDPG algorithm surpasses the existing methods.

Index Terms:
Optimality gap, Over-the-air federated learning, Fluid antenna, Deep reinforcement learning.

I Introduction

Federated learning (FL) has gained significant prominence in the field of machine learning in recent years due to its decentralized framework and robust privacy protection measures. FL leverages the computational capabilities of edge devices to collectively train a unified global model, all while ensuring that sensitive locally stored data remains confidential. This approach is particularly suitable for applications in various Internet of Mobile Things (IoMT) scenarios, including the Internet of Drones (IoD)[1] and mobile crowd sensing[2], among others. To mitigate communication latency and costs associated with FL implementation, over-the-air computation (AirComp) for model aggregation has emerged as a promising approach. It leverages the superposition property of wireless multiple access channels (MAC) [3]. However, the performance of model aggregation in over-the-air FL (OTA-FL) is severely affected by adverse wireless propagation channel conditions, especially in massive mobile IoT scenarios.

To address this challenge, previous research has extensively explored the integration of intelligent reflecting surfaces (IRSs) into OTA-FL to achieve reliable model aggregation. This is accomplished by reconfiguring wireless channels through adjusting the passive reflecting coefficients of the IRS [3]. Additionally, other studies have investigated beamforming techniques at the receiver server to enhance OTA-FL performance by leveraging the degrees of freedom (DoF) in the spatial domain [4]. However, the fixed positions of receiver antennas can impose limitations on the potential gains from beamforming. To overcome these limitations, we propose the use of fluid antennas (FAs) in OTA-FL to dynamically manipulate wireless channel conditions through adaptive movement of antennas. Unlike fixed-position antennas (FPA), FAs offer the capability to reshape channel environments, thereby introducing new DoFs to enhance OTA-FL performance [5]. Previous studies have demonstrated the superior performance of FAs over traditional FPAs in various communication systems, including AirComp systems [6], multi-user uplink communications [7, 5, 8], and mobile edge computing [9]. Moreover, FAs have been investigated to maximize network sum-rate in multiple access communication systems through deep reinforcement learning (DRL) [10]. However, the integration of FAs into OTA-FL systems has not yet been addressed in current academic literature to the best of our knowledge.

In this paper, we propose the integration of FA systems with OTA-FL to enhance convergence performance. Our objective is to minimize the optimality gap through joint optimization of the beamforming vector and the antenna position vector at the access point (AP) under practical dynamic conditions [11]. First, we derive the optimality gap between the actual loss and the optimal loss for OTA-FL to quantify the impact of the beamforming vector and antenna positioning. Subsequently, based on the convergence analysis, we formulate a non-convex optimization problem to enhance learning performance. We then reformulate it as a Markov decision process (MDP) and leverage deep reinforcement learning (DRL) techniques suitable for dynamic environments. We introduce the recurrent deep deterministic policy gradient (RDPG) algorithm, with actor and critic networks specifically designed to capture the temporal correlation of state features, enabling real-time decisions under dynamic conditions. Extensive simulations are undertaken to evaluate the performance of the integrated FAs in OTA-FL systems compared to FPA, and the performance of the proposed RDPG algorithm in comparison to standard DRL algorithms such as soft actor-critic (SAC) and deep deterministic policy gradient (DDPG). Simulation results indicate that the proposed RDPG algorithm outperforms both SAC and DDPG methods in terms of performance and stability in both scenarios. Furthermore, the FA-assisted OTA-FL system consistently outperforms the FPA scenarios.

Notations: Italicized letters represent scalars, while boldface letters denote vectors. ()Tsuperscript𝑇(\cdot)^{T}( ⋅ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the transpose, ()Hsuperscript𝐻(\cdot)^{H}( ⋅ ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT the conjugate transpose, and 𝔼[]𝔼delimited-[]\mathbb{E}[\cdot]blackboard_E [ ⋅ ] the expectation operation. |||\cdot|| ⋅ | signifies the magnitude of a scalar or the cardinality of a set. The Euclidean norm of a vector is represented by .\left\|.\right\|∥ . ∥.

II System Model

We consider an OTA-FL system comprising K𝐾Kitalic_K single-antenna user equipment (UE) devices, denoted as UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k𝒦{1,2,,K}for-all𝑘𝒦12𝐾\forall k\in\mathcal{K}\triangleq\{1,2,\ldots,K\}∀ italic_k ∈ caligraphic_K ≜ { 1 , 2 , … , italic_K }. These devices could be include mobile IoT devices, UAVs, drones, or similar entities. The UEs are randomly distributed and move dynamically within a designated area of interest, where they collect local data samples. These samples are collaboratively utilized to train a global model at an AP equipped with N𝑁Nitalic_N FAs.

II-A OTA-FL Model

We consider an OTA-FL framework with full participation that executes sequential actions at each iteration t𝑡titalic_t over T𝑇Titalic_T training rounds as follows:

  • Global model broadcast: The AP broadcasts the current global model 𝒘tdsubscript𝒘𝑡superscript𝑑\boldsymbol{w}_{t}\in\mathbb{R}^{d}bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT to all UEs, where d𝑑ditalic_d represents the dimensionality of the model parameter space.

  • Local model update: Each UE updates its local model using the gradient descent algorithm as 𝒘k,t=𝒘tγF(𝒘t,𝒟k)subscript𝒘𝑘𝑡subscript𝒘𝑡𝛾𝐹subscript𝒘𝑡subscript𝒟𝑘\boldsymbol{w}_{k,t}=\boldsymbol{w}_{t}-\gamma\nabla{F}(\boldsymbol{w}_{t},% \mathbf{\mathcal{D}}_{k})bold_italic_w start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT = bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_γ ∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where γ𝛾\gammaitalic_γ is the learning rate, F(𝒘t,𝒟k)𝐹subscript𝒘𝑡subscript𝒟𝑘\nabla{F}(\boldsymbol{w}_{t},\mathbf{\mathcal{D}}_{k})∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) represents the gradient of the local loss function, and 𝒟ksubscript𝒟𝑘\mathbf{\mathcal{D}}_{k}caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the local dataset for UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with a local dataset size denoted by |𝒟k|=Dsubscript𝒟𝑘𝐷|\mathbf{\mathcal{D}}_{k}|=D| caligraphic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | = italic_D.

  • Model aggregation: Each UE transmits its local model to the AP, which then performs aggregation by averaging to update the global model as:

    𝒘t+1subscript𝒘𝑡1\displaystyle\boldsymbol{w}_{t+1}bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =1Kk𝒦𝒘k,t.absent1𝐾subscript𝑘𝒦subscript𝒘𝑘𝑡\displaystyle=\frac{1}{K}\sum_{k\in\mathcal{K}}\boldsymbol{w}_{k,t}.= divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_k , italic_t end_POSTSUBSCRIPT . (1)

The procedure continues iteratively until reaching the maximum specified number of outer iterations.

II-B Communication Model

We consider the uploading phase within the OTA-FL system, where each UE synchronously transmits its updated model parameters to the AP. The AP is equipped with an array of FAs, facilitating the adjustment of each FA along a one-dimensional line segment that spans a length of X𝑋Xitalic_X. Each FA’s position is constrained within the interval [0,X]0𝑋[0,X][ 0 , italic_X ] with a minimum distance X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT between adjacent FAs to prevent antenna coupling. The collective locations of all N𝑁Nitalic_N FAs are represented as the vector 𝐱=[x1,,xN]T𝐱superscriptsubscript𝑥1subscript𝑥𝑁𝑇\mathbf{x}=[x_{1},\ldots,x_{N}]^{T}bold_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, with their movement along one dimension restricted by x1<x2<<xNsubscript𝑥1subscript𝑥2subscript𝑥𝑁x_{1}<x_{2}<\ldots<x_{N}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < … < italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. For the purposes of brevity and clarity, time indices are omitted in this subsection. Assuming line-of-sight (LoS) propagation conditions, the channel between the UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the AP, represented by 𝐡k[𝐱]N×1subscript𝐡𝑘delimited-[]𝐱superscript𝑁1\mathbf{h}_{k}[\mathbf{x}]\in\mathbb{C}^{N\times 1}bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_x ] ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, is given by

𝐡k[𝐱]=l0dkα[ej2πλx1cos(ϕk),,ej2πλxNcos(ϕk)]T,subscript𝐡𝑘delimited-[]𝐱subscript𝑙0subscriptsuperscript𝑑𝛼𝑘superscriptsuperscript𝑒𝑗2𝜋𝜆subscript𝑥1subscriptitalic-ϕ𝑘superscript𝑒𝑗2𝜋𝜆subscript𝑥𝑁subscriptitalic-ϕ𝑘𝑇\mathbf{h}_{k}[\mathbf{x}]=\sqrt{\frac{l_{0}}{{d^{\alpha}_{k}}}}\left[e^{j% \frac{2\pi}{\lambda}x_{1}\cos(\phi_{k})},\ldots,e^{j\frac{2\pi}{\lambda}x_{N}% \cos(\phi_{k})}\right]^{T},bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_x ] = square-root start_ARG divide start_ARG italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG [ italic_e start_POSTSUPERSCRIPT italic_j divide start_ARG 2 italic_π end_ARG start_ARG italic_λ end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_cos ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , … , italic_e start_POSTSUPERSCRIPT italic_j divide start_ARG 2 italic_π end_ARG start_ARG italic_λ end_ARG italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT roman_cos ( italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (2)

where l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, λ𝜆\lambdaitalic_λ, and α𝛼\alphaitalic_α denote the path loss at the reference distance, the wavelength, and the path loss exponent, respectively. dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and ϕksubscriptitalic-ϕ𝑘\phi_{k}italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represent the distance between the FAs and UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the angle of arrival (AoA) of the LoS path, respectively, which are determined by the location of the UEs in each training round. In this system, we assume each UE moves within an designated area and then transmits model parameters from a stationary position [2]. Moreover, given that the signal path length significantly exceeds the FA movement area, the far field condition between the AP and UEs is assumed. Consequently, ϕksubscriptitalic-ϕ𝑘\phi_{k}italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are treated as constants during transmission, regardless of FA positional changes [9, 7].

The AP receives the local model parameters from all UEs in the t𝑡titalic_t-th training round as:

𝐲=k𝒦pk𝐡k[𝐱]𝒘k+𝐳,𝐲subscript𝑘𝒦subscript𝑝𝑘subscript𝐡𝑘delimited-[]𝐱subscript𝒘𝑘𝐳\mathbf{y}=\sum_{k\in\mathcal{K}}{p}_{k}\mathbf{{h}}_{k}[\mathbf{x}]% \boldsymbol{w}_{k}+\mathbf{z},bold_y = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_x ] bold_italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + bold_z , (3)

where, pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the transmission power factor for the k𝑘kitalic_k-th UE, and 𝐳N×d𝐳superscript𝑁𝑑\mathbf{z}\in\mathbb{C}^{N\times d}bold_z ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × italic_d end_POSTSUPERSCRIPT represents an additive white Gaussian noise (AWGN) matrix with elements following a complex normal distribution 𝒞𝒩(0,σ2)𝒞𝒩0superscript𝜎2\mathcal{CN}(0,\sigma^{2})caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

The aggregated model parameter vector, 𝒘^bold-^𝒘\boldsymbol{\hat{w}}overbold_^ start_ARG bold_italic_w end_ARG, in the t𝑡titalic_t-th training round is estimated by conducting post-processing on the received signal at the AP as follows:

𝒘^bold-^𝒘\displaystyle\boldsymbol{\hat{w}}overbold_^ start_ARG bold_italic_w end_ARG =1K(1η𝒎H𝐲)absent1𝐾1𝜂superscript𝒎𝐻𝐲\displaystyle=\frac{1}{K}(\frac{1}{\sqrt{\eta}}\boldsymbol{m}^{H}\mathbf{y})= divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_η end_ARG end_ARG bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_y ) (4)
=1K(k𝒦1η𝒎Hpk𝐡k[𝐱]𝒘k+𝒎H𝐳η),absent1𝐾subscript𝑘𝒦1𝜂superscript𝒎𝐻subscript𝑝𝑘subscript𝐡𝑘delimited-[]𝐱subscript𝒘𝑘superscript𝒎𝐻𝐳𝜂\displaystyle=\frac{1}{K}(\sum_{k\in\mathcal{K}}\frac{1}{\sqrt{\eta}}% \boldsymbol{m}^{H}{p}_{k}\mathbf{h}_{k}[\mathbf{x}]\boldsymbol{w}_{k}+\frac{% \boldsymbol{m}^{H}\mathbf{z}}{\sqrt{\eta}}),= divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ( ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_η end_ARG end_ARG bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_x ] bold_italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + divide start_ARG bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_z end_ARG start_ARG square-root start_ARG italic_η end_ARG end_ARG ) ,

where, 𝒎N×1𝒎superscript𝑁1\boldsymbol{m}\in\mathbb{C}^{N\times 1}bold_italic_m ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denotes the beamforming vector at the AP, and η𝜂\etaitalic_η represents the scaling factor for signal amplitude alignment.

According to [4, 3], in order to minimize the mean square error (MSE) in the OTA-FL system, the transmit power factor in each client is designed as follows:

pk=η(𝒎H𝒉k[𝒙])H|𝒎H𝒉k[𝒙]|2.subscript𝑝𝑘𝜂superscriptsuperscript𝒎𝐻subscript𝒉𝑘delimited-[]𝒙𝐻superscriptsuperscript𝒎𝐻subscript𝒉𝑘delimited-[]𝒙2\displaystyle{p}_{k}=\frac{\sqrt{\eta}\left(\boldsymbol{m}^{H}\boldsymbol{h}_{% k}[\boldsymbol{x}]\right)^{H}}{\left|\boldsymbol{m}^{H}\boldsymbol{h}_{k}[% \boldsymbol{x}]\right|^{2}}.italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG square-root start_ARG italic_η end_ARG ( bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_x ] ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT end_ARG start_ARG | bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_x ] | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (5)

We consider that the transmission power allocated to each UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT does not exceed the maximum transmission power limit pmaxsubscript𝑝maxp_{\text{max}}italic_p start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, expressed as:

1dpk2𝔼[𝒘k2]pmax,k𝒦.formulae-sequence1𝑑superscriptsubscript𝑝𝑘2𝔼delimited-[]superscriptnormsubscript𝒘𝑘2subscript𝑝for-all𝑘𝒦\frac{1}{d}p_{k}^{2}\mathbb{E}\left[\|\boldsymbol{w}_{k}\|^{2}\right]\leq p_{% \max},\quad\forall k\in\mathcal{K}.divide start_ARG 1 end_ARG start_ARG italic_d end_ARG italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ ∥ bold_italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_p start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , ∀ italic_k ∈ caligraphic_K . (6)

Moreover, under the assumption of full participation in FL to adhere to the maximum power budget for each client, the upper bound of η𝜂\etaitalic_η must satisfy the following condition:

ηdpmax|𝒎H𝒉k[𝒙]|2𝔼[𝒘k2],k𝒦.formulae-sequence𝜂𝑑subscript𝑝maxsuperscriptsuperscript𝒎𝐻subscript𝒉𝑘delimited-[]𝒙2𝔼delimited-[]superscriptnormsubscript𝒘𝑘2for-all𝑘𝒦\displaystyle\eta\leq\frac{dp_{\text{max}}\left|\boldsymbol{m}^{H}\boldsymbol{% h}_{k}[\boldsymbol{x}]\right|^{2}}{\mathbb{E}\left[\|\boldsymbol{w}_{k}\|^{2}% \right]},\quad\forall k\in\mathcal{K}.italic_η ≤ divide start_ARG italic_d italic_p start_POSTSUBSCRIPT max end_POSTSUBSCRIPT | bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_x ] | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG blackboard_E [ ∥ bold_italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG , ∀ italic_k ∈ caligraphic_K . (7)

III Convergence Analysis

In order to facilitate our analysis of convergence, we incorporate the following commonly accepted assumptions as discussed in [3, 1, 12]:

Assumption 1: The global loss function F(𝒘)𝐹𝒘F(\boldsymbol{w})italic_F ( bold_italic_w ) is \ellroman_ℓ-smooth. Namely, for any given model parameters 𝒘,𝒗d𝒘𝒗superscript𝑑\boldsymbol{w},\boldsymbol{v}\in\mathbb{R}^{d}bold_italic_w , bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, there exists a nonnegative constant \ellroman_ℓ, such that

F(𝒘)F(𝒗)(𝒘𝒗)TF(𝒗)+2𝒘𝒗2.𝐹𝒘𝐹𝒗superscript𝒘𝒗𝑇𝐹𝒗2superscriptnorm𝒘𝒗2F(\boldsymbol{w})-F(\boldsymbol{v})\leq(\boldsymbol{w}-\boldsymbol{v})^{T}% \nabla F(\boldsymbol{v})+\frac{\ell}{2}\|\boldsymbol{w}-\boldsymbol{v}\|^{2}.italic_F ( bold_italic_w ) - italic_F ( bold_italic_v ) ≤ ( bold_italic_w - bold_italic_v ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_F ( bold_italic_v ) + divide start_ARG roman_ℓ end_ARG start_ARG 2 end_ARG ∥ bold_italic_w - bold_italic_v ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (8)

Assumption 2: The loss function satisfies the Polyak-Lojasiewicz inequality, where F(𝒘)𝐹superscript𝒘F(\boldsymbol{\boldsymbol{w}}^{*})italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) denotes the optimal global loss value and μ>0𝜇0\mu>0italic_μ > 0, as follows:

F(𝒘)22μ[F(𝒘)F(𝒘)].superscriptnorm𝐹𝒘22𝜇delimited-[]𝐹𝒘𝐹superscript𝒘\|\nabla F(\boldsymbol{w})\|^{2}\geq 2\mu[F(\boldsymbol{w})-F(\boldsymbol{w}^{% *})].∥ ∇ italic_F ( bold_italic_w ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≥ 2 italic_μ [ italic_F ( bold_italic_w ) - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ] . (9)

Assumption 3: The upper limit of the model parameter for UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is denoted as Γ0Γ0\Gamma\geq 0roman_Γ ≥ 0. This ensures:

𝔼[𝒘k2]Γ,k𝒦.formulae-sequence𝔼delimited-[]superscriptnormsubscript𝒘𝑘2Γfor-all𝑘𝒦\mathbb{E}\left[\left\|\boldsymbol{w}_{k}\right\|^{2}\right]\leq\Gamma,\quad% \forall k\in\mathcal{K}.blackboard_E [ ∥ bold_italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ roman_Γ , ∀ italic_k ∈ caligraphic_K . (10)

Theorm 1: Under the conditions outlined in Assumptions III, III, and III, and setting the learning rate to 1/11/\ell1 / roman_ℓ, the optimality gap after T𝑇Titalic_T rounds of training is bounded as follows:

𝔼[F(𝒘T+1)]F(𝒘)𝔼delimited-[]𝐹subscript𝒘𝑇1𝐹superscript𝒘\displaystyle\mathbb{E}[F(\boldsymbol{w}_{T+1})]-F(\boldsymbol{w}^{*})blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ψT(𝔼[F(𝒘1)]F(𝒘))absentsuperscript𝜓𝑇𝔼delimited-[]𝐹subscript𝒘1𝐹superscript𝒘\displaystyle\leq\psi^{T}\left(\mathbb{E}[F(\boldsymbol{w}_{1})]-F(\boldsymbol% {w}^{*})\right)≤ italic_ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) (11)
+t=1TψTtΘt=ΦT(𝒎,𝒙),superscriptsubscript𝑡1𝑇superscript𝜓𝑇𝑡subscriptΘ𝑡subscriptΦ𝑇𝒎𝒙\displaystyle+\sum_{t=1}^{T}\psi^{T-t}\Theta_{t}=\Phi_{T}(\boldsymbol{m},% \boldsymbol{x}),+ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ψ start_POSTSUPERSCRIPT italic_T - italic_t end_POSTSUPERSCRIPT roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) ,

where, ΘtsubscriptΘ𝑡\Theta_{t}roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ψ𝜓\psiitalic_ψ can be represented as follows:

Θt=lσ2Γ2K2pmaxmaxk𝒦𝒎tH2|𝒎tH𝒉k[𝒙]|2,subscriptΘ𝑡𝑙superscript𝜎2Γ2superscript𝐾2subscript𝑝subscript𝑘𝒦superscriptnormsuperscriptsubscript𝒎𝑡𝐻2superscriptsuperscriptsubscript𝒎𝑡𝐻subscript𝒉𝑘delimited-[]𝒙2\displaystyle\Theta_{t}=\frac{l\sigma^{2}\Gamma}{2K^{2}p_{\max}}\max_{k\in% \mathcal{K}}\frac{\|\boldsymbol{m}_{t}^{H}\|^{2}}{|\boldsymbol{m}_{t}^{H}% \boldsymbol{h}_{k}[\boldsymbol{x}]|^{2}},roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_l italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Γ end_ARG start_ARG 2 italic_K start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_max start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT divide start_ARG ∥ bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG | bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_x ] | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (12)
ψ=1μl.𝜓1𝜇𝑙\displaystyle\psi=1-\frac{\mu}{l}.italic_ψ = 1 - divide start_ARG italic_μ end_ARG start_ARG italic_l end_ARG . (13)
Proof:

See Appendix. ∎

IV Problem Formulation

In this paper, our objective is to enhance learning performance in OTA-FL through the design of FA systems within dynamic environments. According to Theorem III, the optimality gap is influenced by the configuration of the beamforming vector and the FA locations in the training iterations. Thus, we formulate an optimization problem to jointly optimize 𝒎=[m1,,mN]T𝒎superscriptsubscript𝑚1subscript𝑚𝑁𝑇\boldsymbol{m}=[m_{1},\ldots,m_{N}]^{T}bold_italic_m = [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝒙=[x1,,xN]T𝒙superscriptsubscript𝑥1subscript𝑥𝑁𝑇\boldsymbol{x}=[x_{1},\ldots,x_{N}]^{T}bold_italic_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT for all t[1,,T]𝑡1𝑇t\in[1,\ldots,T]italic_t ∈ [ 1 , … , italic_T ], aiming to minimize the total optimality gap as follows:

𝒫1::subscript𝒫1absent\displaystyle\mathcal{P}_{1}:caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : min𝒎,𝒙ΦT(𝒎,𝒙)subscript𝒎𝒙subscriptΦ𝑇𝒎𝒙\displaystyle\min_{\boldsymbol{m},\boldsymbol{x}}\hskip 5.69046pt\Phi_{T}(% \boldsymbol{m},\boldsymbol{x})roman_min start_POSTSUBSCRIPT bold_italic_m , bold_italic_x end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) (14)
s.t.C1:0xnX,n[1,,N],\displaystyle\text{s.t.}\quad C_{1}:0\leq x_{n}\leq X,\ \ \ \ \ \ \forall n\in% [1,\ldots,N],s.t. italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : 0 ≤ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_X , ∀ italic_n ∈ [ 1 , … , italic_N ] ,
C2:x1<x2<<xN,:subscript𝐶2subscript𝑥1subscript𝑥2subscript𝑥𝑁\displaystyle\ \ \ \ \ \ C_{2}:x_{1}<x_{2}<\ldots<x_{N},italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < … < italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ,
C3:xnxn1>X0,n[2,,N],:subscript𝐶3formulae-sequencesubscript𝑥𝑛subscript𝑥𝑛1subscript𝑋0for-all𝑛2𝑁\displaystyle\ \ \ \ \ \ C_{3}:x_{n}-x_{n-1}>X_{0},\ \forall n\in[2,\ldots,N],italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT : italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n - 1 end_POSTSUBSCRIPT > italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ∀ italic_n ∈ [ 2 , … , italic_N ] ,

where, C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT constrains the permissible range for FA locations, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT determines the order of FA placements, and C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT enforces a minimum separation distance between adjacent FAs.

Due to the non-convex nature of the objective function and the stochasticity inherent in the dynamic environment, particularly in massive mobile IoT scenarios, traditional optimization methods face challenges in solving the original intractable problem 𝒫1subscript𝒫1\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In order to tackle this issue, we propose transforming problem 𝒫1subscript𝒫1\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT into an online optimization problem, subsequently reformulating it as an MDP. Based on Theorem III, the optimality gap at the t𝑡titalic_t-th training round, denoted as Φt(𝒎,𝒙)subscriptΦ𝑡𝒎𝒙\Phi_{t}(\boldsymbol{m},\boldsymbol{x})roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ), is bounded as follows:

Φt(𝒎,𝒙)subscriptΦ𝑡𝒎𝒙\displaystyle\Phi_{t}(\boldsymbol{m},\boldsymbol{x})roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) Φt1(𝒎,𝒙)absentsubscriptΦ𝑡1𝒎𝒙\displaystyle\leq\text{$\Phi_{t-1}(\boldsymbol{m},\boldsymbol{x})$}≤ roman_Φ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) (15)
+(ψtψt1)(𝔼[F(𝒘1)]F(𝒘))+Θt(𝒎,𝒙),superscript𝜓𝑡superscript𝜓𝑡1𝔼delimited-[]𝐹subscript𝒘1𝐹superscript𝒘subscriptΘ𝑡𝒎𝒙\displaystyle+(\psi^{t}-\psi^{t-1})(\mathbb{E}[F(\boldsymbol{w}_{1})]-F(% \boldsymbol{w}^{*}))+\Theta_{t}(\boldsymbol{m},\boldsymbol{x}),+ ( italic_ψ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_ψ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) ( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) + roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) ,

where (15) indicates that when ψ𝜓\psiitalic_ψ and the initial optimality gap (𝔼[F(𝒘1)]F(𝒘))𝔼delimited-[]𝐹subscript𝒘1𝐹superscript𝒘(\mathbb{E}[F(\boldsymbol{w}_{1})]-F(\boldsymbol{w}^{*}))( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) are known, the optimality gap is determined by ΘtsubscriptΘ𝑡\Theta_{t}roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the previous optimality gap Φt1(𝒎,𝒙)subscriptΦ𝑡1𝒎𝒙\Phi_{t-1}(\boldsymbol{m},\boldsymbol{x})roman_Φ start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ). Thus, the problem 𝒫1subscript𝒫1\mathcal{P}_{1}caligraphic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of minimizing the optimality gap after T𝑇Titalic_T communication rounds can be transformed into minimizing Θt(𝒎,𝒙)subscriptΘ𝑡𝒎𝒙\Theta_{t}(\boldsymbol{m},\boldsymbol{x})roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) in each round. This reformulation is expressed as follows:

𝒫2:min𝒎,𝒙Θt(𝒎,𝒙):subscript𝒫2subscript𝒎𝒙subscriptΘ𝑡𝒎𝒙\displaystyle\mathcal{P}_{2}:\min_{\boldsymbol{m},\boldsymbol{x}}\hskip 5.6904% 6pt\Theta_{t}(\boldsymbol{m},\boldsymbol{x})caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT : roman_min start_POSTSUBSCRIPT bold_italic_m , bold_italic_x end_POSTSUBSCRIPT roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ) (16)
s.t. C1,C2,C3.s.t. subscript𝐶1subscript𝐶2subscript𝐶3\displaystyle\ \ \ \ \text{s.t. }C_{1},C_{2},C_{3}.s.t. italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT .

V Proposed DRL Algorithm

To address Problem 𝒫2subscript𝒫2\mathcal{P}_{2}caligraphic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, we deploy a DRL agent on the AP to learn an optimal decision policy that simultaneously optimizes the beamforming vector 𝒎𝒎\boldsymbol{m}bold_italic_m and the FA locations 𝒙𝒙\boldsymbol{x}bold_italic_x in each training round in order to minimize Θt(𝒎,𝒙)subscriptΘ𝑡𝒎𝒙\Theta_{t}(\boldsymbol{m},\boldsymbol{x})roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ). Details of the MDP are provided below:

  • State Space: The state space at time slot t𝑡titalic_t consists of the distances dksubscript𝑑𝑘d_{k}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT between the FAs and the UEksubscriptUE𝑘\text{UE}_{k}UE start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and the AoA of the LoS paths ϕksubscriptitalic-ϕ𝑘\phi_{k}italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, k𝒦for-all𝑘𝒦\forall k\in\mathcal{K}∀ italic_k ∈ caligraphic_K. The state space can be expressed as: st=[[d1,,dK],[ϕ1,,ϕK]]subscript𝑠𝑡subscript𝑑1subscript𝑑𝐾subscriptitalic-ϕ1subscriptitalic-ϕ𝐾s_{t}=[[d_{1},\ldots,d_{K}],[\phi_{1},\ldots,\phi_{K}]]italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ [ italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_d start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] , [ italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ϕ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ].

  • Action space: The action space at each time slot t𝑡titalic_t comprises the beamforming vector and the locations of the FAs. Consequently, the action space at time slot t𝑡titalic_t can be expressed as:

    at=[[m1,,mN],[x1,,xN]].subscript𝑎𝑡subscript𝑚1subscript𝑚𝑁subscript𝑥1subscript𝑥𝑁a_{t}=[[m_{1},\ldots,m_{N}],[x_{1},\ldots,x_{N}]].italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] , [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] ] . (17)
  • Reward function: Based on (12), in order to minimize Θt(𝒎,𝒙)subscriptΘ𝑡𝒎𝒙\Theta_{t}(\boldsymbol{m},\boldsymbol{x})roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ), the reward function can be formulated as:

    r(st,at)={r1,𝒎=0,r2maxk𝒦(𝒎2|𝒎H𝐡k[𝒙]|2),otherwise,𝑟subscript𝑠𝑡subscript𝑎𝑡casessubscript𝑟1norm𝒎0otherwisesubscript𝑟2subscript𝑘𝒦superscriptnorm𝒎2superscriptsuperscript𝒎𝐻subscript𝐡𝑘delimited-[]𝒙2otherwiseotherwiser(s_{t},a_{t})=\begin{cases}r_{1},\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ \ \ \ \ \ \ \ \|\boldsymbol{m}\|=0,\\ r_{2}\max_{k\in\mathcal{K}}\left(\frac{\|\boldsymbol{m}\|^{2}}{|\boldsymbol{m}% ^{H}\mathbf{h}_{k}[\boldsymbol{x}]|^{2}}\right),\ \ \ \ \ \text{otherwise},% \end{cases}italic_r ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = { start_ROW start_CELL italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ∥ bold_italic_m ∥ = 0 , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT ( divide start_ARG ∥ bold_italic_m ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG | bold_italic_m start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [ bold_italic_x ] | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , otherwise , end_CELL start_CELL end_CELL end_ROW (18)

    where, the constants r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are negative values that require tuning during the simulation process to achieve better convergence. Notably, the reward function is formulated as a negative value. Therefore, by maximizing this (negative) reward, the agent effectively minimizes Θt(𝒎,𝒙)subscriptΘ𝑡𝒎𝒙\Theta_{t}(\boldsymbol{m},\boldsymbol{x})roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_m , bold_italic_x ).

Since the action space is continuous, we cannot use model-free value-based DRL algorithms such as deep Q-network (DQN), as they can only handle discrete action spaces. Instead, we utilize policy gradient-based reinforcement learning methods. The DDPG algorithm is a suitable off-policy actor-critic approach capable of managing continuous action spaces. However, the fully-connected deep neural networks (DNNs) employed in conventional DDPG are inadequate for capturing the temporal patterns of environmental dynamics, such as user mobility [13, 11]. Therefore, we adjust the RDPG approach by incorporating long short-term memory (LSTM) into the DDPG architecture to exploit temporal state patterns and adapt continuously to environmental dynamics.

The proposed RDPG algorithm utilizes four neural networks: an Actor Network (policy network) denoted by πϕsubscript𝜋italic-ϕ\pi_{\phi}italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT with parameter ϕitalic-ϕ\phiitalic_ϕ, which determines actions at=πϕ(st)+ξsubscript𝑎𝑡subscript𝜋italic-ϕsubscript𝑠𝑡𝜉a_{t}=\pi_{\phi}(s_{t})+\xiitalic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_ξ based on states stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where ξ𝜉\xiitalic_ξ is a random process added to actions for exploration; a Critic Network (Q-network) with parameters θ𝜃\thetaitalic_θ that computes Q-values Qθ(st,at;θ)subscript𝑄𝜃subscript𝑠𝑡subscript𝑎𝑡𝜃Q_{\theta}(s_{t},a_{t};\theta)italic_Q start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_θ ) for state-action pairs; a Target Actor Network, which is an older version of the actor network; and a Target Critic Network, which is an older version of the critic network.

In our optimization problem, the objective is to minimize the optimality gap by maximizing the expected reward r(st,at)𝑟subscript𝑠𝑡subscript𝑎𝑡r(s_{t},a_{t})italic_r ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) in each training round. The goal of the RDPG, given the state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, is to identify a policy that maximizes the expected cumulative reward, defined as:

π=argmaxπ𝔼st,at[t=0r(st,at)],superscript𝜋subscript𝜋subscript𝔼subscript𝑠𝑡subscript𝑎𝑡delimited-[]superscriptsubscript𝑡0𝑟subscript𝑠𝑡subscript𝑎𝑡\pi^{*}=\arg\max_{\pi}\mathbb{E}_{s_{t},a_{t}}\left[\sum_{t=0}^{\infty}r(s_{t}% ,a_{t})\right],italic_π start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_r ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] , (19)

to achieve this, the Actor Network is optimized based on the gradient of the objective function J(ϕ)𝐽italic-ϕJ(\phi)italic_J ( italic_ϕ ) as follows:

ϕJ(ϕ)=𝔼[atQθ1(st,at)|at=πϕ(st)ϕπϕ(st)],subscriptitalic-ϕ𝐽italic-ϕ𝔼delimited-[]evaluated-atsubscriptsubscript𝑎𝑡subscript𝑄subscript𝜃1subscript𝑠𝑡subscript𝑎𝑡subscript𝑎𝑡subscript𝜋italic-ϕsubscript𝑠𝑡subscriptitalic-ϕsubscript𝜋italic-ϕsubscript𝑠𝑡\nabla_{\phi}J(\phi)=\mathbb{E}\left[\nabla_{a_{t}}Q_{\theta_{1}}(s_{t},a_{t})% \bigg{|}_{a_{t}=\pi_{\phi}(s_{t})}\nabla_{\phi}\pi_{\phi}(s_{t})\right],∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_J ( italic_ϕ ) = blackboard_E [ ∇ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] , (20)

while the critic network is trained to minimize the loss function relative to the target value Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, defined as:

Yt=rt+γQθi(st+1,πϕ(st+1)+ξ).subscript𝑌𝑡subscript𝑟𝑡𝛾subscript𝑄superscriptsubscript𝜃𝑖subscript𝑠���1subscript𝜋superscriptitalic-ϕsubscript𝑠𝑡1𝜉Y_{t}=r_{t}+\gamma Q_{\theta_{i}^{\prime}}(s_{t+1},\pi_{\phi^{\prime}}(s_{t+1}% )+{\xi}).italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_γ italic_Q start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) + italic_ξ ) . (21)

The proposed RDPG method is described in Algorithm 1.

Initialize: experience replay memory M𝑀Mitalic_M, mini-batch size H𝐻Hitalic_H, the actor network πϕsubscript𝜋italic-ϕ\pi_{\phi}italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, the critic network Qθsubscript𝑄𝜃Q_{\theta}italic_Q start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with random values, and create the target networks by setting θθsuperscript𝜃𝜃\theta^{\prime}\leftarrow\thetaitalic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_θ and ϕϕsuperscriptitalic-ϕitalic-ϕ\phi^{\prime}\leftarrow\phiitalic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_ϕ.
Set: Set E𝐸Eitalic_E and T𝑇Titalic_T as the maximum number of episodes and episode length, respectively.
for each episode e:E:𝑒𝐸e:E{}italic_e : italic_E do
      Initialize the environment state s0subscript𝑠0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and the exploration noise ξ𝜉\xiitalic_ξ;
       for t=1:T:𝑡1𝑇t=1:Titalic_t = 1 : italic_T do
            Receive stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the environment;
             Obtain at=πϕ(st)+ξsubscript𝑎𝑡subscript𝜋italic-ϕsubscript𝑠𝑡𝜉a_{t}=\pi_{\phi}(s_{t})+\xiitalic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_ξ from the actor network and re-shape it;
             Obtain rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on equation (18);
             Observe the new state, st+1subscript𝑠𝑡1s_{t+1}italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT;
             Store transition (st,at,rt,st+1)subscript𝑠𝑡subscript𝑎𝑡subscript𝑟𝑡subscript𝑠𝑡1(s_{t},a_{t},r_{t},s_{t+1})( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) into M𝑀Mitalic_M;
            
       end for
      Randomly sample a H𝐻Hitalic_H mini-batch of transitions from M𝑀Mitalic_M;
       Compute the target function Ytsubscript𝑌𝑡Y_{t}italic_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT according to (21);
       Update the actor and critic networks using the Adam optimizer.
       Soft update the target actor and target critics with τ[0,1]𝜏01\tau\in[0,1]italic_τ ∈ [ 0 , 1 ], as the soft update coefficient:
ϕτϕ+(1τ)ϕ,θτθ+(1τ)θformulae-sequencesuperscriptitalic-ϕ𝜏italic-ϕ1𝜏superscriptitalic-ϕsuperscript𝜃𝜏𝜃1𝜏superscript𝜃\phi^{\prime}\leftarrow\tau\phi+(1-\tau)\phi^{\prime},\quad\theta^{\prime}% \leftarrow\tau\theta+(1-\tau)\theta^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_τ italic_ϕ + ( 1 - italic_τ ) italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_τ italic_θ + ( 1 - italic_τ ) italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
end for
Algorithm 1 The RDPG Algorithm

VI Simulation Results

In this section, we provide numerical results illustrating how combining FA arrays with the proposed RDPG algorithm can improve OTA-FL learning performance. We consider the distance between users and the AP to be uniformly distributed in the range [20,100]20100[20,100][ 20 , 100 ] m, and the AoAs to be uniformly distributed over [π/2,π/2]𝜋2𝜋2[-\pi/2,\pi/2][ - italic_π / 2 , italic_π / 2 ] rad. The parameters for the FA arrays are set with X0=0.5λsubscript𝑋00.5𝜆X_{0}=0.5\lambdaitalic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0.5 italic_λ and X=8λ𝑋8𝜆X=8\lambdaitalic_X = 8 italic_λ. The RDPG algorithm is configured with a learning rate of 0.0005, a replay buffer size of 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, a batch size of 64, a soft update parameter of 0.001, and a discount factor of 0.9. For performance evaluation, we consider two comparisons. Initially, we evaluate the performance of FA relative to FPA with a predetermined location vector 𝒙=[XN+1,,NXN+1]T𝒙superscript𝑋𝑁1𝑁𝑋𝑁1𝑇\boldsymbol{x}=\left[\frac{X}{N+1},\ldots,\frac{NX}{N+1}\right]^{T}bold_italic_x = [ divide start_ARG italic_X end_ARG start_ARG italic_N + 1 end_ARG , … , divide start_ARG italic_N italic_X end_ARG start_ARG italic_N + 1 end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. Subsequently, we compare the proposed RDPG algorithm with SAC [14] and DDPG [13], which are conventional DRL algorithms. Learning performance is evaluated by computing the average rewards over 100 episodes, which is determined at episode e𝑒eitalic_e by employing the formula Ravg(e)=1100i=e100eRisubscript𝑅avg𝑒1100superscriptsubscript𝑖𝑒100𝑒subscript𝑅𝑖{R_{\text{avg}}(e)}=\frac{1}{100}\sum_{i=e-100}^{e}R_{i}italic_R start_POSTSUBSCRIPT avg end_POSTSUBSCRIPT ( italic_e ) = divide start_ARG 1 end_ARG start_ARG 100 end_ARG ∑ start_POSTSUBSCRIPT italic_i = italic_e - 100 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_e end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT signifies the mean reward of episode i𝑖iitalic_i.

Refer to caption
Figure 1: Comparing DRL Algorithms in FA and FPA systems across (a) training episodes; (b) antenna numbers; and (c) client numbers.

Fig. 1 (a) demonstrates the convergence characteristics of different DRL algorithms, depicting the average rewards with solid curves and showing the standard deviations as shaded regions. The RDPG algorithm exhibits higher average rewards and lower variance compared to standard DRLs, demonstrating superior performance and improved stability in dynamic environments.

To assess the performance of the proposed algorithms across varying numbers of antennas, we kept the number of clients fixed and varied the antenna count in both the FA and FPA scenarios. As depicted in Fig. 1 (b), the average reward performance of all DRL methods improves with increasing N𝑁Nitalic_N, although this improvement diminishes as N𝑁Nitalic_N continues to increase. Furthermore, due to the increased DoF provided by antenna adjustments in FA systems, FAs consistently outperform FPAs at all values N𝑁Nitalic_N. Moreover, RDPG demonstrating superior performance over other DRL algorithms.

Fig. 1(c) provides a detailed comparison of the performance of FAs and the RDPG algorithm across varying numbers of users. It is evident from the results that as the number of users increases, there is a noticeable decrease in performance for both FA and FPA scenarios. This decline is attributed to the increased challenge of optimizing the beamforming vector and the antenna position vector of the AP in the presence of more dynamic users. Despite these challenges, FAs consistently outperform FPAs in all tested scenarios, highlighting the efficacy of FAs in enhancing OTA-FL system performance. Moreover, the RDPG algorithm consistently exhibits superior performance compared to other optimization methods in mitigating the adverse effects of dynamic user dynamics on system performance.

VII Conclusion

In this study, we investigated the integration of FAs into AP to improve the performance of OTA-FL systems. Our convergence analysis highlighted the significant impact of FA positions and the beamforming vector on the optimality gap. We addressed this issue with a non-convex optimization problem and proposed the RDPG algorithm for real-time optimization. Through simulations, we demonstrated that the OTA-FL system enhanced by FAs outperformed conventional FPA systems. Moreover, RDPG demonstrates superior performance and stability compared to existing methods, validating its effectiveness in dynamic environments.

Proof of Theorem III

In the t𝑡titalic_t-th communication round, the global model update is formulated by integrating (5) into (4), as follows:

𝒘t+1subscript𝒘𝑡1\displaystyle\boldsymbol{w}_{t+1}bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =1Mm𝒘m,t+1Mηt𝒎tH𝐳tabsent1𝑀subscript𝑚subscript𝒘𝑚𝑡1𝑀subscript𝜂𝑡superscriptsubscript𝒎𝑡𝐻subscript𝐳𝑡\displaystyle=\frac{1}{M}\sum_{m\in\mathcal{M}}\boldsymbol{w}_{m,t}+\frac{1}{M% \sqrt{\eta_{t}}}\boldsymbol{m}_{t}^{H}\mathbf{z}_{t}= divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG italic_M square-root start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (22)
=1Mm(𝒘tγF(𝒘t,𝒟m))+1Mηt𝒎tH𝐳tabsent1𝑀subscript𝑚subscript𝒘𝑡𝛾𝐹subscript𝒘𝑡subscript𝒟𝑚1𝑀subscript𝜂𝑡superscriptsubscript𝒎𝑡𝐻subscript𝐳𝑡\displaystyle=\frac{1}{M}\sum_{m\in\mathcal{M}}(\boldsymbol{w}_{t}-\gamma% \nabla{F}(\boldsymbol{w}_{t},\mathbf{\mathcal{D}}_{m}))+\frac{1}{M\sqrt{\eta_{% t}}}\boldsymbol{m}_{t}^{H}\mathbf{z}_{t}= divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_γ ∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ) + divide start_ARG 1 end_ARG start_ARG italic_M square-root start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=𝒘tγ(F(𝒘t)𝐞t),absentsubscript𝒘𝑡𝛾𝐹subscript𝒘𝑡subscript𝐞𝑡\displaystyle=\boldsymbol{w}_{t}-\gamma(\nabla{F}(\boldsymbol{w}_{t})-\mathbf{% e}_{t}),= bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_γ ( ∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - bold_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ,

where, F(𝒘t)=1MmFm(𝒘t,𝒟m)𝐹subscript𝒘𝑡1𝑀subscript𝑚subscript𝐹𝑚subscript𝒘𝑡subscript𝒟𝑚\nabla{F}(\boldsymbol{w}_{t})=\frac{1}{M}\sum_{m\in\mathcal{M}}{\nabla{F}_{m}(% \boldsymbol{w}_{t},\mathbf{\mathcal{D}}_{m})}∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT ∇ italic_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) is the global gradient and 𝐞t=𝒎tH𝐳tγMηtsubscript𝐞𝑡superscriptsubscript𝒎𝑡𝐻subscript𝐳𝑡𝛾𝑀subscript𝜂𝑡\mathbf{e}_{t}=\frac{\boldsymbol{m}_{t}^{H}\mathbf{z}_{t}}{\gamma M\sqrt{\eta_% {t}}}bold_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_γ italic_M square-root start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG is the model aggregation error caused by wireless communication in the t𝑡titalic_t-th communication round. By applying (22) in Assumption III, and setting η=1l𝜂1𝑙\eta=\frac{1}{l}italic_η = divide start_ARG 1 end_ARG start_ARG italic_l end_ARG, after taking expectation, we have:

𝔼[F(𝒘t+1)]𝔼delimited-[]𝐹subscript𝒘𝑡1\displaystyle\mathbb{E}[{F}(\boldsymbol{w}_{t+1})]blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] 𝔼[F(𝒘t)]12lF(𝒘t)22+12l𝔼[𝐞t22]absent𝔼delimited-[]𝐹subscript𝒘𝑡12𝑙superscriptsubscriptnorm𝐹subscript𝒘𝑡2212𝑙𝔼delimited-[]superscriptsubscriptnormsubscript𝐞𝑡22\displaystyle\leq\mathbb{E}[{F}(\boldsymbol{w}_{t})]-\frac{1}{2l}\|\nabla{F}(% \boldsymbol{w}_{t})\|_{2}^{2}+\frac{1}{2l}\mathbb{E}[\|\mathbf{e}_{t}\|_{2}^{2}]≤ blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] - divide start_ARG 1 end_ARG start_ARG 2 italic_l end_ARG ∥ ∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_l end_ARG blackboard_E [ ∥ bold_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (23)
𝔼[F(𝒘t)]12lF(𝒘t)22+l2dσ2𝒎tH22ηM2.absent𝔼delimited-[]𝐹subscript𝒘𝑡12𝑙superscriptsubscriptnorm𝐹subscript𝒘𝑡22𝑙2𝑑superscript𝜎2superscriptsubscriptnormsuperscriptsubscript𝒎𝑡𝐻22𝜂superscript𝑀2\displaystyle\leq\mathbb{E}[{F}(\boldsymbol{w}_{t})]-\frac{1}{2l}\|\nabla{F}(% \boldsymbol{w}_{t})\|_{2}^{2}+\frac{l}{2}\frac{d\sigma^{2}\|\boldsymbol{m}_{t}% ^{H}\|_{2}^{2}}{\eta M^{2}}.≤ blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] - divide start_ARG 1 end_ARG start_ARG 2 italic_l end_ARG ∥ ∇ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_l end_ARG start_ARG 2 end_ARG divide start_ARG italic_d italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ bold_italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_η italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

By applying the upper bound of ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT based on (LABEL:eta) and employing Assumption III and III, then subtracting F(𝒘)𝐹superscript𝒘F(\boldsymbol{w}^{*})italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) from both sides of (23), we have:

𝔼[F(wt+1)]F(w)(1μl)(𝔼[F(wt)]F(w))+lσ2Γ2M2pmaxmaxk𝒦mtH22|mtHhm[x]|2.𝔼delimited-[]𝐹subscript𝑤𝑡1𝐹superscript𝑤1𝜇𝑙𝔼delimited-[]𝐹subscript𝑤𝑡𝐹superscript𝑤𝑙superscript𝜎2Γ2superscript𝑀2subscript𝑝subscript𝑘𝒦superscriptsubscriptnormsuperscriptsubscript𝑚𝑡𝐻22superscriptsuperscriptsubscript𝑚𝑡𝐻subscript𝑚delimited-[]𝑥2\mathbb{E}[F(w_{t+1})]-F(w^{*})\leq(1-\frac{\mu}{l})(\mathbb{E}[F(w_{t})]-F(w^% {*}))\\ +\frac{l\sigma^{2}\Gamma}{2M^{2}p_{\max}}\max_{k\in\mathcal{K}}\frac{\|m_{t}^{% H}\|_{2}^{2}}{|m_{t}^{H}h_{m}[x]|^{2}}.start_ROW start_CELL blackboard_E [ italic_F ( italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ] - italic_F ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ ( 1 - divide start_ARG italic_μ end_ARG start_ARG italic_l end_ARG ) ( blackboard_E [ italic_F ( italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] - italic_F ( italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL + divide start_ARG italic_l italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_Γ end_ARG start_ARG 2 italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT end_ARG roman_max start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT divide start_ARG ∥ italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG | italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT [ italic_x ] | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . end_CELL end_ROW (24)

By recursively operating on (24) and using the definitions of ΘtsubscriptΘ𝑡\Theta_{t}roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ψ𝜓\psiitalic_ψ in (12) and (13), the cumulative optimality gap is calculated as follows:

𝔼[F(𝒘T+1)\displaystyle\mathbb{E}[F(\boldsymbol{w}_{T+1})blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) ]F(𝒘)ψ(𝔼[F(𝒘T)]F(𝒘)+ΘT\displaystyle]-F(\boldsymbol{w}^{*})\leq\psi(\mathbb{E}[F(\boldsymbol{w}_{T})]% -F(\boldsymbol{w}^{*})+\Theta_{T}] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ italic_ψ ( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + roman_Θ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (25)
ψ(ψ(𝔼[F(𝒘T1)]F(𝒘))+ΘT1)+ΘT)\displaystyle\leq\psi(\psi(\mathbb{E}[F(\boldsymbol{w}_{T-1})]-F(\boldsymbol{w% }^{*}))+\Theta_{T-1})+\Theta_{T})≤ italic_ψ ( italic_ψ ( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) + roman_Θ start_POSTSUBSCRIPT italic_T - 1 end_POSTSUBSCRIPT ) + roman_Θ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT )
ψT(𝔼[F(𝒘1)]F(𝒘))+t=1TψTtΘt.absentsuperscript𝜓𝑇𝔼delimited-[]𝐹subscript𝒘1𝐹superscript𝒘superscriptsubscript𝑡1𝑇superscript𝜓𝑇𝑡subscriptΘ𝑡\displaystyle\leq\ldots\leq\psi^{T}(\mathbb{E}[F(\boldsymbol{w}_{1})]-F(% \boldsymbol{w}^{*}))+\sum_{t=1}^{T}\psi^{T-t}\Theta_{t}.≤ … ≤ italic_ψ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( blackboard_E [ italic_F ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] - italic_F ( bold_italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_ψ start_POSTSUPERSCRIPT italic_T - italic_t end_POSTSUPERSCRIPT roman_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT .

Proof of Theorem III is complete.

Appendix A References Section

References

  • [1] J. Yao and et al., “Secure federated learning by power control for internet of drones,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 4, pp. 1021–1031, Apr. 2021.
  • [2] Y. Wang and et al., “Learning in the air: Secure federated learning for UAV-assisted crowdsensing,” IEEE Transactions on Network Science and Engineering, vol. 8, no. 2, pp. 1055–1069, Aug. 2020.
  • [3] Z. Wang and et al., “Federated learning via intelligent reflecting surface,” IEEE Transactions on Wireless Communications, vol. 21, no. 2, pp. 808–822, Jul. 2021.
  • [4] C. Chen and et al., “Joint client selection and receive beamforming for over-the-air federated learning with energy harvesting,” IEEE Open Journal of the Communications Society, vol. 4, pp. 1127–1140, May. 2023.
  • [5] G. Hu and et al., “Fluid Antennas-Enabled Multiuser Uplink: A Low-Complexity Gradient Descent for Total Transmit Power Minimization,” IEEE Communications Letters, vol. 28, no. 3, pp. 602–606, Jan. 2024.
  • [6] D. Zhang and et al., “Fluid antenna array enhanced over-the-air computation,” IEEE Wireless Communications Letters, vol. 13, no. 6, pp. 1541–1545, Mar. 2024.
  • [7] H. Qin and et al., “Antenna positioning and beamforming design for fluid antenna-assisted multi-user downlink communications,” IEEE Wireless Communications Letters, vol. 13, pp. 1073–1077, Jan. 2024.
  • [8] Z. Cheng and et al., “Sum-rate maximization for fluid antenna enabled multiuser communications,” IEEE Communications Letters, vol. 28, no. 5, pp. 1206–1210, Mar. 2024.
  • [9] Y. Zuo and et al., “Fluid Antenna for Mobile Edge Computing,” IEEE Communications Letters, May. 2024.
  • [10] N. Waqar and et al., “Opportunistic fluid antenna multiple access via team-inspired reinforcement learning,” IEEE Transactions on Wireless Communications, Apr. 2024.
  • [11] X. Fan and et al., “UAV-Enabled Federated Learning in Dynamic Environments: Efficiency and Security Trade-off,” IEEE Transactions on Vehicular Technology, vol. 73, no. 5, pp. 6993–7006, Dec. 2023.
  • [12] X. Cao and et al., “Transmission power control for over-the-air federated averaging at network edge,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 5, pp. 1571–1586, Jan. 2022.
  • [13] S. Sheikhzadeh and et al., “AI-based secure NOMA and cognitive radio-enabled green communications: Channel state information and battery value uncertainties,” IEEE Transactions on Green Communications and Networking, vol. 6, no. 2, pp. 1037–1054, Dec. 2021.
  • [14] V. Konda and et al., “Actor-critic algorithms,” Advances in neural information processing systems, vol. 12, pp. 1008–1014, Dec. 1999.