Joint Beamforming Design and Bit Allocation in Massive MIMO with Resolution-Adaptive ADCs

Mengyuan Ma, , Nhan Thanh Nguyen, ,
Italo Atzeni, , and Markku Juntti
The authors are with the Centre for Wireless Communications, University of Oulu, Finland (e-mail: {mengyuan.ma, nhan.nguyen, italo.atzeni, markku.juntti}@oulu.fi). This work was supported by the Research Council of Finland (318927 6G Flagship, 332362 EERA, 336449 Profi6, 348396 HIGH-6G, and 357504 EETCAMD).
Abstract

Low-resolution analog-to-digital converters (ADCs) have emerged as a promising technology for reducing power consumption and complexity in massive multiple-input multiple-output (MIMO) systems while maintaining satisfactory spectral and energy efficiencies (SE/EE). In this work, we first identify the essential properties of optimal quantization and leverage them to derive a closed-form approximation of the covariance matrix of the quantization distortion. The theoretical finding facilitates the system SE analysis in the presence of low-resolution ADCs. We then focus on the joint optimization of the transmit-receive beamforming and bit allocation to maximize the SE under constraints on the transmit power and the total number of active ADC bits. To solve the resulting mixed-integer problem, we first develop an efficient beamforming design for fixed ADC resolutions. Then, we propose a low-complexity heuristic algorithm to iteratively optimize the ADC resolutions and beamforming matrices. Numerical results for a 64×64646464\times 6464 × 64 MIMO system demonstrate that the proposed design offers 6%percent66\%6 % improvement in both SE and EE with 40%percent4040\%40 % fewer active ADC bits compared with the uniform bit allocation. Furthermore, we numerically show that receiving more data streams with low-resolution ADCs can achieve higher SE and EE compared to receiving fewer data streams with high-resolution ADCs.

Index Terms:
Beamforming, bit allocation, massive MIMO, low-resolution ADCs, spectral efficiency, energy efficiency

I Introduction

Massive multiple-input multiple-output (MIMO) is a crucial physical-layer technology for wireless communications at both sub-6GHz and millimeter wave (mmWave) frequencies [heath2016overview], addressing the increasing demand for high data rates [jiang2021road]. The large number of antenna elements in massive MIMO significantly improves spatial multiplexing gain through beamforming techniques. Digital beamforming (DBF) architectures, which deploy a dedicated radio-frequency (RF) chain for each antenna element, can enable high spectral efficiency (SE) but incur substantial energy costs due to power-intensive RF components, especially analog-to-digital converters (ADCs). For instance, a high-speed ADC operating at 1111 Gsample/s with high resolution (e.g., 888812121212 bits) can consume several Watts [li2017channel]. Furthermore, its power consumption increases linearly with the signal bandwidth and exponentially with the number of resolution bits [murmann2015race, Atz21_hw], posing a significant challenge to the system’s energy efficiency (EE). Consequently, the integration of low-resolution ADCs and DBF has emerged as an effective strategy to curtail power consumption without unduly compromising the SE [liu2019low].

Another attractive solution in this regard is to utilize hybrid beamforming (HBF) architectures, where a small number of RF chains is connected to the antenna array through a network of phase shifters or switches [mendez2016hybrid, ma2021closed, ma2022switch]. However, HBF architectures have limited multiplexing capabilities and strongly depend on the calibration of the analog components [roth2018comparison]. Consequently, DBF requires less circuit cost to achieve a SE similar to HBF [yan2019performance], which makes the former more energy efficient, especially when using low-resolution ADCs [roth2018comparison, castaneda2021resolution]. The water-filling (WF) power allocation achieves the capacity of a full-resolution MIMO system with perfect channel state information (CSI) at both the transmitter and receiver [tse2005fundamentals]. However, it becomes suboptimal in the presence of quantization, necessitating a more efficient design. Furthermore, adopting resolution-adaptive ADCs can enable order-of-magnitude power savings for realistic mmWave channels [castaneda2021spawc]. These considerations motivate us to focus on the design and analysis of fully digital architectures with resolution-adaptive ADCs.

I-A Prior Works

Recent years have witnessed a proliferation of studies on low-resolution massive MIMO transceivers, exploring various quantization techniques including one-bit, multi-bit, mixed-bit, and variable-bit quantization. One-bit quantized systems have been extensively investigated in the literature due to their simplicity and tractability [singh2009limits, mo2015capacity, mezghani2008analysis, mezghani2007ultra, li2017channel, atzeni2021channel]. Specifically, Mo et al. [mo2015capacity] derived the exact channel capacity with perfect CSI at both the transmitter and receiver of a multi-input single-output system. It was shown in [mezghani2008analysis] that with only receive CSI, quadrature phase-shift keying (QPSK) signaling is the capacity-achieving distribution in single-input single-output systems, unlike in the full-resolution case where a Gaussian codebook is optimal. At low signal-to-noise ratio (SNR), the mutual information of a one-bit quantized MIMO system decreases by a factor of 2π2𝜋\frac{2}{\pi}divide start_ARG 2 end_ARG start_ARG italic_π end_ARG compared with a full-resolution one [mezghani2007ultra]. Although one-bit quantization has low power consumption and hardware cost, it significantly limits the SE performance [orhan2015low, li2017channel]. In this regard, it was shown in [li2017channel] that an error floor exists at high SNR for the channel estimator due to coarse quantization and that at least 22223333 times the number of antennas is required to attain an SE comparable to that of a full-resolution system.

The limitations of one-bit quantization have sparked widespread interest and research on low-resolution systems with multi-bit (22224444 bits) quantization. It was shown in [singh2009limits, jacobsson2017throughput] that a system using very few bits can approach the performance of a full-resolution one. Mezghani et al. [mezghani2012capacity] derived a closed-form lower bound for the capacity of a point-to-point MIMO system. More recent works focused on beamforming designs [mezghani2009transmit, jacobsson2017quantized, ling2019performance]. Furthermore, mixed-ADC systems, which simultaneously deploy one-bit and high-resolution ADCs, are shown to perform better than fixed-resolution architectures, especially at high SNR [zhang2016mixed, zhang2017performance, pirzadeh2018spectral]. On the other hand, variable-resolution ADCs have been studied in [bai2013optimization, ahmed2017joint, choi2017resolution, nguyen2020energy, prasad2020optimizing, castaneda2021resolution, castaneda2021spawc] to flexibly balance the SE-EE tradeoff of low-resolution systems. For instance, it was shown in [bai2013optimization, ahmed2017joint, choi2017resolution, nguyen2020energy] that efficient bit allocation strategies can offer a higher EE compared with uniform-resolution architectures. Castañeda et al. [castaneda2021resolution] developed a resolution-adaptive fully digital receiver within an application-specific integrated circuit (ASIC). Furthermore, they demonstrated that a 256256256256-antenna base station with resolution-adaptive ADCs serving 16161616 users allows to reduce the power consumption by 6.76.76.76.7 times compared with a traditional fixed-resolution design [castaneda2021spawc]. Additionally, the gain in SE can be achieved by jointly optimizing the transmit power and ADC resolutions [prasad2020optimizing].

Many of the aforementioned works utilize the arcsine law [jacovitti1994estimation] to facilitate the analysis and design of one-bit systems. For systems with a few bits, two primary methods are used to model quantization, i.e., the additive quantization noise model (AQNM) [gersho2012vector, fletcher2007robust, orhan2015low] and the Bussgang decomposition [bussgang1952crosscorrelation]. Both the two approaches approximate the (nonlinear) quantization function with a linear model. However, in the literature, there are two distinct linear approximations referred to as the AQNM. The first is [gersho2012vector]

Q(X)=X+q,𝑄𝑋𝑋𝑞Q(X)=X+q,italic_Q ( italic_X ) = italic_X + italic_q , (1)

where Q()𝑄Q(\cdot)italic_Q ( ⋅ ) and q𝑞qitalic_q denote the quantization function quantization error, respectively. The second is [fletcher2007robust]

Q(X)=αX+η,𝑄𝑋𝛼𝑋𝜂Q(X)=\alpha X+\eta,italic_Q ( italic_X ) = italic_α italic_X + italic_η , (2)

where α𝛼\alphaitalic_α is a constant depending on the quantizers and on the distribution of X𝑋Xitalic_X, and η𝜂\etaitalic_η represents the quantization distortion (QD). Both (1) and (2) can be employed to analyze the worst-case system performance [diggavi2001worst, hassibi2003much] assuming that q𝑞qitalic_q or η𝜂\etaitalic_η is a Gaussian variable uncorrelated with X𝑋Xitalic_X. Model (2) was first derived in [fletcher2007robust] and named AQNM later in [orhan2015low]; it was also called the pseudo-quantization noise model in [zhang2016mixed]. Although (2) and Bussgang decomposition were developed from separate technical lineages, it was shown in [demir2020bussgang] that the former is nothing but the latter tailored for the case of quantization. Therefore, we call the model in (2) as the Bussgang-based AQNM (BAQNM) while we refer to (1) as the AQNM for distinction. The AQNM is typically less accurate than the BAQNM because the assumption that q𝑞qitalic_q is uncorrelated with X𝑋Xitalic_X is generally not satisfied. In contrast, η𝜂\etaitalic_η is uncorrelated with X𝑋Xitalic_X based on the properties of the Bussgang decomposition. Furthermore, the QD covariance is a key ingredient for the performance analysis and optimization with the BAQNM. A diagonal approximation of the QD covariance matrix was derived in [mezghani2012capacity, bai2013optimization], which has since then been widely used in the literature. However, the error arising from this diagonal approximation can be substantial in some cases, raising a major concern about the reliability of the corresponding results [demir2020bussgang, prasad2020optimizing].

I-B Contributions

Previous works [mezghani2009transmit, jacobsson2017quantized, ling2019performance, bai2013optimization, choi2017resolution, nguyen2020energy] focus on either beamforming design [mezghani2009transmit, jacobsson2017quantized, ling2019performance] or bit allocation [bai2013optimization, choi2017resolution, nguyen2020energy]. The joint optimization of the two aspects is promising to achieve higher SE and provide deeper insights into the SE-EE tradeoff, as shown in [ahmed2017joint, prasad2020optimizing]. However, the transmitter design was not considered in [ahmed2017joint], whereas in [prasad2020optimizing] the receive beamforming was omitted. Unlike previous studies, this paper focuses on analyzing the BAQNM and the quantization distortion, alongside the joint design of the transmit-receive beamforming and bit allocation for point-to-point MIMO systems utilizing resolution-adaptive ADCs. The specific contributions of this paper are summarized as follows:

  • We first identify the essential properties of optimal quantization. Leveraging these properties and the Bussgang decomposition, we reestablish the BAQNM and the diagonal approximation of the QD covariance matrix, offering a new perspective compared to [mezghani2012capacity, bai2013optimization]. The analysis shows that the BAQNM and the QD covariance approximation typically hold under the assumption of Gaussian signals undergoing optimal quantization. Furthermore, we examine the connections between applying BAQNM and the arcsine law to one-bit quantization. The consistency in results obtained from these two methods validates our findings.

  • Building upon the above theoretical findings, we consider the joint transmit-receive beamforming design and bit allocation problem to maximize the SE subject to the constraints on the transmit power budget and total active bits of ADCs. This design problem is inherently complex due to its mixed-integer nature. We address this by first determining the beamformer under fixed ADC resolutions. Subsequently, we propose a low-complexity algorithm to iteratively optimize the ADC resolutions and the beamforming matrices.

  • Extensive numerical simulations verify the superiority of the proposed schemes. Specifically, the results show that the proposed beamforming design significantly outperforms conventional WF solutions in low-resolution systems, especially with one-bit quantization and high SNR. Furthermore, the benefit from bit allocation is clearly demonstrated. For example, in a 64×64646464\times 6464 × 64 MIMO system, the proposed design offers 6%percent66\%6 % improvement in both SE and EE, while requiring 40%percent4040\%40 % fewer active ADC bits compared with uniform bit allocation. When using a total of 128128128128 bits over the 64646464 RF chains, the former achieves improvements of 49%percent4949\%49 % in SE and 39%percent3939\%39 % in EE compared to the latter. Moreover, the SE-EE comparison shows that receiving more data streams with low-resolution ADCs can achieve higher SE and EE than receiving fewer data streams with high-resolution ADCs.

I-C Organization and Notations

The rest of this paper is organized as follows. In Section II, we present the signal model and quantization model. The BAQNM and the approximation of the QD covariance are then derived in Section III. We delve into the joint transmit-receive beamforming and bit allocation design in Section LABEL:sec:transceiver_design. Finally, we provide simulation results and conclusions in Sections LABEL:sec:simulation and LABEL:sec:conclusion, respectively.

Scalars, vectors, and matrices are denoted by the lowercase, boldface lowercase, and boldface uppercase letters, respectively. Furthermore, we use ()superscript\left(\cdot\right)^{*}( ⋅ ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, ()𝖳superscript𝖳\left(\cdot\right)^{\scriptscriptstyle\mathsf{T}}( ⋅ ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT, ()𝖧superscript𝖧\left(\cdot\right)^{\scriptscriptstyle\mathsf{H}}( ⋅ ) start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT, and ()1superscript1\left(\cdot\right)^{-1}( ⋅ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT to represent the conjugate, transpose, conjugate transpose, and matrix inverse operators, respectively. \left\|\cdot\right\|_{\mathcal{F}}∥ ⋅ ∥ start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT signify the Frobenius norm for matrices. In addition, the expectation and trace operators are represented by 𝔼()𝔼{\mathbb{E}}\left(\cdot\right)blackboard_E ( ⋅ ) and Tr()Tr{\rm Tr}\left(\cdot\right)roman_Tr ( ⋅ ). We use |a|𝑎\left|a\right|| italic_a | and det(𝐀)𝐀\det\left(\mathbf{A}\right)roman_det ( bold_A ) to denote the absolute value of the scalar a𝑎aitalic_a and the determinant of matrix 𝐀𝐀\mathbf{A}bold_A, respectively. The real and imaginary part operators are denoted by {}\Re\{\cdot\}roman_ℜ { ⋅ } and {}\Im\{\cdot\}roman_ℑ { ⋅ }, respectively. Moreover, diag(𝐚)diag𝐚{\rm diag}(\mathbf{a})roman_diag ( bold_a ) yields a diagonal matrix with its diagonal entries being the elements of 𝐚𝐚\mathbf{a}bold_a, while diag(𝐀)diag𝐀{\rm diag}(\mathbf{A})roman_diag ( bold_A ) returns a vector with its elements being the diagonal entries of 𝐀𝐀\mathbf{A}bold_A. Finally, we use 𝐂xysubscript𝐂𝑥𝑦\mathbf{C}_{xy}bold_C start_POSTSUBSCRIPT italic_x italic_y end_POSTSUBSCRIPT and 𝐂xsubscript𝐂𝑥\mathbf{C}_{x}bold_C start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT to represent the cross-covariance matrix between 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y and the auto-covariance matrix of 𝐱𝐱\mathbf{x}bold_x, respectively.

II System Model

II-A System Model

We consider a point-to-point MIMO system where a transmitter (Tx) with Ntsubscript𝑁tN_{\rm t}italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT antennas communicates with a receiver (Rx) with Nrsubscript𝑁rN_{\rm r}italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT antennas. We assume that the Tx is equipped with high-resolution digital-to-analog converters while low-resolution ADCs are deployed at the Rx. Let 𝐬Ns𝐬superscriptsubscript𝑁s\mathbf{s}\in{\mathbb{C}}^{N_{\rm s}}bold_s ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (Nsmin(Nt,Nr)subscript𝑁ssubscript𝑁tsubscript𝑁rN_{\rm s}\leq\min(N_{\rm t},N_{\rm r})italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ≤ roman_min ( italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT )) be the transmitted signal vector. We assume that 𝐬𝐬\mathbf{s}bold_s follows the Gaussian distribution and 𝔼[𝐬𝐬𝖧]=𝐈𝔼delimited-[]superscript𝐬𝐬𝖧𝐈{\mathbb{E}}[\mathbf{s}\mathbf{s}^{\scriptscriptstyle\mathsf{H}}]=\mathbf{I}blackboard_E [ bold_ss start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ] = bold_I. Furthermore, let 𝐅Nt×Ns𝐅superscriptsubscript𝑁tsubscript𝑁s\mathbf{F}\in{\mathbb{C}}^{N_{\rm t}\times N_{\rm s}}bold_F ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be the precoding matrix with the power constraint 𝐅2Ptsuperscriptsubscriptnorm��2subscript𝑃t\|\mathbf{F}\|_{\mathcal{F}}^{2}\leq P_{\rm t}∥ bold_F ∥ start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_P start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT. Here, Ptsubscript𝑃tP_{\rm t}italic_P start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT denotes the transmit power budget of the Tx. The received signal (without quantization) at the Rx can be written as

𝐲=𝐇𝐅𝐬+𝐧,𝐲𝐇𝐅𝐬𝐧\mathbf{y}=\mathbf{H}\mathbf{F}\mathbf{s}+\mathbf{n},bold_y = bold_HFs + bold_n , (3)

where 𝐇Nr×Nt𝐇superscriptsubscript𝑁rsubscript𝑁t\mathbf{H}\in{\mathbb{C}}^{N_{\rm r}\times N_{\rm t}}bold_H ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the channel between the Tx and the Rx, and 𝐧𝐧\mathbf{n}bold_n denotes the additive white Gaussian noise (AWGN) vector, 𝐧𝒞𝒩(0,σn2𝐈)similar-to𝐧𝒞𝒩0superscriptsubscript𝜎n2𝐈\mathbf{n}\sim\mathcal{C}\mathcal{N}(0,\sigma_{\rm n}^{2}\mathbf{I})bold_n ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ), with σn2superscriptsubscript𝜎n2\sigma_{\rm n}^{2}italic_σ start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT being the noise power. Here, we assume that 𝐇𝐇\mathbf{H}bold_H is quasi-flat during each coherence time. Furthermore, to characterize the system performance bound, we assume the availability of perfect CSI at both the Rx and the Tx [mo2015capacity, ling2019performance]. Channel estimation with adaptive-resolution ADCs was studied in [wang2022channel]. Furthermore, an ASIC receiver integrating both resolution-adaptive ADCs and a channel estimation module was developed in [castaneda2021resolution].

II-B Signal Model with Quantization

We denote the codebook of a scalar quantizer of b𝑏bitalic_b bits as 𝒞={c0,,cNq1}𝒞subscript𝑐0subscript𝑐subscript𝑁q1\mathcal{C}=\{c_{0},\ldots,c_{N_{\rm q}-1}\}caligraphic_C = { italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT }, where Nq=2bsubscript𝑁qsuperscript2𝑏N_{\rm q}=2^{b}italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT = 2 start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT is the number of output levels of the quantizer. The set of quantization thresholds is 𝒯={t0,,tNq}𝒯subscript𝑡0subscript𝑡subscript𝑁q\mathcal{T}=\{t_{0},\ldots,t_{N_{\rm q}}\}caligraphic_T = { italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, where t0=subscript𝑡0t_{0}=-\inftyitalic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = - ∞ and tNq=subscript𝑡subscript𝑁qt_{N_{\rm q}}=\inftyitalic_t start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∞ allows inputs with arbitrary power.111In practice, the input signal of ADCs outside the range [t1,tNq1]subscript𝑡1subscript𝑡subscript𝑁q1[t_{1},t_{N_{\rm q}-1}][ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ] can be clipped into the range of [t1δ,tNq1+δ]subscript𝑡1𝛿subscript𝑡subscript𝑁q1𝛿[t_{1}-\delta,t_{N_{\rm q}-1}+\delta][ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_δ , italic_t start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT + italic_δ ] where δ𝛿\deltaitalic_δ is an adjustable parameter depending on the constraints of hardware components, e.g., the automatic gain control (AGC). Let Q()𝑄Q(\cdot)italic_Q ( ⋅ ) denote the quantization function associated with 𝒞𝒞\mathcal{C}caligraphic_C and 𝒯𝒯\mathcal{T}caligraphic_T. For a complex signal x𝑥xitalic_x, we have Q(x)=Q({x})+jQ({x})𝑄𝑥𝑄𝑥𝑗𝑄𝑥Q(x)=Q(\Re\{x\})+jQ(\Im\{x\})italic_Q ( italic_x ) = italic_Q ( roman_ℜ { italic_x } ) + italic_j italic_Q ( roman_ℑ { italic_x } ), with Q({x})=cI({x})𝑄𝑥subscript𝑐𝐼𝑥Q(\Re\{x\})=c_{I(\Re\{x\})}italic_Q ( roman_ℜ { italic_x } ) = italic_c start_POSTSUBSCRIPT italic_I ( roman_ℜ { italic_x } ) end_POSTSUBSCRIPT, where I({x})=i{0,,Nq1}𝐼𝑥𝑖0subscript𝑁q1I(\Re\{x\})=i\in\{0,\ldots,N_{\rm q}-1\}italic_I ( roman_ℜ { italic_x } ) = italic_i ∈ { 0 , … , italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 } for {x}[ti,ti+1]𝑥subscript𝑡𝑖subscript𝑡𝑖1\Re\{x\}\in[t_{i},t_{i+1}]roman_ℜ { italic_x } ∈ [ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ]. Q({x})𝑄𝑥Q(\Im\{x\})italic_Q ( roman_ℑ { italic_x } ) is obtained in a similar way.

The Bussgang decomposition applied to a vector space in the complex domain is presented in [demir2020bussgang]. Specifically, let 𝐐:NN:𝐐superscript𝑁superscript𝑁\mathbf{Q}:{\mathbb{C}}^{N}\rightarrow{\mathbb{C}}^{N}bold_Q : blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT denote a scalar quantization function and 𝐳𝐳\mathbf{z}bold_z be the quantized output of 𝐲𝐲\mathbf{y}bold_y. We can write 𝐳=𝐐(𝐲)𝐳𝐐𝐲\mathbf{z}=\mathbf{Q}(\mathbf{y})bold_z = bold_Q ( bold_y ) or equivalently zi=Qi(yi),isubscript𝑧𝑖subscript𝑄𝑖subscript𝑦𝑖for-all𝑖z_{i}=Q_{i}(y_{i}),\;\forall iitalic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , ∀ italic_i, where zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denote the i𝑖iitalic_i-th element of 𝐳𝐳\mathbf{z}bold_z and 𝐲𝐲\mathbf{y}bold_y, respectively; Qi()subscript𝑄𝑖Q_{i}(\cdot)italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) represents the associated quantization function. For the circular-symmetric Gaussian random vector 𝐲𝐲\mathbf{y}bold_y, the Bussgang decomposition implies

𝐳=𝐐(𝐲)=𝐆𝐲+𝜼,𝐳𝐐𝐲𝐆𝐲𝜼\mathbf{z}=\mathbf{Q}(\mathbf{y})=\mathbf{G}\mathbf{y}+{\bm{\eta}},bold_z = bold_Q ( bold_y ) = bold_Gy + bold_italic_η , (4)

where 𝐆𝐂zy𝐂y1𝐆subscript𝐂𝑧𝑦superscriptsubscript𝐂𝑦1\mathbf{G}\triangleq\mathbf{C}_{zy}\mathbf{C}_{y}^{-1}bold_G ≜ bold_C start_POSTSUBSCRIPT italic_z italic_y end_POSTSUBSCRIPT bold_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT denotes the Bussgang gain, and the distortion term 𝜼𝜼{\bm{\eta}}bold_italic_η is uncorrelated to 𝐲𝐲\mathbf{y}bold_y. In (4), 𝜼𝜼{\bm{\eta}}bold_italic_η represents the QD vector with its covariance matrix given by

𝐂η=𝔼[(𝐳𝐆𝐲)(𝐳𝐆𝐲)𝖧]=𝐂z𝐆𝐂yz.subscript𝐂𝜂𝔼delimited-[]𝐳𝐆𝐲superscript𝐳𝐆𝐲𝖧subscript𝐂𝑧subscript𝐆𝐂𝑦𝑧\mathbf{C}_{\eta}={\mathbb{E}}[(\mathbf{z}-\mathbf{G}\mathbf{y})(\mathbf{z}-% \mathbf{G}\mathbf{y})^{\scriptscriptstyle\mathsf{H}}]=\mathbf{C}_{z}-\mathbf{G% }\mathbf{C}_{yz}.bold_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT = blackboard_E [ ( bold_z - bold_Gy ) ( bold_z - bold_Gy ) start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ] = bold_C start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - bold_GC start_POSTSUBSCRIPT italic_y italic_z end_POSTSUBSCRIPT . (5)

Furthermore, under some mild assumptions, the Bussgang gain 𝐆𝐆\mathbf{G}bold_G is shown to be diagonal, as detailed in the following lemma.

Lemma 1 (​​[jacobsson2017quantized, bjornson2018hardware, demir2020bussgang])

Consider a jointly circularly symmetric Gaussian random vector 𝐲𝐲\mathbf{y}bold_y fed into scalar quantizers. With (4) modeling the quantization, we have 𝐆=diag(𝐠)𝐆diag𝐠\mathbf{G}={\rm diag}\left(\mathbf{g}\right)bold_G = roman_diag ( bold_g ) with gi=𝔼[Qi(yi)yi]𝔼[|yi|2]subscript𝑔𝑖𝔼delimited-[]subscript𝑄𝑖subscript𝑦𝑖superscriptsubscript𝑦𝑖𝔼delimited-[]superscriptsubscript𝑦𝑖2g_{i}=\frac{{\mathbb{E}}[Q_{i}(y_{i})y_{i}^{*}]}{{\mathbb{E}}[|y_{i}|^{2}]}italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG blackboard_E [ italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG being the i𝑖iitalic_i-th element of 𝐠𝐠\mathbf{g}bold_g.

Substituting (3) into (4), we obtain the quantized version of the signal received at the Rx, expressed as

𝐳=𝐆𝐇𝐅𝐬+𝐞,𝐳𝐆𝐇𝐅𝐬𝐞\mathbf{z}=\mathbf{G}\mathbf{H}\mathbf{F}\mathbf{s}+\mathbf{e},bold_z = bold_GHFs + bold_e , (6)

where 𝐞=𝐆𝐧+𝜼𝐞𝐆𝐧𝜼\mathbf{e}=\mathbf{G}\mathbf{n}+{\bm{\eta}}bold_e = bold_Gn + bold_italic_η represents the effective noise with covariance matrix 𝐂e=𝔼[𝐞𝐞𝖧]=𝐂η+σn2𝐆2subscript𝐂𝑒𝔼delimited-[]superscript𝐞𝐞𝖧subscript𝐂𝜂superscriptsubscript𝜎n2superscript𝐆2\mathbf{C}_{e}={\mathbb{E}}[\mathbf{e}\mathbf{e}^{\scriptscriptstyle\mathsf{H}% }]=\mathbf{C}_{\eta}+\sigma_{\rm n}^{2}\mathbf{G}^{2}bold_C start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = blackboard_E [ bold_ee start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ] = bold_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The post-combined signal at the Rx is expressed as

𝐬^=𝐔𝖧𝐳=𝐔𝖧𝐆𝐇𝐅𝐬+𝐔𝖧𝐞,^𝐬superscript𝐔𝖧𝐳superscript𝐔𝖧𝐆𝐇𝐅𝐬superscript𝐔𝖧𝐞\hat{\mathbf{s}}=\mathbf{U}^{\scriptscriptstyle\mathsf{H}}\mathbf{z}=\mathbf{U% }^{\scriptscriptstyle\mathsf{H}}\mathbf{G}\mathbf{H}\mathbf{F}\mathbf{s}+% \mathbf{U}^{\scriptscriptstyle\mathsf{H}}\mathbf{e},over^ start_ARG bold_s end_ARG = bold_U start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_z = bold_U start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_GHFs + bold_U start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_e , (7)

where 𝐔Nr×Nt𝐔superscriptsubscript𝑁rsubscript𝑁t\mathbf{U}\in{\mathbb{C}}^{N_{\rm r}\times N_{\rm t}}bold_U ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the combining matrix. Although 𝐬𝐬\mathbf{s}bold_s is Gaussian distributed, 𝐞𝐞\mathbf{e}bold_e does not follow a Gaussian distribution because of the non-linear quantization distortion. However, we can treat the effective noise vector 𝐞𝐞\mathbf{e}bold_e as a Gaussian variable and obtain a lower bound of the SE as [hassibi2003much]

R=logdet(𝐈+(𝐔𝖧𝐂e𝐔)1𝐔𝖧𝐆𝐇𝐅𝐅𝖧𝐇𝖧𝐆𝐔).𝑅𝐈superscriptsuperscript𝐔𝖧subscript𝐂𝑒𝐔1superscript𝐔𝖧superscript𝐆𝐇𝐅𝐅𝖧superscript𝐇𝖧𝐆𝐔R=\log\det\left(\mathbf{I}+(\mathbf{U}^{\scriptscriptstyle\mathsf{H}}\mathbf{C% }_{e}\mathbf{U})^{-1}\mathbf{U}^{\scriptscriptstyle\mathsf{H}}\mathbf{G}% \mathbf{H}\mathbf{F}\mathbf{F}^{\scriptscriptstyle\mathsf{H}}\mathbf{H}^{% \scriptscriptstyle\mathsf{H}}\mathbf{G}\mathbf{U}\right).italic_R = roman_log roman_det ( bold_I + ( bold_U start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_C start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT bold_U ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_U start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_GHFF start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_GU ) . (8)

It is observed that the Bussgang gain 𝐆𝐆\mathbf{G}bold_G and the QD covariance matrix 𝐂ηsubscript𝐂𝜂\mathbf{C}_{\eta}bold_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT are necessary for further analysis and optimization of the SE performance. For one-bit quantization, closed-form expressions for 𝐆𝐆\mathbf{G}bold_G and 𝐂ηsubscript𝐂𝜂\mathbf{C}_{\eta}bold_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT can be derived based on the arcsine law [li2017channel]. However, obtaining those for multi-bit quantization is significantly more challenging. A closed-form expression of 𝐆𝐆\mathbf{G}bold_G and a diagonal approximation of 𝐂ηsubscript𝐂𝜂\mathbf{C}_{\eta}bold_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT were developed in [mezghani2012capacity, bai2013optimization] under the assumption that the quantizer satisfies the following properties:

𝔼[ziyi]=0,𝔼delimited-[]subscript𝑧𝑖subscript𝑦𝑖0\displaystyle{\mathbb{E}}[z_{i}-y_{i}]=0,blackboard_E [ italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0 , (9)
𝔼[(ziyi)zi]=0,𝔼delimited-[]subscript𝑧𝑖subscript𝑦𝑖subscript𝑧𝑖0\displaystyle{\mathbb{E}}[(z_{i}-y_{i})z_{i}]=0,blackboard_E [ ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0 , (10)

where zi=Qi(yi)subscript𝑧𝑖subscript𝑄𝑖subscript𝑦𝑖z_{i}=Q_{i}(y_{i})italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). However, the validity of these assumptions remains unclear, and thus the applicability of these results to general signal distributions and quantizers is uncertain. In the next section, we derive the BAQNM and diagonal approximation from a new perspective, aiming to clarify this uncertainty.

III BAQNM and Approximation of the QD Covariance

In this section, we first identify the fundamental properties of optimal quantizers in Lemma 2 and Lemma 3 and then leverage them to obtain the BAQNM and the approximation of the QD covariance. Furthermore, we elaborate on the nuances between applying the BAQNM and the arcsine law to one-bit quantization.

III-A Properties of Optimal Quantizers

We first recall the definition of the optimal quantizer [max1960quantizing] below.

Definition 1 (​​[max1960quantizing])

Consider a real-valued random variable X𝑋Xitalic_X. Let fX(x)subscript𝑓𝑋𝑥f_{X}(x)italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) denote its probability density function (PDF), and let Q(x)=cI(x)𝑄𝑥subscript𝑐𝐼𝑥Q(x)=c_{I(x)}italic_Q ( italic_x ) = italic_c start_POSTSUBSCRIPT italic_I ( italic_x ) end_POSTSUBSCRIPT be its quantized approximation, where I(x)=i{0,,Nq1}𝐼𝑥𝑖0subscript𝑁q1I(x)=i\in\{0,\ldots,N_{\rm q}-1\}italic_I ( italic_x ) = italic_i ∈ { 0 , … , italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 } satisfies x(ti,ti+1]𝑥subscript𝑡𝑖subscript𝑡𝑖1x\in(t_{i},t_{i+1}]italic_x ∈ ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ]. The mean square error (MSE) for the quantization can be expressed as

D𝐷\displaystyle Ditalic_D =𝔼[(Q(x)x)2]=i=0Nq1titi+1(xci)2fX(x)dx.absent𝔼delimited-[]superscript𝑄𝑥𝑥2superscriptsubscript𝑖0subscript𝑁q1superscriptsubscriptsubscript𝑡𝑖subscript𝑡𝑖1superscript𝑥subscript𝑐𝑖2subscript𝑓𝑋𝑥differential-d𝑥\displaystyle={\mathbb{E}}\left[\left(Q(x)-x\right)^{2}\right]=\sum\limits_{i=% 0}^{N_{\rm q}-1}\int_{t_{i}}^{t_{i+1}}(x-c_{i})^{2}f_{X}(x){\rm d}x.= blackboard_E [ ( italic_Q ( italic_x ) - italic_x ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_q end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x - italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x . (11)

The optimal quantizer is the one that minimizes D𝐷Ditalic_D.

By setting the derivatives of D𝐷Ditalic_D with respect to tjsubscript𝑡𝑗t_{j}italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to zeros, we obtain

tj=cj+cj12,subscript𝑡𝑗subscript𝑐𝑗subscript𝑐𝑗12\displaystyle t_{j}=\frac{c_{j}+c_{j-1}}{2},italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , (12)
cj=tjtj+1xfX(x)dxtjtj+1fX(x)dx,subscript𝑐𝑗superscriptsubscriptsubscript𝑡𝑗subscript𝑡𝑗1𝑥subscript𝑓𝑋𝑥differential-d𝑥superscriptsubscriptsubscript𝑡𝑗subscript𝑡𝑗1subscript𝑓𝑋𝑥differential-d𝑥\displaystyle c_{j}=\frac{\int_{t_{j}}^{t_{j+1}}xf_{X}(x){\rm d}x}{\int_{t_{j}% }^{t_{j+1}}f_{X}(x){\rm d}x},italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = divide start_ARG ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x end_ARG start_ARG ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT ( italic_x ) roman_d italic_x end_ARG , (13)

which are referred to as the nearest neighbor condition and the centroid condition, respectively, [gersho2012vector, Chapter 6]. They are necessary for the optimal quantizer, also known as the Llyod-Max quantizer [max1960quantizing] or the optimal non-uniform quantizer. The latter term follows the fact that the optimal quantizer is generally non-uniform. The uniform quantizer that minimizes D𝐷Ditalic_D in (11) is referred to as the optimal uniform quantizer.

Remark 1

The centroid condition requires that the output of the quantization for each interval is its mean value. This condition can also be written as [gersho2012vector]

𝔼[X|Q(X)]=Q(X),𝔼delimited-[]conditional𝑋𝑄𝑋𝑄𝑋{\mathbb{E}}[X|Q(X)]=Q(X),blackboard_E [ italic_X | italic_Q ( italic_X ) ] = italic_Q ( italic_X ) , (14)

which was used in [fletcher2007robust] as a basic assumption for deriving the model (2). Therefore, the BAQNM is limited to the optimal quantizer.

The Llyod-Max algorithm [max1960quantizing] iteratively updates 𝒯𝒯\mathcal{T}caligraphic_T and 𝒞𝒞\mathcal{C}caligraphic_C based on (12) and (13) to find the optimal quantizer for a specific input signal. However, this iterative method requires a long run time, especially for high-resolution quantization. In what follows, we propose an optimal quantization without running the Lloyd-Max algorithm. To this end, we begin with identifying the fundamental properties for the optimal quantization of Gaussian signals in the following lemma.

Lemma 2

Let X𝑋Xitalic_X be a real-valued, zero-mean, and unit-variance random variable, and let Y=σyX𝑌subscript𝜎y𝑋Y=\sigma_{\rm y}Xitalic_Y = italic_σ start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT italic_X. Then, we have

Qy(Y)=σyQx(X)=σyQx(Yσy),subscript𝑄y𝑌subscript𝜎ysubscript𝑄x𝑋subscript𝜎ysubscript𝑄x𝑌subscript𝜎y\displaystyle Q_{\rm y}(Y)=\sigma_{\rm y}Q_{\rm x}(X)=\sigma_{\rm y}Q_{\rm x}% \left(\frac{Y}{\sigma_{\rm y}}\right),italic_Q start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT ( italic_Y ) = italic_σ start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT ( italic_X ) = italic_σ start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT ( divide start_ARG italic_Y end_ARG start_ARG italic_σ start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT end_ARG ) , (15)
γ=𝔼[(Qx(X)X)2]=𝔼[(Qy(Y)Y)2]σy2,𝛾𝔼delimited-[]superscriptsubscript𝑄x𝑋𝑋2𝔼delimited-[]superscriptsubscript𝑄y𝑌𝑌2superscriptsubscript𝜎y2\displaystyle\gamma={\mathbb{E}}\left[(Q_{\rm x}(X)-X)^{2}\right]=\frac{{% \mathbb{E}}\left[(Q_{\rm y}(Y)-Y)^{2}\right]}{\sigma_{\rm y}^{2}},italic_γ = blackboard_E [ ( italic_Q start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT ( italic_X ) - italic_X ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = divide start_ARG blackboard_E [ ( italic_Q start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT ( italic_Y ) - italic_Y ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG italic_σ start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (16)

where Qy(Y)subscript𝑄y𝑌Q_{\rm y}(Y)italic_Q start_POSTSUBSCRIPT roman_y end_POSTSUBSCRIPT ( italic_Y ) and Qx(X)subscript𝑄x𝑋Q_{\rm x}(X)italic_Q start_POSTSUBSCRIPT roman_x end_POSTSUBSCRIPT ( italic_X ) denote the optimal quantized output of Y𝑌Yitalic_Y and X𝑋Xitalic_X, respectively.

Proof:

See Appendix LABEL:prof:scaling_and_distortion_invariance. ∎

We refer to γ𝛾\gammaitalic_γ as the distortion factor and the properties in (15) and (16) as the scaling property and distortion invariance, respectively. Utilizing the scaling property and the optimal quantizer for the standard Gaussian signal [max1960quantizing], we can derive the optimal quantization for any Gaussian signal with a known variance. For example, we can obtain the optimal element-wise quantization of the received signal vector 𝐲𝐲\mathbf{y}bold_y in (3) with covariance matrix

𝐂y𝔼[𝐲𝐲𝖧]=𝐇𝐅𝐅𝖧𝐇𝖧+σn2𝐈.subscript𝐂𝑦𝔼delimited-[]superscript𝐲𝐲𝖧superscript𝐇𝐅𝐅𝖧superscript𝐇𝖧superscriptsubscript𝜎n2𝐈\mathbf{C}_{y}\triangleq{\mathbb{E}}[\mathbf{y}\mathbf{y}^{\scriptscriptstyle% \mathsf{H}}]=\mathbf{H}\mathbf{F}\mathbf{F}^{\scriptscriptstyle\mathsf{H}}% \mathbf{H}^{\scriptscriptstyle\mathsf{H}}+\sigma_{\rm n}^{2}\mathbf{I}.bold_C start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ≜ blackboard_E [ bold_yy start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT ] = bold_HFF start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT sansserif_H end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I . (17)

Regarding the distortion factor, we note the following property.

Lemma 3

For a zero-mean complex random variable X={X}+j{X}𝑋𝑋𝑗𝑋X=\Re\{X\}+j\Im\{X\}italic_X = roman_ℜ { italic_X } + italic_j roman_ℑ { italic_X } with variance σX2superscriptsubscript𝜎𝑋2\sigma_{X}^{2}italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, assume that {X}𝑋\Re\{X\}roman_ℜ { italic_X } and {X}𝑋\Im\{X\}roman_ℑ { italic_X } are independent and identically distributed (i.i.d.) with the same variance σX22superscriptsubscript𝜎𝑋22\frac{\sigma_{X}^{2}}{2}divide start_ARG italic_σ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG and are independently quantized by two identical Llyod-Max quantizers Q()𝑄Q(\cdot)italic_Q ( ⋅ ). With χQ(X)X𝜒𝑄𝑋𝑋\chi\triangleq Q(X)-Xitalic_χ ≜ italic_Q ( italic_X ) - italic_X, we obtain

𝔼[Q(X)]=𝔼[X],𝔼delimited-[]𝑄𝑋𝔼delimited-[]𝑋\displaystyle{\mathbb{E}}[Q(X)]={\mathbb{E}}[X],blackboard_E [ italic_Q ( italic_X ) ] = blackboard_E [ italic_X ] , (18)
𝔼[Q(X)χ]=0,𝔼delimited-[]𝑄𝑋superscript𝜒0\displaystyle{\mathbb{E}}[Q(X)\chi^{*}]=0,blackboard_E [ italic_Q ( italic_X ) italic_χ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ] = 0 , (19)
γ=𝔼[|χ|2]𝔼[|X|2]=𝔼[{χ}2]𝔼[{X}2]=𝔼[{χ}2]𝔼[{X}2].\displaystyle\gamma=\frac{{\mathbb{E}}[\left|\chi\right|^{2}]}{{\mathbb{E}}[% \left|X\right|^{2}]}=\frac{{\mathbb{E}}[\Re\{\chi\}^{2}]}{{\mathbb{E}}[\Re\{X% \}^{2}]}=\frac{{\mathbb{E}}[\Im\{\chi\}^{2}]}{{\mathbb{E}}[\Im\{X\}^{2}]}.italic_γ = divide start_ARG blackboard_E [ | italic_χ | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ | italic_X | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG = divide start_ARG blackboard_E [ roman_ℜ { italic_χ } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ roman_ℜ { italic_X } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG = divide start_ARG blackboard_E [ roman_ℑ { italic_χ } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ roman_ℑ { italic_X } start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG . (20)