Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis

Tong Zhou    Shuqiang Wang Corresponding author: Shuqiang Wang, email:sq.wang@siat.ac.cnTong Zhou and Shuqiang Wang are with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China, and also with the University of Chinese Academy of Sciences, Beijing 100049, China.
Abstract

Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, meeting the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges such as high acquisition costs and limited usage scenarios. In this paper, spatio-temporal adaptive diffusion models (STADMs) are proposed to pioneer the use of diffusion models for achieving spatial SR reconstruction from low-resolution (LR, 64 channels or fewer) EEG to high-resolution (HR, 256 channels) EEG. Specifically, a spatio-temporal condition module is designed to extract the spatio-temporal features of LR EEG, which then serve as conditional inputs to guide the reverse denoising process of diffusion models. Additionally, a multi-scale Transformer denoising module is constructed to leverage multi-scale convolution blocks and cross-attention-based diffusion Transformer blocks for conditional guidance to generate subject-adaptive SR EEG. Experimental results demonstrate that the proposed method effectively enhances the spatial resolution of LR EEG and quantitatively outperforms existing methods. Furthermore, STADMs demonstrate their value by applying synthetic SR EEG to classification and source localization tasks of epilepsy patients, indicating their potential to significantly improve the spatial resolution of LR EEG.

{IEEEkeywords}

Spatio-temporal adaptive, diffusion models, super-resolution (SR), Transformer, source localization.

1 Introduction

\IEEEPARstart

Electroencephalogram (EEG) is a non-invasive, cost-effective neuroimaging technique with high temporal resolution, playing a crucial role in clinical diagnosis and cognitive neuroscience research. Unlike other neuroimaging techniques such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET), EEG offers superior temporal resolution and can continuously monitor brain activity in natural environments. This capability provides substantial support for studying dynamic cognitive processes [1]. In recent years, the emergence of portable and wireless EEG devices has further improved the flexibility and usability of EEG, enabling its widespread application in emerging neural engineering scenarios such as sleep monitoring and brain-computer interface (BCI) control [2, 3, 4]. However, these devices typically have fewer than 64 channels, resulting in lower spatial resolution of the acquired EEG data, which severely limits its potential in clinical diagnosis and brain function imaging research [5]. Currently, high-density (HD) EEG devices improve the spatial resolution of EEG by employing more electrode quantities, thus assisting doctors in performing evaluations or diagnoses in different clinical scenarios. For example, recent studies have shown that non-invasive HD EEG devices can objectively, non-invasively, and accurately determine the epileptic focus area [6, 7, 8]. However, HD EEG devices greatly increase hardware and acquisition costs, their bulkiness and complex wiring prevent subjects from moving their heads freely, restricting their use to controlled environments such as laboratories or hospitals. In addition, due to poor wearing comfort, subjects cannot be monitored for extended periods [9]. These factors collectively lead to difficulties in acquisition and limited application scenarios for HD EEG devices.

The aforementioned observation highlights a significant dilemma in the current clinical diagnosis and treatment using EEG: due to high costs, discomfort from wearing, and other factors, high-resolution (HR) EEG is scarce. Consequently, the spatial resolution of low-resolution (LR) EEG is insufficient for clinical needs, particularly in the preoperative evaluation of patients with refractory epilepsy [10]. In such cases, the EEG spatial super-resolution (SR) method offers an effective remedy. This technique enables the reconstruction of HR EEG from LR EEG collected from subjects, thereby enhancing the spatial resolution of the existing EEG acquisition systems. For instance, EEG SR can reconstruct 256-channel EEG from 32-channel counterparts. Notably, EEG SR methods hold substantial potential for clinical applications, providing a cost-effective and efficient method to acquire HR EEG data.

Currently, in the field of enhancing spatial resolution of EEG, the methodologies predominantly fall into two main categories: channel interpolation based on mathematical models, and EEG SR reconstruction utilizing deep learning methods. Channel interpolation methods principally rely on predefined scalp surface models to reconstruct missing channel data using signals from other channels, thereby improving spatial resolution. These methods involve calculating the second-order derivatives of the interpolation function in space to explore the distribution across the electrode channels [11]. Subsequent studies have attempted more precise measurements of the spatial distances between EEG electrodes and more accurate predictions for missing channel data, for example, using inverse distance weighting based on Euclidean distances [12] and multi-quadratic interpolation methods [13]. However, these methods only utilize the spatial correlations between adjacent electrodes and typically struggle to effectively capture and reconstruct the high-frequency and transient components of brain electrical activity, usually yielding limited results for non-stationary signals.

In recent years, deep learning methods have become mainstream for analyzing and enhancing medical images [14, 15, 16]. Specifically, deep learning-based EEG SR technology involves training deep learning models to learn the implicit mappings between LR EEG and HR EEG, and using these trained models to reconstruct SR EEG. Notably, research has introduced EEG SR techniques based on convolution neural networks (CNN) [17], followed by studies proposing deep EEG SR frameworks using graph convolution networks (GCNs) [18] to correlate the structural and functional connections of participants’ brains [19]. Additionally, generative models offer a new learning paradigm for medical image reconstruction [20, 21]. Existing researches have utilized approaches such as Autoencoders (AE) [22] and Generative Adversarial Networks (GANs) [23] to enhance the spatial resolution of EEG. While existing deep learning-based EEG SR methods have achieved significant improvements in enhancing EEG spatial resolution, they are still limited in effectively capturing both the temporal dynamics and spatial features of EEG, resulting in limited generalizability and practicality of the models. Moreover, the enhanced spatial resolution achieved by these methods is constrained, with a maximum capability of reconstructing EEG data up to 64-channel level, which may not meet the requirements of specific clinical scenarios.

In clinical diagnostic and therapeutic settings, achieving high-fidelity reconstruction of HR (256 channels) ground truth EEG presents a substantial challenge. The key is accurately depicting the dependencies between LR EEG and HR EEG and addressing the significant channel-level disparity between them. Recently, a novel generative model—diffusion models [24] have demonstrated formidable generative capabilities in the field of computer vision, sparking a surge in the Artificial Intelligence Generated Content (AIGC) [25, 26]. Moreover, diffusion models have been shown to surpass GANs in image generation quality, while also offering ideal characteristics such as distribution coverage, fixed training targets, and scalability [27]. Besides, Latent Diffusion Models (LDMs) [28] extend diffusion models into latent spaces, utilizing cross-attention mechanisms to achieve refined control in conditional generation tasks. Moreover, SynDiff [29] and DiffMDD [30] have both demonstrated that diffusion models can be applied to the generation of medical images, thereby improving the accuracy of disease diagnosis. Despite this, the application of diffusion models in enhancing the spatial resolution of EEG remains relatively unexplored. Therefore, this paper aims to investigate the potential of diffusion models in achieving EEG SR, particularly in enhancing spatial resolution. Through this novel generative model, we hope to provide a novel and effective method for processing and analyzing EEG data, supporting more accurate EEG studies and applications across various neuroscience contexts.

Refer to caption
Figure 1: The main framework of training the proposed STADMs to synthetic SR EEG from LR EEG.

In this paper, a novel EEG SR method, called spatio-temporal adaptive diffusion models (STADMs), is proposed to achieve spatial SR reconstruction from LR EEG to HR EEG. This method enhances the spatial resolution of EEG by learning the latent mapping relationship between LR EEG and HR EEG through diffusion learning strategy. The main contributions of this paper are summarized as follows:

  1. 1.

    A spatio-temporal adaptive framework based on diffusion models is proposed for EEG SR reconstruction. To the best of our knowledge, this is the first instance where diffusion models are employed to achieve spatial SR reconstruction from LR EEG to HR EEG. This provides an efficient and cost-effective method for obtaining HR EEG, enhancing the portability and usability of HR EEG, and further advancing fields such as epileptic focus localization and BCI technology.

  2. 2.

    The spatio-temporal condition (STC) module is designed to extract the spatio-temporal features of LR EEG. It further uses the generated spatio-temporal conditions to constrain the reverse process of the diffusion model, guiding the model to generate SR EEG that conforms to the characteristics of the subjects.

  3. 3.

    The multi-scale Transformer denoising (MTD) module is constructed to map LR EEG to HR EEG. It extracts temporal features at various scales and uses cross-attention-based diffusion Transformer blocks to adaptively modulate the denoising process based on spatio-temporal conditions, addressing the channel-level disparity between LR EEG and HR EEG.

2 Method

2.1 Preliminaries and Problem Statement

Before introducing our architecture, we briefly review some basic concepts needed to understand diffusion models [24]. Diffusion models assume a forward diffusion process which gradually applies noise to input data x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: q(xt|x0)=𝒩(xtαt¯x0,(1αt¯)I)𝑞conditionalsubscript𝑥𝑡subscript𝑥0𝒩conditionalsubscript𝑥𝑡¯subscript𝛼𝑡subscript𝑥01¯subscript𝛼𝑡𝐼q(x_{t}|x_{0})=\mathcal{N}(x_{t}\mid\sqrt{\bar{\alpha_{t}}}x_{0},(1-\bar{% \alpha_{t}})I)italic_q ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ square-root start_ARG over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) italic_I ), where αt¯¯subscript𝛼𝑡\bar{\alpha_{t}}over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG are the variance schedule of the quantity of Gaussian noise.

In the reverse denoising process, the denoising model needs to learn the distributions pθ(xt1|xt)subscript𝑝𝜃conditionalsubscript𝑥𝑡1subscript𝑥𝑡p_{\theta}(x_{t-1}|x_{t})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to denoise the each state xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT during training. Furthermore, the training of diffusion models is accomplished using a variational method-based training strategy [31]. Formally, the inference procedure can be described as a reverse Markov process that starts with Gaussian noise xT𝒩(0,I)subscript𝑥𝑇𝒩0𝐼x_{T}\in\mathcal{N}(0,I)italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_N ( 0 , italic_I ), and progresses towards the target data x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as:

pθ(xt1|xt)=𝒩(xt1μθ(xt,t),σt2I),subscript𝑝𝜃conditionalsubscript𝑥𝑡1subscript𝑥𝑡𝒩conditionalsubscript𝑥𝑡1subscript𝜇𝜃subscript𝑥𝑡𝑡superscriptsubscript𝜎𝑡2𝐼\displaystyle p_{\theta}(x_{t-1}|x_{t})=\mathcal{N}(x_{t-1}\mid\mu_{\theta}(x_% {t},t),\sigma_{t}^{2}I),italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_μ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) , (1)

Once pθsubscript𝑝𝜃p_{\theta}italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is trained, new data can be sampled by initializing xT𝒩(0,I)similar-tosubscript𝑥𝑇𝒩0𝐼x_{T}\sim\mathcal{N}(0,I)italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_I ) and sampling xt1pθ(xt1|xt)similar-tosubscript𝑥𝑡1subscript𝑝𝜃conditionalsubscript𝑥𝑡1subscript𝑥𝑡x_{t-1}\sim p_{\theta}(x_{t-1}|x_{t})italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) via the reparameterization trick, where T𝑇Titalic_T denotes the number of forward process steps.

In this work, given an LR-HR EEG pair denoted as (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) with xCl×N𝑥superscriptsubscript𝐶𝑙𝑁x\in\mathbb{R}^{C_{l}\times N}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT × italic_N end_POSTSUPERSCRIPT being a degraded version of yCh×N𝑦superscriptsubscript𝐶𝑁y\in\mathbb{R}^{C_{h}\times N}italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT × italic_N end_POSTSUPERSCRIPT, where Clsubscript𝐶𝑙C_{l}italic_C start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and Chsubscript𝐶C_{h}italic_C start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT denote the number of the LR and HR EEG channels, respectively. N𝑁Nitalic_N is the length of each EEG epoch. Therefore, in EEG SR tasks, the end-to-end mapping from the LR EEG to the corresponding SR EEG can be expressed as

yysr=F(x,θ)𝑦subscript𝑦𝑠𝑟𝐹𝑥𝜃y\approx y_{sr}=\mathit{F}(x,\theta)italic_y ≈ italic_y start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT = italic_F ( italic_x , italic_θ ) (2)

where ysrsubscript𝑦𝑠𝑟y_{sr}italic_y start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT represent synthetic SR EEG. F𝐹\mathit{F}italic_F denotes the mapping function and θ𝜃\thetaitalic_θ represent the optimal parameters that are looked for to reconstruct the HR EEG.

2.2 Basic ideas

Learning the mapping relationship from LR EEG to HR EEG is challenging. To ensure that the generated results conform to the subject’s characteristics, the proposed STADMs use the spatio-temporal features of LR EEG as conditions and employs a cross-attention-based conditional injection mechanism to guide the denoising model for progressive denoising restoration. This mechanism embeds the spatio-temporal information of LR EEG into the high-dimensional latent space, indirectly guiding the model to learn the complex mapping relationship between LR EEG and HR EEG. This ensures that the synthetic SR EEG is highly consistent with HR EEG in both temporal and spatial domains, addressing the issue of the large channel-level disparity between LR EEG and HR EEG.

As Fig. 1 shows, the entire model framework consists of EEG pre-trained autoencoder, STC module and MTD module. During the training phase of STADMs, the HR EEG ground truth is first input into the EEG pre-trained encoder to obtain the corresponding latent vectors z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and the state ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of each time step t𝑡titalic_t is gradually obtained through the forward process of the diffusion models. In the reverse process, the STC module first encodes the time series and channel spatial distribution matrix from LR EEG into the spatio-temporal representation conforming to the subject, and concatenates it with time embedding as the conditioning vector. MTD module takes the conditioning vectors and noise sampled from the standard normal distribution as input, implements the conditional guidance mechanism through the cross-attention layer, and predicts the noise at each step of the reverse process. The MTD module is trained by optimizing the mean squared error loss between the predicted noise and the actual noise, and finally generates SR EEG through progressive denoising. During the sampling phase of STADMs, the deterministic MTD module precisely generates SR EEG matching the input LR EEG by progressively sampling using the spatio-temporal conditional information of LR EEG and noise sampled from the Gaussian distribution.

2.3 Architectures

2.3.1 Pre-trained EEG Autoencoders

LDMs [28] have demonstrated that mapping input data into a latent space through a pre-trained encoder can significantly enhance the quality of high-resolution natural image generation and improve performance on other downstream tasks. Following the work of [32], we employ a Masked Autoencoder (MAE) for asymmetric latent space representation of HR EEG. Given that the input EEG consists of multi-channel time series signals, we utilize two-dimensional EEG data as input, where each row represents the time series data of a single channel, and each column represents the values of all channels at a specific time point. Initially, this module divides the EEG data of each channel into fixed-length windows and randomly masks them according to a specified percentage. The MAE then predicts the missing values based on the surrounding contextual cues. By reconstructing the masked signals, the pre-trained EEG encoder can learn the latent spatio-temporal representations of brain activity across different subjects. Detailed information on the architecture of the pre-trained EEG encoder can be found in [32].

2.3.2 Spatio-temporal Condition Module

In EEG signals, each channel represents the signals collected by electrodes placed at specific locations on the brain, recording electrical activity from different brain regions [33, 34]. The spatio-temporal condition module τθsubscript𝜏𝜃\tau_{\theta}italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT based on Transformer is constructed to capture the temporal correlations between time points and the spatial correlations between EEG channels. This module takes the time series from LR EEG and the spatial positions of the channels as inputs and outputs corresponding encoding vectors c=τθ(x)𝑐subscript𝜏𝜃𝑥c=\tau_{\theta}(x)italic_c = italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) as conditional information for the denoising model, where xC×T𝑥superscript𝐶𝑇x\in\mathbb{R}^{C\times T}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_T end_POSTSUPERSCRIPT represents the LR EEG. This module can extract the spatio-temporal features of LR EEG, providing richer and more precise contextual information for the denoising model.

As the yellow part of Fig. 1 shows, the STC module is mainly composed of a spatial position embedding layer, a 1D convolution block, and a Transformer block. More specifically, the channel spatial embedding layer encodes the channel spatial coordinates of LR EEG, outputting a set of spatial features. Then, we partition the input time series into different patches and feed them into the 1D convolutional block to preliminarily extract temporal features [35]. Finally, the Transformer block concatenates the temporal features with position encoding and channel spatial embeddings to output the final spatio-temporal condition c𝑐citalic_c. The 1D convolution block consists of a 1D convolution layer, a ReLU activation function, and batch normalization layer. The Transformer block consists of a multi-head self-attention (MSA) layer followed by feed-forward networks, both wrapped in residual connections and layer normalization.

2.3.3 Multi-scale Transformer Denoising Module

EEG, limited by the number and placement of electrodes, struggles with spatial resolution. High-density EEG devices have shown promise in tasks like epileptic source localization and BCI. The multi-scale Transformer denoising module is designed to enhance the spatial resolution of existing EEG devices. Traditional diffusion models, which typically use UNet, are effective in image generation but limited in handling the long-sequence dependencies of time-series data like EEG. Inspired by [36, 37], the MTD module employs a Transformer backbone to more effectively extract spatio-temporal features. Additionally, multi-scale 1D convolution blocks are introduced to capture multi-scale temporal features in the reverse denoising process.

As shown in the blue part of Fig. 1, the MTD module mainly consists of position encoding, multi-scale 1D convolution blocks and diffusion Transformer blocks. Each diffusion Transformer block includes a normalization layer, a MSA layer, a cross-attention layer and a feed-forward network. The MTD module ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT takes ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, timestep t𝑡titalic_t and spatio-temporal condition c𝑐citalic_c as inputs, outputting the predicted noise ϵθ(zt,t,c)subscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡𝑐\epsilon_{\theta}(z_{t},t,c)italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_c ) at timestep t𝑡titalic_t. The multi-scale 1D convolution blocks extract temporal features at different scales from the latent vectors ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT using 1D convolution layers with various kernel sizes and concatenate them along the feature dimension. The output of this convolution path can be represented as:

h~i=BN(Conv(zt,ki)),subscript~𝑖𝐵𝑁𝐶𝑜𝑛𝑣subscript𝑧𝑡subscript𝑘𝑖\displaystyle\widetilde{h}_{i}\ =\ BN(Conv(z_{t},k_{i})),over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_B italic_N ( italic_C italic_o italic_n italic_v ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) , (3)
H~t=Concat(h~1,,h~n)subscript~𝐻𝑡𝐶𝑜𝑛𝑐𝑎𝑡subscript~1subscript~𝑛\displaystyle\widetilde{H}_{t}\ =Concat(\widetilde{h}_{1},\ldots,\widetilde{h}% _{n})over~ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_C italic_o italic_n italic_c italic_a italic_t ( over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

where Conv denotes performing a 1D convolution on each channel sequence and BN represents batch normalization. h~isubscript~𝑖\widetilde{h}_{i}over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the output of different 1D convoluntion layer and n𝑛nitalic_n denotes the number of 1D convoluntion layers. Subsequently, each row of H~tsubscript~𝐻𝑡\widetilde{H}_{t}over~ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is treated as a token and input into the diffusion Transformer block. Through the MSA layer, information from different scales is integrated, and global spatio-temporal dependencies are captured. For the input H~tl×dsubscript~𝐻𝑡superscript𝑙𝑑\widetilde{H}_{t}\in\mathbb{R}^{l\times d}over~ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_l × italic_d end_POSTSUPERSCRIPT, where l𝑙litalic_l is the sequence length and d𝑑ditalic_d is the feature dimension. The output otsubscript𝑜𝑡o_{t}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of MSA layer can be expressed as

ot=H~t+MSA(LN(H~t))subscript𝑜𝑡subscript~𝐻𝑡𝑀𝑆𝐴𝐿𝑁subscript~𝐻𝑡o_{t}=\widetilde{H}_{t}+MSA(LN(\widetilde{H}_{t}))italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_M italic_S italic_A ( italic_L italic_N ( over~ start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) (4)

where LN denotes the layer normalization operation. Furthermore, condition c𝑐citalic_c is incorporated into Transformer through a cross-attention layer. The output of cross-attention layer can be expressed as

Qt=WtQot,Kt=WtKc,Vt=WtVcformulae-sequencesubscript𝑄𝑡superscriptsubscript𝑊𝑡𝑄subscript𝑜𝑡formulae-sequencesubscript𝐾𝑡superscriptsubscript𝑊𝑡𝐾𝑐subscript𝑉𝑡superscriptsubscript𝑊𝑡𝑉𝑐\displaystyle Q_{t}=W_{t}^{Q}\cdot o_{t},K_{t}=W_{t}^{K}\cdot c,V_{t}=W_{t}^{V% }\cdot citalic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT ⋅ italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ⋅ italic_c , italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ⋅ italic_c (5)
Attn=softmax(LN(Qt)LN(KtT)dk)Vt𝐴𝑡𝑡𝑛𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝐿𝑁subscript𝑄𝑡𝐿𝑁superscriptsubscript𝐾𝑡𝑇subscript𝑑𝑘subscript𝑉𝑡\displaystyle Attn=softmax(\frac{{LN(Q}_{t})LN({K_{t}}^{T})}{\sqrt{d_{k}}})V_{t}italic_A italic_t italic_t italic_n = italic_s italic_o italic_f italic_t italic_m italic_a italic_x ( divide start_ARG italic_L italic_N ( italic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_L italic_N ( italic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG ) italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

where WtQd×do,WtKd×dcandWtVd×dcformulae-sequencesuperscriptsubscript𝑊𝑡𝑄superscript𝑑subscript𝑑𝑜superscriptsubscript𝑊𝑡𝐾superscript𝑑subscript𝑑𝑐𝑎𝑛𝑑superscriptsubscript𝑊𝑡𝑉superscript𝑑subscript𝑑𝑐W_{t}^{Q}\in\mathbb{R}^{d\times d_{o}},W_{t}^{K}\in\mathbb{R}^{d\times d_{c}}% andW_{t}^{V}\in\mathbb{R}^{d\times d_{c}}italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_a italic_n italic_d italic_W start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are projection matrices with learnable parameters. dksubscript𝑑𝑘\sqrt{d_{k}}square-root start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG is the scaling factor. Finally, the output is processed by a feed-forward network to obtain the final fused feature representation, and a standard linear decoder is used to output a predicted noise. This enables the reparameterization trick to obtain the denoised latent zt1subscript𝑧𝑡1z_{t-1}italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT for the previous time step. This mechanism effectively integrates the temporal and spatial features of EEG, providing rich context for subsequent denoising and reconstruction tasks, thereby skillfully achieving the mapping from LR EEG to HR EEG.

{mdframed}[backgroundcolor=bottomcolor,rightline=false,leftline=false,topline=false,bottomline=false,innerleftmargin=5pt,innerrightmargin=5pt,userdefinedwidth=0.92nerbottommargin=5pt,innertopmargin=5pt]
Input:
x𝑥xitalic_x ;
  // Preprocessed LR EEG data
y𝑦yitalic_y ;
  // Preprocessed HR EEG data
T𝑇Titalic_T ;
  // Number of diffusion steps
Epoch𝐸𝑝𝑜𝑐Epochitalic_E italic_p italic_o italic_c italic_h ;
  // maximal iterative number
Initialize parameters θ𝜃\thetaitalic_θ of the STADMs
for s1𝑠1s\leftarrow 1italic_s ← 1 to Epoch𝐸𝑝𝑜𝑐Epochitalic_E italic_p italic_o italic_c italic_h do
       Perform: Obtain latent vectors via Encoder of MAE ;
      
      z0=Encoder(y)subscript𝑧0𝐸𝑛𝑐𝑜𝑑𝑒𝑟𝑦z_{0}=Encoder\left(y\right)italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_E italic_n italic_c italic_o italic_d italic_e italic_r ( italic_y ) ;
      
      Perform: Extract spatio-temporal features of LR EEG x𝑥xitalic_x via STC module ;
      
      c=STC(x)𝑐𝑆𝑇𝐶𝑥c=STC\left(x\right)italic_c = italic_S italic_T italic_C ( italic_x ) ;
      
      Perform: Forward Diffusion Process ;
      
      for t1𝑡1t\leftarrow 1italic_t ← 1 to T𝑇Titalic_T do
             Sample ϵt𝒩(0,I)similar-tosubscriptitalic-ϵ𝑡𝒩0𝐼\epsilon_{t}\sim\mathcal{N}(0,I)italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_I ) ;
             ztαt¯z0+1αt¯ϵtsubscript𝑧𝑡¯subscript𝛼𝑡subscript𝑧01¯subscript𝛼𝑡subscriptitalic-ϵ𝑡z_{t}\leftarrow\sqrt{\bar{\alpha_{t}}}z_{0}+\sqrt{1-\bar{\alpha_{t}}}\epsilon_% {t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← square-root start_ARG over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG 1 - over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ;
            
       end for
      
      Perform: Reverse Denoising Process via MTD module ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ;
      
      for tT𝑡𝑇t\leftarrow Titalic_t ← italic_T to 1111 do
             ϵt^ϵθ(zt,t,c)^subscriptitalic-ϵ𝑡subscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡𝑐\hat{\epsilon_{t}}\leftarrow\epsilon_{\theta}(z_{t},t,c)over^ start_ARG italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ← italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_c ) ;
             z^t11αt¯(ztβt1α¯tϵt^)subscript^𝑧𝑡11¯subscript𝛼𝑡subscript𝑧𝑡subscript𝛽𝑡1subscript¯𝛼𝑡^subscriptitalic-ϵ𝑡\hat{z}_{t-1}\leftarrow\frac{1}{\sqrt{\bar{\alpha_{t}}}}\left(z_{t}-\frac{% \beta_{t}}{\sqrt{1-\bar{\alpha}_{t}}}\hat{\epsilon_{t}}\right)over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ← divide start_ARG 1 end_ARG start_ARG square-root start_ARG over¯ start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG end_ARG ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG over^ start_ARG italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) ;
            
       end for
      
      Perform: Obtain SR EEG via Decoder of MAE ;
      
      y^=Decoder(z^0)^𝑦𝐷𝑒𝑐𝑜𝑑𝑒𝑟subscript^𝑧0\hat{y}=Decoder\left(\hat{z}_{0}\right)over^ start_ARG italic_y end_ARG = italic_D italic_e italic_c italic_o italic_d italic_e italic_r ( over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ;
      
      Loss Calculation:
      DM(θ)=ϵtϵθ(zt,t,c)22subscript𝐷𝑀𝜃subscriptsuperscriptnormsubscriptitalic-ϵ𝑡subscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡𝑐22\mathcal{L}_{DM}(\theta)={\left\|\epsilon_{t}-\epsilon_{\theta}(z_{t},t,c)% \right\|}^{2}_{2}caligraphic_L start_POSTSUBSCRIPT italic_D italic_M end_POSTSUBSCRIPT ( italic_θ ) = ∥ italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_c ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ;
      
      Update: Optimize θ𝜃\thetaitalic_θ via backpropagation ;
      
end for
Result: The generated SR EEG y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG.
Algorithm 1 Training Phase of STADMs

2.4 Loss Function

In the forward process of the diffusion model, we gradually add Gaussian noise to the clean input z0subscript𝑧0z_{0}italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to generate a series of noisy data ztt=1Tsubscriptsuperscriptsubscript𝑧𝑡𝑇𝑡1{z_{t}}^{T}_{t=1}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT. In the reverse process, the traditional method is to train a denoising model pθ(zt1|zt)subscript𝑝𝜃conditionalsubscript𝑧𝑡1subscript𝑧𝑡p_{\theta}(z_{t-1}|z_{t})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to predict the state zt1subscript𝑧𝑡1z_{t-1}italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT of the previous time step. Diffusion models draw on the principles of variation method [38], optimizing model parameters by maximizing a variational lower bound based on the evidence lower bound (ELBO):

ELBO(θ)subscript𝐸𝐿𝐵𝑂𝜃\displaystyle\mathcal{L}_{ELBO}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_E italic_L italic_B italic_O end_POSTSUBSCRIPT ( italic_θ ) =𝔼q(zt1|z0,zt)[logpθ(zt|zt1)p(zt1)q(zt1|z0,zt)]absentsubscript𝔼𝑞conditionalsubscript𝑧𝑡1subscript𝑧0subscript𝑧𝑡delimited-[]subscript𝑝𝜃conditionalsubscript𝑧𝑡subscript𝑧𝑡1𝑝subscript𝑧𝑡1𝑞conditionalsubscript𝑧𝑡1subscript𝑧0subscript𝑧𝑡\displaystyle=\mathbb{E}_{q(z_{t-1}|z_{0},z_{t})}\left[\log\frac{p_{\theta}(z_% {t}|z_{t-1})p(z_{t-1})}{q(z_{t-1}|z_{0},z_{t})}\right]= blackboard_E start_POSTSUBSCRIPT italic_q ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ roman_log divide start_ARG italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) italic_p ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG ] (6)
+KL(q(zt1|z0,zt)||p(zt1))\displaystyle+\mathrm{KL}(q(z_{t-1}|z_{0},z_{t})||p(z_{t-1}))+ roman_KL ( italic_q ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | italic_p ( italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) )

where KL denotes the Kullback-Leibler divergence. However, directly predicting zt1subscript𝑧𝑡1z_{t-1}italic_z start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT may lead to a complex and unstable training process. To simplify the training process, we use the denoising model to predict the noise ϵitalic-ϵ\epsilonitalic_ϵ at the current time step t𝑡titalic_t, and then minimize the mean squared error loss between the predicted noise ϵθ(zt,t,τθ(x))subscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡subscript𝜏𝜃𝑥\epsilon_{\theta}(z_{t},t,\tau_{\theta}(x))italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) ) and the ground-truth noise ϵitalic-ϵ\epsilonitalic_ϵ for each time step t𝑡titalic_t to learn the model parameters, where x denotes the input LR EEG. The final loss function can be expressed as:

DM(θ)=𝔼z,ϵ𝒩(0,1),t,x[ϵϵθ(zt,t,τθ(x))22]subscript𝐷𝑀𝜃subscript𝔼formulae-sequencesimilar-to𝑧italic-ϵ𝒩01𝑡𝑥delimited-[]superscriptsubscriptdelimited-∥∥italic-ϵsubscriptitalic-ϵ𝜃subscript𝑧𝑡𝑡subscript𝜏𝜃𝑥22\mathcal{L}_{DM}(\theta)=\mathbb{E}_{z,\epsilon\sim\mathcal{N}(0,1),t,x}\left[% \left\lVert\epsilon-\epsilon_{\theta}(z_{t},t,\tau_{\theta}(x))\right\rVert_{2% }^{2}\right]caligraphic_L start_POSTSUBSCRIPT italic_D italic_M end_POSTSUBSCRIPT ( italic_θ ) = blackboard_E start_POSTSUBSCRIPT italic_z , italic_ϵ ∼ caligraphic_N ( 0 , 1 ) , italic_t , italic_x end_POSTSUBSCRIPT [ ∥ italic_ϵ - italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t , italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (7)

where ϵθsubscriptitalic-ϵ𝜃\epsilon_{\theta}italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is the proposed multi-scale Transformer denoising model and τθsubscript𝜏𝜃\tau_{\theta}italic_τ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT represents the proposed STC module. Algorithm 1 displays the complete training procedure with the final loss function.

3 Experiment

3.1 Dataset and Preprocessing

In this study, we utilized a publicly available dataset (Localize-MI [39]) for model training and validation. The dataset consists of high-density EEG data collected from seven drug-resistant epilepsy patients (mean age=35.1, sd𝑠𝑑sditalic_s italic_d age=5.4), comprising a total of 61 sessions (average session duration per subject=8.71, sd𝑠𝑑sditalic_s italic_d session duration per subject=2.65). Additionally, the dataset includes spatial locations of stimulus contacts in MRI space, MNI152 space [40], and Freesurfer surface space [41], corresponding to each subject, as well as the digitized positions of 256 scalp EEG electrodes. We employed the MNE software for preprocessing of the raw data. Following the work of [39], we applied high-pass filtering to the continuous EEG time series to remove low-frequency irrelevant information. To eliminate the large line noise caused by direct current, we used notch filtering at 50, 100, 150, and 200 Hz. Subsequently, we segmented the raw data into epochs of fixed duration (350 ms) based on the provided annotation events in the dataset. Moreover, the dataset was recorded using the EGI NA-400 amplifier (Electrical Geodesics) at a sampling rate of 8000 Hz for 256-channel EEG data, and deployed with the Geodesic Sensor Net using the HydroCel CleanLead electrode distribution system.

Table 1: The corresponding electrode systems for different scaling factors.
Scaling Factor Channels Montage
2 128 EGI-128 montage
4 64 EGI-64 montage
8 32 EGI-32 montage
16 16 EGI-16 montage

In the experiment, we downsampled the original HR EEG data along the channel dimension to obtain LR counterparts at different spatial resolutions, with scaling factors approximately set to 2, 4, 8, and 16. To simulate the acquisition of low-density EEG data in the real world, we mapped each scaling factor to different numbers of channels in the EGI standard electrode systems[42], as shown in Table 1. This process involved obtaining corresponding EEG electrode arrays. By using electrode systems with different channel ranks, we identified and removed the channels to be estimated, resulting in LR EEG that matched the preset channel levels. These LR-HR data pairs at different spatial resolutions were then created for model training.

3.2 Experiment Settings

3.2.1 Implement Details

The proposed STADMs are implemented using the PyTorch framework, and the corresponding experiments were conducted on an Nvidia RTX A800 GPU. During model training, the Adam optimizer was employed with a batch size of 32, a learning rate of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, and 300 training epochs. In the diffusion model framework, the forward process was set to 1000 time steps, with noise levels following sine or cosine schedules, and Gaussian noise was used for the noise type. In the multi-scale 1D convolution blocks, we employed 1D convolution layers with kernel sizes of 3, 5, 7, and 9, respectively. In the diffusion Transformer block, the number of attention heads was set to 16, and the hidden dimension was 64.

3.2.2 Metrics

To evaluate the quality of the synthetic EEG signals, we employed four quantitative metrics: Pearson Correlation Coefficient (PCC), Mean Absolute Error (MAE), Normalized Mean Squared Error (NMSE), and Signal-to-Noise Ratio (SNR) [43]. These metrics provide comprehensive insights into the accuracy and reliability of the reconstructed signals compared to the real signals.

To quantitatively evaluate the performance improvement of SR EEG in downstream tasks, we used classification performance metrics including Accuracy (ACC), Precision, Recall, and F1 Score. Additionally, to assess the performance of SR EEG in source localization, we used Localization Error [7] as a quantitative measure of performance improvement in source localization.

3.3 Performance Evaluation on EEG SR Reconstruction

In this section, we evaluate the performance of the proposed STADMs for EEG spatial super-resolution reconstruction, primarily analyzing the generated results through various quantitative metrics and scalp topography maps.

3.3.1 Comparison With Other Existing EEG SR Methods

the proposed STADMs are compared with the existing EEG SR methods to demonstrate the effectiveness of the synthetic HR EEG (256 channels). We quantitatively analyze the reconstruction performance differences between the proposed STADMs and several other methods: CNN-based methods (Deep-CNN [17]), autoencoder-based methods [Weighted gate layer autoencoders (WGLAE) [22], Deep autoencoder (DAE) [44]], and GANs-based method (EEGSR-GAN) [23].

Fig. 2 shows quantitative comparison results between different methods. As observed, the proposed STADMs achieve the best performance, significantly outperforming other existing competitive methods. The highest PCC and SNR values indicate that the reconstructed results of STADMs maintain high signal quality and correlation. Moreover, STADMs significantly outperform other methods in terms of MAE and NMSE, indicating lower errors and higher accuracy in reconstructing SR EEG. Additionally, the gaps between STADMs and other methods are substantial across all quantitative metrics. Through the experimental results, it can be concluded that compared to existing EEG SR methods, the proposed STADMs effectively reconstruct high spatial resolution(256 channels) EEG, achieving significant spatial resolution enhancement of LR EEG.

Refer to caption

(a) PCC

Refer to caption

(b) MAE

Refer to caption

(c) NMSE

Refer to caption

(d) SNR

Figure 2: Quantitative comparison between the existing EEG SR methods and the proposed method in terms of four metrics. (a): The performance of PCC in different methods. (b): The performance of MAE in different methods. (c): The performance of NMSE in different methods. (d): The performance of SNR in different methods. The synthetic results of STADMs have the best quality in terms of four quantitative evaluation metrics.

3.3.2 Comparison With Different scaling factors

To evaluate the performance of the proposed STADMs in generating synthetic SR EEG from LR EEG under different scaling factors, we employ four metrics for quantitative analysis of the model’s generative performance. Fig. 3 shows quantitative results for different scaling factors. As the scaling factor increases, PCC and SNR values decrease, indicating a weakening correlation between synthetic SR EEG and ground truth. Meanwhile, MAE and NMSE values increase, implying a growing difference. This suggests that the reconstruction performance of STADMs diminishes at larger scaling factors due to limited channel information, making HR EEG reconstruction challenging. However, even at larger scaling factors, the performance of STADMs remains comparable to that at smaller scaling factors. For example, with a scaling factor of 16, the PCC value is only 3.02% lower than at a scaling factor of 8, the MAE value differs by 1.68%, and the SNR value by 3.19%. This indicates that STADMs can still adequately reconstruct SR EEG signals despite limited channel information. Overall, STADMs effectively enhance LR EEG spatial resolution across different scaling factors, achieving better reconstruction at smaller scales while maintaining satisfactory performance at larger scales, demonstrating strong adaptability and robustness in handling various channel levels.

Refer to caption

(a) PCC

Refer to caption

(b) MAE

Refer to caption

(c) NMSE

Refer to caption

(d) SNR

Figure 3: Quantitative comparison between different scaling factors, including 2, 4, 8, 16. (a): The performance of PCC in different scaling factors. (b): The performance of MAE in different scaling factors. (c): The performance of NMSE in different scaling factors. (d): The performance of SNR in different scaling factors.

3.3.3 Qualitative results of Synthetic EEG

Refer to caption
Figure 4: The comparison qualitative results of topomaps between synthetic SR EEG and LR EEG in different frequency bands.

To observe the differences between synthetic SR EEG and LR EEG at different channel levels (16, 32, 64, and 128) more intuitively, we analyze the scalp topomaps of EEG in different frequency bands.We first use the Multitaper method to extract the power spectral density (PSD) of LR EEG and corresponding synthetic SR EEG in five frequency bands, including δ𝛿\deltaitalic_δ (0.5-4 Hz), θ𝜃\thetaitalic_θ (4-8 Hz), α𝛼\alphaitalic_α (8-13 Hz), β𝛽\betaitalic_β (13-30 Hz), and γ𝛾\gammaitalic_γ (30-40 Hz or higher) [45]. Finally, we use the MNE tool to obtain scalp topographies in different frequency bands. Fig. 4 compares scalp topographies across different frequency bands between EEG at various channel levels and model-generated 256-channel SR EEG. Each column represents a frequency band, and each row represents a type of EEG. As spatial resolution increases, the active regions on the scalp topographies become smaller and clearer. For instance, the 16-channel LR EEG shows large, blurry active regions in both low (δ𝛿\deltaitalic_δ, θ𝜃\thetaitalic_θ, α𝛼\alphaitalic_α) and high-frequency bands (β𝛽\betaitalic_β, γ𝛾\gammaitalic_γ), whereas the 256-channel SR EEG accurately localizes active sources in high-frequency bands. The synthesized SR EEG retains the frequency characteristics of the original signal across all bands, effectively reconstructing their spatial distribution. This demonstrates the strong generalization of the model across different frequency bands and its ability to stably reconstruct EEG signals. Overall, the proposed method generates SR EEG with richer spatial detail, more accurately describing and locating active regions on the scalp, whereas LR EEG results in coarse active regions, making it challenging to pinpoint active brain areas.

3.4 Classification Evaluation on the Synthetic SR EEG

Enhancing the spatial resolution of EEG significantly improves the performance of EEG-based diagnostic systems. For epilepsy patients, HR EEG aids doctors in identifying abnormal waveforms, such as spikes and sharp waves during seizures. To evaluate the effectiveness of synthetic SR EEG in detecting abnormalities, we designed a binary classification experiment (normal vs. abnormal) using SR EEG. In the Localize-MI dataset, EEG signals during stimulation are labeled abnormal, and those before stimulation are labeled normal. We trained the EEG-Net [46] separately on both LR EEG and the synthetic SR EEG and conducted corresponding classification tests. This allowed us to compare the performance of different spatial resolution LR EEG and synthetic SR EEG in the classification task.

Refer to caption

(a)

Refer to caption

(b)

Refer to caption

(c)

Figure 5: Comparison of classification performance with different channels-level LR EEG and Synthetic SR EEG. The classification performance is measured with accuracy, recall, precision and f1 score. (a): LR EEG(16 channels) versus SR EEG (256 channels). (b): LR EEG(32 channels) versus SR EEG (256 channels). (c): LR EEG(64 channels) versus SR EEG (256 channels).

Fig. 5 shows the comparison results of four classification metrics between LR EEG and synthetic SR EEG. The SR EEG reconstructed by STADMs outperforms LR EEG in all metrics, showing that STADMs enhance classification performance of EEG-Net. As the channel count of LR EEG increases, the classification performance improvement by SR EEG also increases, indicating better detail reconstruction. This demonstrates that STADMs effectively enhance LR EEG spatial resolution, improving downstream classification tasks. Moreover, this provides a novel approach for other EEG signal enhancement and EEG-based clinical diagnosis research.

Table 2: The Clissfication Performance Comparison of Different Frequency Bands.
freq bands EEG Accuracy (AVG±plus-or-minus\pm±STD %)
scaling factor 4 scaling factor 8 scaling factor 16
δ𝛿\deltaitalic_δ band LR 69.81±plus-or-minus\pm±0.02 67.21±plus-or-minus\pm±0.00 63.87±plus-or-minus\pm±0.12
SR 74.42±plus-or-minus\pm±0.21 71.83±plus-or-minus\pm±0.33 69.42±plus-or-minus\pm±0.13
θ𝜃\thetaitalic_θ band LR 71.12±plus-or-minus\pm±0.04 69.78±plus-or-minus\pm±0.12 64.15±plus-or-minus\pm±0.00
SR 75.36±plus-or-minus\pm±0.01 73.21±plus-or-minus\pm±0.32 72.13±plus-or-minus\pm±0.09
α𝛼\alphaitalic_α band LR 77.43±plus-or-minus\pm±0.01 74.37±plus-or-minus\pm±0.21 65.22±plus-or-minus\pm±0.39
SR 81.25±plus-or-minus\pm±0.04 79.54±plus-or-minus\pm±0.19 71.31±plus-or-minus\pm±0.04
β𝛽\betaitalic_β band LR 78.23±plus-or-minus\pm±0.11 77.89±plus-or-minus\pm±0.02 73.24±plus-or-minus\pm±0.23
SR 84.34±plus-or-minus\pm±0.16 81.47±plus-or-minus\pm±0.10 78.63±plus-or-minus\pm±0.12
γ𝛾\gammaitalic_γ band LR 76.58±plus-or-minus\pm±0.04 73.29±plus-or-minus\pm±0.18 70.04±plus-or-minus\pm±0.27
SR 80.15±plus-or-minus\pm±0.12 78.19±plus-or-minus\pm±0.07 76.14±plus-or-minus\pm±0.17
all LR 83.92±plus-or-minus\pm±0.24 81.59±plus-or-minus\pm±0.24 77.85±plus-or-minus\pm±0.15
SR 87.61±plus-or-minus\pm±0.18 83.51±plus-or-minus\pm±0.28 79.52±plus-or-minus\pm±0.06

We also analyzed classification performance from the frequency domain perspective, calculating the PSD of LR and SR EEG in five frequency bands. Each sample corresponds to n ×\times× 5 features (where n is the number of channels), and classification is based on these features. Table 2 shows classification results for LR and SR EEG across different frequency bands and scaling factors. SR EEG outperforms LR EEG in all frequency bands. As the scaling factor increases, classification accuracy decreases for both but remains higher for SR EEG. Low-frequency bands (δ𝛿\deltaitalic_δ, θ𝜃\thetaitalic_θ, α𝛼\alphaitalic_α) have worse performance than high-frequency bands (β𝛽\betaitalic_β, γ𝛾\gammaitalic_γ), but the average accuracy improvement is higher in low-frequency bands (5.06%) compared to high-frequency bands (4.95%). This indicates that the proposed method better preserves low-frequency information and captures high-frequency variations, improving classifier performance in identifying abnormal EEG features.

3.5 Source Localization Evaluation on the Synthetic EEG

Refer to caption
Figure 6: The localization errors across different EEG types for all Subjects.

Source localization is a crucial task in neuroscience and clinical applications, as it reveals the active source locations within the brain [47]. To evaluate the effectiveness of the proposed method in enhancing the spatial resolution of EEG and improving downstream task performance, we assess it using the source localization task, incorporating both quantitative and qualitative analyses. For quantitative analysis, we use localization error [7] as the metric, quantified as the Euclidean distance between the source localization results and the true source positions, to compare the performance of LR EEG, SR EEG, and HR EEG. For qualitative analysis, we visualize the source localization results on brain maps to provide an intuitive display of the localization performance across different data sources. To ensure fairness, we consistently use eLORETA [48], an unconstrained linear inverse solution that infers the distribution of brain current sources from scalp-recorded EEG signals through precise regularization strategies [49]. By examining both quantitative results and visualization outcomes, we can evaluate whether the proposed method effectively enhances the spatial resolution of EEG.

Fig. 6 shows a comparison of localization errors for different EEG types among all subjects in the dataset. The localization error for LR EEG is widely dispersed with higher values, indicating lower precision and greater variability. In contrast, SR EEG shows significantly lower and more concentrated localization errors, suggesting higher precision and stability. Although the localization error of SR EEG is slightly higher than HR EEG, its accuracy is markedly improved, making it a viable substitute for HR EEG and reducing reliance on expensive high-density equipment. Therefore, the experimental results demonstrate that synthetic SR EEG substantially enhances the spatial resolution and source localization accuracy of LR EEG.

Refer to caption

(a)

Refer to caption

(b)

Figure 7: The visualization results of different EEG data types on source localization task.

Fig. 7 shows the visualization results of different EEG data types on source localization tasks for some subjects. In these visualizations, green circles represent the stimulus source location ground-truth, red circles indicate the original HR EEG results, yellow circles denote the synthesized SR EEG results, and blue circles show the results from the LR EEG. The LR EEG results exhibit significant deviations from the ground truth, whereas SR EEG results show minimal deviation, closely matching HR EEG performance. This demonstrates that the proposed method effectively enhances spatial resolution and signal quality, significantly improving source localization accuracy for LR EEG. Overall, the method not only enhances EEG spatial resolution but also supports high-precision EEG imaging analysis in clinical and research applications.

4 Discussion and Conclusion

In this work, STADMs are proposed to address the spatial resolution limitations of low-density EEG devices, particularly in clinical diagnostic applications, such as epilepsy focus localization. STADMs are the first to employ a diffusion model to achieve spatial SR reconstruction from LR EEG to HR EEG. To ensure that the generated results align with subject-specific characteristics, a spatio-temporal condition module is designed to extract the spatio-temporal features of LR EEG, which are then used as conditional inputs to guide the reverse denoising process. Additionally, a multi-scale Transformer denoising module is constructed to leverage multi-scale convolution blocks and cross-attention-based diffusion Transformer blocks for conditional guidance, bridging the gap between SR EEG and HR EEG. Qualitative and quantitative results indicate that STADMs effectively enhance the spatial resolution of EEG. Furthermore, classification and source localization experiments demonstrate their ability to improve EEG performance in practical scenarios.

Experimental results show that while the proposed STADMs outperform existing methods in reconstruction performance, larger scaling factors reveal a significant difference between the synthetic SR EEG and the ground truth, as illustrated in Figure 3. We identified several possible reasons for this: i) At larger scaling factors, the available spatial information for the model is limited, making it difficult to accurately capture the spatial relationships between channels; ii) The inherently low SNR of EEG increases the difficulty of reconstruction for the model; iii) Significant individual differences between subjects make it challenging for the model to uniformly learn the mapping between HR EEG and LR EEG across different subjects. To address the issue of individual differences, personalized strategies can be designed, which is one of our future research priorities. Despite some limitations, the proposed method has demonstrated its effectiveness in downstream tasks and offers a new approach for enhancing the spatial resolution of other physiological signals. In the future, we will further optimize this framework and apply it to practical clinical scenarios in epilepsy diagnosis, aiding doctors in the preoperative assessment for epilepsy focus resection surgery.

References

  • [1] J. M. Bernabei, A. Li, A. Y. Revell, R. J. Smith, K. M. Gunnarsdottir, I. Z. Ong, K. A. Davis, N. Sinha, S. Sarma, and B. Litt, “Quantitative approaches to guide epilepsy surgery from intracranial eeg,” Brain, vol. 146, no. 6, pp. 2248–2258, 2023.
  • [2] C. He, Y.-Y. Chen, C.-R. Phang, C. Stevenson, I.-P. Chen, T.-P. Jung, and L.-W. Ko, “Diversity and suitability of the state-of-the-art wearable and wireless eeg systems review,” IEEE Journal of Biomedical and Health Informatics, 2023.
  • [3] X. Zheng, X. Liu, Y. Zhang, L. Cui, and X. Yu, “A portable hci system-oriented eeg feature extraction and channel selection for emotion recognition,” International Journal of Intelligent Systems, vol. 36, no. 1, pp. 152–176, 2021.
  • [4] Y. Liao, C. Zhang, M. Zhang, Z. Wang, and X. Xie, “Lightsleepnet: Design of a personalized portable sleep staging system based on single-channel eeg,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 1, pp. 224–228, 2021.
  • [5] C. Plummer, S. J. Vogrin, W. P. Woods, M. A. Murphy, M. J. Cook, and D. T. Liley, “Interictal and ictal source localization for epilepsy surgery using high-density eeg with meg: a prospective long-term study,” Brain, vol. 142, no. 4, pp. 932–951, 2019.
  • [6] P. Nemtsas, G. Birot, F. Pittau, C. M. Michel, K. Schaller, S. Vulliemoz, V. K. Kimiskidis, and M. Seeck, “Source localization of ictal epileptic activity based on high-density scalp eeg data,” Epilepsia, vol. 58, no. 6, pp. 1027–1036, 2017.
  • [7] A. Sohrabpour, Z. Cai, S. Ye, B. Brinkmann, G. Worrell, and B. He, “Noninvasive electromagnetic source imaging of spatiotemporally distributed epileptogenic brain sources,” Nature communications, vol. 11, no. 1, p. 1946, 2020.
  • [8] S. Ye, L. Yang, Y. Lu, M. T. Kucewicz, B. Brinkmann, C. Nelson, A. Sohrabpour, G. A. Worrell, and B. He, “Contribution of ictal source imaging for localizing seizure onset zone in patients with focal epilepsy,” Neurology, vol. 96, no. 3, pp. e366–e375, 2021.
  • [9] P. Van Mierlo, B. J. Vorderwülbecke, W. Staljanssens, M. Seeck, and S. Vulliëmoz, “Ictal eeg source localization in focal epilepsy: Review and future perspectives,” Clinical Neurophysiology, vol. 131, no. 11, pp. 2600–2616, 2020.
  • [10] J. Zhao, C. Wang, W. Sun, and C. Li, “Tailoring materials for epilepsy imaging: from biomarkers to imaging probes,” Advanced Materials, vol. 34, no. 44, p. 2203667, 2022.
  • [11] H. S. Courellis, J. R. Iversen, H. Poizner, and G. Cauwenberghs, “Eeg channel interpolation using ellipsoid geodesic length,” in 2016 IEEE biomedical circuits and systems conference (BioCAS), pp. 540–543, IEEE, 2016.
  • [12] S. Petrichella, L. Vollere, F. Ferreri, A. Guerra, S. Määtta, M. Könönen, V. Di Lazzaro, and G. Iannello, “Channel interpolation in tms-eeg: a quantitative study towards an accurate topographical representation,” in 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 989–992, IEEE, 2016.
  • [13] I. Nouira, A. B. Abdallah, and M. H. Bedoui, “Three-dimensional interpolation methods to spatiotemporal eeg mapping during various behavioral states,” Signal, Image and Video Processing, vol. 10, no. 5, pp. 943–949, 2016.
  • [14] S. You, B. Lei, S. Wang, C. K. Chui, A. C. Cheung, Y. Liu, M. Gan, G. Wu, and Y. Shen, “Fine perceptive gans for brain mr image super-resolution in wavelet domain,” IEEE transactions on neural networks and learning systems, vol. 34, no. 11, pp. 8802–8814, 2022.
  • [15] S. Wang, H. Wang, A. C. Cheung, Y. Shen, and M. Gan, “Ensemble of 3d densely connected convolutional network for diagnosis of mild cognitive impairment and alzheimer’s disease,” Deep learning applications, pp. 53–73, 2020.
  • [16] W. Yu, B. Lei, Y. Shen, S. Wang, Y. Liu, Z. Feng, Y. Hu, and M. K. Ng, “Morphological feature visualization of alzheimer’s disease via multidirectional perception gan,” IEEE Transactions on Neural Networks and Learning Systems, no. 0, p. 0, 2021.
  • [17] S. Han, M. Kwon, S. Lee, and S. C. Jun, “Feasibility study of eeg super-resolution using deep convolutional networks,” in 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1033–1038, IEEE, 2018.
  • [18] Q. Zuo, B. Lei, Y. Shen, Y. Liu, Z. Feng, and S. Wang, “Multimodal representations learning and adversarial hypergraph fusion for early alzheimer’s disease prediction,” in PRCV2021, no. 13021, pp. 479–490, 2021.
  • [19] Y. Tang, D. Chen, H. Liu, C. Cai, and X. Li, “Deep eeg superresolution via correlating brain structural and functional connectivities,” IEEE Transactions on Cybernetics, 2022.
  • [20] S. Hu, B. Lei, S. Wang, Y. Wang, Z. Feng, and Y. Shen, “Bidirectional mapping generative adversarial networks for brain mr to pet synthesis,” IEEE Transactions on Medical Imaging, vol. 41, no. 1, pp. 145–157, 2021.
  • [21] J. Pan, Q. Zuo, B. Wang, C. P. Chen, B. Lei, and S. Wang, “Decgan: Decoupling generative adversarial network for detecting abnormal neural circuits in alzheimer’s disease,” IEEE Transactions on Artificial Intelligence, 2024.
  • [22] H. El-Fiqi, M. Wang, K. Kasmarik, A. Bezerianos, K. C. Tan, and H. A. Abbass, “Weighted gate layer autoencoders,” IEEE Transactions on Cybernetics, vol. 52, no. 8, pp. 7242–7253, 2021.
  • [23] I. A. Corley and Y. Huang, “Deep eeg super-resolution: Upsampling eeg spatial resolution with generative adversarial networks,” in 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 100–103, IEEE, 2018.
  • [24] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  • [25] Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P. S. Yu, and L. Sun, “A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt,” arXiv preprint arXiv:2303.04226, 2023.
  • [26] C. Gong, C. Jing, X. Chen, C. M. Pun, G. Huang, A. Saha, M. Nieuwoudt, H.-X. Li, Y. Hu, and S. Wang, “Generative ai for brain image computing and brain network computing: a review,” Frontiers in Neuroscience, vol. 17, p. 1203104, 2023.
  • [27] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
  • [28] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
  • [29] M. Özbey, O. Dalmaz, S. U. Dar, H. A. Bedel, Ş. Özturk, A. Güngör, and T. Çukur, “Unsupervised medical image translation with adversarial diffusion models,” IEEE Transactions on Medical Imaging, 2023.
  • [30] Y. Wang, S. Zhao, H. Jiang, S. Li, B. Luo, T. Li, and G. Pan, “Diffmdd: A diffusion-based deep learning framework for mdd diagnosis using eeg,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2024.
  • [31] L.-F. Mo and S.-Q. Wang, “A variational approach to nonlinear two-point boundary value problems,” Nonlinear Analysis: Theory, Methods & Applications, vol. 71, no. 12, pp. e834–e838, 2009.
  • [32] Y. Bai, X. Wang, Y.-p. Cao, Y. Ge, C. Yuan, and Y. Shan, “Dreamdiffusion: Generating high-quality images from brain eeg signals,” arXiv preprint arXiv:2306.16934, 2023.
  • [33] W.-Y. Hsu and Y.-W. Cheng, “Eeg-channel-temporal-spectral-attention correlation for motor imagery eeg classification,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 1659–1669, 2023.
  • [34] J. Xie, J. Zhang, J. Sun, Z. Ma, L. Qin, G. Li, H. Zhou, and Y. Zhan, “A transformer-based approach combining deep learning network and spatial-temporal information for raw eeg classification,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 30, pp. 2126–2136, 2022.
  • [35] K. Wu, Y. Shen, and S. Wang, “3d convolutional neural network for regional precipitation nowcasting,” Journal of Image and Signal Processing, vol. 7, no. 4, pp. 200–212, 2018.
  • [36] W. Peebles and S. Xie, “Scalable diffusion models with transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4195–4205, 2023.
  • [37] Y. Song, Q. Zheng, B. Liu, and X. Gao, “Eeg conformer: Convolutional transformer for eeg decoding and visualization,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710–719, 2022.
  • [38] S.-Q. Wang and J.-H. He, “Variational iteration method for a nonlinear reaction-diffusion process,” International Journal of Chemical Reactor Engineering, vol. 6, no. 1, 2008.
  • [39] E. Mikulan, S. Russo, S. Parmigiani, S. Sarasso, F. M. Zauli, A. Rubino, P. Avanzini, A. Cattani, A. Sorrentino, S. Gibbs, et al., “Simultaneous human intracerebral stimulation and hd-eeg, ground-truth for source localization methods,” Scientific data, vol. 7, no. 1, p. 127, 2020.
  • [40] J. Wu, G. H. Ngo, D. Greve, J. Li, T. He, B. Fischl, S. B. Eickhoff, and B. T. Yeo, “Accurate nonlinear mapping between mni volumetric and freesurfer surface coordinate systems,” Human brain mapping, vol. 39, no. 9, pp. 3793–3808, 2018.
  • [41] B. Fischl, “Freesurfer,” Neuroimage, vol. 62, no. 2, pp. 774–781, 2012.
  • [42] R. Debnath, G. A. Buzzell, S. Morales, M. E. Bowers, S. C. Leach, and N. A. Fox, “The maryland analysis of developmental eeg (made) pipeline,” Psychophysiology, vol. 57, no. 6, p. e13580, 2020.
  • [43] L. Carmona, P. F. Diez, E. Laciar, and V. Mut, “Multisensory stimulation and eeg recording below the hair-line: a new paradigm on brain computer interfaces,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 4, pp. 825–831, 2020.
  • [44] S. Saba-Sadiya, T. Alhanai, T. Liu, and M. M. Ghassemi, “Eeg channel interpolation using deep encoder-decoder networks,” in 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2432–2439, IEEE, 2020.
  • [45] W. Zhao, E. J. Van Someren, C. Li, X. Chen, W. Gui, Y. Tian, Y. Liu, and X. Lei, “Eeg spectral analysis in insomnia disorder: A systematic review and meta-analysis,” Sleep medicine reviews, vol. 59, p. 101457, 2021.
  • [46] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for eeg-based brain–computer interfaces,” Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018.
  • [47] J. C. Bore, P. Li, L. Jiang, W. M. Ayedh, C. Chen, D. J. Harmah, D. Yao, Z. Cao, and P. Xu, “A long short-term memory network for sparse spatiotemporal eeg source imaging,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3787–3800, 2021.
  • [48] M. Hata, H. Kazui, T. Tanaka, R. Ishii, L. Canuet, R. D. Pascual-Marqui, Y. Aoki, S. Ikeda, H. Kanemoto, K. Yoshiyama, et al., “Functional connectivity assessed by resting state eeg correlates with cognitive decline of alzheimer’s disease–an eloreta study,” Clinical Neurophysiology, vol. 127, no. 2, pp. 1269–1278, 2016.
  • [49] S. Pancholi, A. Giri, A. Jain, L. Kumar, and S. Roy, “Source aware deep learning framework for hand kinematic reconstruction using eeg signal,” IEEE Transactions on Cybernetics, 2022.