Non-Cooperative Backdoor Attacks in Federated Learning: A New Threat Landscape

Tuan Nguyen1,2, Dung Thuy Nguyen3, Khoa D Doan1,2, Kok-Seng Wong1,2
1VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam
2College of Engineering & Computer Science, VinUniversity, Hanoi, Vietnam
3 Vanderbilt University, Nashville, TN 37235
{tuan.nm, khoa.dd, wong.ks}@vinuni.edu.vn,
dung.t.nguyen@Vanderbilt.Edu
Abstract

Despite the promise of Federated Learning (FL) for privacy-preserving model training on distributed data, it remains susceptible to backdoor attacks. These attacks manipulate models by embedding triggers (specific input patterns) in the training data, forcing misclassification as predefined classes during deployment. Traditional single-trigger attacks and recent work on cooperative multiple-trigger attacks, where clients collaborate, highlight limitations in attack realism due to coordination requirements. We investigate a more alarming scenario: non-cooperative multiple-trigger attacks. Here, independent adversaries introduce distinct triggers targeting unique classes. These parallel attacks exploit FL’s decentralized nature, making detection difficult. Our experiments demonstrate the alarming vulnerability of FL to such attacks, where individual backdoors can be successfully learned without impacting the main task. This research emphasizes the critical need for robust defenses against diverse backdoor attacks in the evolving FL landscape. While our focus is on empirical analysis, we believe it can guide backdoor research toward more realistic settings, highlighting the crucial role of FL in building robust defenses against diverse backdoor threats. The code is available at https://anonymous.4open.science/r/nba-980F/.

1 Introduction

Federated learning (FL) [16] is a distributed machine learning paradigm that enables multiple parties to train a shared model cooperatively without sharing their private data. In FL, each party trains a local model on its data and then shares the model parameters with a central server. The server aggregates the parameters from all parties and updates the global model. The updated global model is then sent back to each party for further training. This process is repeated until the global model converges. However, because the training data is distributed across multiple parties, FL is vulnerable to backdoor attacks [20], where the attacker poisons the model by injecting a backdoor trigger into the training data. When the model is deployed, a specific input pattern can activate the backdoor trigger to cause the model to output a specific target class.

Backdoor attacks on FL have been recently studied in [1, 24, 28, 6, 30, 2, 18]. Existing research categorizes these attacks into two main types: fixed-trigger attacks and optimized-trigger attacks. Fixed-trigger attacks, as described in [1], involve pre-selecting a trigger without leveraging information from the FL training process. Conversely, optimized-trigger attacks refine the trigger specifically to enhance the attack’s effectiveness by utilizing such information. Recent studies have explored various optimization techniques: maximizing the difference between clean and trigger-added sample representations (Fang et al., 2023) [4], jointly optimizing the trigger and local model with regularization to bypass defenses (Lyu et al., 2023) [15], and using autoencoders to generate the optimal trigger pattern (Nguyen et al., 2023) [18]. However, these works primarily focus on a cooperative attack scenario where malicious clients either coordinate with decomposed triggers and target labels (Xie et al., 2020) [28] or embed the same global trigger pattern across all attackers (Bagdasaryan et al., 2020) [1], ultimately compromising the global model.

Recent advancements in machine learning have led to new backdoor attack scenarios in FL. For instance, attackers can choose an arbitrary target class during inference (Doan et al., 2022) [3] or inject multiple triggers to poison the same dataset (Li et al., 2024) [14]. These advancements highlight the attacker’s growing sophistication, allowing independent attackers to inject their triggers and target classes without coordination. This approach is more realistic and poses a significant challenge for real-world FL systems, as any participant can potentially learn a backdoor task without compromising the main task’s performance. Independent attacks can be motivated by individual goals. Imagine competing companies participating in an FL system to develop a recommendation model. A malicious company could attempt to inject a backdoor into the model to promote its own products to users unfairly. Motivated by this emerging threat, we investigate a new backdoor attack scenario in FL: Non-Cooperative Backdoor Attacks (NBA). In this scenario, adversarial clients act independently, each with a unique backdoor trigger and target class. This scenario presents a significant practical challenge, as it allows any participant to introduce a backdoor without affecting the core functionality of the model. Fig. 1 provides a visual overview of the proposed NBA.

Through extensive experiments, we summarize our main contributions as follows:

  • We introduce a new attack scenario: Non-Cooperative Backdoor Attacks (NBA) in FL, where multiple malicious clients act independently by employing their specific trigger to backdoor their own targeted class. This scenario reflects a more realistic threat in real-world FL deployments.

  • We demonstrate the efficacy and increased risk of NBA attacks through extensive experiments on four datasets. We show successful backdoor insertion in single-shot, multiple-shot, and semi-multiple-shot settings, highlighting the growing danger as attackers exploit multiple communication rounds.

  • Our analysis investigates NBA with a large-scale attacker pool (up to 8 attackers). This reflects the potential for multiple parties to inject backdoors into a single model, a significant concern for practical FL systems.

  • We conduct in-depth analysis and ablation studies to understand how various factors like trigger patterns, scaling factors, and the number of attackers impact NBA success. This comprehensive analysis provides valuable insights for designing future defenses.

2 Background and related work

2.1 Federated learning

In FL, users with private data collaborate to train a global model (Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT). Each user iteratively updates a local model (Wit+1superscriptsubscript𝑊𝑖𝑡1W_{i}^{t+1}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) based on their data (Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT) using Stochastic Gradient Descent:

Wit+1=Gtlrtask(Gt,𝒟i)superscriptsubscript𝑊𝑖𝑡1superscript𝐺𝑡𝑙𝑟subscript𝑡𝑎𝑠𝑘superscript𝐺𝑡subscript𝒟𝑖{W_{i}^{t+1}}={G^{t}}-lr\cdot\nabla{\mathcal{L}_{task}}({G^{t}},{\mathcal{D}_{% i}})italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_l italic_r ⋅ ∇ caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (1)

where tasksubscript𝑡𝑎𝑠𝑘\mathcal{L}_{task}caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT is the loss function, task(Gt,Di)subscript𝑡𝑎𝑠𝑘superscript𝐺𝑡subscript𝐷𝑖\nabla{\mathcal{L}_{task}}(G^{t},D_{i})∇ caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the gradient, and lr𝑙𝑟lritalic_l italic_r is the local learning rate. User updates are then uploaded and aggregated by the central server using aggregation rules (e.g., FedAvg [16]) with a global learning rate (η𝜂\etaitalic_η) to create a new global model for the next round:

Gt+1=Gt+ηni=1n(Wit+1Gt)superscript𝐺𝑡1superscript𝐺𝑡𝜂𝑛superscriptsubscript𝑖1𝑛superscriptsubscript𝑊𝑖𝑡1superscript𝐺𝑡{G^{t+1}}={G^{t}}+\frac{\eta}{n}\mathop{\sum}\limits_{i=1}^{n}({W_{i}^{t+1}}-{% G^{t}})italic_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) (2)

2.2 Backdoor attacks and defenses in FL

Backdoor attack. This attack aims to make a model perform well on normal data (benign data) while also producing attacker-desired outputs for inputs with a hidden trigger (e.g., specific image pattern). Attackers participate in FL with backdoor data (Dbackdoorsubscript𝐷𝑏𝑎𝑐𝑘𝑑𝑜𝑜𝑟D_{backdoor}italic_D start_POSTSUBSCRIPT italic_b italic_a italic_c italic_k italic_d italic_o italic_o italic_r end_POSTSUBSCRIPT ) to poison the model. Eq. 3 shows how the attacker updates their local model (Wadvt+1superscriptsubscript𝑊𝑎𝑑𝑣𝑡1W_{adv}^{t+1}italic_W start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) using both normal data (Dnormalsubscript𝐷𝑛𝑜𝑟𝑚𝑎𝑙D_{normal}italic_D start_POSTSUBSCRIPT italic_n italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT ) and backdoor data:

Wadvt+1=Gtlrtask(Gt,DnormalDbackdoor)superscriptsubscript𝑊𝑎𝑑𝑣𝑡1superscript𝐺𝑡𝑙𝑟subscript𝑡𝑎𝑠𝑘superscript𝐺𝑡subscript𝐷𝑛𝑜𝑟𝑚𝑎𝑙subscript𝐷𝑏𝑎𝑐𝑘𝑑𝑜𝑜𝑟W_{adv}^{t+1}=G^{t}-lr\cdot\nabla\mathcal{L}_{task}(G^{t},D_{normal}\cup D_{% backdoor})italic_W start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_l italic_r ⋅ ∇ caligraphic_L start_POSTSUBSCRIPT italic_t italic_a italic_s italic_k end_POSTSUBSCRIPT ( italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_n italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT ∪ italic_D start_POSTSUBSCRIPT italic_b italic_a italic_c italic_k italic_d italic_o italic_o italic_r end_POSTSUBSCRIPT ) (3)

Existing backdoor attacks in FL. A widely recognized fact is that a global model approaching convergence undergoes minimal significant gradient updates. In light of this, attackers can employ a scaling factor γ𝛾\gammaitalic_γ in the Model Replacement Attack [1] to replace the global model with an attacker-trained backdoor model within a single epoch. Leveraging the distributed nature of FL, the Distributed Backdoor Attack (DBA)[28] decomposes the global trigger pattern into multiple local triggers, assigning each compromised device a unique local trigger. Building upon DBA, Gong et al.[6] introduce a coordinated backdoor attack with model-dependent local triggers. The semantic backdoor, a variant of the FL backdoor attack, incorporates triggers related to inherent features in target images, such as the color of a car [1]. Success and persistence in a semantic backdoor hinge on the frequency of trigger features in other clients’ datasets, as highlighted by Bagdasaryan et al. To enhance backdoor effectiveness, Wang et al.[24] propose an edge-case backdoor, similar to the semantic backdoor but strategically positions the backdoor datasets at the tail of the global datasets’ distribution, making them less likely to appear on other clients’ data. For increased backdoor durability, Neurotoxin [30] identifies parameters infrequently updated by benign clients and inserts backdoors using these parameters. Chameleon [2] explores the relationship between the original label and the backdoor label before flipping the label to the target label to extend the backdoor duration. IBA [18] leverages the updated history of adversarials with imperceptible triggers to enhance backdoor durability.

Existing backdoor defenses in FL. Existing defense methods primarily focus on distinguishing adversarial’ updates from benign clients’ updates, as adversarial strive to make their updates closely resemble other updates. Various outlier detection techniques have been proposed to counter backdoor attacks. Li et al. proposed a spectral anomaly detection framework based on low-dimensional embeddings, removing noisy and irrelevant features while retaining essential ones [13]. In this low-dimensional latent feature space, abnormal (malicious) model updates can be easily differentiated from normal updates. Deepsight [22] conducts deep model inspection for each model, analyzing Normalized Update Energies (NEUPs) and Division Differences (DDifs). FL-Detector [29] predicts the global model through model update consistency, detecting outliers based on the distance to the predicted model. RFLBAT [25] utilizes Principal Component Analysis (PCA) to reduce the dimension of gradient updates, effectively separating malicious models from benign models in a low-dimensional projection space. Foolsgold [5], examines historical updates for each client and penalizes those with high pairwise cosine similarities by employing a low learning rate. Another avenue of research focuses on robust defense against FL backdoor attacks by applying weak Differential Privacy (DP) [32] to the global model [31]. Weak DP, involving norm clipping and the addition of Gaussian noise to each gradient update, has proven effective in mitigating FL backdoor attacks [27]. Recognizing potential drawbacks such as deteriorating the global model’s main task accuracy with Gaussian noise and the need for a clipping bound with norm clipping, FLAME adapts DP method by introducing a noise boundary proof and a dynamic clipping bound. This adaptation has demonstrated its capability to alleviate backdoor attacks while still maintaining a high main task accuracy [19].

Knowledge of the adversary. While the attacker has white-box access to the global model weights and predictions, their knowledge of the training data is limited to the data distribution held by compromised clients, resulting in only partial knowledge of the overall training data.

Capabilities of the adversary. We assume a model-poisoning adversary with full access to the server and a fixed number of compromised clients, similar to the scenario presented in Xie et al. [28]. This powerful attacker can alter training hyperparameters, model weights, and training data on compromised clients.

3 Non-Cooperative Backdoor Attacks against federated learning

3.1 General framework

Refer to caption
Figure 1: Non-Cooperative Backdoor Attacks (NBA) scenario in FL: the red color represents the malicious client with their own unique trigger and target class, aiming to inject the backdoor trigger. Here, (Ti,Ci)subscript𝑇𝑖subscript𝐶𝑖(T_{i},C_{i})( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the trigger and target class of the i𝑖iitalic_i-th attacker.

Our proposed scenario, NBA, in FL, involves multiple clients acting independently, each with their own unique backdoor trigger and target class. As shown in Fig. 1, this attack differs from from existing cooperative backdoor attacks, where clients coordinate with decomposed triggers and target labels, ultimately compromising the global model. In NBA, each client acts independently, introducing its own unique backdoor trigger and target class. This novel scenario presents a significant threat to FL systems, as individual backdoor tasks can be successfully learned without harming the main task performance. The triggers and target classes of the attackers are denoted as (Ti,Ci)subscript𝑇𝑖subscript𝐶𝑖(T_{i},C_{i})( italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where i𝑖iitalic_i is the index of the attacker. All the triggers (Fig. 2) are unique in shape and have a size of 24 pixels [28].

3.2 Factors in Non-Cooperative Backdoor Attacks

Refer to caption
Figure 2: Eight trigger patterns used in our NBA experiments, all with fixed sizes of 24 pixels.

Trigger location and size. The trigger is located at the top left corner of the image, with a size of 24 pixels. We use eight fixed trigger patterns with the shapes 1×241241\times 241 × 24, 2×122122\times 122 × 12, 3×8383\times 83 × 8, 4×6464\times 64 × 6, 6×4646\times 46 × 4, 8×3838\times 38 × 3, 12×212212\times 212 × 2, and 24×124124\times 124 × 1 pixels, respectively, as shown in Fig. 2.

Scale γ𝛾\gammaitalic_γ. The scaling parameter γ=nη𝛾𝑛𝜂\gamma=\frac{n}{\eta}italic_γ = divide start_ARG italic_n end_ARG start_ARG italic_η end_ARG defined in Bagdasaryan et al. [1] is used by the attacker to scale up the malicious model weights. For instance, assume the ith malicious local model is X𝑋Xitalic_X. The new local model Lit+1superscriptsubscript𝐿𝑖𝑡1L_{i}^{t+1}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT that will be submitted is calculated as Lit+1=γ(XGt)+Gtsuperscriptsubscript𝐿𝑖𝑡1𝛾𝑋subscript𝐺𝑡subscript𝐺𝑡L_{i}^{t+1}=\gamma(X-G_{t})+G_{t}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_γ ( italic_X - italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

Data distribution α𝛼\alphaitalic_α. FL often presumes non-i.i.d. data distribution across parties. Here, we use a Dirichlet distribution [17] with hyperparameter α=0.5𝛼0.5\alpha=0.5italic_α = 0.5 to generate different data distributions.

4 Experiments

4.1 Datasets and experiment setup

Datasets. NBA is evaluated on four classification datasets with non-i.i.d. data distributions: Fashion-MNIST [26], MNIST [12], CIFAR-10 [10], and Tiny-ImageNet [11]. The data description and parameter setups are summarized in Appx. A.

Federated learning setup. Following the standard setup, we use FedAvg [16] as the global model optimization algorithm, and the global learning rate η𝜂\etaitalic_η is set to 0.01. In each round, 10 of the 100 clients are selected for aggregation and each selected client trains for E𝐸Eitalic_E local epochs with a local learning rate lr𝑙𝑟lritalic_l italic_r. Our experiments utilize 8 triggers with fixed sizes of 24 pixels, as shown in Fig. 2.

Attack scenarios. We evaluate the performance of NBA in three distinct attack scenarios following the setup in [1, 28]:

  • Single-shot attack: Attackers participate in only one round, during which they scale the client’s model using the model replacement method [1] with a scaling factor of γ=100𝛾100\gamma=100italic_γ = 100.

  • Multiple-shot attack: Attackers are continuously selected to participate throughout the entire training process, without applying any scaling factor.

  • Semi-multiple-shot attack: Attackers are continuously selected for a fixed number of rounds (100 rounds in our experiments). They employ the model replacement method combined with varying scaling factors γ𝛾\gammaitalic_γ ranging from 1 to 100, blending the continuous participation aspect of [28] with the scaling strategy of [1].

It is important to note that all attack settings are initiated after the global model has converged. Injecting backdoors from the first round, as observed in Xie et al. [28], can lead to low main accuracy and difficulty in model convergence.

Evaluation metrics. We use the following evaluation metrics to measure the performance of the proposed NBA in FL:

  • Main Task Accuracy (MA𝑀𝐴MAitalic_M italic_A): Accuracy of the global model on the main task during testing.

  • Backdoor Task Accuracy (BA𝐵𝐴BAitalic_B italic_A): Percentage of test inputs with a specific pixel pattern correctly classified into the target class by global model. For trigger k𝑘kitalic_k, this is BAk𝐵subscript𝐴𝑘BA_{k}italic_B italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

4.2 Backdoor attacks with one adversary

4.2.1 Single-shot attack

We begin our evaluation by analyzing the performance of an attack in a single-shot setting with one adversary. In this setting, the attacker participates in only one round of training and modifies the strength of the backdoor model with a scaling factor γ=100𝛾100\gamma=100italic_γ = 100.

Refer to caption
Figure 3: Backdoor accuracy of 8 triggers in single-shot attack with one adversary and γ=100𝛾100\gamma=100italic_γ = 100.

Performance of backdoor attacks with different triggers. The performance of backdoor attacks with different triggers shows consistency between datasets. With γ=1𝛾1\gamma=1italic_γ = 1, all datasets exhibit relatively low backdoor accuracies, with Fashion-MNIST and CIFAR-10 having slightly higher values compared to MNIST and Tiny-ImageNet. However, when γ=100𝛾100\gamma=100italic_γ = 100, the backdoor accuracy is consistently high across all triggers within each dataset, indicating that the specific trigger used has less impact on the success rate when the scaling factor is high. For instance, all four datasets maintain almost perfect backdoor accuracies, close to 100% for all triggers. This consistency suggests that the choice of trigger becomes less significant when the scaling factor is increased.

Table 1: Performance of backdoor attacks with one adversary in single-shot setting
γ𝛾\gammaitalic_γ Accuracy \rightarrow, Dataset \downarrow BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
1 Fashion-MNIST 2.60 2.83 2.94 2.78 2.59 2.68 2.59 2.48 2.69
MNIST 0.33 0.34 0.33 0.33 0.37 0.37 0.38 0.37 0.35
CIFAR-10 1.71 1.87 1.52 1.68 1.64 1.47 1.67 1.90 1.68
Tiny-ImageNet 0.09 0.08 0.09 0.07 0.06 0.05 0.07 0.08 0.07
100 Fashion-MNIST 99.88 100 100 100 100 100 100 98.67 99.82
MNIST 100 99.99 100 98.61 97.92 97.73 99.96 97.28 98.94
CIFAR-10 99.88 100 100 100 100 100 100 98.67 99.82
Tiny-ImageNet 99.37 99.67 99.92 99.83 99.77 99.89 99.30 99.83 99.70

Impact of scaling factor γ𝛾\gammaitalic_γ on backdoor task. The scaling factor γ𝛾\gammaitalic_γ significant affects the performance of backdoor attacks. When γ=1𝛾1\gamma=1italic_γ = 1, indicating no scaling, the backdoor accuracy across all datasets is relatively low. For instance, the backdoor accuracy for Fashion-MNIST ranges from 2.48% to 2.94%, with an average of 2.69%, while for MNIST, it ranges from 0.33% to 0.38%, with an average of 0.35%. CIFAR-10 shows backdoor accuracies from 1.47% to 1.90%, averaging 1.68%, and Tiny-ImageNet displays very low values between 0.05% and 0.09%, with an average of 0.07%. In contrast, when γ=100𝛾100\gamma=100italic_γ = 100, the backdoor accuracy dramatically increases, reaching nearly 100% across most triggers and datasets. For example, Fashion-MNIST and CIFAR-10 both achieve an average backdoor accuracy of 99.82%, while MNIST and Tiny-ImageNet also show high averages of 98.94% and 99.70%, respectively. This stark contrast highlights the critical role of the scaling factor in determining the success of backdoor attacks in single-shot settings.

Abnormality of backdoor performance on CIFAR-10 dataset. The graph in Fig. 3 depicts the backdoor accuracy trends over 100 rounds for the CIFAR-10 dataset compared to three other datasets. During attack rounds with γ=100𝛾100\gamma=100italic_γ = 100, the backdoor accuracy initially reaches nearly 100% across all four datasets. Following these rounds, in the absence of backdoor training injections, the backdoor accuracy generally decreases after 20 rounds. However, in the CIFAR-10 dataset, an unusual pattern emerges: the backdoor accuracy begins to increase after 20 rounds, eventually reaching nearly 70% even after 100 rounds. In contrast, the backdoor accuracy in the other datasets continues to decline, approaching 0% after 100 rounds. This observation suggests that the intrinsic characteristics of the CIFAR-10 dataset significantly influence the persistence of backdoor performance.

4.2.2 Multiple-shot attack

Refer to caption
Figure 4: Backdoor accuracy in multiple-shot setting with one adversary (γ=1𝛾1\gamma=1italic_γ = 1)

Performance of backdoor task with different triggers. In the multiple-shot attack scenario with one adversary over 200 rounds, as shown in Fig. 4, the performance of backdoor attacks exhibits notable differences across datasets. For Fashion-MNIST, the backdoor accuracy remains consistently high, with an average backdoor accuracy (BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT) of 99.38%, indicating that the backdoor attack is highly effective. Similarly, MNIST shows a strong resilience to the backdoor attack, with an average backdoor accuracy of 97.18%, though slightly lower than Fashion-MNIST. In the case of CIFAR-10, there is a significant range in backdoor accuracy, with values spanning from 74.06% to 91.04%, and an average of 85.86%, suggesting variable success in the backdoor attack across different triggers. Tiny-ImageNet displays consistently high backdoor accuracy, with all values close to or exceeding 93.80%, culminating in an average of 94.29%, indicating effective backdoor insertion. These results demonstrate the varying effectiveness of backdoor attacks in a multiple-shot scenario, heavily influenced by the specific triggers used.

4.3 Non-Cooperative Backdoor Attacks

4.3.1 Single-shot attack

Refer to caption
Figure 5: Backdoor accuracy in single-shot NBA with γ=100𝛾100\gamma=100italic_γ = 100 and gap 10 rounds.
Table 2: Performance of NBA in single-shot setting with γ=100𝛾100\gamma=100italic_γ = 100 (gap 10 rounds)
Accuracy \rightarrow, Dataset \downarrow MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 77.72 21.31 19.74 32.73 25.50 39.57 7.83 73.89 1.87 27.81
MNIST 77.84 3.99 0.28 6.74 2.36 7.30 0.97 3.96 1.29 3.36
CIFAR-10 96.32 0.51 0.28 0.50 0.57 0.54 0.42 3.64 0.66 0.89
Tiny-ImageNet 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.30 12.41

Fig. 5 illustrates the backdoor accuracy trends in a single-shot NBA setting with an attack gap of 10 rounds, meaning the first attacker injects the backdoor in the first round, the next in the 11thsuperscript11𝑡11^{th}11 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT round, and so on. In this scenario, each attacker participates in only one round, scaling the client’s model using the model replacement method with a scaling factor γ=100𝛾100\gamma=100italic_γ = 100. As shown in Fig. 5, the backdoor accuracy at the scaling round is nearly 100% for all triggers across all datasets. After the scaling round, the backdoor accuracy decreases significantly, reducing to nearly zero by the next scaling round. After 30 rounds without scaling, the backdoor accuracy patterns vary between datasets. For MNIST and CIFAR-10, while the main task accuracy remains high, the backdoor accuracy for all triggers drops to nearly zero. In the case of Fashion-MNIST, although the main task accuracy is moderate, the backdoor accuracy is varied, with some triggers maintaining higher accuracy than others. For Tiny-ImageNet, the main task accuracy is notably affected by the backdoor attack, showing a low main task accuracy alongside a high backdoor accuracy for the last trigger. These results highlight that dataset characteristics and backdoor trigger design are crucial factors influencing the success of backdoor attacks in FL.

4.3.2 Multiple-shot attack

Table 3: Performance of NBA in multiple-shot setting with γ=1𝛾1\gamma=1italic_γ = 1 (8 adversaries)
Accuracy \rightarrow, Dataset \downarrow MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 84.07 86.53 84.11 86.34 89.68 95.71 96.90 97.73 41.26 84.78
MNIST 97.96 99.40 86.61 89.47 87.52 79.24 89.43 97.01 49.30 84.75
CIFAR-10 76.21 55.87 88.93 84.69 86.78 89.99 86.50 94.09 27.19 76.75
Tiny-ImageNet 38.32 0.01 0.00 0.05 0.03 0.00 0.00 0.00 0.04 0.02
Refer to caption
Figure 6: Performance of NBA in multiple-shot setting with 8 adversaries (γ=1𝛾1\gamma=1italic_γ = 1)

Performance of backdoor task with different triggers. The effectiveness of backdoor attacks varies significantly with different triggers across the datasets. For Fashion-MNIST, triggers BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT and BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT are highly effective, achieving backdoor accuracies of 96.90% and 97.73% respectively, while trigger BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT is notably less effective at 41.26%. In MNIST, trigger BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT achieves near-perfect backdoor accuracy at 99.40%, with most other triggers also showing high effectiveness except for BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT, which has a lower accuracy of 49.30%. This could be due to the shape of trigger 8, which has dimensions 24×124124\times 124 × 1 and follows the vertical direction of the image, making it harder to detect in the image. For CIFAR-10, trigger BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT is the most effective with a backdoor accuracy of 94.09%, while BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT is the least effective at 27.19%. Interestingly, all triggers show negligible impact in Tiny-ImageNet, with backdoor accuracy close to zero. This suggests that the dataset’s complexity, potentially due to a smaller model size or a large number of classes, makes it challenging to learn both the main task and a backdoor task simultaneously. The presence of multiple backdoor tasks further complicates the attackers’ efforts to embed the backdoor effectively.

4.3.3 Semi-multiple-shot attack

Refer to caption
Figure 7: Backdoor accuracy in semi-multiple-shot NBA with γ=100#Atk𝛾100#𝐴𝑡𝑘\gamma=\frac{100}{\#Atk}italic_γ = divide start_ARG 100 end_ARG start_ARG # italic_A italic_t italic_k end_ARG
Table 4: Performance of NBA in semi-multiple-shot scenario (8 adversaries) after 100 attack rounds
Accuracy \rightarrow, Dataset \downarrow MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 82.53 90.93 91.54 93.29 96.01 98.79 98.77 98.68 59.33 90.92
MNIST 97.04 99.76 88.38 96.10 93.55 85.32 90.15 97.99 69.74 90.12
CIFAR-10 19.71 0.01 0.00 35.74 0.00 35.94 0.00 0.00 0.00 8.96
Tiny-ImageNet 13.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

In multiple-shot attacks, adversaries continuously inject backdoor triggers, resulting in a gradual increase in backdoor accuracy. As illustrated in Fig. 6, attackers need to participate in at least 200 rounds to achieve high backdoor accuracy for most triggers across three datasets, with the exception of Tiny-ImageNet. In real-world scenarios, however, adversaries may not always participate in every round. To reduce the number of rounds needed for effective attacks, we propose a semi-multiple-shot attack where adversaries participate for 100 rounds and then stop injecting backdoor triggers. The key difference between multiple-shot and semi-multiple-shot attacks is the adjustment of the scaling factor γ𝛾\gammaitalic_γ to 100#Atk100#𝐴𝑡𝑘\frac{100}{\#Atk}divide start_ARG 100 end_ARG start_ARG # italic_A italic_t italic_k end_ARG, where #Atk#𝐴𝑡𝑘\#Atk# italic_A italic_t italic_k is the number of attackers. Results presented in Tab. 4 and Fig. 7 for a scenario with eight adversaries show that, although the backdoor accuracy fluctuates, it can reach high values for most triggers in three datasets at specific rounds.

These findings highlight a double-edged sword for backdoor attackers using the semi-multiple-shot attack. While it can be more efficient than the multiple-shot attack in terms of reducing participation rounds (reducing the risk of detection), it may not guarantee consistently high backdoor accuracy and the main accuracy might drop significantly. This inconsistency is crucial because once the backdoor injection is stopped, the model’s backdoor accuracy (BA) drops significantly. However, the BAs then gradually recover, suggesting that the triggers are not entirely "forgotten" by the model. This presents a challenge for central server detection, as any client can inject a backdoor trigger in any round and then stop participating. The lingering effect of the triggers, even after the attacker ceases participation, makes it difficult for the server to distinguish between a temporary fluctuation and a true backdoor attack.

4.4 The robustness in Non-Cooperative Backdoor Attacks

Norm Clipping [23]. Participant updates undergo a clipping process to limit the impact of model adjustments, which involves multiplying them by min(1,SLit+1Gt2)𝑚𝑖𝑛1subscript𝑆normsuperscriptsubscript𝐿𝑖𝑡1subscript𝐺𝑡2min(1,\frac{S}{||L_{i}^{t+1}-G_{t}||}_{2})italic_m italic_i italic_n ( 1 , divide start_ARG italic_S end_ARG start_ARG | | italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), where S𝑆Sitalic_S represents the clipping threshold. Tab. 3 and Tab. 5 illustrate the performance of the NBA algorithm in a multiple-shot scenario both before and after implementing the norm clipping defense with S=5𝑆5S=5italic_S = 5. The results reveal that neither the primary accuracy nor the backdoor accuracy is significantly influenced by the norm clipping defense, suggesting that adversaries can still effectively introduce backdoor triggers into the global model.

Table 5: Performance of NBA in multiple-shot setting (8 adversaries) under norm clipping defense
Accuracy \rightarrow, Dataset \downarrow MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 84.09 86.56 84.14 86.36 89.67 95.73 96.91 97.77 41.28 84.80
MNIST 97.95 99.39 86.62 89.44 87.62 79.36 89.42 96.95 48.94 84.72
CIFAR-10 75.69 65.30 90.80 87.61 88.56 91.87 88.10 94.88 25.99 79.14
Tiny-ImageNet 42.53 0.03 0.02 0.08 0.06 0.05 0.04 0.02 0.13 0.05

Differential Privacy (DP) [1]. Gaussian noise N(0,σ)𝑁0𝜎N(0,\sigma)italic_N ( 0 , italic_σ ) is added to local updates to reduce the influence of backdoor attacks. As shown in Tab. 6, although the backdoor accuracy decreases, the DP defense significantly impacts the main accuracy, leading to a notable drop in performance across all datasets. This indicates a trade-off between maintaining privacy and preserving model efficacy.

Table 6: Performance of NBA in multiple-shot setting (8 adversaries) under DP defense
Accuracy \rightarrow, Dataset \downarrow MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 82.67 85.00 78.69 83.23 91.92 89.19 94.67 97.91 33.36 81.75
MNIST 97.02 95.64 89.28 70.38 88.09 70.96 84.48 95.01 44.64 79.81
CIFAR-10 41.66 59.43 72.26 52.92 65.04 78.11 43.52 65.38 12.98 56.21
Tiny-ImageNet 9.38 0.02 0.00 0.35 0.00 0.08 0.09 0.04 0.53 0.14

Effectiveness of defense mechanisms. Mainstream defenses, such as Norm Clipping and Differential Privacy, were not designed to address scenarios where multiple independent attackers inject unique triggers and target classes. Additionally, existing research hasn’t thoroughly investigated the effectiveness of these defenses in the presence of multiple backdoor attackers (NBA). This highlights a critical gap in the current understanding of defense mechanisms for FL security.

4.5 Limitations

Our work explores various facets of the NBA scenario in FL systems, focusing on trigger design, dataset characteristics, and model updates. The effectiveness of backdoor triggers varies with dataset characteristics, and the scaling factor γ𝛾\gammaitalic_γ significantly impacts backdoor accuracy and main task performance. Multiple-shot attacks require numerous rounds to achieve high backdoor accuracy, which is impractical in real-world scenarios, while semi-multiple-shot attacks can reduce the number of rounds but need precise tuning of γ𝛾\gammaitalic_γ. Our controlled evaluation might not fully capture real-world FL complexities, and our study focuses on limited trigger designs and defense mechanisms like norm clipping and differential privacy. Future research should explore more sophisticated attack strategies and novel defenses tailored to the non-i.i.d. nature of FL systems.

5 Conclusion

This paper explores the emerging threat of the NBA scenario in FL systems, where multiple independent clients can compromise the global model by injecting unique backdoor triggers and target classes. Our findings highlight the potential of watermarking-based backdoor triggers, which can be useful in cross-silo FL scenarios to protect the copyright of participants. This study lays the groundwork for future research focused on developing robust strategies to counteract backdoor attacks. Furthermore, investigating incentive structures that discourage malicious behavior and encourage cooperative participation is essential. By advancing in these areas, we can significantly improve the security, privacy, and integrity of FL platforms, thereby contributing to the creation of secure and trustworthy FL systems.

References

  • [1] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. In International conference on artificial intelligence and statistics, pages 2938–2948. PMLR, 2020.
  • [2] Yanbo Dai and Songze Li. Chameleon: Adapting to peer images for planting durable backdoors in federated learning. arXiv preprint arXiv:2304.12961, 2023.
  • [3] Khoa D Doan, Yingjie Lao, and Ping Li. Marksman backdoor: Backdoor attacks with arbitrary target class. Advances in Neural Information Processing Systems, 35:38260–38273, 2022.
  • [4] Pei Fang and Jinghui Chen. On the vulnerability of backdoor defenses for federated learning. arXiv preprint arXiv:2301.08170, 2023.
  • [5] Clement Fung, Chris JM Yoon, and Ivan Beschastnikh. The limitations of federated learning in sybil settings. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), pages 301–316, 2020.
  • [6] Xueluan Gong, Yanjiao Chen, Huayang Huang, Yuqing Liao, Shuai Wang, and Qian Wang. Coordinated backdoor attacks against federated learning with model-dependent triggers. IEEE network, 36(1):84–90, 2022.
  • [7] Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020.
  • [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [9] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007.
  • [10] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  • [11] Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  • [12] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [13] Suyi Li, Yong Cheng, Wei Wang, Yang Liu, and Tianjian Chen. Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211, 2020.
  • [14] Yige Li, Xingjun Ma, Jiabo He, Hanxun Huang, and Yu-Gang Jiang. Multi-trigger backdoor attacks: More triggers, more threats. arXiv preprint arXiv:2401.15295, 2024.
  • [15] Xiaoting Lyu, Yufei Han, Wei Wang, Jingkai Liu, Bin Wang, Jiqiang Liu, and Xiangliang Zhang. Poisoning with cerberus: stealthy and colluded backdoor attack against federated learning. In Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023.
  • [16] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  • [17] Thomas P. Minka. Estimating a dirichlet distribution. Technical report, 2000.
  • [18] Dung Thuy Nguyen, Tuan Minh Nguyen, Anh Tuan Tran, Khoa D Doan, and KOK-SENG WONG. IBA: Towards irreversible backdoor attacks in federated learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [19] Thien Duc Nguyen, Phillip Rieger, Roberta De Viti, Huili Chen, Björn B Brandenburg, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, et al. {{\{{FLAME}}\}}: Taming backdoors in federated learning. In 31st USENIX Security Symposium (USENIX Security 22), pages 1415–1432, 2022.
  • [20] Thuy Dung Nguyen, Tuan Nguyen, Phi Le Nguyen, Hieu H Pham, Khoa D Doan, and Kok-Seng Wong. Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions. Engineering Applications of Artificial Intelligence, 127:107166, 2024.
  • [21] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • [22] Phillip Rieger, Thien Duc Nguyen, Markus Miettinen, and Ahmad-Reza Sadeghi. Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection. arXiv preprint arXiv:2201.00763, 2022.
  • [23] Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H. B. McMahan. Can you really backdoor federated learning? ArXiv preprint, abs/1911.07963, 2019.
  • [24] Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, and Dimitris Papailiopoulos. Attack of the tails: Yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems, 33:16070–16084, 2020.
  • [25] Yongkang Wang, Dihua Zhai, Yufeng Zhan, and Yuanqing Xia. Rflbat: A robust federated learning algorithm against backdoor attack. arXiv preprint arXiv:2201.03772, 2022.
  • [26] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  • [27] Chulin Xie, Minghao Chen, Pin-Yu Chen, and Bo Li. Crfl: Certifiably robust federated learning against backdoor attacks. In International Conference on Machine Learning, pages 11372–11382. PMLR, 2021.
  • [28] Chulin Xie, Keli Huang, Pin Yu Chen, and Bo Li. Dba: Distributed backdoor attacks against federated learning. In 8th International Conference on Learning Representations, ICLR 2020, 2020.
  • [29] Zaixi Zhang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2545–2555, 2022.
  • [30] Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael Mahoney, Prateek Mittal, Ramchandran Kannan, and Joseph Gonzalez. Neurotoxin: Durable backdoors in federated learning. In International Conference on Machine Learning, pages 26429–26446. PMLR, 2022.
  • [31] Huadi Zheng, Haibo Hu, and Ziyang Han. Preserving user privacy for machine learning: Local differential privacy or federated machine learning? IEEE Intelligent Systems, 35(4):5–14, 2020.
  • [32] Huadi Zheng, Qingqing Ye, Haibo Hu, Chengfang Fang, and Jie Shi. Protecting decision boundary of machine learning model with differentially private perturbation. IEEE Transactions on Dependable and Secure Computing, 19(3):2007–2022, 2020.

This appendix provides an extended exploration of our research, providing additional details on methods and results. Appx. A details the training procedures and experimental settings used in our NBA experiments. Appx. B presents additional results on NBA performance with a single attacker. We then explore the impact of multiple attackers on NBA performance in Appx. C, showcasing results with varying attacker counts. Appx. D evaluates the effectiveness of defense mechanisms against NBA attacks, offering additional results. Finally, Appx. E and Appx. F discuss the limitations of our work, explore potential social implications, and suggest future research directions.

Appendix A Training details and experimental settings in NBA

A.1 Experiment setup

Datasets and hyperparameters. We use four datasets in our experiments: Fashion-MNIST [26], MNIST [12], CIFAR-10 [10], and Tiny-ImageNet [11]. These datasets are preprocessed and divided among different participants in the FL system with heterogeneous data distributions, set at α=0.5𝛼0.5\alpha=0.5italic_α = 0.5. To simulate the NBA scenario, we use a total of N=100𝑁100N=100italic_N = 100 clients, with K=10𝐾10K=10italic_K = 10 clients selected for aggregation in each round. During each round, each client trains for E𝐸Eitalic_E local epochs with a local learning rate lr𝑙𝑟lritalic_l italic_r and a batch size of 128. The global learning rate η𝜂\etaitalic_η is set to 0.1. The details of the FL training setup are shown in Table 7.

Table 7: NBA training details
Dataset Model used #Client (K/N𝐾𝑁K/Nitalic_K / italic_N) Benign lr/E𝑙𝑟𝐸lr/Eitalic_l italic_r / italic_E Poison lr/E𝑙𝑟𝐸lr/Eitalic_l italic_r / italic_E η𝜂\etaitalic_η
Fashion-MNIST 2 conv and 2 fc 10 / 100 0.1 / 2 0.05 / 6 0.1
MNIST 2 conv and 2 fc 10 / 100 0.1 / 2 0.05 / 6 0.1
CIFAR-10 Resnet-18 [8] 10 / 100 0.1 / 2 0.05 / 6 0.1
Tiny-ImageNet Resnet-18 [8] 10 / 100 0.1 / 2 0.05 / 6 0.1

Computational Resources. All experiments are conducted on a server with an Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 256GB RAM, and an NVIDIA GeForce RTX 3090 GPU with 24GB memory. The code is implemented in PyTorch [21] and the experiments are conducted following the backdoor attack setup in prior work [1, 28]. Moreover, we utilized libraries such as NumPy [7], and Matplotlib [9] for data processing and visualization.

A.2 Model Replacement Attack

In Model Replacement Attack [1], the adversary aims to completely overwrite the global model Gt+1superscript𝐺𝑡1G^{t+1}italic_G start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT with their malicious model, denoted by X𝑋Xitalic_X using the following equation:

X=Gt+ηni=1m(Wit+1Gt).𝑋superscript𝐺𝑡𝜂𝑛superscriptsubscript𝑖1𝑚superscriptsubscript𝑊𝑖𝑡1superscript𝐺𝑡X=G^{t}+\frac{\eta}{n}\sum_{i=1}^{m}(W_{i}^{t+1}-G^{t}).italic_X = italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + divide start_ARG italic_η end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) . (4)

Due to the non-independent and identically distributed (non-IID) nature of the training data, local models (Wit+1superscriptsubscript𝑊𝑖𝑡1W_{i}^{t+1}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) may deviate significantly from the current global model (Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT). However, as the global model converges, these deviations tend to cancel each other out, i.e., i=1m1(Wit+1Gt)0superscriptsubscript𝑖1𝑚1superscriptsubscript𝑊𝑖𝑡1superscript𝐺𝑡0\sum_{i=1}^{m-1}(W_{i}^{t+1}-G^{t})\approx 0∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≈ 0, meaning the sum of these deviations approaches zero.

Leveraging this cancellation effect, the adversary can solve for the malicious model they need to submit (W~mt+1superscriptsubscript~𝑊𝑚𝑡1\widetilde{W}_{m}^{t+1}over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) by:

W~mt+1=nηX(nη1)Gti=1m1(Wit+1Gt)nη(XGt)+Gt.superscriptsubscript~𝑊𝑚𝑡1𝑛𝜂𝑋𝑛𝜂1superscript𝐺𝑡superscriptsubscript𝑖1𝑚1superscriptsubscript𝑊𝑖𝑡1superscript𝐺𝑡𝑛𝜂𝑋superscript𝐺𝑡superscript𝐺𝑡\widetilde{W}_{m}^{t+1}=\frac{n}{\eta}X-(\frac{n}{\eta}-1)G^{t}-\sum_{i=1}^{m-% 1}(W_{i}^{t+1}-G^{t})\approx\frac{n}{\eta}(X-G^{t})+G^{t}.over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = divide start_ARG italic_n end_ARG start_ARG italic_η end_ARG italic_X - ( divide start_ARG italic_n end_ARG start_ARG italic_η end_ARG - 1 ) italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ≈ divide start_ARG italic_n end_ARG start_ARG italic_η end_ARG ( italic_X - italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . (5)

Appendix B Performance of main task with one adversary

B.1 Single-shot attack

The introduction of backdoor attacks in single-shot scenarios caused a substantial drop in main accuracy across all datasets during the attack round, with the most pronounced impacts observed in Tiny-ImageNet and MNIST, as illustrated in Tab. 8 and Fig. 8, indicating their higher susceptibility. Post-attack, the main accuracy largely recovered in Fashion-MNIST, MNIST, and CIFAR-10, suggesting that these models can regain performance with continued training. However, Tiny-ImageNet’s main accuracy remained significantly lower than the initial stage even after 100 training rounds, indicating a more lasting impact or greater difficulty in overcoming the attack. The consistent patterns across different triggers, with similar recovery trends in main accuracy, highlight the varying resilience of datasets to backdoor attacks in FL and underscore the need for tailored defense mechanisms to mitigate long-term impacts effectively.

Table 8: Performance of the main task with one adversary in single-shot after 100 rounds (γ=100𝛾100\gamma=100italic_γ = 100)
Accuracy \rightarrow, Dataset \downarrow MA1𝑀subscript𝐴1MA_{1}italic_M italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT MA2𝑀subscript𝐴2MA_{2}italic_M italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT MA3𝑀subscript𝐴3MA_{3}italic_M italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT MA4𝑀subscript𝐴4MA_{4}italic_M italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT MA5𝑀subscript𝐴5MA_{5}italic_M italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT MA6𝑀subscript𝐴6MA_{6}italic_M italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT MA7𝑀subscript𝐴7MA_{7}italic_M italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT MA8𝑀subscript𝐴8MA_{8}italic_M italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT MAAvg𝑀subscript𝐴𝐴𝑣𝑔MA_{Avg}italic_M italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 82.15 82.51 82.30 82.49 82.38 82.29 82.28 82.19 82.32
MNIST 97.12 97.23 97.25 97.15 97.12 97.13 97.07 97.00 97.13
CIFAR-10 78.50 78.67 79.03 79.02 79.10 78.95 78.91 78.84 78.88
Tiny-ImageNet 29.04 28.16 27.22 27.64 29.66 27.49 23.95 27.20 27.54
Refer to caption
Figure 8: Main accuracy of single-shot attack with one adversary

B.2 Multiple-shot attack

Table 9: Performance of the main task with one adversary in multiple-shot setting
Accuracy \rightarrow, Dataset \downarrow MA1𝑀subscript𝐴1MA_{1}italic_M italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT MA2𝑀subscript𝐴2MA_{2}italic_M italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT MA3𝑀subscript𝐴3MA_{3}italic_M italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT MA4𝑀subscript𝐴4MA_{4}italic_M italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT MA5𝑀subscript𝐴5MA_{5}italic_M italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT MA6𝑀subscript𝐴6MA_{6}italic_M italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT MA7𝑀subscript𝐴7MA_{7}italic_M italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT MA8𝑀subscript𝐴8MA_{8}italic_M italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT MAAvg𝑀subscript𝐴𝐴𝑣𝑔MA_{Avg}italic_M italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion-MNIST 87.50 87.34 87.46 87.56 87.45 87.45 87.54 87.58 87.48
MNIST 98.82 98.87 98.86 98.88 98.86 98.86 98.92 98.83 98.86
CIFAR-10 75.97 77.04 77.34 77.79 77.79 77.74 77.86 76.40 77.24
Tiny-ImageNet 52.03 52.85 53.09 53.20 52.99 53.40 52.94 52.40 52.86
Refer to caption
Figure 9: Main task accuracy of multiple-shot setting with one adversary

In the multiple-shot attack scenario, the main accuracy, as shown in Tab. 9 and Fig. 9, generally improved or remained stable after the attack period compared to the initial accuracy. Both Fashion-MNIST and MNIST showed an increase in main accuracy, suggesting that the models could recover and even benefit from continued training despite the attacks. CIFAR-10 experienced a slight improvement, indicating minimal negative impact and marginal gains. However, Tiny-ImageNet saw a slight decrease in main accuracy, highlighting its continued susceptibility to backdoor attacks. Overall, most datasets could recover or improve following the attack, except for Tiny-ImageNet, which underscores the need for stronger defense mechanisms in FL to protect more complex datasets.

Appendix C Non-Cooperative Backdoor Attacks with different numbers of adversaries

Table 10: Performance of NBA in multiple-shot (γ=1𝛾1\gamma=1italic_γ = 1) with different numbers of adversaries
Accuracy \rightarrow, Dataset \downarrow #Atk MA𝑀𝐴MAitalic_M italic_A BA1𝐵subscript𝐴1BA_{1}italic_B italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT BA2𝐵subscript𝐴2BA_{2}italic_B italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT BA3𝐵subscript𝐴3BA_{3}italic_B italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT BA4𝐵subscript𝐴4BA_{4}italic_B italic_A start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT BA5𝐵subscript𝐴5BA_{5}italic_B italic_A start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT BA6𝐵subscript𝐴6BA_{6}italic_B italic_A start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT BA7𝐵subscript𝐴7BA_{7}italic_B italic_A start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT BA8𝐵subscript𝐴8BA_{8}italic_B italic_A start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT BAAvg𝐵subscript𝐴𝐴𝑣𝑔BA_{Avg}italic_B italic_A start_POSTSUBSCRIPT italic_A italic_v italic_g end_POSTSUBSCRIPT
Fashion- MNIST 1 86.58 97.38 - - - - - - - 97.38
2 86.13 86.31 82.44 - - - - - - 84.38
3 85.78 88.83 83.23 79.68 - - - - - 83.91
4 84.92 88.19 84.74 78.34 74.68 - - - - 81.49
5 85.00 87.86 84.07 85.04 84.32 97.44 - - - 87.75
6 84.61 87.53 85.03 87.93 88.52 93.72 97.50 - - 90.04
7 84.30 86.63 82.82 86.21 89.48 95.18 95.62 97.22 - 90.45
8 84.07 86.53 84.11 86.34 89.68 95.71 96.90 97.73 41.26 84.78
MNIST 1 98.72 96.82 - - - - - - - 96.82
2 98.72 96.40 83.60 - - - - - - 90.00
3 98.65 99.49 87.07 86.63 - - - - - 91.06
4 98.49 99.30 88.82 78.19 52.97 - - - - 79.82
5 98.42 99.28 87.23 87.53 76.99 88.77 - - - 87.96
6 98.31 99.18 86.43 91.58 80.41 83.68 88.93 - - 88.37
7 98.07 99.07 86.46 89.75 84.65 75.97 89.20 91.40 - 88.07
8 97.96 99.40 86.61 89.47 87.52 79.24 89.43 97.01 49.30 84.75
CIFAR-10 1 76.04 61.33 - - - - - - - 61.33
2 75.04 45.36 84.32 - - - - - - 64.84
3 75.34 31.97 81.30 78.18 - - - - - 63.81
4 75.77 41.00 85.97 78.82 80.20 - - - - 71.50
5 76.29 48.38 86.89 81.66 82.53 88.99 - - - 77.69
6 75.63 48.69 87.52 81.61 84.81 89.81 85.58 - - 79.67
7 76.62 46.60 87.82 79.88 83.86 89.18 80.01 92.76 - 80.01
8 76.21 55.87 88.93 84.69 86.78 89.99 86.50 94.09 27.19 76.75
Tiny- ImageNet 1 51.54 93.55 - - - - - - - 93.55
2 47.35 89.29 86.29 - - - - - - 87.79
3 45.19 89.91 4.48 1.94 - - - - - 32.11
4 43.75 77.42 1.66 0.20 2.39 - - - - 20.42
5 42.35 54.23 6.13 1.64 8.68 2.52 - - - 14.64
6 41.16 24.55 2.66 0.98 2.35 0.32 1.71 - - 5.43
7 40.66 0.44 0.04 0.10 0.10 0.01 0.03 0.01 - 0.10
8 38.32 0.01 0.00 0.05 0.03 0.00 0.00 0.00 0.04 0.02
Note: #Atk denotes the number of adversaries.

In the multiple-shot attack scenario with varying numbers of adversaries, the impact on main and backdoor accuracies across different datasets reveals distinct trends. While Fashion-MNIST and MNIST exhibit relatively stable main accuracies as the number of adversaries increases, indicating resilience to the attacks, the effectiveness of the backdoor attacks diminishes, as reflected by decreasing average backdoor accuracies. In CIFAR-10, the main accuracy remains consistent, but the backdoor accuracy shows an upward trend with more adversaries, suggesting increased susceptibility to backdoor attacks. In contrast, Tiny-ImageNet experiences significant drops in both main and backdoor accuracies with an increasing number of adversaries, indicating heightened vulnerability. These findings underscore the importance of considering the number of adversaries when designing defense mechanisms in federated learning systems, as different datasets may exhibit varying levels of resilience to multiple adversaries (see Tab. 10).

Appendix D NBA with eight adversaries in multiple-shot setting under different defenses

Refer to caption
Figure 10: NBA with eight adversaries in multiple-shot setting without defense
Refer to caption
Figure 11: NBA with eight adversaries in multiple-shot setting under Norm Clipping defense
Refer to caption
Figure 12: NBA with eight adversaries in multiple-shot setting under Differential Privacy defense

In the scenario of multiple-shot attacks involving eight adversaries, the effects on main and backdoor accuracies across various datasets display discernible patterns. Without defense mechanisms (see Figure 10), the model’s accuracies vary across datasets, with Fashion-MNIST and MNIST exhibiting relatively higher main accuracies compared to CIFAR-10 and Tiny-ImageNet. However, all datasets show significant backdoor accuracies, indicating vulnerability to backdoor attacks. When employing the Norm Clipping defense method (see Figure 11), there are slight improvements in main accuracies across all datasets, with Fashion-MNIST and MNIST maintaining similar backdoor accuracies while CIFAR-10 and Tiny-ImageNet experience slight decreases in backdoor accuracies. The Norm Clipping defense appears to be more effective in mitigating backdoor attacks in CIFAR-10 and Tiny-ImageNet compared to Fashion-MNIST and MNIST. On the other hand, employing the Differential Privacy defense method (see Figure 12) results in noticeable drops in both main and backdoor accuracies across all datasets. This indicates that while Differential Privacy may offer some protection against backdoor attacks, it also significantly impacts the model’s overall performance, particularly in CIFAR-10 and Tiny-ImageNet, where main accuracies decrease considerably. Therefore, the choice of defense mechanism should consider the trade-off between backdoor protection and maintaining overall model performance.

Appendix E Discussion

Our work delves into multiple aspects of the NBA scenario within FL systems, particularly emphasizing trigger design, dataset characteristics, and model updates. The effectiveness of backdoor triggers can vary widely depending on the dataset; some datasets may render the trigger ineffective, while others may facilitate highly effective backdoor attacks. Additionally, the scaling factor γ𝛾\gammaitalic_γ plays a critical role in the success rate of these attacks. Increasing the scale of model updates can significantly enhance backdoor accuracy, yet this comes at the cost of potentially degrading the main task accuracy, leading to suboptimal overall performance.

In the context of multiple-shot attacks, adversaries must engage in a substantial number of rounds to achieve high backdoor accuracy, which may be impractical in real-world scenarios. However, in cross-silo FL scenarios, each participant can introduce unique backdoor triggers as a form of watermarking to protect their intellectual property. Semi-multiple-shot attacks offer a more efficient alternative by reducing the number of rounds required to attain high backdoor accuracy, though they require careful selection of the scaling factor γ𝛾\gammaitalic_γ to ensure effectiveness and may not consistently maintain high accuracy throughout the attack.

Appendix F Societal impacts

Our research highlights the potential of NBA to compromise the integrity of FL systems. We believe this work serves as a crucial stepping stone towards a more secure future for FL. Through our work, we highlight the efficacy of watermarking-based triggers, presenting avenues for enhancing secure communication and detecting tampering within FL frameworks. Additionally, we explore the design of incentive structures, aiming to cultivate cooperative engagement while mitigating malicious activities. These endeavors are pivotal in establishing trust and fostering a secure ecosystem within FL platforms. By advancing research in these areas, we contribute substantially to the establishment of robust and trustworthy FL systems. Such efforts are crucial for safeguarding data privacy and integrity, thereby unleashing the full potential of FL to benefit society at large.