\WarningFilter

revtex4-2Repair the float package

Exotic and physics-informed support vector machines for high energy physics

A. Ramirez-Morales andres.ramirez@fisica.uaz.edu.mx Facultad de Física, Universidad Autónoma de Zacatecas, Apartado Postal C-580, 98060 Zacatecas, México    A. Gutiérrez-Rodríguez alexgu@fisica.uaz.edu.mx Facultad de Física, Universidad Autónoma de Zacatecas, Apartado Postal C-580, 98060 Zacatecas, México    T. Cisneros-Pérez tzihue@gmail.com Unidad Académica de Ciencias Químicas, Universidad Autónoma de Zacatecas,Apartado Postal C-585, 98060 Zacatecas, México.    H. Garcia-Tecocoatzi hugo.garcia.tecocoatzi@ge.infn.it INFN, Sezione di Genova, Via Dodecaneso 33, 16146 Genova, Italy    A. Dávila-Rivera alejandra.davila@fisica.uaz.edu.mx Facultad de Física, Universidad Autónoma de Zacatecas, Apartado Postal C-580, 98060 Zacatecas, México
Abstract

In this article, we explore machine learning techniques using support vector machines with two novel approaches: exotic and physics-informed support vector machines. Exotic support vector machines employ unconventional techniques such as genetic algorithms and boosting. Physics-informed support vector machines integrate the physics dynamics of a given high-energy physics process in a straightforward manner. The goal is to efficiently distinguish signal and background events in high-energy physics collision data. To test our algorithms, we perform computational experiments with simulated Drell-Yan events in proton-proton collisions. Our results highlight the superiority of the physics-informed support vector machines, emphasizing their potential in high-energy physics and promoting the inclusion of physics information in machine learning algorithms for future research.

I INTRODUCTION

Machine learning techniques have proven to be extremely powerful when applied to high energy physics phenomena, both in theory and experimental studies [1, 2, 3]. Several algorithms have been applied to distinguish signals coming from high energy collider data [4, 5]. For instance, the discovery of the Higgs boson was aided with the help of the so-called boosted decision trees algorithm [6]. Some other popular machine learning algorithms which have been successful in high energy physics are: neural networks [7, 8, 9, 10], linear regressions  [8, 11, 12] and deep learning [13, 14, 15, 16].

To continue exploiting the potential of machine learning techniques, the idea that physics insights can help design a better machine learning algorithm has recently been used across several fields, yielding excellent results. This field is known as physics-informed machine learning [17]. The majority of physics-informed machine learning studies are through the use of advanced neural network architectures. Moreover, support vector machines (SVM) which are based on kernel methods, have also benefited from these physics insights. The physics information in the SVMs is introduced via their kernels. The latter improves the SVMs performance [18].

In the realm of high energy physics, physics-informed neural networks and deep learning techniques have been proposed to tackle the most challenging tasks in data analysis coming from high energy physics experiments ranging from searches of new physics phenomena to jet tagging [19, 20, 21, 22]. SVMs have been also helpful and interesting for the high energy physics community [23, 24, 25, 26, 1, 27]. However, there is no reports of physics-informed support vector machines applied to high energy physics phenomena. Hence, this invites the exploration of SVMs in the context of physics-informed machine learning.

This paper is focused on the application and interpretation of the SVMs in experimental high energy physics. The use of support vector machines is motivated by their relatively simple geometric interpretation, especially for binary discrimination of signal events against background events. First, we study what we call exotic support vector machines. These SVMs are exotic in the sense that we utilize unconventional techniques to build them. That is, we use genetic and boosting algorithms to construct more efficient classifiers. Moreover, we use somewhat unconventional kernels. The construction of the exotic SVMs is guided by our previous studies [28]. Second, we study physics-informed support vector machines. To include high energy physics information in our SVMs, we propose kernels that define the SVM and aim to capture the dynamical properties of the underlying theory that intends to describe the observed/expected data in high energy experiments.

We perform a case of study: The Drell-Yan Z𝑍Zitalic_Z boson production in proton-proton high energy collisions. In our studies, we simulate data for the process qq¯Zl+l𝑞¯𝑞𝑍superscript𝑙superscript𝑙q\bar{q}\rightarrow Z\rightarrow l^{+}l^{-}italic_q over¯ start_ARG italic_q end_ARG → italic_Z → italic_l start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_l start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT, where q𝑞qitalic_q and q¯¯𝑞\bar{q}over¯ start_ARG italic_q end_ARG are the quarks coming from the colliding protons and l+lsuperscript𝑙superscript𝑙l^{+}l^{-}italic_l start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_l start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT are the final state oppositely charged leptons. Using the kinematic variables for these final state leptons we construct the kernels that define the SVM in every case. We then make formal statistical tests to compare the performances of each SVM. The latter will help us to conclude the usefulness of introducing the dynamics into a support vector machine algorithm.

In Sect. II we summarize the formalism of support vector machines, the basic kernel theory, and the definition of the considered kernels. Furthermore, we describe the genetic algorithms and boosting techniques used, and the approach of how to introduce the physics dynamics of a given process in high energy physics to a support vector machine algorithm. In Sec. III we present the computational experiments to train and test our proposed support vector machines. In Sect. IV we present our results and discussion. Finally, in Sect. V we present our conclusions.

II Methodology

We propose that if the theory underlying the dynamics of a physics process to be studied in high-energy experiments are considered or included during the construction of a kernel that defines the support vector machine, then the discrimination capabilities of the support vector machine binary classifier will be significantly enhanced. Then we compare the physics-informed SVMs with state-of-the-art SVMs. The following sections describe the ingredients of this proposal.

II.1 Support vector machines

In a binary SVM classifier, an optimal hyper-plane, separating two classes in the feature space, is found  [29]. Binary classification is important in experimental high energy physics, as it helps discriminate between signals of interest against background. During optimization, the SVM model selects a subset of support vectors (SVs) from the training samples, 𝐱𝐱\mathbf{x}bold_x, to establish the decision surface’s location. To simplify the search for SVs, the training samples are mapped into a high-dimensional space using kernel functions, κ(𝐱,𝐳)𝜅𝐱𝐳\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ), which are expressed as inner products of the training samples or their mappings. In this feature space, a specific kernel produces a hyperplane that assigns a prediction 𝐲𝐲\mathbf{y}bold_y to each element of 𝐱𝐱\mathbf{x}bold_x based on which side of the hyperplane 𝐱𝐱\mathbf{x}bold_x lies. The kernel functions solve the optimization problem without explicitly using the actual mappings, a technique known as the kernel trick. Since data may not be perfectly separable and some points may lie within the margin or be misclassified, SVM implementations allow for a certain degree of misclassification by introducing an adjustable penalty cost C𝐶Citalic_C [29, 28]. A SVM classifier is defined by its kernel and the parameters that describe the kernel. Kernel theory in machine learning allows the construction of a broad diversity of kernels employing elemental kernel properties. Let κ1subscript𝜅1\kappa_{1}italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT be kernels over 𝐱𝐳,tensor-product𝐱𝐳\mathbf{x}\otimes\mathbf{z},bold_x ⊗ bold_z , where 𝐱,𝐳n,𝐱𝐳superscript𝑛\mathbf{x,z}\subseteq\mathbb{R}^{n},bold_x , bold_z ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , a+𝑎superscripta\in\mathbb{R}^{+}italic_a ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, and κ3subscript𝜅3\kappa_{3}italic_κ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is a kernel over nntensor-productsuperscript𝑛superscript𝑛\mathbb{R}^{n}\otimes\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ⊗ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then the following functions are kernels as well [30]:

κ(𝐱,𝐳)𝜅𝐱𝐳\displaystyle\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ) =\displaystyle== κ1(𝐱,𝐳)+κ2(𝐱,𝐳)subscript𝜅1𝐱𝐳subscript𝜅2𝐱𝐳\displaystyle\kappa_{1}(\mathbf{x},\mathbf{z})+\kappa_{2}(\mathbf{x},\mathbf{z})italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x , bold_z ) + italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x , bold_z ) (1)
κ(𝐱,𝐳)𝜅𝐱𝐳\displaystyle\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ) =\displaystyle== aκ1(𝐱,𝐳)𝑎subscript𝜅1𝐱𝐳\displaystyle a\kappa_{1}(\mathbf{x},\mathbf{z})italic_a italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x , bold_z )
κ(𝐱,𝐳)𝜅𝐱𝐳\displaystyle\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ) =\displaystyle== κ1(𝐱,𝐳)κ2(𝐱,𝐳)subscript𝜅1𝐱𝐳subscript𝜅2𝐱𝐳\displaystyle\kappa_{1}(\mathbf{x},\mathbf{z})\kappa_{2}(\mathbf{x},\mathbf{z})italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x , bold_z ) italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x , bold_z )
κ(𝐱,𝐳)𝜅𝐱𝐳\displaystyle\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ) =\displaystyle== f(𝐱)f(𝐳)𝑓𝐱𝑓𝐳\displaystyle f(\mathbf{x})f(\mathbf{z})italic_f ( bold_x ) italic_f ( bold_z )
κ(𝐱,𝐳)𝜅𝐱𝐳\displaystyle\kappa(\mathbf{x},\mathbf{z})italic_κ ( bold_x , bold_z ) =\displaystyle== κ3(ϕ(𝐱),ϕ(𝐳))subscript𝜅3italic-ϕ𝐱italic-ϕ𝐳\displaystyle\kappa_{3}(\phi(\mathbf{x}),\phi(\mathbf{z}))italic_κ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_ϕ ( bold_x ) , italic_ϕ ( bold_z ) )

where, f,ϕ:𝐱n:𝑓italic-ϕ𝐱superscript𝑛f,\phi:\mathbf{x}\rightarrow\mathbb{R}^{n}italic_f , italic_ϕ : bold_x → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

II.2 Basic kernels

In this context, a kernel is a Hermitian and positive semidefinite Gram matrix G𝐺Gitalic_G defined as G=[vj,vi]i,j=1n𝐺superscriptsubscriptdelimited-[]subscript𝑣𝑗subscript𝑣𝑖𝑖𝑗1𝑛G=[\langle v_{j},v_{i}\rangle]_{i,j=1}^{n}italic_G = [ ⟨ italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ ] start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where the vectors v1,,vnsubscript𝑣1subscript𝑣𝑛v_{1},...,v_{n}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT live in a vector space that contains an inner product ,\langle\cdot,\cdot\rangle⟨ ⋅ , ⋅ ⟩ [31]. To make the notation more compact, we write G=κ(x,z)=x,z𝐺𝜅xzxzG=\kappa(\textbf{x},\textbf{z})=\langle\textbf{x},\textbf{z}\rangleitalic_G = italic_κ ( x , z ) = ⟨ x , z ⟩, with 𝐱,𝐳n𝐱𝐳superscript𝑛\mathbf{x,z}\subseteq\mathbb{R}^{n}bold_x , bold_z ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This paper considers the kernels:

  • Linear kernel

    κ(x,z)=x,z,𝜅xzxz\kappa(\textbf{x},\textbf{z})=\langle\textbf{x},\textbf{z}\rangle,italic_κ ( x , z ) = ⟨ x , z ⟩ , (2)

    with no hyper-parameters.

  • Radial Basis Function (RBF) kernel

    κ(𝐱,𝐳)=exp(γxz2),𝜅𝐱𝐳𝛾superscriptnormxz2\kappa(\mathbf{x},\mathbf{z})=\exp(-\gamma||\textbf{x}-\textbf{z}||^{2}),italic_κ ( bold_x , bold_z ) = roman_exp ( - italic_γ | | x - z | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (3)

    with hyper-parameter γ𝛾\gammaitalic_γ.

  • Sigmoid kernel

    κ(x,z)=tanh(γx,z+r),𝜅xz𝛾xz𝑟\kappa(\textbf{x},\textbf{z})=\tanh(\gamma\langle\textbf{x},\textbf{z}\rangle+% r),italic_κ ( x , z ) = roman_tanh ( italic_γ ⟨ x , z ⟩ + italic_r ) , (4)

    with hyper-parameters γ𝛾\gammaitalic_γ and r𝑟ritalic_r.

  • Polynomial kernel

    κ(x,z)=(γx,z+r)d,𝜅xzsuperscript𝛾xz𝑟𝑑\kappa(\textbf{x},\textbf{z})=(\gamma\langle\textbf{x},\textbf{z}\rangle+r)^{d},italic_κ ( x , z ) = ( italic_γ ⟨ x , z ⟩ + italic_r ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , (5)

    with hyper-parameters γ𝛾\gammaitalic_γ, r𝑟ritalic_r and d𝑑ditalic_d.

For the sigmoid kernel r=1𝑟1r=-1italic_r = - 1. For the polynomial kernel r=+1𝑟1r=+1italic_r = + 1 and d=2𝑑2d=2italic_d = 2. Finally, we set a high γ𝛾\gammaitalic_γ value, γ=100𝛾100\gamma=100italic_γ = 100, to provide a non-negligible impact of each training vector. The chosen values of the hyper-parameters γ𝛾\gammaitalic_γ, r𝑟ritalic_r, and d𝑑ditalic_d enforce a good behavior when fitting a SVM [33, 32].

The kernels in Eqs. (2)-(5) in addition to the properties in Eq. (1) allow to define composed kernels with almost an arbitrary shape and hence help include the properties of a given physics process.

II.3 Exotic support vector machines

To construct exotic support vector machines we use and combine three elements:

  • Unconventional kernels. We use the kernels of Eqs. (2)-(5) arbitrarily joined according to Eq. (1). These kernels inherently do not carry any physical information beforehand.

  • Ensembles of classifiers. An ensemble of classifiers is a collection of single weak classifiers that when combined together, provide a strong classifier [34, 35]. In this work, we use the AdaBoost algorithm [36] to construct ensembles. This adaptive method updates the vector111In this context, a vector is a point of the data sample. weights based on the training error of a given binary classifier. These weights are used to train the next classifier to be added to the ensemble. Correctly classified vectors are assigned lower weights, whilst misclassified vectors are given higher weights. Thus, vectors that are harder to classify receive more focus from the algorithm. The AdaBoost algorithm is repeated T𝑇Titalic_T times, t=1,,T𝑡1𝑇t=1,...,Titalic_t = 1 , … , italic_T. First, for the data true label yisubscript𝑦𝑖y_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the base classifier prediction ht(xi)subscript𝑡subscriptx𝑖h_{t}(\textbf{x}_{i})italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), the training error ϵtsubscriptitalic-ϵ𝑡\epsilon_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is calculated

    ϵt=i=1nwit;yiht(xi),formulae-sequencesubscriptitalic-ϵ𝑡superscriptsubscript𝑖1𝑛superscriptsubscript𝑤𝑖𝑡subscript𝑦𝑖subscript𝑡subscriptx𝑖\epsilon_{t}=\sum_{i=1}^{n}w_{i}^{t};\qquad y_{i}\neq h_{t}(\textbf{x}_{i}),italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (6)

    where witsuperscriptsubscript𝑤𝑖𝑡w_{i}^{t}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT are the weights of each vector xisubscriptx𝑖\textbf{x}_{i}x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT utilized to train the classifier. Then, the score αtsubscript𝛼𝑡\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is defined as

    αt=12ln1ϵtϵt.subscript𝛼𝑡121subscriptitalic-ϵ𝑡subscriptitalic-ϵ𝑡\alpha_{t}=\frac{1}{2}\ln\frac{1-\epsilon_{t}}{\epsilon_{t}}\;.italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_ln divide start_ARG 1 - italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG . (7)

    The weights are updated for the next iteration with

    wit+1=wite[αtyiht(xi)]×At,superscriptsubscript𝑤𝑖𝑡1superscriptsubscript𝑤𝑖𝑡superscript𝑒delimited-[]subscript𝛼𝑡subscript𝑦𝑖subscript𝑡subscriptx𝑖subscript𝐴𝑡w_{i}^{t+1}=w_{i}^{t}e^{[-\alpha_{t}y_{i}h_{t}(\textbf{x}_{i})]}\times A_{t},italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT [ - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ] end_POSTSUPERSCRIPT × italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (8)

    where Atsubscript𝐴𝑡A_{t}italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a normalization factor. The weights in Eq. (8) are applied to train and add a new classifier to the ensemble. When T𝑇Titalic_T iterations are completed, the predicted label of the total ensemble is the weighted sum of the predictions of the individual classifiers within the ensemble

    H(x)=t=1Tαtht(x).𝐻xsuperscriptsubscript𝑡1𝑇subscript𝛼𝑡subscript𝑡x\displaystyle H(\textbf{x})=\sum_{t=1}^{T}\alpha_{t}h_{t}(\textbf{x}).italic_H ( x ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( x ) . (9)
  • Genetic algorithms. The genetic algorithms are optimization techniques inspired by the principles of biological evolution. Selections are performed using simple operators based on genetic recombinations and mutations. In this work, we use genetic algorithms to select a small subset of the training data, which will likely contain the support vectors needed to solve the binary classification problem [37]. To determine if a subgroup of vectors is indeed likely to contain the support vectors, a fitness function is calculated to check if this subgroup is good at classifying data outside this subgroup. This is repeated for several subgroups of vectors and a selection of subgroups is performed using the high-low method [38]. The selected subgroups are recombined and the previous steps are repeated until a given stop criterion is satisfied. For more details, see Ref. [28].

II.4 The Drell-Yan process

Based on the parton model and the quark-antiquark annihilation mechanism, Sidney D. Drell and Tung-Mow Yan [39] predicted the production of two oppositely charged leptons in hadron-hadron collisions. The neutral dilepton pair was predicted to appear with a large invariant mass. This production is the well-known neutral current Drell-Yan process. For proton-proton collisions, the partons participating in the Drell-Yan production are quark and antiquark that constitute the protons. The tree-level or leading-order partonic cross-section of the qq¯Z𝑞¯𝑞𝑍q\bar{q}\rightarrow Zitalic_q over¯ start_ARG italic_q end_ARG → italic_Z process is found to be [40]

σ^qq¯Zsuperscript^𝜎𝑞¯𝑞𝑍\displaystyle\hat{\sigma}^{q\bar{q}\rightarrow Z}over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT italic_q over¯ start_ARG italic_q end_ARG → italic_Z end_POSTSUPERSCRIPT =\displaystyle== π32GFMZ2(vq2+aq2)δ(s^MZ2),𝜋32subscript𝐺𝐹superscriptsubscript𝑀𝑍2superscriptsubscript𝑣𝑞2superscriptsubscript𝑎𝑞2𝛿^𝑠superscriptsubscript𝑀𝑍2\displaystyle\frac{\pi}{3}\sqrt{2}G_{F}M_{Z}^{2}(v_{q}^{2}+a_{q}^{2})\delta(% \hat{s}-M_{Z}^{2}),divide start_ARG italic_π end_ARG start_ARG 3 end_ARG square-root start_ARG 2 end_ARG italic_G start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_a start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_δ ( over^ start_ARG italic_s end_ARG - italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (10)

where GFsubscript𝐺𝐹G_{F}italic_G start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the Fermi weak coupling constant, MZsubscript𝑀𝑍M_{Z}italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT the invariant mass of the Z𝑍Zitalic_Z boson, vq(aq)subscript𝑣𝑞subscript𝑎𝑞v_{q}(a_{q})italic_v start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ) is the vector (axial vector) coupling of the Z𝑍Zitalic_Z to the quarks, and s^^𝑠\hat{s}over^ start_ARG italic_s end_ARG is the square of the center-of-mass energy of the quark-antiquark.

A quark with charge Qksubscript𝑄𝑘Q_{k}italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT inside a proton is described by a parton distribution function qksubscript𝑞𝑘q_{k}italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Considering all the proton parton distribution functions and with the aid of the QCD factorization theorem, it is found that the hadronic (proton-proton) cross-section for the Drell-Yan process is

dσppZdMZ2𝑑superscript𝜎𝑝𝑝𝑍𝑑superscriptsubscript𝑀𝑍2\displaystyle\frac{d\sigma^{pp\rightarrow Z}}{dM_{Z}^{2}}divide start_ARG italic_d italic_σ start_POSTSUPERSCRIPT italic_p italic_p → italic_Z end_POSTSUPERSCRIPT end_ARG start_ARG italic_d italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG =\displaystyle== σ^qq¯ZNc01𝑑x1𝑑x2δ(x1x2sMZ2)superscript^𝜎𝑞¯𝑞𝑍subscript𝑁𝑐superscriptsubscript01differential-dsubscript𝑥1differential-dsubscript𝑥2𝛿subscript𝑥1subscript𝑥2𝑠superscriptsubscript𝑀𝑍2\displaystyle\frac{\hat{\sigma}^{q\bar{q}\rightarrow Z}}{N_{c}}\int_{0}^{1}{dx% _{1}}{dx_{2}}\delta(x_{1}x_{2}s-M_{Z}^{2})divide start_ARG over^ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT italic_q over¯ start_ARG italic_q end_ARG → italic_Z end_POSTSUPERSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_d italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_δ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s - italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (11)
×[kQk2(qk(x1,MZ2)q¯k(x2,MZ2)\displaystyle\times\quad\Big{[}\sum_{k}\;Q_{k}^{2}\;\big{(}q_{k}(x_{1},M_{Z}^{% 2})\bar{q}_{k}(x_{2},M_{Z}^{2})× [ ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) over¯ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
+[12])],\displaystyle+\big{[}1\leftrightarrow 2\big{]}\big{)}\Big{]},+ [ 1 ↔ 2 ] ) ] ,

where 1/Nc=1/31subscript𝑁𝑐131/N_{c}=1/31 / italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 1 / 3 is the color factor. x1,2subscript𝑥12x_{1,2}italic_x start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT are defined in terms of the four-momentum of each parton

p1μsuperscriptsubscript𝑝1𝜇\displaystyle p_{1}^{\mu}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT =\displaystyle== s2(x1,0,0,x1),𝑠2subscript𝑥100subscript𝑥1\displaystyle\dfrac{\sqrt{s}}{2}(x_{1},0,0,x_{1}),divide start_ARG square-root start_ARG italic_s end_ARG end_ARG start_ARG 2 end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 0 , 0 , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , (12)
p2μsuperscriptsubscript𝑝2𝜇\displaystyle p_{2}^{\mu}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT =\displaystyle== s2(x2,0,0,x2).𝑠2subscript𝑥200subscript𝑥2\displaystyle\dfrac{\sqrt{s}}{2}(x_{2},0,0,x_{2}).divide start_ARG square-root start_ARG italic_s end_ARG end_ARG start_ARG 2 end_ARG ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , 0 , 0 , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . (13)

From Eqs. (12)-(13) it is found that s^=x1x2s^𝑠subscript𝑥1subscript𝑥2𝑠\hat{s}=x_{1}x_{2}sover^ start_ARG italic_s end_ARG = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s, where s𝑠sitalic_s is the proton-proton center-of-mass energy. For the produced lepton pair, the rapidity is given by y=1/2ln(x1/x2)𝑦12subscript𝑥1subscript𝑥2y=\textstyle{1/2}\ln(x_{1}/x_{2})italic_y = 1 / 2 roman_ln ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), and hence

x1=MZsexp(y),x2=MZsexp(y).formulae-sequencesubscript𝑥1subscript𝑀𝑍𝑠𝑦subscript𝑥2subscript𝑀𝑍𝑠𝑦x_{1}=\frac{M_{Z}}{\sqrt{s}}\;\exp(y)\ ,\qquad x_{2}=\frac{M_{Z}}{\sqrt{s}}\;% \exp(-y).italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_s end_ARG end_ARG roman_exp ( italic_y ) , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_s end_ARG end_ARG roman_exp ( - italic_y ) . (14)

The cross-section of Eq. (11) is multiplied by the branching ratio for any particular hadronic or leptonic final state of interest, which for this paper is the dielectron final state, namely, qq¯Ze+e𝑞¯𝑞𝑍superscript𝑒superscript𝑒q\bar{q}\rightarrow Z\rightarrow e^{+}e^{-}italic_q over¯ start_ARG italic_q end_ARG → italic_Z → italic_e start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT.

The proton-proton cross section in Eq. (11) is a function of the kinematics of the outgoing leptons. The kernel for our support vector machines is therefore constructed in accordance with Eqs. (10)-(14) in the following way: First, we identify the matrix of the proton-proton collision data as the kernel. Then, we perform operations on this kernel according to the relevant kinematic variables of the final state leptons in the cross-section. With this information, the kernel is expected to discriminate Drell-Yan events against backgrounds. Taking into account the kernel properties in Eq. (1), we propose a physics-informed kernel

κ(𝐱,𝐳)=γ(x,z2+x,z+x,zexp(x,z)).𝜅𝐱𝐳𝛾superscriptxz2xzxzxz\kappa(\mathbf{x},\mathbf{z})=\gamma(\langle\textbf{x},\textbf{z}\rangle^{2}+% \langle\textbf{x},\textbf{z}\rangle+\langle\textbf{x},\textbf{z}\rangle\cdot% \exp(\langle\textbf{x},\textbf{z}\rangle)).italic_κ ( bold_x , bold_z ) = italic_γ ( ⟨ x , z ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⟨ x , z ⟩ + ⟨ x , z ⟩ ⋅ roman_exp ( ⟨ x , z ⟩ ) ) . (15)

The terms in Eq. (15) are intended to capture the physics in Eqs. (10)-(14) as

x,z2MZ2,similar-tosuperscriptxz2superscriptsubscript𝑀𝑍2\langle\textbf{x},\textbf{z}\rangle^{2}\sim M_{Z}^{2},⟨ x , z ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (16)
x,zMZ,similar-toxzsubscript𝑀𝑍\langle\textbf{x},\textbf{z}\rangle\sim M_{Z},⟨ x , z ⟩ ∼ italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT , (17)
x,zexp(x,z)MZsexp(±y).similar-toxzxzsubscript𝑀𝑍𝑠plus-or-minus𝑦\langle\textbf{x},\textbf{z}\rangle\cdot\exp(\langle\textbf{x},\textbf{z}% \rangle)\sim\frac{M_{Z}}{\sqrt{s}}\cdot\exp(\pm y).⟨ x , z ⟩ ⋅ roman_exp ( ⟨ x , z ⟩ ) ∼ divide start_ARG italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_s end_ARG end_ARG ⋅ roman_exp ( ± italic_y ) . (18)

In Eqs. (16)-(18), when the Z𝑍Zitalic_Z boson decays to an electron-positron pair, MZsubscript𝑀𝑍M_{Z}italic_M start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is calculated from the kinematics of this electron-positron pair.

III Experiments

To test our proposed methodology, we perform computational experiments on a well-known Standard Model process. Namely, the production of a Z𝑍Zitalic_Z boson decaying to an electron-positron pair (Drell-Yan production). Finally, we train, test, and compare several support vector machine binary classifiers to characterize their discrimination power between the Drell-Yan process against backgrounds.

III.1 Data simulation

In this work, we consider Drell-Yan simulated signal and backgrounds. The simulated data is at the generator level, that is, no detector effects are taken into account. The simulation is carried out utilizing PYTHIA8.3 [41]. The event generation is performed utilizing the PYTHIA configuration for the production of weak single and double bosons for proton-proton collisions at center-of-mass energy s=14𝑠14s=14italic_s = 14 TeV. For the signal events, we require that the event contains particles with the PDGid [42] corresponding to the Z𝑍Zitalic_Z boson. Then, we require that this particle’s invariant mass is within the Z𝑍Zitalic_Z boson mass (91.1876 GeV) with a width of 40 GeV. Also, we require in the final state, two oppositely charged leptons whose mother particle is the selected Z𝑍Zitalic_Z. The kinematics of these charged leptons are the variables that are used to construct the kernels of the support vector machines. In this study, we consider the backgrounds which are most important for the Drell-Yan production Z𝑍Zitalic_Z reported by the ATLAS and CMS experiments at the Large Hadron Collider [43, 44]. The considered backgrounds are the diboson (WW𝑊𝑊WWitalic_W italic_W, ZW𝑍𝑊ZWitalic_Z italic_W, ZZ𝑍𝑍ZZitalic_Z italic_Z), tt¯𝑡¯𝑡t\bar{t}italic_t over¯ start_ARG italic_t end_ARG, and single top productions. These backgrounds are expected, as their final states may mimic the single Z𝑍Zitalic_Z boson production final state charged leptons. The event selection for the backgrounds is similar to the single Z𝑍Zitalic_Z boson. In this work, we do not consider backgrounds coming from multijet, as they are expected to be negligible (<0.1%absentpercent0.1<0.1\%< 0.1 %[43, 44]. Since the events are simulated with no detector effects, the samples contain a high purity of events and there is no need to consider variables which are used to handle mismodelling, particle identification, lepton isolation, or acceptance effects. Figure 1 shows the invariant mass of the Z𝑍Zitalic_Z boson calculated with the kinematics of the final state electron-positron pair. Furthermore, we consider the electron-positron kinematic quantities: energy, momentum, transverse momentum, rapidity and azymuthal angle. These quantities are utilized to build the kernels for SVMs.

Refer to caption
Figure 1: Invariant mass of the Z𝑍Zitalic_Z boson calculated from the kinematics of the final state electron-positron pair coming from the simulated Drell-Yan events. The simulation was carried out with the PYTHIA8.3 event generator [41].

III.2 Data splitting

In high energy physics, the challenge of class imbalance in the data sample usually appears. Hence, in this study, we consider different levels of imbalance among the signal and background events. Conventionally, in the binary classification task for high energy physics, a positive value is assigned to label a signal event, and a negative value is assigned to label a background event, being these values ±1plus-or-minus1\pm 1± 1. We consider the cases when the data sample is fully balanced and the cases when there is an imbalance of the ratio signal: background as 1:3, 1:10, 3:1, and 10:1. This is summarized in Table 1.

Table 1: Drell-Yan data sets for the experiments.
Sample +11+1+ 1 Class 11-1- 1 Class Imbalance
half_half 5000 5000 1:1
1quart_3quart 2500 7500 1:3
3quart_1quart 7500 2500 3:1
1dec_10dec 1000 10000 1:10
10dec_1dec 10000 1000 10:1

III.3 Support vector machine models

The support vector machines we study in this paper are summarized in Table 2. The models listed in this table are based on the definitions in Sections II.1-II.4. The phys-DY model employs a kernel that incorporates the Drell-Yan dynamics, as detailed in Eqs. (10)-(14) and summarized in Eq. (15). Models with lin, rbf, pol, or sig in their names utilize the kernels specified in Eq. (2), Eq. (3), Eq. (5), and Eq. (4), respectively. Models featuring adaboost are ensembles constructed using the AdaBoost algorithm described in Sec. II.3, following Eqs. (6)-(9). Models marked with gen use genetic selection as discussed in Sec. II.3. Finally, single and sum indicate that the kernel consists of a single element or the sum of two kernels, respectively. In addition to the physics-informed support vector machine, the classifiers listed in this table are chosen for their outstanding performance in preliminary tests in agreement with our previous study in Ref. [28].

Table 2: SVM models considered in this paper. The first column gives the name of the model, and the second provides a brief description of the elements considered to construct it.
Name Description
phys-DY Single with physics-informed kernel
adaboost-gen-rbf AdaBoost ensemble with genetic
selection and RBF kernel
adaboost-gen-pol AdaBoost ensemble with genetic
selection and polynomial kernel
adaboost-gen-sig AdaBoost ensemble with genetic
selection and sigmoid kernel
single-rbf Single RBF kernel
single-lin Single linear kernel
single-pol Single polynomial kernel
single-sig Single sigmoid kernel
single-sum-rbf-lin Sum of RBF and linear kernels
single-sum-rbf-pol Sum of RBF and polynomial kernels
adaboost-rbf AdaBoost ensemble with RBF kernel
adaboost-pol AdaBoost ensemble with polynomial kernel
adaboost-lin AdaBoost ensemble with linear kernel
adaboost-sig AdaBoost ensemble with sigmoid kernel
Refer to caption
Figure 2: Performance metrics of the SVMs. The vertical axes correspond the average values of the ACC, PREC+, PREC- and AUC as defined in Eqs. (19)-(21). The error bars are not displayed. The horizontal axis indicates which SVM is being considered as listed in Table 2. A solid line corresponds to a sample as listed in Table 1.
Model/Sample μAUC(σ)subscript𝜇𝐴𝑈𝐶𝜎\mu_{AUC}(\sigma)italic_μ start_POSTSUBSCRIPT italic_A italic_U italic_C end_POSTSUBSCRIPT ( italic_σ ) p𝑝pitalic_p-val. R.H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT μPREC+(σ)subscript𝜇𝑃𝑅𝐸superscript𝐶𝜎\mu_{PREC^{+}}(\sigma)italic_μ start_POSTSUBSCRIPT italic_P italic_R italic_E italic_C start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_σ ) p𝑝pitalic_p-val. R.H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT μPREC(σ)subscript𝜇𝑃𝑅𝐸superscript𝐶𝜎\mu_{PREC^{-}}(\sigma)italic_μ start_POSTSUBSCRIPT italic_P italic_R italic_E italic_C start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_σ ) p𝑝pitalic_p-val. R.H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT μACC(σ)subscript𝜇𝐴𝐶𝐶𝜎\mu_{ACC}(\sigma)italic_μ start_POSTSUBSCRIPT italic_A italic_C italic_C end_POSTSUBSCRIPT ( italic_σ ) p𝑝pitalic_p-val. R.H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
half_half
phys-DY 0.98 (0.01) 0.86 (0.02) 0.97 (0.01) 0.91 (0.01)
adaboost-gen-rbf 0.67 (0.01) 0.0 ✓✓ 0.73 (0.04) 0.0 ✓✓ 0.64 (0.02) 0.0 ✓✓ 0.67 (0.01) 0.0 ✓✓
single-rbf 0.84 (0.01) 0.0 ✓✓ 0.83 (0.02) 0.0 ✓✓ 0.72 (0.03) 0.0 ✓✓ 0.76 (0.02) 0.0 ✓✓
single-sum-rbf-pol 0.86 (0.01) 0.0 ✓✓ 0.77 (0.03) 0.0 ✓✓ 0.74 (0.03) 0.0 ✓✓ 0.75 (0.02) 0.0 ✓✓
single-sum-rbf-lin 0.85 (0.01) 0.0 ✓✓ 0.74 (0.03) 0.0 ✓✓ 0.74 (0.03) 0.0 ✓✓ 0.74 (0.02) 0.0 ✓✓
1quart_3quart
phys-DY 0.96 (0.01) 0.83 (0.04) 0.91 (0.02) 0.89 (0.01)
adaboost-gen-rbf 0.55 (0.05) 0.0 ✓✓ 0.75 (0.29) 0.36224 0.88 (0.02) 0.0 ✓✓ 0.87 (0.01) 0.0 ✓✓
single-rbf 0.81 (0.02) 0.0 ✓✓ 0.72 (0.06) 0.0 ✓✓ 0.81 (0.02) 0.0 ✓✓ 0.80 (0.02) 0.0 ✓✓
single-sum-rbf-pol 0.85 (0.02) 0.0 ✓✓ 0.87 (0.05) 2e-05 0.82 (0.02) 0.0 ✓✓ 0.83 (0.01) 0.0 ✓✓
single-sum-rbf-lin 0.87 (0.02) 0.0 ✓✓ 0.86 (0.05) 0.00025 0.84 (0.02) 0.0 ✓✓ 0.84 (0.01) 0.0 ✓✓
3quart_1quart
phys-DY 0.98 (0.01) 0.92 (0.01) 0.99 (0.01) 0.94 (0.01)
adaboost-gen-rbf 0.57 (0.06) 0.0 ✓✓ 0.85 (0.03) 0.0 ✓✓ 0.29 (0.23) 0.0 ✓✓ 0.77 (0.08) 0.0 ✓✓
single-rbf 0.83 (0.02) 0.0 ✓✓ 0.77 (0.02) 0.0 ✓✓ 0.64 (0.08) 0.0 ✓✓ 0.77 (0.02) 0.0 ✓✓
single-sum-rbf-pol 0.84 (0.02) 0.0 ✓✓ 0.82 (0.02) 0.0 ✓✓ 0.85 (0.04) 0.0 ✓✓ 0.83 (0.02) 0.0 ✓✓
single-sum-rbf-lin 0.83 (0.02) 0.0 ✓✓ 0.83 (0.02) 0.0 ✓✓ 0.91 (0.04) 0.0 ✓✓ 0.84 (0.01) 0.0 ✓✓
1dec_10dec
phys-DY 0.96 (0.01) 0.85 (0.07) 0.95 (0.01) 0.94 (0.01)
adaboost-gen-rbf 0.59 (0.06) 0.0 ✓✓ 0.54 (0.33) 0.0 ✓✓ 0.96 (0.01) 0.0 0.94 (0.04) 0.59846
single-rbf 0.78 (0.03) 0.0 ✓✓ 0.58 (0.15) 0.0 ✓✓ 0.91 (0.01) 0.0 ✓✓ 0.90 (0.01) 0.0 ✓✓
single-sum-rbf-pol 0.86 (0.02) 0.0 ✓✓ 0.95 (0.06) 0.0 0.93 (0.01) 0.0 ✓✓ 0.93 (0.01) 0.0 ✓✓
single-sum-rbf-lin 0.86 (0.02) 0.0 ✓✓ 0.96 (0.06) 0.0 0.92 (0.01) 0.0 ✓✓ 0.92 (0.01) 0.0 ✓✓
10dec_1dec
phys-DY 0.96 (0.02) 0.96 (0.01) 0.99 (0.01) 0.96 (0.01)
adaboost-gen-rbf 0.55 (0.05) 0.0 ✓✓ 0.94 (0.01) 0.0 ✓✓ 0.27 (0.32) 0.0 ✓✓ 0.86 (0.09) 0.0 ✓✓
single-rbf 0.84 (0.02) 0.0 ✓✓ 0.90 (0.01) 0.0 ✓✓ 0.55 (0.26) 0.0 ✓✓ 0.90 (0.01) 0.0 ✓✓
single-sum-rbf-pol 0.61 (0.06) 0.0 ✓✓ 0.92 (0.01) 0.0 ✓✓ 1.00 (0.01) 0.17971 0.92 (0.01) 0.0 ✓✓
single-sum-rbf-lin 0.61 (0.06) 0.0 ✓✓ 0.93 (0.01) 0.0 ✓✓ 0.98 (0.04) 0.01646 ✓✓ 0.93 (0.01) 0.0 ✓✓
Table 3: The first column indicates the data sample and the model used to describe the data. The second column provides information on the AUC: first, the mean value μAUCsubscript𝜇𝐴𝑈𝐶\mu_{AUC}italic_μ start_POSTSUBSCRIPT italic_A italic_U italic_C end_POSTSUBSCRIPT is reported along with its uncertainty σ𝜎\sigmaitalic_σ in parentheses; then, the p𝑝pitalic_p-value from the Wilcoxon test is presented. This is followed by the result of rejecting the null hypothesis, H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. A double ✓✓ indicates the rejection of H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and that phys-DY model performs better than the rest of the classifiers, while a single ✓ indicates the rejection of H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and that phys-DY model performs worse than the rest of the classifiers. An ✗ indicates that H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT cannot be rejected. The third column presents the same information as the second column but for PREC+. The fourth column presents the same information as the second column but for PREC-. The fifth column presents the same information as the second column but for ACC.

III.4 Support vector machines training and testing

To evaluate the efficiency of the proposed support vector machines, we perform training and testing experiments utilizing the data described in Sec. III.1. In the training phase, a subset of the data is used to fit the model. During the testing phase, the fitted model obtains the predictions for the remaining data, where these predictions are the labels of whether a given data point is signal or background. To ensure reliable performance metrics for each support vector machine, we implement a repeated k𝑘kitalic_k-fold cross-validation. We divide the data into k𝑘kitalic_k folds, where each fold is used once as the test set while the remaining k1𝑘1k-1italic_k - 1 folds serve as the training set. This process is repeated for each of the k𝑘kitalic_k folds. That is, the entire k𝑘kitalic_k-fold cross-validation is repeated Ncvsubscript𝑁𝑐𝑣N_{cv}italic_N start_POSTSUBSCRIPT italic_c italic_v end_POSTSUBSCRIPT times, with a different random split for each repetition. Overall, this results in k×Ncv𝑘subscript𝑁𝑐𝑣k\times N_{cv}italic_k × italic_N start_POSTSUBSCRIPT italic_c italic_v end_POSTSUBSCRIPT training and testing cycles. The reported metrics are the average values of the obtained distributions, with one standard deviation as the associated errors [45, 28].

The classifier metrics are defined in terms of the error matrix elements: TP𝑇𝑃TPitalic_T italic_P is the number of true positive values, TN𝑇𝑁TNitalic_T italic_N is the number of true negative values, FP𝐹𝑃FPitalic_F italic_P is the number of false positive values, FN𝐹𝑁FNitalic_F italic_N is the number of false negative values [46]. The metrics considered in this paper are the accuracy ACC,

ACC=TP+TNTP+TN+FP+FN,ACC𝑇𝑃𝑇𝑁𝑇𝑃𝑇𝑁𝐹𝑃𝐹𝑁\text{ACC}=\frac{TP+TN}{TP+TN+FP+FN},ACC = divide start_ARG italic_T italic_P + italic_T italic_N end_ARG start_ARG italic_T italic_P + italic_T italic_N + italic_F italic_P + italic_F italic_N end_ARG , (19)

the positive precision PRC+superscriptPRC\text{PRC}^{+}PRC start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT,

PRC+=TPTP+FP,superscriptPRC𝑇𝑃𝑇𝑃𝐹𝑃\text{PRC}^{+}=\frac{TP}{TP+FP},PRC start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = divide start_ARG italic_T italic_P end_ARG start_ARG italic_T italic_P + italic_F italic_P end_ARG , (20)

the negative precision PRCsuperscriptPRC\text{PRC}^{-}PRC start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT,

PRC=TNTN+FN,superscriptPRC𝑇𝑁𝑇𝑁𝐹𝑁\text{PRC}^{-}=\frac{TN}{TN+FN},PRC start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = divide start_ARG italic_T italic_N end_ARG start_ARG italic_T italic_N + italic_F italic_N end_ARG , (21)

and the Area Under the Receiver Operating Characteristic Curve AUC. The AUC is the area under the plot of the TP𝑇𝑃TPitalic_T italic_P yields at different thresholds [47]. In SVMs, these thresholds are obtained by varying the offset of the hyperplane from the origin to produce different predictions. The values of these metrics are within the range [0,1] where 0 corresponds to the worst performance and 1 to the best performance.

III.5 Computing implementation

We work out our calculations on Python. In particular, we use the software NumPy [48] and the libsvm [49] implementation of scikit-learn [50]. The computing framework for our experiments is publicly available on GitHub [51].

IV Results

IV.1 Cross validation and statistical tests

In Fig. 2, we display our ACC, PREC+, PREC-, and AUC defined in Eqs. (19)-(21). The latter are calculated for the data samples listed in Table 1 and the support vector machines described in Table 2. Each point in these plots represents the mean value of the metric calculated following the cross-validation procedure described in Sec. III.4. Here we set k=10𝑘10k=10italic_k = 10 and Ncv=10subscript𝑁𝑐𝑣10N_{cv}=10italic_N start_POSTSUBSCRIPT italic_c italic_v end_POSTSUBSCRIPT = 10, that is, we compute 100 times the training and testing phases for each sample and support vector machine, and obtain a distribution for each metric. The displayed ACC, PREC+, PREC-, and AUC are calculated using the predicted classes from the test samples excluded during the training phase. In this plot, a line of a given color can show the behavior across all the proposed support vector machines for a specific data sample.

We carry out a comparison of the support vector machine containing the Drell-Yan kernel, against the support vector machines that showed the best four behaviors in Fig. 2 (we consider that the rest of the classifiers are evidently outperformed by our proposed physics-informed kernel). These are the single-sum-rbf-pol, single-sum-rbf-lin, single-single-rbf, and adaboost-gen-rbf support vector machines whose kernels are described in Table 2. In this work, we use a paired ranked Wilcoxon test [52]. This is equivalent to a Student’s t𝑡titalic_t-test for distributions with a non-Gaussian behavior. This test will determine if the difference between the metrics of the physics-informed support vector machine with respect to the others is statistically significant. Let H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT be the null hypothesis that states that the metrics of the classifiers are equal. The purpose is to accept or reject H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in light of the distributions of the metrics obtained in the cross-validation procedure. We reject H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at a statistical significance level of α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, meaning we conclude that the ACC, PREC+, PREC-, and AUC of two classifiers are indeed not equal if the p𝑝pitalic_p-value, coming from the Wilcoxon test, is below 0.05. Table 3 summarizes these tests, for each metric we display the mean value of its distribution along with the associated error given by the standard deviation of this distribution. Moreover, in the column named R.H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT we display the results of the Wilcoxon test: check marks, ✓or ✓✓, indicate that the test rejects the null-hypothesis, and a cross mark, ✗, indicates that H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is not rejected. Table 3 contains the results for the samples described in Table 1.

IV.2 Discussion

The first feature to note from Fig. 2, is that the values for ACC are stable across the different samples. The reason for this is that this metric takes an average of both the signal and background classification results. This metric is appropriate when describing a balanced data sample. A similar pattern is observed in the values found for the AUC. Conversely, large fluctuations arise when analyzing the signal precision, PRC+, and the background precision PREC-. From the plots in Fig. 2, the most noticeable observation is when we look at the lines corresponding to the samples with imbalance 3:1 and 10:1. This poor behavior of most of the classifiers is expected since when there are not enough samples of one kind during the training phase, the support vector machine fails to describe both classes. Note that there could be a misleading assessment regarding a given classifier, as this classifier can achieve high AUC, ACC, and PREC+, while the PREC- is near zero. Therefore, this suggests that the most important metrics are the positive and negative precisions. The latter implies that a good classifier is expected to be robust against imbalances in data samples, which is typically the case in high energy physics. Remarkably, our proposed physics-informed classifier phys-DY shows high values for all the metrics presented here. The reason for this could be that we have effectively captured the intrinsic properties of the data samples by incorporating physics information into the kernel of the support vector machine. Other classifiers also exhibit stable metrics across the samples, which can be explained by the fact that their kernels are similar to the one inspired by the Drell-Yan process.

From Table 3, we can quantitatively compare the physics-informed kernel against the best-performing kernels. The first notable feature is that, in most cases, we can reject H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This is evident as the ✓or ✓✓ appears in almost every case. Upon inspecting the metric values, when we reject H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, there are two scenarios. First, the physics-informed kernel outperforms the exotic kernel, indicated by a double check mark (✓✓). Second, the exotic kernel outperforms the physics-informed kernel, indicated by a single check mark (✓). In almost all of the metrics presented in this study, our proposed physics-informed kernel in Eq. (15) performs better than the other kernels. Specifically, when analyzing PRC+, our physics-informed kernel performs excellently for the most imbalanced data samples, 1dec_10dec. PRC+ is the metric that provides information about the performance of a classifier at finding signal events in the sample. Therefore, the PRC+ attained by the physics-informed kernel demonstrates that this kernel is useful for high energy physics data. Moreover, the physics-informed kernel presents a stable PRC- when describing all the samples, demonstrating the robustness of this kernel against imbalance in data samples. There are two other kernels that show competitive metrics, namely, the single-sum-rbf-pol and single-sum-rbf-lin kernels. These kernels are the sums of the individual kernels defined in Eqs. (3) and (5), and Eqs. (3) and (2), respectively. When comparing them with the physics-informed kernel in Eq. (15), we conclude that these kernels can both capture the dynamical properties of the Drell-Yan cross-section.

V Conclusions

In this work, we analyze several types of kernels that define a support vector machine. A physics-informed kernel is proposed to describe simulated data of a simple and well-known Standard Model process. The physics of this process is introduced to the kernel in a simple and straightforward manner, by considering the functional form of the kinematic variables found in the cross-section and then transforming the matrix that represents the data according to these functional forms. To test the effectiveness of this method, we construct unconventional kernels that a priori can overcome the typical challenges of high energy physics. We carry out statistical tests to determine if the physics-informed kernel is competitive compared to kernels constructed with sophisticated machine-learning algorithms. Remarkably, it is found that our proposed physics-informed kernel outperformed these algorithms. This finding motivates further investigation into the improvement of machine learning algorithms for more complex high energy physics data using the proposed approach. This simple method of introducing physics insights to kernel methods is proven to be effective, and since there is a connection between kernel methods and neural networks [53], the techniques we study in this paper can be extended to more modern machine learning algorithms based on kernel methods.

Acknowledgements.
This work was funded by the CONAHCYT project I1200/311/2023. T. C. P. thanks a CONAHCYT postdoctoral fellowship. A. G. R. thanks SNII (México).

References

  • [1] S. Whiteson, D. Whiteson, Eng. Appl. Artif. Intell 22, 8 (2009)
  • [2] P.T. Komiske, E.M. Metodiev, J. Thaler , JHEP 01, 121 (2019).
  • [3] K.K. Sharma, MPLA. 36, 02 (2021).
  • [4] P. Baldi, P. Sadowski, D. Whiteson, Nat. Commun. 05, 4308 (2014).
  • [5] A. Alves, JINST 12, 05 (2017)
  • [6] T. Biswas, A. Datta, JHEP 05, 104 (2023).
  • [7] P.C. Bhat, R. Gilmartin, H.B. Prosper, Phys. Rev. D 62, 074022 (2000)
  • [8] P. Baldi, K. Cranmer, T. Faucett,. Sadowski, D. Whiteson, Eur. Phys. J. C. 76, 235 (2016).
  • [9] A. Aurisano et al, JINST 11, P09001 (2016).
  • [10] F. Bishara, A. Paul, J. Dy, Sci. Rep 14, 5294 (2024).
  • [11] C.W.Murphy, Phys. Rev. D 97, 015007 (2018).
  • [12] H.B. Prosper, Phys. Rev. 37, 1153 (1988)
  • [13] P. Baldi, P. Sadowski, and D. Whiteson, Phys. Rev. Lett. 114, 111801 (2015)
  • [14] G.C. Strong, Mach. Learn.: Sci. Technol. 1, 045006 (2020).
  • [15] E. Barberio, B. Le, E. Richter-Was, Z. Was, J. Zaremba, D. Zanzi, Phys. Rev. 96, 073002 (2017).
  • [16] J.Amacker, W.Balunas, L.Beresford, D.Bortoletto, J.Frost, C.Issever, J.Liu, J. McKee, A. Micheli, S.P.Saenz, M.Spannowsky, B, Stanislaus JHEP 12, 115 (2020).
  • [17] G.E. Karniadakis, I.G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang. Nat. Rev. Phys. 03, 422-440 (2021).
  • [18] K. Mudunuru, S. Karra, Comput. Methods Appl. Mech. Eng. 374, 113560 (2021).
  • [19] V.S. Ngairangbam, M. Spannowsky, JHEP 05, 004 (2024).
  • [20] C. Li, H. Qu, S. Qian, Q. Meng, S. Gong, J. Zhang, TY. Liu, Q. Li, Phys. Rev. D 109, 056003 (2024).
  • [21] Z. Hao, R. Kansal, J. Duarte, N. Chernyavskaya, Eur. Phys. J. C 83, 485, (2023).
  • [22] O. Atkinson, A. Bhardwaj, C. Englert, P. Konar, V.S Ngairangbam, M. Spannowsky, Front. Artif. Intell. 5, 943135 (2022).
  • [23] M.Ö Sahin, D. Krücker, I.A. Melzer-Pellmann, Nucl. Instrum. Methods Phys. Res., Sect. A. 838, 137-146 (2016).
  • [24] M. Aaboud et al. (ATLAS Collaboration) Phys. Rev. D 108, 032014 (2023).
  • [25] A. Vaiciulis, Nucl. Instrum. Methods Phys. Res., Sect. A. 502, 2-3 (2003).
  • [26] F. Sforza, V. Lippi, Nucl. Instrum. Methods Phys. Res., Sect. A. 722, 11-19 (2013).
  • [27] S.. Wu, S. Sun, W. Guan, C. Zhou, J. Chan, C.L. Cheng, T. Pham, Y. Qian, A.Z. Wang, R. Zhang, M. Livny, J. Glick, P. Kl. Barkoutsos, S. Woerner, I. Tavernelli, F. Carminati, A.D. Meglio, A. C. Y. Li, J. Lykken, P. Spentzouris, S. Y. Chen, S.Yoo, T Wei, Phys. Rev. Research 3, 033221 (2021).
  • [28] A. Ramirez-Morales, J.U. Salmon-Gamboa, J Li, A.G. Sanchez-Reyna, A. Palli-Valappil, Appl. Intell 53, 4996–5012 (2023).
  • [29] C. Cortes, V. Vapnik, Mach. Learn. 20, 273–297 (1995).
  • [30] J. Shawe-Taylor, N. Cristianini, Cambridge University Press (2004).
  • [31] R.A. Horn, C.R. Johnson, Matrix Analysis. Cambridge, Cambridge University Press (2012). Horn, Roger A.; Johnson, Charles R. (2012). Matrix Analysis (2nd ed.). Cambridge University Press.
  • [32] H.T. Lin, C.J. Lin, Neural Comput. 3, 1-32 (2003).
  • [33] Y.W. Chang, C.J. Hsieh, K.W. Chang, M. Ringgaard, C.J. Lin, JMLR 11,4 (2010).
  • [34] C. Zhang, Y. Ma, Springer 144 (2012).
  • [35] O. Sagi, L. Rokach, WIRES DMKD 8, 1249 (2018).
  • [36] R.E. Schapire, Y. Singer, COLT 37 (3), 297-336 (1999).
  • [37] J.H. Holland, Adaptation in Natural and Artificial Systems, Univ. of Michigan Press 2ed (1992).
  • [38] E.E.E. Ali, E. Elamin, King Saud Univ., Coll. of Comput. and Inf. Sci. In Proceedings of the 1st NITS (2006).
  • [39] S.D. Drell, T.M. Yan, Phys. Rev. Lett. 25, 316-320 (1970).
  • [40] J. M. Campbell et al, Rep. Prog. Phys. 89, 70 (2007).
  • [41] C. Bierlich, et al, SciPost Phys. Codebases, 8, (2022).
  • [42] R. L. Workman et al. [Particle Data Group], PTEP 2022, 083C01 (2022).
  • [43] The ATLAS collaboration., M. Aaboud, G. Aad, et al. J. High Energ. Phys. 12, 59 (2017).
  • [44] The CMS collaboration., A.M. Sirunyan, , A. Tumasyan, et al. J. High Energ. Phys. 12, 59 (2019).
  • [45] M.Kuhn, K.Johnson, Applied Predictive Modeling, Springer 26 (2013).
  • [46] D.M.W. Powers, J. Mach. Learn. Technol. 2 (2008).
  • [47] A.P. Bradley, Pattern Recognition 30(7), 1145-1159 (1997).
  • [48] C.R. Harris, K.J.Millman, S.J. van der Walt, et al. Nature 585, 357–362 (2020).
  • [49] C.C. Chang, C.J. Lin, ACM TIST 2, 3 (2011).
  • [50] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion , O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, J. Mach. Learn. Res. 12, 2825–2830 (2011).
  • [51] A. Ramirez-Morales, A. Davila-Rivera, Github: SVM-physics code. https://github.com/andrex-naranjas/SVM-physics.
  • [52] F. Wilcoxon, Biometrics Bulletin 1 (6), 80–83 (1945).
  • [53] Wang, S., Yu, X. and Perdikaris, P., Preprint at arXiv https://arxiv.org/abs/2007.14527 (2020).