skip to main content
research-article
Open access

Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs

Published: 12 January 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real-world applications such as intrusion identification in network systems, detection of ecosystem disturbances, and detection of epidemic outbreaks. In this article, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: (i) how to compare graph snapshots across time, (ii) how to capture temporal dependencies, and (iii) how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short-term and long-term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that (i) LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, (ii) MultiLAD’s advantage over contenders significantly increases when additional views are available, and (iii) MultiLAD is highly robust to noise from individual views. In five real-world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.

    1 Introduction

    Real-world problems in various domains (e.g., political science, biology, chemistry, and sociology) can be modeled as evolving networks that capture temporal relations between nodes. With the increasing availability of dynamic network data in areas such as social media, public health and transportation, providing sophisticated methods that can identify anomalies over time is an important research direction. The task of anomaly detection in dynamic graphs aims at identifying different types of time-varying entities that significantly deviate from the “normal” or “expected” behavior of the underlying graph distribution.
    In this work, we focus on the change point detection problem which identifies time steps where the graph structure or many graph components deviate significantly from the normal behavior. As change point detection and event detection are closely related, we first explain the distinctions between two of them. Following [57]’s definition, a change point is a time point where there is a sudden change in the underlying network generative process and this new process continues beyond the current point. In contrast, an event is defined as a time point where the network deviates significantly from the expected behavior and resumes back to normal after this time point. Once the anomalous graph instance is found, potential causes can be identified through various static graph analysis techniques. In this work, we first propose a novel change point detection method, Laplacian Anomaly Detection (LAD).1 There are two major challenges for change point detection in dynamic graphs. First, how to compare graph snapshots across time. It is difficult to directly compare two graphs or their adjacency matrices. A common approach is to summarize the graph into a low dimensional embedding vector called the signature vector. LAD uses the singular values of the Laplacian matrix (Laplacian spectrum) as the signature vector for each snapshot because they are closely related to the graph connectivity and low rank approximation. Second, how to capture temporal dependencies. In practice, graphs can undergo abrupt changes at any given time step [30]. These sudden changes can be detected by examining the graph snapshots from a few timestamps in the past or a short time window. However, gradual changes in the graph structure is often shown over a long time horizon and possible accompanied by permanent changes in the underlying graph generative model [57]. Therefore, effective detection of both sudden and gradual changes requires the perspective from both short-term and long-term perspectives. We propose to use both short and long-term sliding windows to explicitly compare the current graph structure with those in the windows. The snapshots within a window are summarized by stacking the signature vectors from these snapshots into a matrix and then computing the principal left singular vector of this matrix as the normal behavior within this window. Then it is compared with the current signature vector via cosine similiarity.
    In practice, a number of networks can describe the same underlying relations. Multi-view (a.k.a. multiplex) networks are a subset of multi-layer networks, which describe relations between the same set of entities from different information sources. Each source can be modeled as an individual view. The interactions between the entities in each view often evolve over time thus forming multi-view dynamic networks. These networks arise naturally in numerous domains such as traffic/transportation networks [15, 18], mobile phone networks [32], friendship networks [13], social media networks [56], and citation networks [52]. Recent work has shown promising improvements by leveraging the additional information from multi-view data in tasks such as community detection [38, 43], anomalous node detection [48], and event forecasting [27]. To the best of our knowledge, existing methods studying change point detection only focused on single view dynamic networks [2, 29, 30, 45, 57]. Therefore, we propose MultiLAD as the first change point detection method for multi-view dynamic networks.
    To better show the motivation for MultiLAD, let us consider a real-world application. Given trip records of different taxi companies and for-hire vehicles, can we identify important occasions such as holidays, events, or unusual weather conditions that disrupt the overall traffic network? In general, there are two main motivations for leveraging multi-view information for change point detection in dynamic networks. First, data from additional sources may have different characteristics and values [43] thus contributing unique and complementary information. In this case, records from taxi companies and for-hire vehicle services (such as Uber and Lyft) should model the same traffic network but from potentially different age groups and demographics. Second, individual sources can have a high degree of noise and contain disruptive information due to the data collection process. Figure 1 shows the anomalous event on March 22nd 2020 in the New York City Transit network detected by our proposed method. This day signals the beginning of the work from home requirement for all non-essential workers. To better illustrate the graph structure, the nodes are colored by their detected community assignment and their sizes correspond to PageRank [41]. Here, we have four views which are explained in detail in Section 8.9. Given these four views with different graph structures and evolution, our method effectively reasons across views to find the important anomalies in the underlying traffic network. In contrast, it is often difficult or even infeasible to select one view and apply a single-view change point detection method such as LAD (LAD’s anomaly scores on each individual views are reported in Figure 1). As seen in the figure, each view can exhibit a different set of anomalies, leading to significant differences among anomalies scores. This disagreement problem is further exacerbated when more views are present. Therefore, naive aggregation strategies on anomaly scores (such as max or mean) are insufficient, while MultiLAD aggregates view-specific features in a meaningful way, preserving the essential information for anomaly detection. After view aggregation, MultiLAD utilizes the same anomaly detection procedures as LAD thus MultiLAD can be seen as a multi-view generalization of LAD.
    Fig. 1.
    Fig. 1. Our proposed MultiLAD accurately detects March 22nd 2020 as an anomalous day in the New York City Transit network. This day is the first day of NYS on Pause Program where all non-essential workers must stay home [31]. Per each view, we visualize the graph structure on the detected change point as well as a day before and after this point. The anomaly scores from the single-view method LAD are also reported for each view to demonstrate the challenge of combining these information when derived independently. On top, we show that MultiLAD’s anomaly score (which considers these four views jointly) can effectively reason across views.
    Our proposed Multi-view Laplacian Anomaly Detection (MultiLAD) uses the scalar power mean operation to meaningfully aggregate information from individual views and infers the anomaly score based on an aggregated vector. More specifically, MultiLAD first computes the singular values of the normalized Laplacian matrix from each view as the view-specific feature vector. Then, MultiLAD merges information across different views by aggregating these feature vectors through the scalar power mean operation. In this way, the effect of noise from individual views are lessened while key information about the overall generative process of the multi-view network is preserved. Lastly, MultiLAD compares the aggregated vector with past normal behavior and determines if a particular time step is anomalous or not.
    Summary of contributions:
    We introduce a novel change point detection method: LAD. LAD utilizes the singular values of the Laplacian matrix to obtain a low dimension embedding of graph snapshots. To the best of our knowledge, this is the first time that the Laplacian spectrum has been used for change point detection.
    LAD captures both the short-term and long-term temporal relations to detect both events and change points.
    We extensively evaluate LAD on three synthetic tasks and three real-world datasets. We show that LAD is more effective at identifying significant events than state-of-the-art methods.
    We extend LAD to the multi-view setting by proposing MultiLAD: the first change point detection method for multi-view dynamic networks. MultiLAD aggregates the singular values of the normalized Laplacian matrices through scalar power mean and identifies the most informative singular values from each view.
    We extensively evaluate MultiLAD on synthetic experiments and show that MultiLAD (1). gains increased performance from additional views, (2). is highly robust to noise, and (3). significantly outperforms state-of-the-art single view baselines and their multi-view extensions.
    We apply MultiLAD to two real-world multi-view traffic networks. We demonstrate that MultiLAD correctly detects major real-world traffic disruptions such as the implementation of stay-at-home order for COVID-19 and Christmas day.
    Reproducibility: All code is publicly available. The codes for LAD and MultiLAD are available publicly as follows:

    2 Related Work

    Many real-world systems and relations can be formulated as dynamic graphs [15, 18, 32]. The relations can also be captured by constructing similarity matrices using \(k\) Nearest Neighbors (KNNs) distance [60]. For dynamic graphs, the common strategy across most anomaly detection methods is to extract a low dimensional representation from graph snapshots and then apply an anomaly scoring function to compare these representations [46]. In this section, we discuss related literature from both single-view and multi-view anomaly detection. We compare features of LAD, MultiLAD, and other alternative methods in Table 1. Note that our proposed MultiLAD is the only approach that satisfies all the desired properties and is designed for multi-view dynamic graphs.
    Table 1.
    Table 1. Only MultiLAD Meets all Desirable Properties

    2.1 Event Detection

    An early work by Idé and Kashima [30] examines runtime anomalies at the application layer of multi-node computer systems. A web-based system is modeled as a weighted graph where each edge represents a dependency between services. Idé and Kashima aim at finding time points where the majority of the edge attributes in the network show significant deviation from the recent ones. The principal eigenvector corresponding to the maximum eigenvalue of the positive weighted adjacency matrix \(W\) is used as a low dimensional representation of the graph (called activity vector). Activity vectors from a short-term context window form a matrix upon which singular value decomposition (SVD) is then performed. The typical graph behavior within the context window is then represented as the principle left singular vector. The deviation of the current activity vector from the typical behavior is used as the anomaly score. Different from Idé and Kashima, we use the Laplacian spectrum to summarize graph structures and explicitly construct two sliding windows for long-term and short-term contexts.
    In ref [34], a dynamic graph is viewed as a high order tensor. The authors proposed to perform the PARAFAC decomposition [7] of this tensor to obtain the signature vector for anomaly detection. As a multi-view dynamic graph can be considered as a 4th order tensor, we generalize TENSORSPLAT to the multi-view setting, and include it as one of the baselines in our experiments in Section 8. SPOTLIGHT [15] was proposed to detect anomalies in dynamic graphs and is centered around monitoring the (dis)appearance of large or dense subgraphs in a dynamic graph. SPOTLIGHT focuses on the related but different anomalous subgraph detection task, while we detect anomalous graph snapshots. In this work, we also investigate the New York City Transit 2015–2016 dataset studied in [15].

    2.2 Change Point Detection

    Koutra et al. [35] formally stated the axioms and desired properties of graph similarity functions. Their proposed an anomaly detection method named DeltaCon, which first computes pairwise node affinities in the first graph and then measures the difference in node affinity score between two graphs. Therefore, DeltaCon is only well-defined for pairs of graphs thus lacking the ability to reason with a sequence of graph snapshots. This inherently limits its ability to detect gradual changes that involve multiple snapshots.
    Wang et al. [57] model network evolution as a first order Markov process and propose the EdgeMonitoring method based on Monte-Carlo sampling techniques. Their assumption is that there is some unknown underlying model that governs the generative process. Moreover, each graph snapshot is dependent on the current generative model as well as the previously observed snapshot. This method is often regarded as the current state-of-the-art for change point detection. However, EdgeMonitoring relies on consistent node orderings across all time steps and assumes constant number of nodes for each snapshot. This assumption is easily violated in large social networks where users accounts are added frequently. In contrast, LAD can manage varying number of nodes across time. Miller and Mokryn [39] proposed an online fast change point detection method called SACPD, based on the degree distribution of snapshots of networks in time. SACPD computes the cumulative distribution function of the network degrees and assumes no prior knowledge of the network thus being size agnostic. Our LAD also assumes no prior distribution on the network by utilizing the Laplacian spectrum.

    2.3 Multi-view Anomaly Detection

    Most multi-view anomaly detection methods follow a two step procedure: (1). extract view-specific representations and (2). aggregate such representations to identify anomalous data. While prior work [23, 48, 50, 53] focused on the anomalous node (or instance) detection task, we tackle the change point detection problem in the multi-view dynamic graph setting. In other words, instead of detecting anomalous nodes, we spot anomalous timestamps when the structure of the entire graph changes.
    We draw on the same idea used in node-level anomaly detection methods which assumes that abnormal events would create a disturbance of regularities across all views. By mining such irregular patterns, one can achieve stronger anomaly detection results than with just a single view. In this work, we use this idea to identify anomalous graph snapshots across different views to detect significant changes in the overall graph generative process for all views.

    2.4 Multi-view Clustering

    The goal of graph clustering is to obtain groups of nodes that are similar with regards to some structural or node attribute information [43, 51]. Graph clustering in the multi-view setting is a similar task to multi-view change point detection in which one needs to combine information from different sources on the same nodes to extract some underlying structure. However, multi-view graph clustering focuses on static graphs while in this work we consider dynamic graphs.
    Mercado et al. [38] propose to take the matrix power means of Laplacians (PML) to extend the spectral clustering algorithm to multilayer graphs. Inspired by the effectiveness of power mean operations in multi-view clustering, we propose to utilize the scalar power mean operation to merge information from different views for change point detection.
    Sohn et al. [50] proposed a Bayesian multilayer stochastic blockmodeling framework for multilayer clustering. They also extend their method to detect communities changes in multilayer temporal graphs. However, their approach is limited to networks with block structures. In comparison, MultiLAD is applicable to any graph including those with no known community structures (such as the Barabási-Albert model).

    3 Problem Definition

    In this section, we formally define the multi-view change point and event detection problems which encompasses the classical single-view definition of the same problems as a special case.
    Multi-view Dynamic Graph A multi-view dynamic graph \(\mathbb {G} = \lbrace \mathcal {G}_t \rbrace _{t=1}^{T}\) is given by a collection of graph snapshots \(\mathcal {G}_t = \lbrace \mathbf {G}_{t,r} \rbrace _{r=1}^m\) , where \(\mathbf {G}_{t,r}\) is the graph from the view (or relation) \(r\) at time step \(t\) . At each time step, each graph \(\mathbf {G}_{t,r} = (\mathbf {V}, \mathbf {E}_{t,r})\) consists of the set of nodes \(\mathbf {V}\) and a set of edges \(\mathbf {E}_{t,r} \subset \mathbf {V} \times \mathbf {V}\) . For clarity of exposition, we assume that the set of nodes is fixed over time and each view has the same number of nodes (though MultiLAD can handle settings where this assumption does not hold, see Section 6.1). Each edge \(e=(i,j,w) \in \mathbf {E}_{t,r}\) is then defined as the connection between node \(i\) and node \(j\) at timestamp \(t\) in the view \(l\) with weight \(w \in \mathbb {R}^{+}\) . Note that in practice both nodes and edges can appear or disappear from one-time step to another. We denote by \(\mathbf {A}_{t,r} \in \mathbb {R}^{n \times n}\) the adjacency matrix representing edges in \(\mathbf {E}_{t,r}\) where \(n = |\mathbf {V}|\) . When the number of views \(m=1\) , we fall back into the single-view setting similar to [30, 57].
    Multi-view Change Point Detection Based on the above definition, let \(\mathbb {G}\) be a multi-view dynamic graph and let \(\mathbf {H}_t\) be the underlying graph generative model at time \(t\) . \(\mathbf {H}_t\) is not observed but is assumed to control the graph behavior across all views. The goal of multi-view change point detection is to find time points after which \(\mathbf {H}_t\) significantly differs from the previous steps. More precisely, we want to find a set \(S \subseteq \lbrace 1, \ldots , T\rbrace\) such that for each \(t \in S\) , \(\cdots \simeq \mathbf {H}_{t-2} \simeq \mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t} \simeq \mathbf {H}_{t+1} \simeq \cdots\) .
    Multi-view Event Detection In this work, we also consider events similar to [30]. Let \(\mathbb {G}\) be a multi-view dynamic graph and let \(\mathbf {H}_t\) be the underlying graph generative model at time \(t\) . The goal of event detection is to find a set of time points \(S \subseteq \lbrace 1, \ldots , T\rbrace\) such that for each \(t \in S\) , \(\mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t}\) and \(\mathbf {H}_{t-1} \simeq \mathbf {H}_{t+1}\) . Events are one-time sudden changes in graph structure where the generative model \(\mathbf {H}_{t+1}\) reverts back to normal after the event at time \(t\) . For clarity, we include a table of notations in Table 2.
    Table 2.
    NotationDescription
    \(e\) edge between a source and a destination node
    \(r\) view (or relation) index
    \(t\) time index
    \(w\) edge weight
    \(\mathbb {G} = \lbrace \mathcal {G}_t\rbrace _{t=1}^T\) a multi-view dynamic graph
    \(\mathcal {G}_t\) multi-view dynamic graph at time \(t\)
    \(\mathbf {G}_{t,r}\) graph snapshot of view \(r\) at time step \(t\)
    \(\mathbf {V}\) set of vertices in a multi-view dynamic graph
    \(\mathbf {E}_{t,r}\) set of edges of view \(r\) at time step \(t\)
    \(\mathbf {A}_{t,r}\) adjacency matrix representing the edges of view \(r\) at time step \(t\) , a matrix representation of \(\mathbf {E}_{t,r}\)
    \(\mathbf {D}\) diagonal degree matrix
    \(\mathbf {L}\) unnormalized Laplacian matrix
    \(\mathbf {L}_{sym}\) symmetric normalized Laplacian matrix
    \(n\) number of nodes in a multi-view dynamic graph, \(n=|\mathbf {V}|\)
    \(m\) number of views in a multi-view dynamic graph
    \(\mathbf {\sigma }_t\) Laplacian spectrum of a graph snapshot (behavior vector)
    \(\mathbf {C}\) context matrix constructed from a sequence of consecutive behavior vectors
    \(\tilde{\sigma }_t\) normal behavior vector obtained from a context matrix
    \(Z\) raw anomaly score
    \(Z_t^*\) processed (final) anomaly score at time step \(t\)
    \(\Sigma\) matrix of left singular vectors
    \(\mathbf {\lambda }_{r,t}\) a singular value vector of the normalized Laplacian matrix of view \(r\) at time \(t\)
    Table 2. Notations used in this Work

    4 Laplacian Anomaly Detection

    We first propose a new spectral anomaly detection method for single-view dynamic graphs: LAD. The core idea of LAD is to detect high-level graph changes from low dimensional embeddings (called signature vectors). The “typical” or “normal” behavior of the graph can then be extracted from a stream of signature vectors based on both short-term and long-term dependencies. In this way, we can compare the deviation of the current signature vector from the normal behavior.

    4.1 Laplacian Spectrum

    We first define the (unnormalized) Laplacian matrix \(\mathbf {L}\) as \(\mathbf {L} = \mathbf {D} - \mathbf {A}\) where \(\mathbf {D}\) is the diagonal degree matrix and \(\mathbf {A}\) is the adjacency matrix of \(\mathbf {G}\) . Here we drop the subscripts for readability, \(\mathbf {L}\) is defined based on a given view at a given timestep. For LAD, we use the vector of singular values obtained through Singular Value Decomposition (SVD) [21] of \(\mathbf {L}\) as graph embedding for a snapshot \(\mathbf {G}\) . Figure 2 shows the visualization of the Laplacian spectrum and the corresponding anomaly scores detected by LAD for the Senate co-sponsorship network. This choice is motivated by the fact that singular values
    Fig. 2.
    Fig. 2. LAD captures changes in the graph spectrum. LAD scores (top) and the top 6 singular values (bottom) are aligned and show correspondence at each time step. This illustration is from the Senate co-sponsorship network. The warmer the color, the higher the intensity.
    are related to the Laplacian spectrum which is fundamentally related to structural properties of the graph such as community structure and degree distribution,
    encodes the compression loss of a low rank approximation of the Laplacian matrix,
    are node permutation invariant,
    can be efficiently computed for real-world sparse matrices.
    First, it is known that the singular values of a positive symmetric matrix coincides with its eigenvalues. The Laplacian matrix is symmetric and positive semi-definite for an undirected weighted graph [55]. The eigenvalues of the Laplacian matrix capture fundamental structural properties of the corresponding graph. These properties have been extensively utilized in many fields such as randomized algorithms, combinatorial optimization problems, and machine learning [61]. Notably, the field of spectral graph theory [9] extensively studies properties of the graph Laplacian matrix. As an illustration, the multiplicity \(k\) of the eigenvalue 0 of \(\mathbf {L}\) is equal to the number of connected components of the graph [55]. In addition, the Laplacian spectrum is related to many other structural properties of the graph such as the degree sequence, number of connected components, diameter, vertex connectivity, and more [61].
    Second, it is well known that the truncated SVD gives the best low rank approximation of a matrix with respect to both the Frobenius norm and the 2-norm (Eckart-Young theorem). More precisely, the \((k+1)\) th singular value \(\sigma _{k+1}\) of a matrix corresponds to the reconstruction error of the best rank \(k\) approximation measured in the 2-norm. From this insight, we know that the (ordered) singular spectrum \(\sigma _1, \sigma _2, \ldots , \sigma _r\) encodes rich information regarding the reconstruction loss that would occur for different levels of low rank approximations. Truncated SVD is often used as a powerful compression tool for images [47], videos [4], and audio [59]. Thus, capturing the singular spectrum can be seen as an alternative compression based anomaly detection technique as categorized by Ranshous et al. [46]. Intuitively, huge fluctuations in the singular values of the Laplacian matrix reflect drastic changes to the global graph structure.
    Third, being node permutation invariant is one of the desirable properties for a graph learning method [58]. LAD is permutation invariant as singular values of the Laplacian matrix is invariant to row and column permutations. Lastly, singular values can be efficiently obtained through sparse computations [1] and we discuss the computational complexity of LAD in Section 6.2.

    4.2 Characterizing Normal Behavior

    Identifying a normal or typical behavior from a sliding window is often an integral part of change point detection. Similar to [2, 30], we compute a “typical” or “normal” behavior vector from the Laplacian spectrum of the previous \(l\) time steps, where \(l\) is the window size. First, we perform L2 normalization on the Laplacian spectrum seen so far \(\mathbf {\sigma }_0, ... , \mathbf {\sigma }_t\in \mathbb {R}^n\) to obtain unit vectors. Next, a context matrix \(\mathbf {C}\) is constructed:
    \(\begin{equation} \mathbf {C} = \left(\begin{array}{cccc} |& | & &| \\ \mathbf {\sigma }_{t-l} & \mathbf {\sigma }_{t-l+1} & ... & \mathbf {\sigma }_{t-1} \\ |& | & & | \end{array} \right) \in \mathbb {R}^{n \times l}. \end{equation}\)
    (1)
    where \(n\) is the length of the signature vector (which corresponds to the number of nodes in the graph). We compute the left singular vector of \(\mathbf {C}\) with SVD to obtain the normal behavior vector \(\tilde{\sigma }_t\) . One can interpret this behavior vector as a weighted average of the sliding window spectrums [2].
    We propose to compare the current signature vector with the typical behavior from two independent sliding windows: a short-term window and a long-term window. The short-term window encodes information from the most recent trend and captures abrupt changes in the overall graph structure. Depending on the application, the length of the short-term window can be adjusted to best reflect an appropriate time scale. On the other hand, a long-term window is designed to capture larger scale and more gradual trends in the dynamic graph. For example, for the UCI Message dataset, the short-term context can be monitoring weekly change and the long-term context can be monitoring a biweekly change.

    4.3 Anomalous Score Computation

    After capturing the normal behavior, a scoring function is then defined to measure the difference between the current signature vector and the expected or normal one. In this work, we use the same anomaly score as introduced in [2, 30], namely the \(Z\) score. Let \(\tilde{\sigma }_t\) be the normal behavior vector and \(\mathbf {\sigma }_t\) be the Laplacian spectrum at current step. As mentioned in Section 4.2, both \(\tilde{\sigma }_t\) and \(\mathbf {\sigma }_t\) are normalized to unit vectors, then the \(Z\) score is computed as
    \(\begin{equation} Z = 1 - \frac{\mathbf {\sigma }_t^\top \tilde{\sigma }_t}{ \Vert \mathbf {\sigma }_t\Vert _2 \Vert \tilde{\sigma }_t\Vert _2} = 1 - \mathbf {\sigma }_t^\top \tilde{\sigma }_t = 1 - \cos {\theta }, \end{equation}\)
    (2)
    where \(\cos {\theta }\) is the cosine similarity between the vectors \(\tilde{\sigma }_t\) and \(\mathbf {\sigma }_t\) . Essentially the \(Z\) scores becomes closer to 1 when the current spectrum is very dissimilar to the normal thus signaling an anomalous point.
    Let \(w_s\) and \(w_l\) be the sizes of the short-term and long-term sliding windows respectively. Analogous to Section 4.2, one can obtain two different normal behavior vectors based on the two sliding windows. One can then compute the short-term and long-term anomaly scores \(Z_{s}\) and \(Z_l\) based on cosine similarity. To best aggregate these two perspectives, we take \(Z_t = max(Z_s, Z_l)\) to decide if the current graph is more anomalous in abrupt or gradual changes.
    \(Z_t\) scores can be used to detect events (one-time changes in the graph) while change points are often followed by a sequence of decaying anomaly scores. We aim at detecting both events and change points thus we compute the final anomaly score \(Z_t^* = min(Z_t - Z_{t-1},0)\) which emphasizes the time points with the largest increase in \(Z_t\) when compared to \(Z_{t-1}\) . Additional discussion and visualization are added in Appendix A.1.

    5 MultiLAD

    In this section, we present MultiLAD for multi-view change point detection in dynamic graphs. The overall procedure for MultiLAD is summarized in Algorithm 1.

    5.1 Extracting Signature Vectors Per View

    Using the unnormalized Laplacian matrix \(\mathbf {L}\) can be problematic in the multi-view setting where edge weights from different views may vary significantly in magnitude. Therefore, we use the symmetric normalized Laplacian [55] instead:
    \(\begin{equation} \mathbf {L}_{sym} = \mathbf {D}^{-\frac{1}{2}} \mathbf {L} \mathbf {D}^{-\frac{1}{2}} = \mathbf {I} - \mathbf {D}^{-\frac{1}{2}} \mathbf {A} \mathbf {D}^{-\frac{1}{2}}. \end{equation}\)
    (3)
    In this way, the Laplacian matrix of each view is normalized by node degree while also being symmetric (the eigenvalues are the same as singular values). Note that the singular values \(\lambda _i\) of \(\mathbf {L}_{sym}\) are now bounded in \([0,2]\) (in contrast with the singular values \(\sigma _i\) of the un-nromalized Laplacian which were unbounded). We consider the singular values vector \(\mathbf {\lambda }_{r,t} = [\lambda _{1}, \dots , \lambda _{n}]\) of the normalized Laplacian \(\mathbf {L}_{sym}\) as the signature vector for the graph \(\mathbf {G}_{t,r}\) of the \(r\) th view at time step \(t\) . Therefore, at each time step \(t\) , we obtain \(m\) signature vectors coming from each view.

    5.2 Aggregating Per View Signatures

    To aggregate the per-view signature vectors for multi-view change point detection, we propose to use the scalar power mean operation. The \(p\) th order power mean of a set of non-negative real numbers \(x_1, ... , x_m\) is defined by
    \(\begin{equation} s_p(x_1, ... , x_m) = \left(\frac{1}{m} \sum _{i=1}^{m} x_i^p\right)^{\frac{1}{p}}, \end{equation}\)
    (4)
    where \(p \in \mathbb {R}\) is a hyper-parameter. We denote the vector of singular values of the Normalized Laplacian matrix \(\mathbf {L}_{sym}\) in view \(r\) at time \(t\) by \(\mathbf {\lambda }_{r,t}\) as before. We now define the scalar power mean spectrum \(\Lambda _t\) of a multi-view dynamic graph with \(m\) views at time \(t\) as
    \(\begin{equation} \Lambda _t = s_p(\mathbf {\lambda }_{1,t}, \dots , \mathbf {\lambda }_{m,t}) \in \mathbb {R}^n, \end{equation}\)
    (5)
    where \(s_p\) is applied component-wise to the singular value vectors, i.e., \((\Lambda _{t})_i = s_p((\mathbf {\lambda }_{1,t})_i, \dots , (\lambda _{m,t})_i)\) for all \(0 \le i \le n\) . In this way, \(\Lambda _t \in \mathbb {R}^n\) is aggregated from the set of per-view signature vectors.
    Aggregating different views with the power mean operation is a natural choice as it simply smooths the spectrums across all views thus reflecting the geometry of the overall graph generative model. In addition, the choice of order \(p\) gives the flexibility to emphasize different frequency in the spectrum. In this work, we set \(p=-10\) for all experiments as allows the smaller eigenvalues to be emphasized. This is motivated by the fact that the smallest eigenvalues of the Laplacian corresponds to the lower frequency in the graph which is often considered to be more useful in practice. For example, the number of zero eigenvalues corresponds to the number of connected components in the graph [55]. Mercado et al. [38] also found the power mean operation to be a powerful tool in the multi-view spectral clustering task. Empirically, we also observe that \(p=-10\) results in the best overall performance for change point detection as shown in an ablation study in Appendix B.
    We also add a small diagonal shift to the normalized Laplacian \(\mathbf {L}_{sym}\) to ensure that it is positive definite. For all experiments, we set the shift \(\epsilon = log(1+|p|)\) for \(p\lt 0\) thus no 0 would be encountered in Equation (4).

    6 Method Properties

    6.1 Evolving Graph Size and Node Permutation

    For clarity, we assumed that the number of nodes in multi-view dynamic graphs is constant over time and the same across all views. In many real-world graphs, the number of nodes often change over time. In practice, LAD is able to adapt to the evolving size of dynamic graphs by computing the top \(k\) singular values as signature vectors instead of the full spectrum. \(k\) can be chosen to be the number of nodes of the smallest graph snapshot seen so far. Similarly, MultiLAD can be applied to multi-view dynamic graphs where the graph size changes by considering the top \(k\) singular values from each view before aggregation. In addition, MultiLAD can also adapt to multi-view graphs where the number of nodes in each view differs from each other, again by considering the top \(k\) singular values where \(k\) is the size of the view with the least number of nodes.
    Since the Laplacian spectrum is invariant to row/column permutations, both LAD and MultiLAD are invariant to node permutations. This implies that the nodes of a dynamic graph are not required to conserve the same ordering for each graph snapshot. This can be very relevant for privacy sensitive applications where the alignment of nodes across many time steps could cause information leakage. This property can also be very useful in the multi-view setting where aligning node order across views may be technically challenging. In this way, LAD and MultiLAD can be applied to a broader range of scenarios than methods relying on a consistent node ordering across snapshots.

    6.2 Computational Complexity

    The computational complexity of LAD is dominated by the singular value decomposition of the Laplacian, which in general has complexity \(\mathcal {O}(n^3)\) for the Laplacian matrix \(\mathbf {L} \in \mathbb {R}^{n \times n}\) . As many real-world networks are sparse, sparse SVD can be used to reduce the complexity to \(\mathcal {O}(k|\mathbf {E}|)\) , where \(k\) is the number of singular values and \(|\mathbf {E}|\) is the number of edges in the graph. In a multi-view dynamic graph with \(T\) time steps, \(m\) views and an average of \(|\mathbf {E}|\) edges per snapshot, MultiLAD has a complexity of \(\mathcal {O}(k \cdot m \cdot T \cdot |\mathbf {E}|)\) when computing \(k\) singular values. Note that the scalar power mean aggregation is very fast to compute thus computation cost of MultiLAD lies mainly in computing SVD for each view. In real-world networks with hundreds of nodes, MultiLAD is fast to compute, usually taking in the order of minutes on CPUs. For large graphs with millions of nodes where computing SVD is not feasible, efficient approximation techniques such as Network Density of States [12] can be used, as seen in [28], to approximate the spectrum of the Laplacian.

    7 Single-view Experiments

    In this section, we conduct extensive experiments on both synthetic and real-world single-view dynamic graphs to evaluate the performance of LAD.

    7.1 Measuring Performance

    To quantitatively evaluate the quality of detected change points, we choose the Hits at \(n\) ( \(H@n\) ) metric which reports the number of correctly identified anomalies out of the top \(n\) time points with the highest anomaly scores. In synthetic experiments, we use the planted anomalies in the generation process for evaluation. In real-world datasets, we treat the well-known anomalous time steps from the literature as ground truth anomalies.

    7.2 Contenders and Baselines

    We compare LAD with alternative single-view methods. We use the same sliding window sizes across all methods when applicable. Both the anomalies scores from the short-term and long-term sliding windows are shown in Figures 3, 4, 5, and 6. For EdgeMonitoring and SACPD, we use the implementation kindly provided by the original authors while other baselines are our implementation. Note that for all experiments, the startup period ( \(0,... ,l\) ) is set to have anomaly score of 0 because we assume these points are not change points.
    Fig. 3.
    Fig. 3. LAD perfectly recovers all events and change points defined in Table 3.
    Fig. 4.
    Fig. 4. LAD detects the end of the university spring term and one day before the start of the fall term in the UCI message dataset.
    Fig. 5.
    Fig. 5. LAD correctly detects the 100th and 104th congress as the top 2 most anomalous points.
    Fig. 6.
    Fig. 6. LAD predicts year 2013 and 2015 as anomalous years for the Canadian bill voting network.
    Activity vector. We refer to the method proposed by Idé et al. [30] as “activity vector” based methods. Idé et al. used the principal eigenvector of the adjacency matrix (namely the activity vector) instead of our proposed Laplacian spectrum. According to the original work, only a short-term context window is considered.
    TENSORSPLAT. Koutra et al. [34] proposed to view the temporal graph as a tensor and then perform the PARAFAC or CP decomposition to obtain low dimensional factors that group similar entities or timestamps together (we use CP rank of 30 for all experiments). The original article proposed to use clustering on the factors for change detection. However, the clustering algorithm is not specified. We use the well-known Local Outlier Factor (LOF) [6] approach along with the TENSORSPLAT framework.
    EdgeMonitoring. Wang et al. [57] proposed the EdgeMonitoring approach and used joint edge probabilities as the feature vector while modeling network evolution as a first order Markov process.
    SPOTLIGHT [16]: SPOTLIGHT takes a randomized sketching-based approach to detect the sudden appearance or disappearance of large dense subgraphs. We implemented SPOTLIGHT in python and use the default Robust Random Cut Forest ( RRCF) [22] for anomaly detection with the SPOTLIGHT embeddings. Following [16], we set \(p=q=0.2\) and \(k=50\) sketch dimensions.
    SACPD [39]: SACPD is an online and fast change point detection method utilizing the degree distribution of network snapshots. The method is size agnostic and the study focused on interaction social networks. The same window size with LAD is used.

    7.3 Synthetic Experiments

    To demonstrate the performance of LAD, we design three controlled experiments. For these synthetic experiments, we use data generated from the Stochastic Block Model (SBM) [26]. The number of communities \(N_c\) as well as the number of nodes within each community can be specified through a size vector \(\mathbf {c}\in \mathbb {R}^{k}\) . In addition, the inter-community and intra-community connectivity for each block can be directly encoded in a symmetric probability matrix \(P \in \mathbb {R}^{N_c \times N_c}\) . For all experiments, we set the short-term and the long-term window to be 5 and 10 time steps respectively and use the entire Laplacian spectrum for LAD. Table 4 summarizes the experimental results for LAD and all baselines in the synthetic experiments.
    Table 3.
    Table 3. Experiment Setting: The Changes in the Generative Model in (a) SBM Experiment (Section 7.3.1) when Only the Number of Blocks ( \(N_c\) ) Changes (Pure Setting)
    Table 4.
    MetricHits @ 7Hits @ 10Hits @ 2
    DatasetPureHybridResampleUCISenate
    LAD (ours)100%100%100%50%100%
    SACPD [39]100%100%57.1%0%50%
    SPOTLIGHT [16]14.2%28.5%57.1%0%100%
    EdgeMonitoring [57]100%100%0%0%100%
    Activity vector [30]71.4%0%0%50%50%
    TENSORSPLAT [34]28.5%14.2%57.1%0%50%
    Table 4. LAD Consistently Finds Significant Anomalies Across Different Datasets and Outperforms Alternative Approaches
    The hybrid experiment refers to the event and change point detection experiment.

    7.3.1 Pure Setting.

    Here, we only introduce change points, where the adjustments in community structure persists until the next change point is reached. We generate a temporal network with 151 time points where each snapshot is produced through SBM parametrized by \(\mathbf {c}\) and \(P\) . There are always 500 nodes per snapshot and the community changes are described in Table 3. Here \(N_{c}\) represents the number of equal sized communities in the snapshot, and \(p_{in}, p_{ex}\) denotes the internal and external community connectivity probability respectively. We set the continuity rate [57] to be 0 for change points and 1.0 elsewhere for the pure setting.
    The top 7 most anomalous points correspond to the 7 ground truth points in Table 3 while the other points have extremely low anomaly scores. In this experiment, LAD, EdgeMonitoring and SACPD all achieves perfect performance as they can reason with gradual changes over time. In contrast, both TENSORSPLAT and Activity vector can only recover some change points. As Activity vector uses the principal eigenvector of the adjacency matrix, it is unable to detect community changes as effectively as the Laplacian spectrum used in LAD.

    7.3.2 Hybrid Setting.

    To study the effectiveness of LAD in detecting both change points and events, we generate a synthetic experiment with SBM where these two types of anomalies both exist. We generate events by strengthening the inter-community connectivity for a given time point (subsequent points are not affected). These events can correspond to a sudden increase in collaborations between typically separated communities such as political parties. The details regarding the generative process can be seen in Table 3. We set the continuity rate [57] to be 0 for change points and 0.9 elsewhere for the hybrid setting. From Figure 3, we observe that LAD is able to perfectly recover all events and change points.

    7.3.3 Resampled Setting.

    In this setting, we use a constant continuity rate of 0 for all time steps, i.e., the graph is resampled from the generative process at each step. In real-world graphs, majority of the edges might not persist over consecutive time steps but rather determined by the underlying community structure. For the generation parameters and change points’ details, we use the same setup as the Hybrid Setting. Resampling from the SBM model can be considered as a node level permutation within each community. As discussed in Section 4, the eigenvalues of the Laplacian is node permutation invariant. Therefore, only LAD is able to detect all change points while all other baselines are missing at least three change points.

    7.4 Real-world Experiments

    Here, we evaluate the performance of LAD on two real-world benchmark datasets, as well as an original dataset. For UCI Message and Senate co-sponsorship experiments, LAD is able to achieve strong performance using only the top 6 eigenvalues. For Canadian bill voting network, we report the LAD performance with the top 338 eigenvalues where 338 is the number of seats in the House of Commons of Canada. More information regarding each dataset can be found in Appendix C.2. The results for UCI Message and the Senate co-sponsorship Network is shown in Table 4.

    7.4.1 UCI Message.

    In this dataset, we treat each day as an individual time point. There are 59,835 total messages sent across all time stamps and 20,296 unique messages. To ensure privacy protection, all individual identifiers such as usernames, email and IP addresses were removed thus we use the dataset description provided in [42] to find significant events in the dataset. We select the short-term window to be 7 days as suggested by Panzarasa et al. [42]. The long-term window is then selected to be 14 days or two weeks.
    Figure 4 shows the \(Z^*\) scores predicted from both the short-term and long-term window (with top 6 eigenvalues). Panzarasa et al. mentioned that day 652 is the end of spring term and day 158 is the start of the fall term. Day 65 is correctly predicted by LAD and activity vector while the top 10 predictions from EdgeMonitoring and TENSORSPLAT do not include either days. However, LAD predicts day 157 as anomalous which corresponds to the day prior to the start of the fall term. It is likely that anomalous message behaviors are exhibited before school starts. As the edge weight (number of characters in messages) shows important connections between users in social networks, the \(Z\) scores has strongest correlation to the average edge weight per snapshot in this dataset. Both SPOTLIGHT and SACPD are not able to detect any of the change points on this dataset.

    7.4.2 Senate co-sponsorship Network.

    In this dataset, an edge is formed between two congresspersons if they are cosponsors on a bill. Fowler [17] pointed out that the 104th Congress corresponds to “a Republican Revolution” which “caused a dramatic change in the partisan and seniority compositions.” Wang et al. also stated that the 104th Congress has the lowest clustering coefficient and thus can be seen as a low point in collaboration while the 100th Congress has the highest clustering coefficient which signals significant collaboration.
    From Figure 5, it is clear that LAD can easily detect the above change points (using only the top 6 singular values). Variations of LAD that only utilizes the short-term or the long-term window are able to identify both points too. SPOTLIGHT is also able to detect both change points as the anomalies here are sudden changes to the graph structure (events). SACPD is able to detect one of the events. Wang et al. mentioned that EdgeMonitoring [57] and LetoChange [45] are able to detect both change points while DeltaCon [35] only predicts the 104th Congress.
    The activity vector method requires full SVD which are computationally expensive and it is only able to detect the 100th Congress. The anomaly scores output by the activity vector method have similar magnitude thus making it difficult to identify change points. However, if we augment the activity vector method to use LAD pipeline and aggregates two sliding windows, it is able to correctly predict both change points. This shows that aggregating the output from two sliding windows can improve empirical performance with a different graph embedding technique.

    7.4.3 Canadian bill voting network.

    To understand the temporal change in Canadian Parliament environment, we extracted open information3 to form the Canadian Parliament bill voting network. The Canadian Parliament consists of 338 Members of Parliament (MPs), each representing an electoral district, who are elected for four years and can be re-elected. In the included time frame from 2006 to 2019, the increase in the number of electoral seats resulted in parliaments with different amounts of MPs; since 2015 the House of Commons has grown from 308 seats to 338. Naturally, the number of nodes in the network fluctuates from year to year. 2 year and 4 years are used as duration for the short-term and long-term windows respectively. We consider the MPs that voted yes for a bill to have a positive relation with the sponsor. In this way, a directed edge is then formed from a voter MP \(u\) to the sponsor MP \(v\) in a given graph snapshot. Each edge is then weighted by the number of times that \(u\) voted positively for \(v\) .
    LAD detects 2013 and 2015 as two significant change points. Figure 6 shows the \(Z^*\) scores predicted by LAD. 2013 is often considered as the year where cross party cohesion against Conservatives begins to decline. Prior to 2013, the Liberal and New Democratic Parties had more unity in voting patterns. As a new party leader (Justin Trudeau) is elected in the Liberal party in 2013, changes in voting patterns are observed [37]. More details regarding data mining are discussed in Appendix A.6. In 2015, the house of common increased the number of constituencies from 308 to 338. Equally, on October 19th 2015 an election took place and the Liberal party won an additional 148 seats, with a total of 184 seats forming a majority government led by Justin Trudeau. Prior to 2015, the Liberal party was divided and however, during this election things changed and literature shows the unified campaign at a local level from different members of parliament across the country [11].

    7.5 Summary of LAD Results

    Table 4 summarizes the empirical performances of LAD and its comparison to alternative methods. We observe that LAD has the best performance across all experiments.
    Synthetic experiment results suggest that LAD can be used in a hybrid environment where both change points or events can occur. Therefore, LAD is suitable to detect anomalous time points where it is possible for both one-time events or fundamental changes in the network generative process to occur thus requiring no prior knowledge on the type of anomaly that could occur later on. The node permutation invariant property of LAD helps it being the best performing method in the resampled setting where the edges in the network are all resampled. In the UCI message and senate co-sponsorship experiments, LAD achieves strong performance using only the top 6 singular values. It demonstrates that in real-world datasets, a low rank truncated SVD is often sufficient to capture rich graph information. Together with efficient SVD computation techniques, LAD can be scaled to large networks. In practice, the rank of the truncated SVD can be determined by the available computational resources.

    8 Multi-view Experiments

    8.1 Synthetic Experiments

    In this section, we empirically examine the change point detection problem for multi-view dynamic graphs and evaluate our proposed MultiLAD approach. In the synthetic experiments, we focus on answering the following questions: Q1. Performance Improvement: How accurately can MultiLAD detect synthetic anomalies by utilizing information from additional views as compared to state-of-the-art single view baselines and naive multi-view baselines? Q2. Noise Invariant: How robust is MultiLAD to noise at individual views and can it leverage multi-view data to reduce the effect of noise? Q3. Benefit of More Views: How will the performance of MultiLAD change as we increase the number of incorporated views? and how the performance gain compares to the baselines?

    8.2 Baselines

    Each data point in Figure 7, 8, and Table 7 is the average over 30 trials and the standard deviation is reflected as the shaded area with the same color. All experiments are run on a desktop with AMD Ryzen 5 1600 CPU and 16GB installed RAM. Each MultiLAD trial takes less than 5 minutes. We compare MultiLAD with the following methods:
    Table 5.
    Table 5. Experiment Setting: The Changes in the Generative Model in (a) Section 8.4 and 8.6 where the SBM Parameter \(c_{in} = 12, p_{in} = 0.024\) and \(p_{ex} = \frac{c_{out}}{\# nodes}\)
    Table 6.
    Table 6. Anomalies in Section 8.7 where the BA Model Parameter \(m\) is the Number of Edges to Attach from a New Node to Existing Nodes
    Table 7.
    MetricHits @ 7
    ExperimentSBM (3v)SBM (noisy)SBM (12v)BA (12v)
    MultiLAD.61 \(\pm\) .16.86 \(\pm\) .09.94 \(\pm\) .08.90 \(\pm\) .08
    LAD [29] \(.26 \pm .10\) \(.14 \pm .11\) \(.29 \pm .13\) \(.13 \pm .09\)
    maxLAD \(.26 \pm .12\) \(.11 \pm .09\) \(.29 \pm .09\) \(.13 \pm .06\)
    meanLAD \(.32 \pm .11\) \(.10 \pm .11\) \(.55 \pm .14\) \(.18 \pm .08\)
    NL LAD \(.39 \pm .16\) \(.70 \pm .14\) \(.37 \pm .15\) \(.71 \pm .11\)
    NL maxLAD \(.37 \pm .15\) \(.77 \pm .10\) \(.34 \pm .16\) \(.33 \pm .07\)
    NL meanLAD \(.52 \pm .16\) \(.84 \pm .11\) \(.82 \pm .10\) \(.70 \pm .07\)
    TENSORSPLAT [34]. \(6 \pm .9\\) N/AN/AN/A
    Table 7. MultiLAD Outperforms all Alternative Approaches on Different Datasets
    Fig. 7.
    Fig. 7. (a) MultiLAD significantly outperforms all baselines as the inter-community connectivity \(c_{out}\) approaches the intra-community connectivity \(c_{in}=12\) . We also indicate the community detection limit for spectral clustering methods with a dotted vertical line. (b) MultiLAD effectively leverages multi-view information to filter out noise injected into individual views while single view methods such as LAD is easily disrupted by high amount of noise.
    Fig. 8.
    Fig. 8. MultiLAD sees the most performance gain from additional views in the SBM model setting (a) and BA model setting (b) when compared to all baselines.
    LAD: We report the best average individual view LAD performance over multiple trials using the spectrum of unnormalized and normalized Laplacian ( \(\mathbf {L}\) and \(\mathbf {L}_{sym}\) ) for comparison (denoted by LAD and NL LAD, respectively). Note that in practice, this is overestimating the performance of such baseline as determining which view is most important is a difficult task.
    Naive Aggregation: We consider two naive aggregation strategies to extend LAD into the multi-view setting. First, the maximum of the LAD anomaly score from each view (per time step) is used as the aggregated anomaly score, named maxLAD. Second, the mean of the LAD anomaly scores is used instead and called meanLAD. Lastly, we apply the same naive aggregation approach to LAD scores obtained from the normalized Laplacian singular values and report them as “NL maxLAD” and “NL meanLAD” respectively.
    TENSORSPLAT: A multi-view dynamic graph can be represented as a 4th order tensor \(\mathbf {\boldsymbol {\mathcal {T}}} \in \mathbb {R}^{T \times M \times N \times N}\) where \(T, M, N\) are the total number of time steps, views and nodes respectively. After a rank \(R\) CP decomposition of \(\mathbf {\boldsymbol {\mathcal {T}}}\) , one can obtain a signature vector of size \(R\) for each time step \(t\) . To make this baseline more competitive, we then apply the same sliding window techniques used in LAD to identify the anomalies. Due to its high computational cost, we only report TENSORSPLAT [34] in Section 8.4 and average over 10 trials.

    8.3 Performance Evaluation

    Hits at \(n\) ( \(Hits@n\) ) metric reports the number of correctly identified anomalies out of the top \(n\) time steps with the highest anomaly score. For synthetic experiment, we use the ground truth change points or events for evaluation. We extensively examine the performance of MultiLAD as compared to baselines on two widely used graph generative model: the Stochastic Block Model (SBM) [26] and the Barabási-Albert (BA) [3] model. The accuracy reported in plots are the \(Hits@7\) score for the 7 planted anomalies. The number of views are set to be 3 except for Section 8.6 and 8.7. At each time step, the snapshot is re-sampled from its generative model. For all experiments, the sliding windows sizes are \(w_s=5\) and \(w_l=10\) respectively and the startup period ( \(0,... ,w_l\) ) is set to have an anomaly score of 0. For experiments involving the SBM model, change points correspond to splitting and merging communities and events are sudden increase in cross community connectivity.

    8.4 Case 1: SBM with increasing difficulty

    To answer if MultiLAD leverages multi-view information to improve change point detection, we examine a sparse SBM model with an increasing level of difficulty. In a SBM model with \(n\) nodes, \(p_{in} = \frac{c_{in}}{n}\) and \(p_{ex} = \frac{c_{out}}{n}\) are the edge probabilities for the within community and cross community edges. The constants \(c_{in}\) and \(c_{out}\) are used to adjust the strength of community connectivity. In this experiment, we set \(c_{in} = 12\) and consider different values for \(c_{out}\) ; note that as \(c_{in} - c_{out}\) gets smaller, the task increases in difficulty. As events would be easy to detect when the difference between \(c_{in}\) and \(c_{out}\) is small, we only consider change points for this experiment (see Table 5 (a)).
    Figure 7 (a) shows that MultiLAD significantly outperforms all baselines as the community structure becomes harder to detect. MultiLAD is able to aggregate over 3 views and significantly outperform the single view method LAD by at as much as \(48.3\%\) at \(c_{out}=4\) . MultiLAD also outperforms the best naive multi-view baseline NL meanLAD by \(9.5\%\) at \(c_{out}=7\) . Note that for \(c_{out} \gt 11\) , the community structure is almost non-existent as the graph closely resembles an Erdős-Rényi graph [14] where all edges are sampled with equal probability. As a reference for the strength of community structure, we indicate the community detection limit for spectral clustering methods on two equal sized communities as a dotted vertical line in the figure. The limit is reached when \(c_{in} - c_{out} = \sqrt {2(c_{in} + c_{out})}\) [40] and in this case, \(c_{in} = 12\) and \(c_{out}=6\) . We observe that after \(c_{out}=6\) , MultiLAD still detects the change points to a significant extent while outperforming other methods. Although being a multi-view baseline, TENSORSPLAT’s performance is much worse than MultiLAD and even single-view LAD. Designing an effective multi-view method is an important direction.

    8.5 Case 2: all views are perturbed by noise

    To examine if MultiLAD is robust to noise from individual views, we add noise to the SBM generative process at each view by examining each pair node pair \(i,j\) and flip the entry \(\mathcal {A}_{i,j}\) of the adjacency matrix to \(1-\mathcal {A}_{i,j}\) with probability \(p_n\) . In this experiment, we set \(p_{in}=0.024\) and \(p_{ex}=0.004\) and adjust \(p_n\) for the noise level. We also include both events and change points (see Table 5 (b)).
    Figure 7(b) demonstrates that MultiLAD is highly robust to noise and outperforms all baselines. Notably, single-view methods like LAD are highly susceptible to noise and their performance degrades quickly as higher ratios of noise are injected, In comparison, MultiLAD maintains strong performance despite each view having a high-level of noise. This shows that using the scalar power mean to aggregate singular values from individual views is a very effective way to filter out noise. When \(p_n = 0.30\) , MultiLAD outperforms single-view LAD by \(45.7\%\) and NL LAD by \(22.9\%\) . MultiLAD upper bounds all naive multi-view baselines and significantly outperforms NL maxLAD and NL meanLAD when the noise ratio is high (i.e., \(p_n \gt 0.15\) ).

    8.6 Case 3: SBM with additional views

    To understand how MultiLAD benefits from having additional views, we follow the same setting as in Section 8.4 and add more views. For this experiment, we fix \(p_{in}=0.024\) and \(p_{ex}=0.012\) , and compare the performances of all methods when the number of available views increases.
    Figure 8 (a) shows that MultiLAD benefits the most from having additional views. Single view methods such as LAD are unable to take advantage of the multi-view data thus remaining at the same performance. Another interesting observation is that max aggregation barely benefits from the additional views as it can be easily dominated by the anomaly score from one particular view. Both NL meanLAD and meanLAD improve with additional views as they compute the average of the anomaly score thus converging to the true expected anomaly score. The strong performance of MultiLAD can be attributed to the fact that MultiLAD directly aggregates the singular values while NL meanLAD and NL maxLAD are dependent on the anomaly score output from individual views. The biggest performance gap appears at 7 views where MultiLAD outperforms NL maxLAD by \(46.7\%\) and NL meanLAD by \(15.7\%\) .

    8.7 Case 4: all views are BA models

    Lastly, we further investigate MultiLAD performance in the BA model to show that MultiLAD is effective beyond the SBM model. In this experiment, the change points correspond to the densification of the network (parameter \(m\) , increased number of edges attached from a new node to an existing node). The details are described in Table 6 (c).
    Figure 8 (b) shows that MultiLAD benefits significantly from more views in the BA setting as well. In particular, with just 2 views, MultiLAD outperforms LAD by 61.9%, NL LAD by 9% and NL meanLAD by 6.7%. MultiLAD is able to efficiently utilize multi-view information as soon as additional views are available and benefits even more when there are more views. We also observe that NL maxLAD has decreasing performance when the number of views increased. This is likely due to naively aggregating the max anomaly score from all views and picking up the noise from additional views rather than useful information. This shows that naive aggregation strategies can have adverse effects. Also, NL meanLAD performs similarly to single view NL LAD thus is not able to effectively leverage the multi-view information.

    8.8 Summary of MultiLAD Results

    Table 7 summarizes the performance comparison between MultiLAD and other baselines across different experiments. We use k“v” to indicate the number of views in the network. For SBM (3v) and SBM (12v), we report the setting where \(p_{in} = 0.024\) and \(p_{ex}=0.012\) . For noisy SBM setting, we follow Section 8.5 and set noise to be 0.15. For BA, we follow the set up in Section 8.7.
    In all settings, MultiLAD significantly outperforms all baselines. By efficient use of multi-view information, MultiLAD improves upon LAD by as high as 62% on the SBM experiments and 77% on the BA experiment. MultiLAD also sees increased gain as additional views become available, achieving 33% gain when using 12 views instead of 3. This shows that MultiLAD effectively leverages the additional views to achieve stronger performance.
    MultiLAD outperforms alternative aggregation strategies including max and mean across all settings and is more robust to noise which shows the benefit of merging information at the signature vector level rather than the anomaly score level. Moreover, max and mean baselines exhibit inconsistent behaviors on different settings. On the BA model, the mean operation does not benefit from having additional views while it is able to outperform single-view baselines on the SBM model. The max operation is detrimental to performance with additional views on the BA model, but not on the SBM model. In contrast, MultiLAD always benefits from additional views.
    We can also see that it is better to use the normalized Laplacian (NL LAD) over the unnormalized one (LAD). It is known that on large dense graphs, the distribution of the eigenvalues of the Laplacian is close to the distribution of the degrees of the vertices [25, 54]. After normalization by node degree, the singular values of the normalized Laplacian can better reflect the graph structure, thus be more suited for change point detection.

    8.9 Real-world Experiments

    Next, we examine MultiLAD in the real-world setting. We used New York City (NYC) Taxi & Limousine Commission (NYC TLC)4 trip records to construct and analyze two dynamic multi-view graphs at different time periods. One of them is a novel multi-view dynamic graph dataset set in 2020, showcasing MultiLAD’s ability to detect key dates in the implementation of New York City’s COVID-19 response. Details on data processing are provided in Appendix C.2. We use window sizes of 3 and 7 days in our experiments to model both short-term events and long-term change points.
    NYC TLC 2015–2016: From NYC TLC records, we acquired the timestamped trip data from two taxi companies (green and yellow) covering November 2015 to January 2016 similar to [15]. We used these records to construct a dynamic two-view directed weighted graph over 264 nodes (representing taxi zones) in which each time point is a day.
    Figure 9 shows MultiLAD’s anomaly score curve and the corresponding Laplacian spectrum (aggregated using the scalar power mean approach). In the spectrum visualization, the time points with high anomaly scores are visually distinct from their surrounding points thus confirming that MultiLAD’s aggregated spectrum correctly retains meaningful information from individual views. Comparing MultiLAD’s results to those reported by [15] revealed that there are many high-scoring anomalies in common (including Christmas day, New Year’s Eve, New Year’s Day, and the blizzard on January 23rd.) There are some minor differences in the anomalies identified by MultiLAD and SPOTLIGHT which can be attributed to the fact that MultiLAD is analyzing the multi-view version of this dataset while SPOTLIGHT is operating on the single-view version (these differences are discussed further in Appendix C.3).
    Fig. 9.
    Fig. 9. Anomaly score (top, annotated with the 7 most anomalous time points) and MultiLAD’s aggregated spectrum (bottom) for (a). the NYC TLC 2015–2016 dataset and (b). NYC TLC 2020 dataset.
    NYC TLC 2020: We also extracted NYC TLC data from January 1st 2020 to April 30th 2020. There are 4 views in total with 2 additional views corresponding to for-hire vehicle (FHV; pre-arranged transportation services like community cars and limousines5) and high-volume for-hire vehicle (FHVHV; Uber, Lyft, and Via6) transportation services were available for this timecourse.
    We hypothesized that anomalies in the 2020 NYC transit dataset would align with key time points in the rollout of NYC’s COVID-19 response. Our results suggest this hypothesis was largely correct (see Figure 9 and Appendix C.4): MultiLAD is able to identify the adoption of the “NYS on Pause Program” (all non-essential workers must stay home, March 22nd), March 29th (the day after all non-essential construction sites were halted in NYS), and April 18th (the Saturday after the first extension of the “NYS on Pause Program”, April 16th). These results show that MultiLAD can indeed detect significant and meaningful real-world events from multiple views. In this case, considering the 4 views individually and determining the most informative one is unpractical.

    9 Conclusion

    In this work, we introduced two novel change point detection method : LAD and MultiLAD for single-view and multi-view dynamic graphs respectively. LAD utilizes the singular values of the Laplacian matrix to embed graph snapshots and explicitly captures both the short-term and the long-term temporal relations to detect events and change points. Through evaluations on both synthetic and real-world datasets, we showed that LAD is more effective at identifying significant events than previous state-of-the-art methods. Next, on multi-view dynamic graphs, MultiLAD aggregates the singular values of the normalized Laplacian matrices through the scalar power mean operation and identifies the most informative singular values from each view. On synthetic benchmarks, MultiLAD benefits from additional views, is robust to noise, and outperforms competitive single view and multi-view baselines. On real-world traffic networks, we showed that MultiLAD correctly identifies significant events which drastically disrupts the flow of traffic.

    Footnotes

    1
    our previous work “Laplacian Change Point Detection for Dynamic Graphs” [29] published at KDD 2020.
    2
    we set the index to start at 0.

    A Additional Materials for LAD Experiments

    This section contains additional material for LAD and single-view experiments in Section 7. We report the implementations used in our experiments. For TENSORSPLAT, we first compute the PARAFAC decomposition (using Tensorly [33] library in python) to obtain the temporal factors. Then the scikit-learn [44] python implementation of the Local Outlier Factor algorithm. For EdgeMonitoring method, we use the matlab code kindly provided by the original authors and keep the default parameters.

    A.1 The effect of Anomaly scores

    In Section 4, we defined the anomaly score as the difference in \(Z\) score when compared to the previous time step. The final anomaly score is defined as \(Z_t^* = min(Z_t - Z_{t-1},0)\) . The points with the largest \(Z^*\) are then selected as anomalies. Here we visualize both the raw \(Z\) score and \(Z^*\) score in Figure 10 for the pure setting in Section 7.3.1. Note that when using the raw \(Z\) scores, we observe declines in anomaly score after the change point. In a sliding window, which contains graph snapshots from the previous graph generative process and the current one, the normal behavior vector is computed based on a mix of graphs from two generative processes thus leading to the observed declining anomalous scores after the change point. In comparison, using \(Z^*\) scores makes LAD more robust under different choices of sliding windows.
    Fig. 10.
    Fig. 10. LAD perfectly recovers the injected change points of Table 3. We visualize the predicted anomaly scores. Both the raw \(Z\) score (top) and the difference in consecutive \(Z\) scores, the \(Z^*\) scores (bottom) are plotted.

    A.2 Spectral Properties and Their Connections

    The above table summarizes connections between different spectral properties in the graph and their connections in the literature. We use the same notation as [55]. \(\mathbf {L},\mathbf {L}_{rw},\mathbf {L}_{sym}, \mathbf {A}\) represent the unnormalized Laplacian matrix, the random walk Laplacian matrix, the symmetric Laplacian matrix, and the adjacency matrix respectively.

    A.3 Interpreting Results

    It is often difficult to interpret the anomaly score of a given change point detection method as the task inherently demands direct comparison between global graph structures over time. As network characteristics vary drastically across domains [8], it is important to design metrics that help us understand the correlation between anomaly scores and well-known graph properties. In this work,we identify temporal outliers in specific graph properties and compare them to the ones predicted by LAD. We choose the outlier score \(y\) as follows:
    \(\begin{equation} y = \frac{|\alpha _t - \alpha _{avg} |}{\alpha _{std}}, \end{equation}\)
    (6)
    where \(\alpha _{avg}\) , \(\alpha _{std}\) are the average and standard deviation of \(\alpha\) computed from a moving window. We select the moving window size to correspond with the short-term window size \(s\) . Then we compute the Spearman rank correlation [62] to understand the statistical dependence between two ranked variables. We observe that LAD is not relying on one particular graph statistics but rather on the most important aspects of the dynamic graph of interest.
    By examining Spearman rank correlations in Table 9, we observe that LAD predictions are most strongly correlated with the number of connected components while still having a positive correlations with other network properties such as transitivity. As LAD captures high-level graph structures, it is sensitive to the important properties in the dynamic graph of interest while not dependent on any single property.
    Table 8.
    Spectral PropertiesConnections
    \(\mathbf {L},\mathbf {L}_{rw},\mathbf {L}_{sym}\) eigenvaluesconnected components [55]
    \(\mathbf {L}\) eigenvectorsratio cut [24]
    \(\mathbf {L}_{rw},\mathbf {L}_{sym}\) eigenvectorsnormalized cut [49, 55]
    \(\mathbf {A}\) eigenvaluesKATZ centrality [20]
    \(\mathbf {A}\) eigenvectorseigenvector centrality [5]
    \(\mathbf {A}\) dominant eigenvectorstationary distrib., PageRank [41]
    Table 8. Spectral Properties and Their Connections
    Table 9.
    Graph PropertySpearman Rank Correlation
    # of connected components17.0%
    Transitivity11.8%
    # of edges7.5%
    Average degree per node15.9%
    Table 9. LAD Scores Correlate Well with the Number of Connected Components when Injected Points are Changing Number of Blocks in SBM (Table 3)
    Spearman rank correlation between LAD and other graph properties is also reported.
    We observe that LAD is not dependent on any one particular graph statistics. Empirically, LAD correlates with different graph properties depending on the network of interest. This coincide with our intuition that any type of significant change in the graph structure could disrupt the singular values. These experimental results suggest that LAD can capture changes of different nature in the graph structure as it tracks the compression loss of the Laplacian matrix to low rank approximations.

    A.4 UCI Message Network

    The UCI Message dataset is a directed and weighted network based on an online community of students at the University of California, Irvine. Each node represents a user and each edge encodes a message interaction from one user to another. The weight of each edge represents the number of characters sent in the message. When an user account is created, a self edge with unit weight is added. A total of 1,899 users was recorded. The network data covers the period from April to October 2004 and spans 196 days.

    A.5 Senate co-sponsorship Network

    Senate co-sponsorship network [17] examines social connections between legislators from their co-sponsorship relations on bills during the 93rd-108th Congress. Bills are grouped into temporal snapshots biannually (time frame for each graph) and co-sponsors on a bill form a clique. Similar to [57], we start from the 97th Congress as full amendments data is available from there onward.

    A.6 Canadian bill voting network

    The Canadian bill voting network was mined from the Open Parliament API (http://api.openparliament.ca/), a source for digitized data from the House of Commons in JSON format. We first extracted all MPs in the canadian parliament from 2006 to 2019. The network nodes for each snapshot only includes the MPs who actively participated in the parliament of that year (around 300 MPs depending on which year). We then extracted all bills sponsored by each MP and the corresponding votes for each bill. Lastly, we filtered out the yes votes for each ballot of the bills and which MPs voted yes. The data mining code is also available in the code repository of LAD.7
    We also provide more information on the political environment from 2006 to 2019. During this time period the government party in power has changed. In 2006, the government was minority Conservative until 2015 when the Liberal party won and formed a majority parliament [36]. Studies have shown that minority governments appear to be less productive in legislative activity as consensus is harder to obtain [10]. Cohesion in the House of Commons within a political party during voting sessions are often observed and dissent has been seen amongst MPs of the same party who are less influential [19]. While elected to the House of Commons, MPs can sponsor more than one bill which can also include bills that may have not passed in prior parliaments. Parliament sessions follows no regular pattern from one parliament to another.

    B Ablation Study On Choice of Power

    Figure 11 shows the impact of the choice of power \(p\) on the performance of MultiLAD. We observe that MultiLAD is largely robust to the choice of \(p\) while the negative powers show slight performance improvement over the positive ones. Therefore, in this work, we choose the power \(p=-10\) .
    Fig. 11.
    Fig. 11. The impact of power \(p\) on the performance of MultiLAD on the SBM experiment from Section 8.4 when \(c_{in} = 12\) and \(c_{out}=4\) .

    C Additional Materials for MultiLAD Experiments

    This section contains additional material for MultiLAD and multi-view experiments in Section 8.

    C.1 Additional Results

    Figure 13 and 12 shows the anomaly scores output by NL maxLAD for the NYC TLC 2020 and 2015-2016 case studies respectively. Figure 14, 15 shows the anomaly scores output by NL meanLAD on the same datasets.
    Fig. 12.
    Fig. 12. NL maxLAD anomaly curve on NYC TLC 2015–2016 dataset (annotated with the 7 most anomalous time points).
    Fig. 13.
    Fig. 13. NL maxLAD anomaly curve on NYC TLC 2020 dat4aset (annotated with the 7 most anomalous time points).
    Fig. 14.
    Fig. 14. NL meanLAD anomaly curve on NYC TLC 2015–2016 dataset (annotated with the 7 most anomalous time points)
    Fig. 15.
    Fig. 15. NL meanLAD anomaly score on NYC Transit 2020 dataset (annotated with the 7 most anomalous time points)

    C.2 Data Processing

    The cleaned and processed datasets are included with the submission. Gridlock days for 2016-2016 were obtained from The Official Website of the City of New York.8 The dates related to NYC’s COVID-19 response were obtained from a publicly available timeline.9

    C.3 NYC TLC 2015-2016

    We used the longitude and latitude coordinates of pickup and drop-off locations to map each trip to the nearest pair of taxi zones. Doing so on a day-by-day basis generated a dynamic two-view directed weighted graphs over 264 nodes (taxi zones) in the greater NYC region, with each time point corresponding to a different day. Edge weights in these graphs correspond to the number of trips between two taxi zones without consideration for the number of passengers. We also point out that Eswaran et al. [15] mapped longitude and latitude coordinates to a set of “57 geographically or conceptually distinguishable zones based on common knowledge” [15], which we were unable to replicate. They also chose to use hourly time points, though this does not drastically impact the comparison between MultiLAD and SPOTLIGHT’s results.
    Eswaran et al. [15] also examined NYC TLC ridership data for November 2015 - January 2016. As mentioned in our Experiments section, there was a sizeable overlap in the dates identified by SPOTLIGHT and MultiLAD. In terms of differences, MultiLAD ranked two NYC gridlock days (dates that the Department of Transportation expected to have unusually busy traffic (11th and 17th respectively, see Table 10) higher in its list of anomalous days. On the other hand, SPOTLIGHT ranked American Thanksgiving higher.
    Table 10.
    Table 10. Key Dates Related to the 2015-2016 NYC TLC Case Study
    This might be partly explained by the fact that we use multi-view data while SPOTLIGHT focuses on single-view graphs. The differences in data pre-processing (see the “NYC TLC 2015-2016” subsection above) could also be partly responsible. We were also unable to ascertain whether they used both the green and yellow taxi ridership data or focused on one view only. We believe the combination of these factors and the large methodological differences between SPOTLIGHT and MultiLAD go a long way in explaining the differences between the two sets of results.

    C.4 NYC TLC 2020

    The 2020 NYC TLC trip records already possessed taxi zone data for pickup and drop-off locations. We also note that two taxi zones had been added between 2015–2020, and that the daily directed weighted graphs spanned 266 nodes instead of 264. Key dates relating to the response of COVID-19 in New York State is detailed in Table 11.
    Table 11.
    Table 11. Key Dates Relating to the Response of COVID-19 in New York State

    References

    [1]
    Michal Aharon, Michael Elad, and Alfred Bruckstein. 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 11 (2006), 4311–4322.
    [2]
    Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In Proceedings of the Army Science Conference.
    [3]
    Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. science 286, 5439 (1999), 509–512.
    [4]
    N. Benjamin Erichson, Steven L. Brunton, and J. Nathan +Kutz. 2017. Compressed singular value decomposition for image and video processing. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1880–1888.
    [5]
    Phillip Bonacich. 1987. Power and centrality: A family of measures. American journal of sociology 92, 5 (1987), 1170–1182.
    [6]
    Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 93–104.
    [7]
    Rasmus Bro. 1997. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38, 2 (1997), 149–171.
    [8]
    Anna D. Broido and Aaron Clauset. 2019. Scale-free networks are rare. Nature Communications 10, 1 (2019), 1017.
    [9]
    Fan RK Chung and Fan Chung Graham. 1997. Spectral Graph Theory. Number 92. American Mathematical Soc.
    [10]
    Richard S. Conley. 2011. Legislative activity in the canadian house of commons: Does majority or minority government matter? American Review of Canadian Studies 41, 4 (2011), 422–437.
    [11]
    William Cross. 2016. The importance of local party activity in understanding canadian politics: Winning from the ground up in the 2015 federal election: Presidential address to the canadian political science association calgary, 31 may 2016. Canadian Journal of Political Science/Revue Canadienne De Science Politique 49, 4 (2016), 601–620.
    [12]
    Kun Dong, Austin R. Benson, and David Bindel. 2019. Network density of states. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1152–1161.
    [13]
    Nathan Eagle, Alex Sandy Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106, 36 (2009), 15274–15278.
    [14]
    Paul Erdős and Alfréd Rényi. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5, 1 (1960), 17–60.
    [15]
    Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. SpotLight: Detecting anomalies in streaming graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Association for Computing Machinery, New York, NY, 1378–1386. DOI:
    [16]
    Dhivya Eswaran, Christos Faloutsos, Sudipto Guha, and Nina Mishra. 2018. Spotlight: Detecting anomalies in streaming graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
    [17]
    James H. Fowler. 2006. Legislative cosponsorship networks in the US House and Senate. Social Networks 28, 4 (2006), 454–465.
    [18]
    Riccardo Gallotti and Marc Barthelemy. 2015. The multilayer temporal network of public transport in Great Britain. Scientific Data 2, 1 (2015), 1–8.
    [19]
    Christopher Garner and Natalia Letki. 2005. Party structure and backbench dissent in the Canadian and British Parliaments. Canadian Journal of Political Science/Revue Canadienne De Science Politique 38, 2 (2005), 463–482.
    [20]
    K.-I. Goh, Byungnam Kahng, and Doochul Kim. 2001. Universal behavior of load distribution in scale-free networks. Physical Review Letters 87, 27 (2001), 278701.
    [21]
    Gene H. Golub and Christian Reinsch. 1971. Singular value decomposition and least squares solutions. In Proceedings of the Linear Algebra. Springer, 134–151.
    [22]
    Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust random cut forest based anomaly detection on streams. In Proceedings of the International Conference on Machine Learning. PMLR, 2712–2721.
    [23]
    Jun Guo and Wenwu Zhu. 2018. Partial multi-view outlier detection based on collective learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
    [24]
    Lars Hagen and Andrew B. Kahng. 1992. New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 11, 9 (1992), 1074–1085.
    [25]
    Shigefumi Hata and Hiroya Nakao. 2017. Localization of laplacian eigenvectors on random networks. Scientific Reports 7, 1 (2017), 1–11.
    [26]
    Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Networks 5, 2 (1983), 109–137.
    [27]
    Chao Huang, Chuxu Zhang, Jiashu Zhao, Xian Wu, Dawei Yin, and Nitesh Chawla. 2019. Mist: A multiview and multimodal spatial-temporal learning framework for citywide abnormal event forecasting. In Proceedings of the World Wide Web Conference. 717–728.
    [28]
    Shenyang Huang, Jacob Danovitch, Guillaume Rabusseau, and Reihaneh Rabbany. 2023. Fast and attributed change detection on dynamic graphs with density of states. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 15–26.
    [29]
    Shenyang Huang, Yasmeen Hitti, Guillaume Rabusseau, and Reihaneh Rabbany. 2020. Laplacian change point detection for dynamic graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 349–358.
    [30]
    Tsuyoshi Idé and Hisashi Kashima. 2004. Eigenspace-based anomaly detection in computer systems. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 440–449.
    [31]
    Alexandra Kerr. 2020. A Historical Timeline of COVID-19 in New York City. (2020). Retrieved from https://www.investopedia.com/historical-timeline-of-covid-19-in-new-york-city-5071986. Accessed April 5th, 2021.
    [32]
    Niko Kiukkonen, Jan Blom, Olivier Dousse, Daniel Gatica-Perez, and Juha Laurila. 2010. Towards rich mobile phone datasets: Lausanne data collection campaign. Proc. ICPS, Berlin, Vol. 68.
    [33]
    Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, and Maja Pantic. 2019. Tensorly: Tensor learning in python. The Journal of Machine Learning Research 20, 1 (2019), 925–930.
    [34]
    Danai Koutra, Evangelos E. Papalexakis, and Christos Faloutsos. 2012. Tensorsplat: Spotting latent anomalies in time. In Proceedings of the 2012 16th Panhellenic Conference on Informatics. IEEE, 144–149.
    [35]
    Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, and Christos Faloutsos. 2016. Deltacon: Principled massive-graph similarity function with attribution. ACM Transactions on Knowledge Discovery from Data 10, 3 (2016), 1–43.
    [36]
    Emmett Macfarlane. 2019. The Renewed Canadian Senate: Organizational Challenges and Relations with the Government. Institute for Research on Public Policy.
    [37]
    Alex Marland. 2013. What is a political brand?: Justin Trudeau and the theory of political branding. In Proceedings of the Annual Meeting of the Canadian Communication Association and the Canadian Political Science Association, University of Victoria, British Columbia.
    [38]
    Pedro Mercado, Antoine Gautier, Francesco Tudisco, and Matthias Hein. 2018. The power mean laplacian for multilayer graph clustering. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 1828–1838.
    [39]
    Hadar Miller and Osnat Mokryn. 2020. Size agnostic change point detection framework for evolving networks. Plos One 15, 4 (2020), e0231035.
    [40]
    Mark EJ Newman and Aaron Clauset. 2016. Structure and inference in annotated networks. Nature Communications 7, 1 (2016), 1–11.
    [41]
    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The Pagerank Citation Ranking: Bringing Order to the Web.Technical Report. Stanford InfoLab.
    [42]
    Pietro Panzarasa, Tore Opsahl, and Kathleen M. Carley. 2009. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology 60, 5 (2009), 911–932.
    [43]
    Evangelos E. Papalexakis, Leman Akoglu, and Dino Ience. 2013. Do more views of a graph help? community detection and clustering in multi-graphs. In Proceedings of the 16th International Conference on Information Fusion. IEEE, 899–905.
    [44]
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research 12 (2011), 2825–2830.
    [45]
    Leto Peel and Aaron Clauset. 2015. Detecting change points in the large-scale structure of evolving networks. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.
    [46]
    Stephen Ranshous, Shitian Shen, Danai Koutra, Steve Harenberg, Christos Faloutsos, and Nagiza F. Samatova. 2015. Anomaly detection in dynamic networks: A survey. Wiley Interdisciplinary Reviews: Computational Statistics 7, 3 (2015), 223–247.
    [47]
    Awwal Mohammed Rufai, Gholamreza Anbarjafari, and Hasan Demirel. 2014. Lossy image compression using singular value decomposition and wavelet difference reduction. Digital Signal Processing 24, 7 (2014), 117–123.
    [48]
    Xiang-Rong Sheng, De-Chuan Zhan, Su Lu, and Yuan Jiang. 2019. Multi-view anomaly detection: Neighborhood in locality matters. In Proceedings of the AAAI Conference on Artificial Intelligence. 4894–4901.
    [49]
    Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 888–905.
    [50]
    Yunkyu Sohn and Jonghee Park. 2017. Bayesian approach to multilayer stochastic blockmodel and network changepoint detection. Netw. Sci. 5, 2 (2017), 164–186.
    [51]
    Lei Tang, Xufei Wang, and Huan Liu. 2010. Community Detection in Multi-Dimensional Networks. Technical Report. Arizona State University, Department of Computer Science and Engineering.
    [52]
    Wei Tang, Zhengdong Lu, and Inderjit S. Dhillon. 2009. Clustering with multiple graphs. In Proceedings of the 2009 9th IEEE International Conference on Data Mining. IEEE, 1016–1021.
    [53]
    Xian Teng, Yu-Ru Lin, and Xidao Wen. 2017. Anomaly detection in dynamic networks using multi-view time-series hypersphere learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 827–836.
    [54]
    Renato Vizuete, Federica Garin, and Paolo Frasca. 2021. The Laplacian spectrum of large graphs sampled from graphons. IEEE Transactions on Network Science and Engineering 8, 2 (2021), 1711–1721.
    [55]
    Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4 (2007), 395–416.
    [56]
    Junhao Wang, Sacha Levy, Ren Wang, Aayushi Kulshrestha, and Reihaneh Rabbany. 2019. SCG: Spotting Coordinated Groups in Social Media. arXiv preprint arXiv:1910.07130.
    [57]
    Yu Wang, Aniket Chakrabarti, David Sivakoff, and Srinivasan Parthasarathy. 2017. Fast change point detection on dynamic social networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2992–2998.
    [58]
    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? In International Conference on Learning Representations.
    [59]
    Sina Zamani, Tejaswi Nanjundaswamy, and Kenneth Rose. 2017. Frequency domain singular value decomposition for efficient spatial audio coding. In Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 126–130.
    [60]
    Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2017. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2017), 1774–1785.
    [61]
    Xiao-Dong Zhang. 2011. The Laplacian eigenvalues of graphs: A survey. arXiv preprint arXiv:1111.2897.
    [62]
    Eric R. Ziegel. 2001. Standard probability and statistics tables and formulae. Technometrics 43, 2 (2001), 249.

    Cited By

    View all
    • (2024)VGGM: Variational Graph Gaussian Mixture Model for Unsupervised Change Point Detection in Dynamic NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.337754819(4272-4284)Online publication date: 2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 3
    April 2024
    663 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3613567
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 January 2024
    Online AM: 06 November 2023
    Accepted: 23 October 2023
    Revised: 25 September 2023
    Received: 31 January 2023
    Published in TKDD Volume 18, Issue 3

    Check for updates

    Author Tags

    1. Spectral methods
    2. Graph algorithms
    3. Graphs and networks
    4. Machine learning

    Qualifiers

    • Research-article

    Funding Sources

    • Canadian Institute for Advanced Research (CIFAR AI chair program)
    • Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship-Doctoral (PGS D) Award
    • Fonds de recherche du Québec – Nature et Technologies (FRQNT) Doctoral Award

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)478
    • Downloads (Last 6 weeks)71

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VGGM: Variational Graph Gaussian Mixture Model for Unsupervised Change Point Detection in Dynamic NetworksIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.337754819(4272-4284)Online publication date: 2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media