research-article

Open access

Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs

Authors:

Shenyang Huang, Samy Coulombe, Yasmeen Hitti, Reihaneh Rabbany, and Guillaume RabusseauAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 3

Article No.: 63, Pages 1 - 32

https://doi.org/10.1145/3631609

Published: 12 January 2024 Publication History

PDF eReader

Abstract

Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real-world applications such as intrusion identification in network systems, detection of ecosystem disturbances, and detection of epidemic outbreaks. In this article, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: (i) how to compare graph snapshots across time, (ii) how to capture temporal dependencies, and (iii) how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short-term and long-term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that (i) LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, (ii) MultiLAD’s advantage over contenders significantly increases when additional views are available, and (iii) MultiLAD is highly robust to noise from individual views. In five real-world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.

1 Introduction

Real-world problems in various domains (e.g., political science, biology, chemistry, and sociology) can be modeled as evolving networks that capture temporal relations between nodes. With the increasing availability of dynamic network data in areas such as social media, public health and transportation, providing sophisticated methods that can identify anomalies over time is an important research direction. The task of anomaly detection in dynamic graphs aims at identifying different types of time-varying entities that significantly deviate from the “normal” or “expected” behavior of the underlying graph distribution.

In this work, we focus on the change point detection problem which identifies time steps where the graph structure or many graph components deviate significantly from the normal behavior. As change point detection and event detection are closely related, we first explain the distinctions between two of them. Following [57]’s definition, a change point is a time point where there is a sudden change in the underlying network generative process and this new process continues beyond the current point. In contrast, an event is defined as a time point where the network deviates significantly from the expected behavior and resumes back to normal after this time point. Once the anomalous graph instance is found, potential causes can be identified through various static graph analysis techniques. In this work, we first propose a novel change point detection method, Laplacian Anomaly Detection (LAD).¹ There are two major challenges for change point detection in dynamic graphs. First, how to compare graph snapshots across time. It is difficult to directly compare two graphs or their adjacency matrices. A common approach is to summarize the graph into a low dimensional embedding vector called the signature vector. LAD uses the singular values of the Laplacian matrix (Laplacian spectrum) as the signature vector for each snapshot because they are closely related to the graph connectivity and low rank approximation. Second, how to capture temporal dependencies. In practice, graphs can undergo abrupt changes at any given time step [30]. These sudden changes can be detected by examining the graph snapshots from a few timestamps in the past or a short time window. However, gradual changes in the graph structure is often shown over a long time horizon and possible accompanied by permanent changes in the underlying graph generative model [57]. Therefore, effective detection of both sudden and gradual changes requires the perspective from both short-term and long-term perspectives. We propose to use both short and long-term sliding windows to explicitly compare the current graph structure with those in the windows. The snapshots within a window are summarized by stacking the signature vectors from these snapshots into a matrix and then computing the principal left singular vector of this matrix as the normal behavior within this window. Then it is compared with the current signature vector via cosine similiarity.

In practice, a number of networks can describe the same underlying relations. Multi-view (a.k.a. multiplex) networks are a subset of multi-layer networks, which describe relations between the same set of entities from different information sources. Each source can be modeled as an individual view. The interactions between the entities in each view often evolve over time thus forming multi-view dynamic networks. These networks arise naturally in numerous domains such as traffic/transportation networks [15, 18], mobile phone networks [32], friendship networks [13], social media networks [56], and citation networks [52]. Recent work has shown promising improvements by leveraging the additional information from multi-view data in tasks such as community detection [38, 43], anomalous node detection [48], and event forecasting [27]. To the best of our knowledge, existing methods studying change point detection only focused on single view dynamic networks [2, 29, 30, 45, 57]. Therefore, we propose MultiLAD as the first change point detection method for multi-view dynamic networks.

To better show the motivation for MultiLAD, let us consider a real-world application. Given trip records of different taxi companies and for-hire vehicles, can we identify important occasions such as holidays, events, or unusual weather conditions that disrupt the overall traffic network? In general, there are two main motivations for leveraging multi-view information for change point detection in dynamic networks. First, data from additional sources may have different characteristics and values [43] thus contributing unique and complementary information. In this case, records from taxi companies and for-hire vehicle services (such as Uber and Lyft) should model the same traffic network but from potentially different age groups and demographics. Second, individual sources can have a high degree of noise and contain disruptive information due to the data collection process. Figure 1 shows the anomalous event on March 22nd 2020 in the New York City Transit network detected by our proposed method. This day signals the beginning of the work from home requirement for all non-essential workers. To better illustrate the graph structure, the nodes are colored by their detected community assignment and their sizes correspond to PageRank [41]. Here, we have four views which are explained in detail in Section 8.9. Given these four views with different graph structures and evolution, our method effectively reasons across views to find the important anomalies in the underlying traffic network. In contrast, it is often difficult or even infeasible to select one view and apply a single-view change point detection method such as LAD (LAD’s anomaly scores on each individual views are reported in Figure 1). As seen in the figure, each view can exhibit a different set of anomalies, leading to significant differences among anomalies scores. This disagreement problem is further exacerbated when more views are present. Therefore, naive aggregation strategies on anomaly scores (such as max or mean) are insufficient, while MultiLAD aggregates view-specific features in a meaningful way, preserving the essential information for anomaly detection. After view aggregation, MultiLAD utilizes the same anomaly detection procedures as LAD thus MultiLAD can be seen as a multi-view generalization of LAD.

Fig. 1.

Our proposed Multi-view Laplacian Anomaly Detection (MultiLAD) uses the scalar power mean operation to meaningfully aggregate information from individual views and infers the anomaly score based on an aggregated vector. More specifically, MultiLAD first computes the singular values of the normalized Laplacian matrix from each view as the view-specific feature vector. Then, MultiLAD merges information across different views by aggregating these feature vectors through the scalar power mean operation. In this way, the effect of noise from individual views are lessened while key information about the overall generative process of the multi-view network is preserved. Lastly, MultiLAD compares the aggregated vector with past normal behavior and determines if a particular time step is anomalous or not.

Summary of contributions:

–

We introduce a novel change point detection method: LAD. LAD utilizes the singular values of the Laplacian matrix to obtain a low dimension embedding of graph snapshots. To the best of our knowledge, this is the first time that the Laplacian spectrum has been used for change point detection.

–

LAD captures both the short-term and long-term temporal relations to detect both events and change points.

–

We extensively evaluate LAD on three synthetic tasks and three real-world datasets. We show that LAD is more effective at identifying significant events than state-of-the-art methods.

–

We extend LAD to the multi-view setting by proposing MultiLAD: the first change point detection method for multi-view dynamic networks. MultiLAD aggregates the singular values of the normalized Laplacian matrices through scalar power mean and identifies the most informative singular values from each view.

–

We extensively evaluate MultiLAD on synthetic experiments and show that MultiLAD (1). gains increased performance from additional views, (2). is highly robust to noise, and (3). significantly outperforms state-of-the-art single view baselines and their multi-view extensions.

–

We apply MultiLAD to two real-world multi-view traffic networks. We demonstrate that MultiLAD correctly detects major real-world traffic disruptions such as the implementation of stay-at-home order for COVID-19 and Christmas day.

Reproducibility: All code is publicly available. The codes for LAD and MultiLAD are available publicly as follows:

–

LAD Github: https://github.com/shenyangHuang/LAD.git

–

MultiLAD Github: https://github.com/shenyangHuang/multiLAD.git

2 Related Work

Many real-world systems and relations can be formulated as dynamic graphs [15, 18, 32]. The relations can also be captured by constructing similarity matrices using $k$ Nearest Neighbors (KNNs) distance [60]. For dynamic graphs, the common strategy across most anomaly detection methods is to extract a low dimensional representation from graph snapshots and then apply an anomaly scoring function to compare these representations [46]. In this section, we discuss related literature from both single-view and multi-view anomaly detection. We compare features of LAD, MultiLAD, and other alternative methods in Table 1. Note that our proposed MultiLAD is the only approach that satisfies all the desired properties and is designed for multi-view dynamic graphs.

Table 1.

2.1 Event Detection

An early work by Idé and Kashima [30] examines runtime anomalies at the application layer of multi-node computer systems. A web-based system is modeled as a weighted graph where each edge represents a dependency between services. Idé and Kashima aim at finding time points where the majority of the edge attributes in the network show significant deviation from the recent ones. The principal eigenvector corresponding to the maximum eigenvalue of the positive weighted adjacency matrix $W$ is used as a low dimensional representation of the graph (called activity vector). Activity vectors from a short-term context window form a matrix upon which singular value decomposition (SVD) is then performed. The typical graph behavior within the context window is then represented as the principle left singular vector. The deviation of the current activity vector from the typical behavior is used as the anomaly score. Different from Idé and Kashima, we use the Laplacian spectrum to summarize graph structures and explicitly construct two sliding windows for long-term and short-term contexts.

In ref [34], a dynamic graph is viewed as a high order tensor. The authors proposed to perform the PARAFAC decomposition [7] of this tensor to obtain the signature vector for anomaly detection. As a multi-view dynamic graph can be considered as a 4th order tensor, we generalize TENSORSPLAT to the multi-view setting, and include it as one of the baselines in our experiments in Section 8. SPOTLIGHT [15] was proposed to detect anomalies in dynamic graphs and is centered around monitoring the (dis)appearance of large or dense subgraphs in a dynamic graph. SPOTLIGHT focuses on the related but different anomalous subgraph detection task, while we detect anomalous graph snapshots. In this work, we also investigate the New York City Transit 2015–2016 dataset studied in [15].

2.2 Change Point Detection

Koutra et al. [35] formally stated the axioms and desired properties of graph similarity functions. Their proposed an anomaly detection method named DeltaCon, which first computes pairwise node affinities in the first graph and then measures the difference in node affinity score between two graphs. Therefore, DeltaCon is only well-defined for pairs of graphs thus lacking the ability to reason with a sequence of graph snapshots. This inherently limits its ability to detect gradual changes that involve multiple snapshots.

Wang et al. [57] model network evolution as a first order Markov process and propose the EdgeMonitoring method based on Monte-Carlo sampling techniques. Their assumption is that there is some unknown underlying model that governs the generative process. Moreover, each graph snapshot is dependent on the current generative model as well as the previously observed snapshot. This method is often regarded as the current state-of-the-art for change point detection. However, EdgeMonitoring relies on consistent node orderings across all time steps and assumes constant number of nodes for each snapshot. This assumption is easily violated in large social networks where users accounts are added frequently. In contrast, LAD can manage varying number of nodes across time. Miller and Mokryn [39] proposed an online fast change point detection method called SACPD, based on the degree distribution of snapshots of networks in time. SACPD computes the cumulative distribution function of the network degrees and assumes no prior knowledge of the network thus being size agnostic. Our LAD also assumes no prior distribution on the network by utilizing the Laplacian spectrum.

2.3 Multi-view Anomaly Detection

Most multi-view anomaly detection methods follow a two step procedure: (1). extract view-specific representations and (2). aggregate such representations to identify anomalous data. While prior work [23, 48, 50, 53] focused on the anomalous node (or instance) detection task, we tackle the change point detection problem in the multi-view dynamic graph setting. In other words, instead of detecting anomalous nodes, we spot anomalous timestamps when the structure of the entire graph changes.

We draw on the same idea used in node-level anomaly detection methods which assumes that abnormal events would create a disturbance of regularities across all views. By mining such irregular patterns, one can achieve stronger anomaly detection results than with just a single view. In this work, we use this idea to identify anomalous graph snapshots across different views to detect significant changes in the overall graph generative process for all views.

2.4 Multi-view Clustering

The goal of graph clustering is to obtain groups of nodes that are similar with regards to some structural or node attribute information [43, 51]. Graph clustering in the multi-view setting is a similar task to multi-view change point detection in which one needs to combine information from different sources on the same nodes to extract some underlying structure. However, multi-view graph clustering focuses on static graphs while in this work we consider dynamic graphs.

Mercado et al. [38] propose to take the matrix power means of Laplacians (PML) to extend the spectral clustering algorithm to multilayer graphs. Inspired by the effectiveness of power mean operations in multi-view clustering, we propose to utilize the scalar power mean operation to merge information from different views for change point detection.

Sohn et al. [50] proposed a Bayesian multilayer stochastic blockmodeling framework for multilayer clustering. They also extend their method to detect communities changes in multilayer temporal graphs. However, their approach is limited to networks with block structures. In comparison, MultiLAD is applicable to any graph including those with no known community structures (such as the Barabási-Albert model).

3 Problem Definition

In this section, we formally define the multi-view change point and event detection problems which encompasses the classical single-view definition of the same problems as a special case.

Multi-view Dynamic Graph A multi-view dynamic graph $\mathbb {G} = \lbrace \mathcal {G}_t \rbrace _{t=1}^{T}$ is given by a collection of graph snapshots $\mathcal {G}_t = \lbrace \mathbf {G}_{t,r} \rbrace _{r=1}^m$ , where $\mathbf {G}_{t,r}$ is the graph from the view (or relation) $r$ at time step $t$ . At each time step, each graph $\mathbf {G}_{t,r} = (\mathbf {V}, \mathbf {E}_{t,r})$ consists of the set of nodes $\mathbf {V}$ and a set of edges $\mathbf {E}_{t,r} \subset \mathbf {V} \times \mathbf {V}$ . For clarity of exposition, we assume that the set of nodes is fixed over time and each view has the same number of nodes (though MultiLAD can handle settings where this assumption does not hold, see Section 6.1). Each edge $e=(i,j,w) \in \mathbf {E}_{t,r}$ is then defined as the connection between node $i$ and node $j$ at timestamp $t$ in the view $l$ with weight $w \in \mathbb {R}^{+}$ . Note that in practice both nodes and edges can appear or disappear from one-time step to another. We denote by $\mathbf {A}_{t,r} \in \mathbb {R}^{n \times n}$ the adjacency matrix representing edges in $\mathbf {E}_{t,r}$ where $n = |\mathbf {V}|$ . When the number of views $m=1$ , we fall back into the single-view setting similar to [30, 57].

Multi-view Change Point Detection Based on the above definition, let $\mathbb {G}$ be a multi-view dynamic graph and let $\mathbf {H}_t$ be the underlying graph generative model at time $t$ . $\mathbf {H}_t$ is not observed but is assumed to control the graph behavior across all views. The goal of multi-view change point detection is to find time points after which $\mathbf {H}_t$ significantly differs from the previous steps. More precisely, we want to find a set $S \subseteq \lbrace 1, \ldots , T\rbrace$ such that for each $t \in S$ , $\cdots \simeq \mathbf {H}_{t-2} \simeq \mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t} \simeq \mathbf {H}_{t+1} \simeq \cdots$ .

Multi-view Event Detection In this work, we also consider events similar to [30]. Let $\mathbb {G}$ be a multi-view dynamic graph and let $\mathbf {H}_t$ be the underlying graph generative model at time $t$ . The goal of event detection is to find a set of time points $S \subseteq \lbrace 1, \ldots , T\rbrace$ such that for each $t \in S$ , $\mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t}$ and $\mathbf {H}_{t-1} \simeq \mathbf {H}_{t+1}$ . Events are one-time sudden changes in graph structure where the generative model $\mathbf {H}_{t+1}$ reverts back to normal after the event at time $t$ . For clarity, we include a table of notations in Table 2.

Table 2.

Notation	Description
$e$	edge between a source and a destination node
$r$	view (or relation) index
$t$	time index
$w$	edge weight
$\mathbb {G} = \lbrace \mathcal {G}_t\rbrace _{t=1}^T$	a multi-view dynamic graph
$\mathcal {G}_t$	multi-view dynamic graph at time $t$
$\mathbf {G}_{t,r}$	graph snapshot of view $r$ at time step $t$
$\mathbf {V}$	set of vertices in a multi-view dynamic graph
$\mathbf {E}_{t,r}$	set of edges of view $r$ at time step $t$
$\mathbf {A}_{t,r}$	adjacency matrix representing the edges of view $r$ at time step $t$ , a matrix representation of $\mathbf {E}_{t,r}$
$\mathbf {D}$	diagonal degree matrix
$\mathbf {L}$	unnormalized Laplacian matrix
$\mathbf {L}_{sym}$	symmetric normalized Laplacian matrix
$n$	number of nodes in a multi-view dynamic graph, $n=\|\mathbf {V}\|$
$m$	number of views in a multi-view dynamic graph
$\mathbf {\sigma }_t$	Laplacian spectrum of a graph snapshot (behavior vector)
$\mathbf {C}$	context matrix constructed from a sequence of consecutive behavior vectors
$\tilde{\sigma }_t$	normal behavior vector obtained from a context matrix
$Z$	raw anomaly score
$Z_t^*$	processed (final) anomaly score at time step $t$
$\Sigma$	matrix of left singular vectors
$\mathbf {\lambda }_{r,t}$	a singular value vector of the normalized Laplacian matrix of view $r$ at time $t$

Table 2. Notations used in this Work

4 Laplacian Anomaly Detection

We first propose a new spectral anomaly detection method for single-view dynamic graphs: LAD. The core idea of LAD is to detect high-level graph changes from low dimensional embeddings (called signature vectors). The “typical” or “normal” behavior of the graph can then be extracted from a stream of signature vectors based on both short-term and long-term dependencies. In this way, we can compare the deviation of the current signature vector from the normal behavior.

4.1 Laplacian Spectrum

We first define the (unnormalized) Laplacian matrix $\mathbf {L}$ as $\mathbf {L} = \mathbf {D} - \mathbf {A}$ where $\mathbf {D}$ is the diagonal degree matrix and $\mathbf {A}$ is the adjacency matrix of $\mathbf {G}$ . Here we drop the subscripts for readability, $\mathbf {L}$ is defined based on a given view at a given timestep. For LAD, we use the vector of singular values obtained through Singular Value Decomposition (SVD) [21] of $\mathbf {L}$ as graph embedding for a snapshot $\mathbf {G}$ . Figure 2 shows the visualization of the Laplacian spectrum and the corresponding anomaly scores detected by LAD for the Senate co-sponsorship network. This choice is motivated by the fact that singular values

Fig. 2.

–

are related to the Laplacian spectrum which is fundamentally related to structural properties of the graph such as community structure and degree distribution,

–

encodes the compression loss of a low rank approximation of the Laplacian matrix,

–

are node permutation invariant,

–

can be efficiently computed for real-world sparse matrices.

First, it is known that the singular values of a positive symmetric matrix coincides with its eigenvalues. The Laplacian matrix is symmetric and positive semi-definite for an undirected weighted graph [55]. The eigenvalues of the Laplacian matrix capture fundamental structural properties of the corresponding graph. These properties have been extensively utilized in many fields such as randomized algorithms, combinatorial optimization problems, and machine learning [61]. Notably, the field of spectral graph theory [9] extensively studies properties of the graph Laplacian matrix. As an illustration, the multiplicity $k$ of the eigenvalue 0 of $\mathbf {L}$ is equal to the number of connected components of the graph [55]. In addition, the Laplacian spectrum is related to many other structural properties of the graph such as the degree sequence, number of connected components, diameter, vertex connectivity, and more [61].

Second, it is well known that the truncated SVD gives the best low rank approximation of a matrix with respect to both the Frobenius norm and the 2-norm (Eckart-Young theorem). More precisely, the $(k+1)$ th singular value $\sigma _{k+1}$ of a matrix corresponds to the reconstruction error of the best rank $k$ approximation measured in the 2-norm. From this insight, we know that the (ordered) singular spectrum $\sigma _1, \sigma _2, \ldots , \sigma _r$ encodes rich information regarding the reconstruction loss that would occur for different levels of low rank approximations. Truncated SVD is often used as a powerful compression tool for images [47], videos [4], and audio [59]. Thus, capturing the singular spectrum can be seen as an alternative compression based anomaly detection technique as categorized by Ranshous et al. [46]. Intuitively, huge fluctuations in the singular values of the Laplacian matrix reflect drastic changes to the global graph structure.

Third, being node permutation invariant is one of the desirable properties for a graph learning method [58]. LAD is permutation invariant as singular values of the Laplacian matrix is invariant to row and column permutations. Lastly, singular values can be efficiently obtained through sparse computations [1] and we discuss the computational complexity of LAD in Section 6.2.

4.2 Characterizing Normal Behavior

Identifying a normal or typical behavior from a sliding window is often an integral part of change point detection. Similar to [2, 30], we compute a “typical” or “normal” behavior vector from the Laplacian spectrum of the previous $l$ time steps, where $l$ is the window size. First, we perform L2 normalization on the Laplacian spectrum seen so far $\mathbf {\sigma }_0, ... , \mathbf {\sigma }_t\in \mathbb {R}^n$ to obtain unit vectors. Next, a context matrix $\mathbf {C}$ is constructed:

$\begin{equation} \mathbf {C} = \left(\begin{array}{cccc} |& | & &| \\ \mathbf {\sigma }_{t-l} & \mathbf {\sigma }_{t-l+1} & ... & \mathbf {\sigma }_{t-1} \\ |& | & & | \end{array} \right) \in \mathbb {R}^{n \times l}. \end{equation}$

(1)

where $n$ is the length of the signature vector (which corresponds to the number of nodes in the graph). We compute the left singular vector of $\mathbf {C}$ with SVD to obtain the normal behavior vector $\tilde{\sigma }_t$ . One can interpret this behavior vector as a weighted average of the sliding window spectrums [2].

We propose to compare the current signature vector with the typical behavior from two independent sliding windows: a short-term window and a long-term window. The short-term window encodes information from the most recent trend and captures abrupt changes in the overall graph structure. Depending on the application, the length of the short-term window can be adjusted to best reflect an appropriate time scale. On the other hand, a long-term window is designed to capture larger scale and more gradual trends in the dynamic graph. For example, for the UCI Message dataset, the short-term context can be monitoring weekly change and the long-term context can be monitoring a biweekly change.

4.3 Anomalous Score Computation

After capturing the normal behavior, a scoring function is then defined to measure the difference between the current signature vector and the expected or normal one. In this work, we use the same anomaly score as introduced in [2, 30], namely the $Z$ score. Let $\tilde{\sigma }_t$ be the normal behavior vector and $\mathbf {\sigma }_t$ be the Laplacian spectrum at current step. As mentioned in Section 4.2, both $\tilde{\sigma }_t$ and $\mathbf {\sigma }_t$ are normalized to unit vectors, then the $Z$ score is computed as

$\begin{equation} Z = 1 - \frac{\mathbf {\sigma }_t^\top \tilde{\sigma }_t}{ \Vert \mathbf {\sigma }_t\Vert _2 \Vert \tilde{\sigma }_t\Vert _2} = 1 - \mathbf {\sigma }_t^\top \tilde{\sigma }_t = 1 - \cos {\theta }, \end{equation}$

(2)

where $\cos {\theta }$ is the cosine similarity between the vectors $\tilde{\sigma }_t$ and $\mathbf {\sigma }_t$ . Essentially the $Z$ scores becomes closer to 1 when the current spectrum is very dissimilar to the normal thus signaling an anomalous point.

Let $w_s$ and $w_l$ be the sizes of the short-term and long-term sliding windows respectively. Analogous to Section 4.2, one can obtain two different normal behavior vectors based on the two sliding windows. One can then compute the short-term and long-term anomaly scores $Z_{s}$ and $Z_l$ based on cosine similarity. To best aggregate these two perspectives, we take $Z_t = max(Z_s, Z_l)$ to decide if the current graph is more anomalous in abrupt or gradual changes.

$Z_t$ scores can be used to detect events (one-time changes in the graph) while change points are often followed by a sequence of decaying anomaly scores. We aim at detecting both events and change points thus we compute the final anomaly score $Z_t^* = min(Z_t - Z_{t-1},0)$ which emphasizes the time points with the largest increase in $Z_t$ when compared to $Z_{t-1}$ . Additional discussion and visualization are added in Appendix A.1.

5 MultiLAD

In this section, we present MultiLAD for multi-view change point detection in dynamic graphs. The overall procedure for MultiLAD is summarized in Algorithm 1.

5.1 Extracting Signature Vectors Per View

Using the unnormalized Laplacian matrix $\mathbf {L}$ can be problematic in the multi-view setting where edge weights from different views may vary significantly in magnitude. Therefore, we use the symmetric normalized Laplacian [55] instead:

$\begin{equation} \mathbf {L}_{sym} = \mathbf {D}^{-\frac{1}{2}} \mathbf {L} \mathbf {D}^{-\frac{1}{2}} = \mathbf {I} - \mathbf {D}^{-\frac{1}{2}} \mathbf {A} \mathbf {D}^{-\frac{1}{2}}. \end{equation}$

(3)

In this way, the Laplacian matrix of each view is normalized by node degree while also being symmetric (the eigenvalues are the same as singular values). Note that the singular values $\lambda _i$ of $\mathbf {L}_{sym}$ are now bounded in $[0,2]$ (in contrast with the singular values $\sigma _i$ of the un-nromalized Laplacian which were unbounded). We consider the singular values vector $\mathbf {\lambda }_{r,t} = [\lambda _{1}, \dots , \lambda _{n}]$ of the normalized Laplacian $\mathbf {L}_{sym}$ as the signature vector for the graph $\mathbf {G}_{t,r}$ of the $r$ th view at time step $t$ . Therefore, at each time step $t$ , we obtain $m$ signature vectors coming from each view.

5.2 Aggregating Per View Signatures

To aggregate the per-view signature vectors for multi-view change point detection, we propose to use the scalar power mean operation. The $p$ th order power mean of a set of non-negative real numbers $x_1, ... , x_m$ is defined by

$\begin{equation} s_p(x_1, ... , x_m) = \left(\frac{1}{m} \sum _{i=1}^{m} x_i^p\right)^{\frac{1}{p}}, \end{equation}$

(4)

where $p \in \mathbb {R}$ is a hyper-parameter. We denote the vector of singular values of the Normalized Laplacian matrix $\mathbf {L}_{sym}$ in view $r$ at time $t$ by $\mathbf {\lambda }_{r,t}$ as before. We now define the scalar power mean spectrum $\Lambda _t$ of a multi-view dynamic graph with $m$ views at time $t$ as

$\begin{equation} \Lambda _t = s_p(\mathbf {\lambda }_{1,t}, \dots , \mathbf {\lambda }_{m,t}) \in \mathbb {R}^n, \end{equation}$

(5)

where $s_p$ is applied component-wise to the singular value vectors, i.e., $(\Lambda _{t})_i = s_p((\mathbf {\lambda }_{1,t})_i, \dots , (\lambda _{m,t})_i)$ for all $0 \le i \le n$ . In this way, $\Lambda _t \in \mathbb {R}^n$ is aggregated from the set of per-view signature vectors.

Aggregating different views with the power mean operation is a natural choice as it simply smooths the spectrums across all views thus reflecting the geometry of the overall graph generative model. In addition, the choice of order $p$ gives the flexibility to emphasize different frequency in the spectrum. In this work, we set $p=-10$ for all experiments as allows the smaller eigenvalues to be emphasized. This is motivated by the fact that the smallest eigenvalues of the Laplacian corresponds to the lower frequency in the graph which is often considered to be more useful in practice. For example, the number of zero eigenvalues corresponds to the number of connected components in the graph [55]. Mercado et al. [38] also found the power mean operation to be a powerful tool in the multi-view spectral clustering task. Empirically, we also observe that $p=-10$ results in the best overall performance for change point detection as shown in an ablation study in Appendix B.

We also add a small diagonal shift to the normalized Laplacian $\mathbf {L}_{sym}$ to ensure that it is positive definite. For all experiments, we set the shift $\epsilon = log(1+|p|)$ for $p\lt 0$ thus no 0 would be encountered in Equation (4).

6 Method Properties

6.1 Evolving Graph Size and Node Permutation

For clarity, we assumed that the number of nodes in multi-view dynamic graphs is constant over time and the same across all views. In many real-world graphs, the number of nodes often change over time. In practice, LAD is able to adapt to the evolving size of dynamic graphs by computing the top $k$ singular values as signature vectors instead of the full spectrum. $k$ can be chosen to be the number of nodes of the smallest graph snapshot seen so far. Similarly, MultiLAD can be applied to multi-view dynamic graphs where the graph size changes by considering the top $k$ singular values from each view before aggregation. In addition, MultiLAD can also adapt to multi-view graphs where the number of nodes in each view differs from each other, again by considering the top $k$ singular values where $k$ is the size of the view with the least number of nodes.

Since the Laplacian spectrum is invariant to row/column permutations, both LAD and MultiLAD are invariant to node permutations. This implies that the nodes of a dynamic graph are not required to conserve the same ordering for each graph snapshot. This can be very relevant for privacy sensitive applications where the alignment of nodes across many time steps could cause information leakage. This property can also be very useful in the multi-view setting where aligning node order across views may be technically challenging. In this way, LAD and MultiLAD can be applied to a broader range of scenarios than methods relying on a consistent node ordering across snapshots.

6.2 Computational Complexity

The computational complexity of LAD is dominated by the singular value decomposition of the Laplacian, which in general has complexity $\mathcal {O}(n^3)$ for the Laplacian matrix $\mathbf {L} \in \mathbb {R}^{n \times n}$ . As many real-world networks are sparse, sparse SVD can be used to reduce the complexity to $\mathcal {O}(k|\mathbf {E}|)$ , where $k$ is the number of singular values and $|\mathbf {E}|$ is the number of edges in the graph. In a multi-view dynamic graph with $T$ time steps, $m$ views and an average of $|\mathbf {E}|$ edges per snapshot, MultiLAD has a complexity of $\mathcal {O}(k \cdot m \cdot T \cdot |\mathbf {E}|)$ when computing $k$ singular values. Note that the scalar power mean aggregation is very fast to compute thus computation cost of MultiLAD lies mainly in computing SVD for each view. In real-world networks with hundreds of nodes, MultiLAD is fast to compute, usually taking in the order of minutes on CPUs. For large graphs with millions of nodes where computing SVD is not feasible, efficient approximation techniques such as Network Density of States [12] can be used, as seen in [28], to approximate the spectrum of the Laplacian.

7 Single-view Experiments

In this section, we conduct extensive experiments on both synthetic and real-world single-view dynamic graphs to evaluate the performance of LAD.

7.1 Measuring Performance

To quantitatively evaluate the quality of detected change points, we choose the Hits at $n$ ( $H@n$ ) metric which reports the number of correctly identified anomalies out of the top $n$ time points with the highest anomaly scores. In synthetic experiments, we use the planted anomalies in the generation process for evaluation. In real-world datasets, we treat the well-known anomalous time steps from the literature as ground truth anomalies.

7.2 Contenders and Baselines

We compare LAD with alternative single-view methods. We use the same sliding window sizes across all methods when applicable. Both the anomalies scores from the short-term and long-term sliding windows are shown in Figures 3, 4, 5, and 6. For EdgeMonitoring and SACPD, we use the implementation kindly provided by the original authors while other baselines are our implementation. Note that for all experiments, the startup period ( $0,... ,l$ ) is set to have anomaly score of 0 because we assume these points are not change points.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

–

Activity vector. We refer to the method proposed by Idé et al. [30] as “activity vector” based methods. Idé et al. used the principal eigenvector of the adjacency matrix (namely the activity vector) instead of our proposed Laplacian spectrum. According to the original work, only a short-term context window is considered.

–

TENSORSPLAT. Koutra et al. [34] proposed to view the temporal graph as a tensor and then perform the PARAFAC or CP decomposition to obtain low dimensional factors that group similar entities or timestamps together (we use CP rank of 30 for all experiments). The original article proposed to use clustering on the factors for change detection. However, the clustering algorithm is not specified. We use the well-known Local Outlier Factor (LOF) [6] approach along with the TENSORSPLAT framework.

–

EdgeMonitoring. Wang et al. [57] proposed the EdgeMonitoring approach and used joint edge probabilities as the feature vector while modeling network evolution as a first order Markov process.

–

SPOTLIGHT [16]: SPOTLIGHT takes a randomized sketching-based approach to detect the sudden appearance or disappearance of large dense subgraphs. We implemented SPOTLIGHT in python and use the default Robust Random Cut Forest ( RRCF) [22] for anomaly detection with the SPOTLIGHT embeddings. Following [16], we set $p=q=0.2$ and $k=50$ sketch dimensions.

–

SACPD [39]: SACPD is an online and fast change point detection method utilizing the degree distribution of network snapshots. The method is size agnostic and the study focused on interaction social networks. The same window size with LAD is used.

7.3 Synthetic Experiments

To demonstrate the performance of LAD, we design three controlled experiments. For these synthetic experiments, we use data generated from the Stochastic Block Model (SBM) [26]. The number of communities $N_c$ as well as the number of nodes within each community can be specified through a size vector $\mathbf {c}\in \mathbb {R}^{k}$ . In addition, the inter-community and intra-community connectivity for each block can be directly encoded in a symmetric probability matrix $P \in \mathbb {R}^{N_c \times N_c}$ . For all experiments, we set the short-term and the long-term window to be 5 and 10 time steps respectively and use the entire Laplacian spectrum for LAD. Table 4 summarizes the experimental results for LAD and all baselines in the synthetic experiments.

Table 3.

Table 4.

Metric	Hits @ 7			Hits @ 10	Hits @ 2
Dataset	Pure	Hybrid	Resample	UCI	Senate
LAD (ours)	100%	100%	100%	50%	100%
SACPD [39]	100%	100%	57.1%	0%	50%
SPOTLIGHT [16]	14.2%	28.5%	57.1%	0%	100%
EdgeMonitoring [57]	100%	100%	0%	0%	100%
Activity vector [30]	71.4%	0%	0%	50%	50%
TENSORSPLAT [34]	28.5%	14.2%	57.1%	0%	50%

Table 4. LAD Consistently Finds Significant Anomalies Across Different Datasets and Outperforms Alternative Approaches

The hybrid experiment refers to the event and change point detection experiment.

7.3.1 Pure Setting.

Here, we only introduce change points, where the adjustments in community structure persists until the next change point is reached. We generate a temporal network with 151 time points where each snapshot is produced through SBM parametrized by $\mathbf {c}$ and $P$ . There are always 500 nodes per snapshot and the community changes are described in Table 3. Here $N_{c}$ represents the number of equal sized communities in the snapshot, and $p_{in}, p_{ex}$ denotes the internal and external community connectivity probability respectively. We set the continuity rate [57] to be 0 for change points and 1.0 elsewhere for the pure setting.

The top 7 most anomalous points correspond to the 7 ground truth points in Table 3 while the other points have extremely low anomaly scores. In this experiment, LAD, EdgeMonitoring and SACPD all achieves perfect performance as they can reason with gradual changes over time. In contrast, both TENSORSPLAT and Activity vector can only recover some change points. As Activity vector uses the principal eigenvector of the adjacency matrix, it is unable to detect community changes as effectively as the Laplacian spectrum used in LAD.

7.3.2 Hybrid Setting.

To study the effectiveness of LAD in detecting both change points and events, we generate a synthetic experiment with SBM where these two types of anomalies both exist. We generate events by strengthening the inter-community connectivity for a given time point (subsequent points are not affected). These events can correspond to a sudden increase in collaborations between typically separated communities such as political parties. The details regarding the generative process can be seen in Table 3. We set the continuity rate [57] to be 0 for change points and 0.9 elsewhere for the hybrid setting. From Figure 3, we observe that LAD is able to perfectly recover all events and change points.

7.3.3 Resampled Setting.

In this setting, we use a constant continuity rate of 0 for all time steps, i.e., the graph is resampled from the generative process at each step. In real-world graphs, majority of the edges might not persist over consecutive time steps but rather determined by the underlying community structure. For the generation parameters and change points’ details, we use the same setup as the Hybrid Setting. Resampling from the SBM model can be considered as a node level permutation within each community. As discussed in Section 4, the eigenvalues of the Laplacian is node permutation invariant. Therefore, only LAD is able to detect all change points while all other baselines are missing at least three change points.

7.4 Real-world Experiments

Here, we evaluate the performance of LAD on two real-world benchmark datasets, as well as an original dataset. For UCI Message and Senate co-sponsorship experiments, LAD is able to achieve strong performance using only the top 6 eigenvalues. For Canadian bill voting network, we report the LAD performance with the top 338 eigenvalues where 338 is the number of seats in the House of Commons of Canada. More information regarding each dataset can be found in Appendix C.2. The results for UCI Message and the Senate co-sponsorship Network is shown in Table 4.

7.4.1 UCI Message.

In this dataset, we treat each day as an individual time point. There are 59,835 total messages sent across all time stamps and 20,296 unique messages. To ensure privacy protection, all individual identifiers such as usernames, email and IP addresses were removed thus we use the dataset description provided in [42] to find significant events in the dataset. We select the short-term window to be 7 days as suggested by Panzarasa et al. [42]. The long-term window is then selected to be 14 days or two weeks.

Figure 4 shows the $Z^*$ scores predicted from both the short-term and long-term window (with top 6 eigenvalues). Panzarasa et al. mentioned that day 65² is the end of spring term and day 158 is the start of the fall term. Day 65 is correctly predicted by LAD and activity vector while the top 10 predictions from EdgeMonitoring and TENSORSPLAT do not include either days. However, LAD predicts day 157 as anomalous which corresponds to the day prior to the start of the fall term. It is likely that anomalous message behaviors are exhibited before school starts. As the edge weight (number of characters in messages) shows important connections between users in social networks, the $Z$ scores has strongest correlation to the average edge weight per snapshot in this dataset. Both SPOTLIGHT and SACPD are not able to detect any of the change points on this dataset.

7.4.2 Senate co-sponsorship Network.

In this dataset, an edge is formed between two congresspersons if they are cosponsors on a bill. Fowler [17] pointed out that the 104th Congress corresponds to “a Republican Revolution” which “caused a dramatic change in the partisan and seniority compositions.” Wang et al. also stated that the 104th Congress has the lowest clustering coefficient and thus can be seen as a low point in collaboration while the 100th Congress has the highest clustering coefficient which signals significant collaboration.

From Figure 5, it is clear that LAD can easily detect the above change points (using only the top 6 singular values). Variations of LAD that only utilizes the short-term or the long-term window are able to identify both points too. SPOTLIGHT is also able to detect both change points as the anomalies here are sudden changes to the graph structure (events). SACPD is able to detect one of the events. Wang et al. mentioned that EdgeMonitoring [57] and LetoChange [45] are able to detect both change points while DeltaCon [35] only predicts the 104th Congress.

The activity vector method requires full SVD which are computationally expensive and it is only able to detect the 100th Congress. The anomaly scores output by the activity vector method have similar magnitude thus making it difficult to identify change points. However, if we augment the activity vector method to use LAD pipeline and aggregates two sliding windows, it is able to correctly predict both change points. This shows that aggregating the output from two sliding windows can improve empirical performance with a different graph embedding technique.

7.4.3 Canadian bill voting network.

To understand the temporal change in Canadian Parliament environment, we extracted open information³ to form the Canadian Parliament bill voting network. The Canadian Parliament consists of 338 Members of Parliament (MPs), each representing an electoral district, who are elected for four years and can be re-elected. In the included time frame from 2006 to 2019, the increase in the number of electoral seats resulted in parliaments with different amounts of MPs; since 2015 the House of Commons has grown from 308 seats to 338. Naturally, the number of nodes in the network fluctuates from year to year. 2 year and 4 years are used as duration for the short-term and long-term windows respectively. We consider the MPs that voted yes for a bill to have a positive relation with the sponsor. In this way, a directed edge is then formed from a voter MP $u$ to the sponsor MP $v$ in a given graph snapshot. Each edge is then weighted by the number of times that $u$ voted positively for $v$ .

LAD detects 2013 and 2015 as two significant change points. Figure 6 shows the $Z^*$ scores predicted by LAD. 2013 is often considered as the year where cross party cohesion against Conservatives begins to decline. Prior to 2013, the Liberal and New Democratic Parties had more unity in voting patterns. As a new party leader (Justin Trudeau) is elected in the Liberal party in 2013, changes in voting patterns are observed [37]. More details regarding data mining are discussed in Appendix A.6. In 2015, the house of common increased the number of constituencies from 308 to 338. Equally, on October 19th 2015 an election took place and the Liberal party won an additional 148 seats, with a total of 184 seats forming a majority government led by Justin Trudeau. Prior to 2015, the Liberal party was divided and however, during this election things changed and literature shows the unified campaign at a local level from different members of parliament across the country [11].

7.5 Summary of LAD Results

Table 4 summarizes the empirical performances of LAD and its comparison to alternative methods. We observe that LAD has the best performance across all experiments.

Synthetic experiment results suggest that LAD can be used in a hybrid environment where both change points or events can occur. Therefore, LAD is suitable to detect anomalous time points where it is possible for both one-time events or fundamental changes in the network generative process to occur thus requiring no prior knowledge on the type of anomaly that could occur later on. The node permutation invariant property of LAD helps it being the best performing method in the resampled setting where the edges in the network are all resampled. In the UCI message and senate co-sponsorship experiments, LAD achieves strong performance using only the top 6 singular values. It demonstrates that in real-world datasets, a low rank truncated SVD is often sufficient to capture rich graph information. Together with efficient SVD computation techniques, LAD can be scaled to large networks. In practice, the rank of the truncated SVD can be determined by the available computational resources.

8 Multi-view Experiments

8.1 Synthetic Experiments

In this section, we empirically examine the change point detection problem for multi-view dynamic graphs and evaluate our proposed MultiLAD approach. In the synthetic experiments, we focus on answering the following questions: Q1. Performance Improvement: How accurately can MultiLAD detect synthetic anomalies by utilizing information from additional views as compared to state-of-the-art single view baselines and naive multi-view baselines? Q2. Noise Invariant: How robust is MultiLAD to noise at individual views and can it leverage multi-view data to reduce the effect of noise? Q3. Benefit of More Views: How will the performance of MultiLAD change as we increase the number of incorporated views? and how the performance gain compares to the baselines?

8.2 Baselines

Each data point in Figure 7, 8, and Table 7 is the average over 30 trials and the standard deviation is reflected as the shaded area with the same color. All experiments are run on a desktop with AMD Ryzen 5 1600 CPU and 16GB installed RAM. Each MultiLAD trial takes less than 5 minutes. We compare MultiLAD with the following methods:

Table 5.

Table 6.

Table 7.

Metric	Hits @ 7
Experiment	SBM (3v)	SBM (noisy)	SBM (12v)	BA (12v)
MultiLAD	.61 $\pm$ .16	.86 $\pm$ .09	.94 $\pm$ .08	.90 $\pm$ .08
LAD [29]	$.26 \pm .10$	$.14 \pm .11$	$.29 \pm .13$	$.13 \pm .09$
maxLAD	$.26 \pm .12$	$.11 \pm .09$	$.29 \pm .09$	$.13 \pm .06$
meanLAD	$.32 \pm .11$	$.10 \pm .11$	$.55 \pm .14$	$.18 \pm .08$
NL LAD	$.39 \pm .16$	$.70 \pm .14$	$.37 \pm .15$	$.71 \pm .11$
NL maxLAD	$.37 \pm .15$	$.77 \pm .10$	$.34 \pm .16$	$.33 \pm .07$
NL meanLAD	$.52 \pm .16$	$.84 \pm .11$	$.82 \pm .10$	$.70 \pm .07$
TENSORSPLAT [34]	. $6 \pm .9\$	N/A	N/A	N/A

Table 7. MultiLAD Outperforms all Alternative Approaches on Different Datasets

Fig. 7.

Fig. 8.

LAD: We report the best average individual view LAD performance over multiple trials using the spectrum of unnormalized and normalized Laplacian ( $\mathbf {L}$ and $\mathbf {L}_{sym}$ ) for comparison (denoted by LAD and NL LAD, respectively). Note that in practice, this is overestimating the performance of such baseline as determining which view is most important is a difficult task.

Naive Aggregation: We consider two naive aggregation strategies to extend LAD into the multi-view setting. First, the maximum of the LAD anomaly score from each view (per time step) is used as the aggregated anomaly score, named maxLAD. Second, the mean of the LAD anomaly scores is used instead and called meanLAD. Lastly, we apply the same naive aggregation approach to LAD scores obtained from the normalized Laplacian singular values and report them as “NL maxLAD” and “NL meanLAD” respectively.

TENSORSPLAT: A multi-view dynamic graph can be represented as a 4th order tensor $\mathbf {\boldsymbol {\mathcal {T}}} \in \mathbb {R}^{T \times M \times N \times N}$ where $T, M, N$ are the total number of time steps, views and nodes respectively. After a rank $R$ CP decomposition of $\mathbf {\boldsymbol {\mathcal {T}}}$ , one can obtain a signature vector of size $R$ for each time step $t$ . To make this baseline more competitive, we then apply the same sliding window techniques used in LAD to identify the anomalies. Due to its high computational cost, we only report TENSORSPLAT [34] in Section 8.4 and average over 10 trials.

8.3 Performance Evaluation

Hits at $n$ ( $Hits@n$ ) metric reports the number of correctly identified anomalies out of the top $n$ time steps with the highest anomaly score. For synthetic experiment, we use the ground truth change points or events for evaluation. We extensively examine the performance of MultiLAD as compared to baselines on two widely used graph generative model: the Stochastic Block Model (SBM) [26] and the Barabási-Albert (BA) [3] model. The accuracy reported in plots are the $Hits@7$ score for the 7 planted anomalies. The number of views are set to be 3 except for Section 8.6 and 8.7. At each time step, the snapshot is re-sampled from its generative model. For all experiments, the sliding windows sizes are $w_s=5$ and $w_l=10$ respectively and the startup period ( $0,... ,w_l$ ) is set to have an anomaly score of 0. For experiments involving the SBM model, change points correspond to splitting and merging communities and events are sudden increase in cross community connectivity.

8.4 Case 1: SBM with increasing difficulty

To answer if MultiLAD leverages multi-view information to improve change point detection, we examine a sparse SBM model with an increasing level of difficulty. In a SBM model with $n$ nodes, $p_{in} = \frac{c_{in}}{n}$ and $p_{ex} = \frac{c_{out}}{n}$ are the edge probabilities for the within community and cross community edges. The constants $c_{in}$ and $c_{out}$ are used to adjust the strength of community connectivity. In this experiment, we set $c_{in} = 12$ and consider different values for $c_{out}$ ; note that as $c_{in} - c_{out}$ gets smaller, the task increases in difficulty. As events would be easy to detect when the difference between $c_{in}$ and $c_{out}$ is small, we only consider change points for this experiment (see Table 5 (a)).

Figure 7 (a) shows that MultiLAD significantly outperforms all baselines as the community structure becomes harder to detect. MultiLAD is able to aggregate over 3 views and significantly outperform the single view method LAD by at as much as $48.3\%$ at $c_{out}=4$ . MultiLAD also outperforms the best naive multi-view baseline NL meanLAD by $9.5\%$ at $c_{out}=7$ . Note that for $c_{out} \gt 11$ , the community structure is almost non-existent as the graph closely resembles an Erdős-Rényi graph [14] where all edges are sampled with equal probability. As a reference for the strength of community structure, we indicate the community detection limit for spectral clustering methods on two equal sized communities as a dotted vertical line in the figure. The limit is reached when $c_{in} - c_{out} = \sqrt {2(c_{in} + c_{out})}$ [40] and in this case, $c_{in} = 12$ and $c_{out}=6$ . We observe that after $c_{out}=6$ , MultiLAD still detects the change points to a significant extent while outperforming other methods. Although being a multi-view baseline, TENSORSPLAT’s performance is much worse than MultiLAD and even single-view LAD. Designing an effective multi-view method is an important direction.

8.5 Case 2: all views are perturbed by noise

To examine if MultiLAD is robust to noise from individual views, we add noise to the SBM generative process at each view by examining each pair node pair $i,j$ and flip the entry $\mathcal {A}_{i,j}$ of the adjacency matrix to $1-\mathcal {A}_{i,j}$ with probability $p_n$ . In this experiment, we set $p_{in}=0.024$ and $p_{ex}=0.004$ and adjust $p_n$ for the noise level. We also include both events and change points (see Table 5 (b)).

Figure 7(b) demonstrates that MultiLAD is highly robust to noise and outperforms all baselines. Notably, single-view methods like LAD are highly susceptible to noise and their performance degrades quickly as higher ratios of noise are injected, In comparison, MultiLAD maintains strong performance despite each view having a high-level of noise. This shows that using the scalar power mean to aggregate singular values from individual views is a very effective way to filter out noise. When $p_n = 0.30$ , MultiLAD outperforms single-view LAD by $45.7\%$ and NL LAD by $22.9\%$ . MultiLAD upper bounds all naive multi-view baselines and significantly outperforms NL maxLAD and NL meanLAD when the noise ratio is high (i.e., $p_n \gt 0.15$ ).

8.6 Case 3: SBM with additional views

To understand how MultiLAD benefits from having additional views, we follow the same setting as in Section 8.4 and add more views. For this experiment, we fix $p_{in}=0.024$ and $p_{ex}=0.012$ , and compare the performances of all methods when the number of available views increases.

Figure 8 (a) shows that MultiLAD benefits the most from having additional views. Single view methods such as LAD are unable to take advantage of the multi-view data thus remaining at the same performance. Another interesting observation is that max aggregation barely benefits from the additional views as it can be easily dominated by the anomaly score from one particular view. Both NL meanLAD and meanLAD improve with additional views as they compute the average of the anomaly score thus converging to the true expected anomaly score. The strong performance of MultiLAD can be attributed to the fact that MultiLAD directly aggregates the singular values while NL meanLAD and NL maxLAD are dependent on the anomaly score output from individual views. The biggest performance gap appears at 7 views where MultiLAD outperforms NL maxLAD by $46.7\%$ and NL meanLAD by $15.7\%$ .

8.7 Case 4: all views are BA models

Lastly, we further investigate MultiLAD performance in the BA model to show that MultiLAD is effective beyond the SBM model. In this experiment, the change points correspond to the densification of the network (parameter $m$ , increased number of edges attached from a new node to an existing node). The details are described in Table 6 (c).

Figure 8 (b) shows that MultiLAD benefits significantly from more views in the BA setting as well. In particular, with just 2 views, MultiLAD outperforms LAD by 61.9%, NL LAD by 9% and NL meanLAD by 6.7%. MultiLAD is able to efficiently utilize multi-view information as soon as additional views are available and benefits even more when there are more views. We also observe that NL maxLAD has decreasing performance when the number of views increased. This is likely due to naively aggregating the max anomaly score from all views and picking up the noise from additional views rather than useful information. This shows that naive aggregation strategies can have adverse effects. Also, NL meanLAD performs similarly to single view NL LAD thus is not able to effectively leverage the multi-view information.

8.8 Summary of MultiLAD Results

Table 7 summarizes the performance comparison between MultiLAD and other baselines across different experiments. We use k“v” to indicate the number of views in the network. For SBM (3v) and SBM (12v), we report the setting where $p_{in} = 0.024$ and $p_{ex}=0.012$ . For noisy SBM setting, we follow Section 8.5 and set noise to be 0.15. For BA, we follow the set up in Section 8.7.

In all settings, MultiLAD significantly outperforms all baselines. By efficient use of multi-view information, MultiLAD improves upon LAD by as high as 62% on the SBM experiments and 77% on the BA experiment. MultiLAD also sees increased gain as additional views become available, achieving 33% gain when using 12 views instead of 3. This shows that MultiLAD effectively leverages the additional views to achieve stronger performance.

MultiLAD outperforms alternative aggregation strategies including max and mean across all settings and is more robust to noise which shows the benefit of merging information at the signature vector level rather than the anomaly score level. Moreover, max and mean baselines exhibit inconsistent behaviors on different settings. On the BA model, the mean operation does not benefit from having additional views while it is able to outperform single-view baselines on the SBM model. The max operation is detrimental to performance with additional views on the BA model, but not on the SBM model. In contrast, MultiLAD always benefits from additional views.

We can also see that it is better to use the normalized Laplacian (NL LAD) over the unnormalized one (LAD). It is known that on large dense graphs, the distribution of the eigenvalues of the Laplacian is close to the distribution of the degrees of the vertices [25, 54]. After normalization by node degree, the singular values of the normalized Laplacian can better reflect the graph structure, thus be more suited for change point detection.

8.9 Real-world Experiments

Next, we examine MultiLAD in the real-world setting. We used New York City (NYC) Taxi & Limousine Commission (NYC TLC)⁴ trip records to construct and analyze two dynamic multi-view graphs at different time periods. One of them is a novel multi-view dynamic graph dataset set in 2020, showcasing MultiLAD’s ability to detect key dates in the implementation of New York City’s COVID-19 response. Details on data processing are provided in Appendix C.2. We use window sizes of 3 and 7 days in our experiments to model both short-term events and long-term change points.

NYC TLC 2015–2016: From NYC TLC records, we acquired the timestamped trip data from two taxi companies (green and yellow) covering November 2015 to January 2016 similar to [15]. We used these records to construct a dynamic two-view directed weighted graph over 264 nodes (representing taxi zones) in which each time point is a day.

Figure 9 shows MultiLAD’s anomaly score curve and the corresponding Laplacian spectrum (aggregated using the scalar power mean approach). In the spectrum visualization, the time points with high anomaly scores are visually distinct from their surrounding points thus confirming that MultiLAD’s aggregated spectrum correctly retains meaningful information from individual views. Comparing MultiLAD’s results to those reported by [15] revealed that there are many high-scoring anomalies in common (including Christmas day, New Year’s Eve, New Year’s Day, and the blizzard on January 23^rd.) There are some minor differences in the anomalies identified by MultiLAD and SPOTLIGHT which can be attributed to the fact that MultiLAD is analyzing the multi-view version of this dataset while SPOTLIGHT is operating on the single-view version (these differences are discussed further in Appendix C.3).

Fig. 9.

NYC TLC 2020: We also extracted NYC TLC data from January 1st 2020 to April 30th 2020. There are 4 views in total with 2 additional views corresponding to for-hire vehicle (FHV; pre-arranged transportation services like community cars and limousines⁵) and high-volume for-hire vehicle (FHVHV; Uber, Lyft, and Via⁶) transportation services were available for this timecourse.

We hypothesized that anomalies in the 2020 NYC transit dataset would align with key time points in the rollout of NYC’s COVID-19 response. Our results suggest this hypothesis was largely correct (see Figure 9 and Appendix C.4): MultiLAD is able to identify the adoption of the “NYS on Pause Program” (all non-essential workers must stay home, March 22^nd), March 29^th (the day after all non-essential construction sites were halted in NYS), and April 18^th (the Saturday after the first extension of the “NYS on Pause Program”, April 16^th). These results show that MultiLAD can indeed detect significant and meaningful real-world events from multiple views. In this case, considering the 4 views individually and determining the most informative one is unpractical.

9 Conclusion

In this work, we introduced two novel change point detection method : LAD and MultiLAD for single-view and multi-view dynamic graphs respectively. LAD utilizes the singular values of the Laplacian matrix to embed graph snapshots and explicitly captures both the short-term and the long-term temporal relations to detect events and change points. Through evaluations on both synthetic and real-world datasets, we showed that LAD is more effective at identifying significant events than previous state-of-the-art methods. Next, on multi-view dynamic graphs, MultiLAD aggregates the singular values of the normalized Laplacian matrices through the scalar power mean operation and identifies the most informative singular values from each view. On synthetic benchmarks, MultiLAD benefits from additional views, is robust to noise, and outperforms competitive single view and multi-view baselines. On real-world traffic networks, we showed that MultiLAD correctly identifies significant events which drastically disrupts the flow of traffic.

Footnotes

our previous work “Laplacian Change Point Detection for Dynamic Graphs” [29] published at KDD 2020.

we set the index to start at 0.

extracted from https://openparliament.ca/

⁴

https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

⁵

https://www1.nyc.gov/site/tlc/businesses/for-hire-vehicle-bases.page

⁶

https://www1.nyc.gov/site/tlc/businesses/high-volume-for-hire-services.page

⁷

https://github.com/shenyangHuang/LAD

⁸

https://www1.nyc.gov/events/gridlock-alert-day/13981/3

⁹

https://www.investopedia.com/historical-timeline-of-covid-19-in-new-york-city-5071986

A Additional Materials for LAD Experiments

This section contains additional material for LAD and single-view experiments in Section 7. We report the implementations used in our experiments. For TENSORSPLAT, we first compute the PARAFAC decomposition (using Tensorly [33] library in python) to obtain the temporal factors. Then the scikit-learn [44] python implementation of the Local Outlier Factor algorithm. For EdgeMonitoring method, we use the matlab code kindly provided by the original authors and keep the default parameters.

A.1 The effect of Anomaly scores

In Section 4, we defined the anomaly score as the difference in $Z$ score when compared to the previous time step. The final anomaly score is defined as $Z_t^* = min(Z_t - Z_{t-1},0)$ . The points with the largest $Z^*$ are then selected as anomalies. Here we visualize both the raw $Z$ score and $Z^*$ score in Figure 10 for the pure setting in Section 7.3.1. Note that when using the raw $Z$ scores, we observe declines in anomaly score after the change point. In a sliding window, which contains graph snapshots from the previous graph generative process and the current one, the normal behavior vector is computed based on a mix of graphs from two generative processes thus leading to the observed declining anomalous scores after the change point. In comparison, using $Z^*$ scores makes LAD more robust under different choices of sliding windows.

Fig. 10.

A.2 Spectral Properties and Their Connections

The above table summarizes connections between different spectral properties in the graph and their connections in the literature. We use the same notation as [55]. $\mathbf {L},\mathbf {L}_{rw},\mathbf {L}_{sym}, \mathbf {A}$ represent the unnormalized Laplacian matrix, the random walk Laplacian matrix, the symmetric Laplacian matrix, and the adjacency matrix respectively.

A.3 Interpreting Results

It is often difficult to interpret the anomaly score of a given change point detection method as the task inherently demands direct comparison between global graph structures over time. As network characteristics vary drastically across domains [8], it is important to design metrics that help us understand the correlation between anomaly scores and well-known graph properties. In this work,we identify temporal outliers in specific graph properties and compare them to the ones predicted by LAD. We choose the outlier score $y$ as follows:

$\begin{equation} y = \frac{|\alpha _t - \alpha _{avg} |}{\alpha _{std}}, \end{equation}$

(6)

where $\alpha _{avg}$ , $\alpha _{std}$ are the average and standard deviation of $\alpha$ computed from a moving window. We select the moving window size to correspond with the short-term window size $s$ . Then we compute the Spearman rank correlation [62] to understand the statistical dependence between two ranked variables. We observe that LAD is not relying on one particular graph statistics but rather on the most important aspects of the dynamic graph of interest.

By examining Spearman rank correlations in Table 9, we observe that LAD predictions are most strongly correlated with the number of connected components while still having a positive correlations with other network properties such as transitivity. As LAD captures high-level graph structures, it is sensitive to the important properties in the dynamic graph of interest while not dependent on any single property.

Table 8.

Spectral Properties	Connections
$\mathbf {L},\mathbf {L}_{rw},\mathbf {L}_{sym}$ eigenvalues	connected components [55]
$\mathbf {L}$ eigenvectors	ratio cut [24]
$\mathbf {L}_{rw},\mathbf {L}_{sym}$ eigenvectors	normalized cut [49, 55]
$\mathbf {A}$ eigenvalues	KATZ centrality [20]
$\mathbf {A}$ eigenvectors	eigenvector centrality [5]
$\mathbf {A}$ dominant eigenvector	stationary distrib., PageRank [41]

Table 8. Spectral Properties and Their Connections

Table 9.

Graph Property	Spearman Rank Correlation
# of connected components	17.0%
Transitivity	11.8%
# of edges	7.5%
Average degree per node	15.9%

Table 9. LAD Scores Correlate Well with the Number of Connected Components when Injected Points are Changing Number of Blocks in SBM (Table 3)

Spearman rank correlation between LAD and other graph properties is also reported.

We observe that LAD is not dependent on any one particular graph statistics. Empirically, LAD correlates with different graph properties depending on the network of interest. This coincide with our intuition that any type of significant change in the graph structure could disrupt the singular values. These experimental results suggest that LAD can capture changes of different nature in the graph structure as it tracks the compression loss of the Laplacian matrix to low rank approximations.

A.4 UCI Message Network

The UCI Message dataset is a directed and weighted network based on an online community of students at the University of California, Irvine. Each node represents a user and each edge encodes a message interaction from one user to another. The weight of each edge represents the number of characters sent in the message. When an user account is created, a self edge with unit weight is added. A total of 1,899 users was recorded. The network data covers the period from April to October 2004 and spans 196 days.

A.5 Senate co-sponsorship Network

Senate co-sponsorship network [17] examines social connections between legislators from their co-sponsorship relations on bills during the 93rd-108th Congress. Bills are grouped into temporal snapshots biannually (time frame for each graph) and co-sponsors on a bill form a clique. Similar to [57], we start from the 97th Congress as full amendments data is available from there onward.

A.6 Canadian bill voting network

The Canadian bill voting network was mined from the Open Parliament API (http://api.openparliament.ca/), a source for digitized data from the House of Commons in JSON format. We first extracted all MPs in the canadian parliament from 2006 to 2019. The network nodes for each snapshot only includes the MPs who actively participated in the parliament of that year (around 300 MPs depending on which year). We then extracted all bills sponsored by each MP and the corresponding votes for each bill. Lastly, we filtered out the yes votes for each ballot of the bills and which MPs voted yes. The data mining code is also available in the code repository of LAD.⁷

We also provide more information on the political environment from 2006 to 2019. During this time period the government party in power has changed. In 2006, the government was minority Conservative until 2015 when the Liberal party won and formed a majority parliament [36]. Studies have shown that minority governments appear to be less productive in legislative activity as consensus is harder to obtain [10]. Cohesion in the House of Commons within a political party during voting sessions are often observed and dissent has been seen amongst MPs of the same party who are less influential [19]. While elected to the House of Commons, MPs can sponsor more than one bill which can also include bills that may have not passed in prior parliaments. Parliament sessions follows no regular pattern from one parliament to another.

B Ablation Study On Choice of Power

Figure 11 shows the impact of the choice of power $p$ on the performance of MultiLAD. We observe that MultiLAD is largely robust to the choice of $p$ while the negative powers show slight performance improvement over the positive ones. Therefore, in this work, we choose the power $p=-10$ .

Fig. 11.

C Additional Materials for MultiLAD Experiments

This section contains additional material for MultiLAD and multi-view experiments in Section 8.

C.1 Additional Results

Figure 13 and 12 shows the anomaly scores output by NL maxLAD for the NYC TLC 2020 and 2015-2016 case studies respectively. Figure 14, 15 shows the anomaly scores output by NL meanLAD on the same datasets.

Fig. 12.

Fig. 13.

Fig. 14.

Fig. 15.

C.2 Data Processing

The cleaned and processed datasets are included with the submission. Gridlock days for 2016-2016 were obtained from The Official Website of the City of New York.⁸ The dates related to NYC’s COVID-19 response were obtained from a publicly available timeline.⁹

C.3 NYC TLC 2015-2016

We used the longitude and latitude coordinates of pickup and drop-off locations to map each trip to the nearest pair of taxi zones. Doing so on a day-by-day basis generated a dynamic two-view directed weighted graphs over 264 nodes (taxi zones) in the greater NYC region, with each time point corresponding to a different day. Edge weights in these graphs correspond to the number of trips between two taxi zones without consideration for the number of passengers. We also point out that Eswaran et al. [15] mapped longitude and latitude coordinates to a set of “57 geographically or conceptually distinguishable zones based on common knowledge” [15], which we were unable to replicate. They also chose to use hourly time points, though this does not drastically impact the comparison between MultiLAD and SPOTLIGHT’s results.

Eswaran et al. [15] also examined NYC TLC ridership data for November 2015 - January 2016. As mentioned in our Experiments section, there was a sizeable overlap in the dates identified by SPOTLIGHT and MultiLAD. In terms of differences, MultiLAD ranked two NYC gridlock days (dates that the Department of Transportation expected to have unusually busy traffic (11^th and 17^th respectively, see Table 10) higher in its list of anomalous days. On the other hand, SPOTLIGHT ranked American Thanksgiving higher.

Table 10.

This might be partly explained by the fact that we use multi-view data while SPOTLIGHT focuses on single-view graphs. The differences in data pre-processing (see the “NYC TLC 2015-2016” subsection above) could also be partly responsible. We were also unable to ascertain whether they used both the green and yellow taxi ridership data or focused on one view only. We believe the combination of these factors and the large methodological differences between SPOTLIGHT and MultiLAD go a long way in explaining the differences between the two sets of results.

C.4 NYC TLC 2020

The 2020 NYC TLC trip records already possessed taxi zone data for pickup and drop-off locations. We also note that two taxi zones had been added between 2015–2020, and that the daily directed weighted graphs spanned 266 nodes instead of 264. Key dates relating to the response of COVID-19 in New York State is detailed in Table 11.

Table 11.

References

[1]

Michal Aharon, Michael Elad, and Alfred Bruckstein. 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 11 (2006), 4311–4322.

Notation	Description
\(e\)	edge between a source and a destination node
\(r\)	view (or relation) index
\(t\)	time index
\(w\)	edge weight
\(\mathbb {G} = \lbrace \mathcal {G}_t\rbrace _{t=1}^T\)	a multi-view dynamic graph
\(\mathcal {G}_t\)	multi-view dynamic graph at time \(t\)
\(\mathbf {G}_{t,r}\)	graph snapshot of view \(r\) at time step \(t\)
\(\mathbf {V}\)	set of vertices in a multi-view dynamic graph
\(\mathbf {E}_{t,r}\)	set of edges of view \(r\) at time step \(t\)
\(\mathbf {A}_{t,r}\)	adjacency matrix representing the edges of view \(r\) at time step \(t\) , a matrix representation of \(\mathbf {E}_{t,r}\)
\(\mathbf {D}\)	diagonal degree matrix
\(\mathbf {L}\)	unnormalized Laplacian matrix
\(\mathbf {L}_{sym}\)	symmetric normalized Laplacian matrix
\(n\)	number of nodes in a multi-view dynamic graph, \(n=\|\mathbf {V}\|\)
\(m\)	number of views in a multi-view dynamic graph
\(\mathbf {\sigma }_t\)	Laplacian spectrum of a graph snapshot (behavior vector)
\(\mathbf {C}\)	context matrix constructed from a sequence of consecutive behavior vectors
\(\tilde{\sigma }_t\)	normal behavior vector obtained from a context matrix
\(Z\)	raw anomaly score
\(Z_t^*\)	processed (final) anomaly score at time step \(t\)
\(\Sigma\)	matrix of left singular vectors
\(\mathbf {\lambda }_{r,t}\)	a singular value vector of the normalized Laplacian matrix of view \(r\) at time \(t\)

Spectral Properties	Connections
\(\mathbf {L},\mathbf {L}_{rw},\mathbf {L}_{sym}\) eigenvalues	connected components [55]
\(\mathbf {L}\) eigenvectors	ratio cut [24]
\(\mathbf {L}_{rw},\mathbf {L}_{sym}\) eigenvectors	normalized cut [49, 55]
\(\mathbf {A}\) eigenvalues	KATZ centrality [20]
\(\mathbf {A}\) eigenvectors	eigenvector centrality [5]
\(\mathbf {A}\) dominant eigenvector	stationary distrib., PageRank [41]