1 Introduction
Real-world problems in various domains (e.g., political science, biology, chemistry, and sociology) can be modeled as evolving networks that capture temporal relations between nodes. With the increasing availability of dynamic network data in areas such as social media, public health and transportation, providing sophisticated methods that can identify anomalies over time is an important research direction. The task of anomaly detection in dynamic graphs aims at identifying different types of time-varying entities that significantly deviate from the “normal” or “expected” behavior of the underlying graph distribution.
In this work, we focus on the change point detection problem which identifies time steps where the graph structure or many graph components deviate significantly from the normal behavior. As change point detection and event detection are closely related, we first explain the distinctions between two of them. Following [
57]’s definition, a
change point is a time point where there is a sudden change in the underlying network generative process and this new process continues beyond the current point. In contrast, an
event is defined as a time point where the network deviates significantly from the expected behavior and resumes back to normal after this time point. Once the anomalous graph instance is found, potential causes can be identified through various static graph analysis techniques. In this work, we first propose a novel change point detection method,
Laplacian Anomaly Detection (
LAD).
1 There are two major challenges for change point detection in dynamic graphs. First,
how to compare graph snapshots across time. It is difficult to directly compare two graphs or their adjacency matrices. A common approach is to summarize the graph into a low dimensional embedding vector called the signature vector. LAD uses the singular values of the Laplacian matrix (Laplacian spectrum) as the signature vector for each snapshot because they are closely related to the graph connectivity and low rank approximation. Second,
how to capture temporal dependencies. In practice, graphs can undergo abrupt changes at any given time step [
30]. These sudden changes can be detected by examining the graph snapshots from a few timestamps in the past or a short time window. However, gradual changes in the graph structure is often shown over a long time horizon and possible accompanied by permanent changes in the underlying graph generative model [
57]. Therefore, effective detection of both sudden and gradual changes requires the perspective from both short-term and long-term perspectives. We propose to use both short and long-term sliding windows to explicitly compare the current graph structure with those in the windows. The snapshots within a window are summarized by stacking the signature vectors from these snapshots into a matrix and then computing the principal left singular vector of this matrix as the normal behavior within this window. Then it is compared with the current signature vector via cosine similiarity.
In practice, a number of networks can describe the same underlying relations. Multi-view (a.k.a. multiplex) networks are a subset of multi-layer networks, which describe relations between the same set of entities from different information sources. Each source can be modeled as an individual view. The interactions between the entities in each view often evolve over time thus forming multi-view dynamic networks. These networks arise naturally in numerous domains such as traffic/transportation networks [
15,
18], mobile phone networks [
32], friendship networks [
13], social media networks [
56], and citation networks [
52]. Recent work has shown promising improvements by leveraging the additional information from multi-view data in tasks such as community detection [
38,
43], anomalous node detection [
48], and event forecasting [
27]. To the best of our knowledge, existing methods studying change point detection only focused on single view dynamic networks [
2,
29,
30,
45,
57]. Therefore, we propose MultiLAD as the first change point detection method for multi-view dynamic networks.
To better show the motivation for MultiLAD, let us consider a real-world application. Given trip records of different taxi companies and for-hire vehicles, can we identify important occasions such as holidays, events, or unusual weather conditions that disrupt the overall traffic network? In general, there are two main motivations for leveraging multi-view information for change point detection in dynamic networks. First, data from additional sources may have different characteristics and values [
43] thus contributing unique and complementary information. In this case, records from taxi companies and for-hire vehicle services (such as Uber and Lyft) should model the same traffic network but from potentially different age groups and demographics. Second, individual sources can have a high degree of noise and contain disruptive information due to the data collection process. Figure
1 shows the anomalous event on March 22nd 2020 in the New York City Transit network detected by our proposed method. This day signals the beginning of the work from home requirement for all non-essential workers. To better illustrate the graph structure, the nodes are colored by their detected community assignment and their sizes correspond to PageRank [
41]. Here, we have four views which are explained in detail in Section
8.9. Given these four views with different graph structures and evolution, our method effectively reasons across views to find the important anomalies in the underlying traffic network. In contrast, it is often difficult or even infeasible to select one view and apply a single-view change point detection method such as LAD (LAD’s anomaly scores on each individual views are reported in Figure
1). As seen in the figure, each view can exhibit a different set of anomalies, leading to significant differences among anomalies scores. This disagreement problem is further exacerbated when more views are present. Therefore, naive aggregation strategies on anomaly scores (such as max or mean) are insufficient, while MultiLAD aggregates view-specific features in a meaningful way, preserving the essential information for anomaly detection. After view aggregation, MultiLAD utilizes the same anomaly detection procedures as LAD thus MultiLAD can be seen as a multi-view generalization of LAD.
Our proposed Multi-view Laplacian Anomaly Detection (MultiLAD) uses the scalar power mean operation to meaningfully aggregate information from individual views and infers the anomaly score based on an aggregated vector. More specifically, MultiLAD first computes the singular values of the normalized Laplacian matrix from each view as the view-specific feature vector. Then, MultiLAD merges information across different views by aggregating these feature vectors through the scalar power mean operation. In this way, the effect of noise from individual views are lessened while key information about the overall generative process of the multi-view network is preserved. Lastly, MultiLAD compares the aggregated vector with past normal behavior and determines if a particular time step is anomalous or not.
Summary of contributions:–
We introduce a novel change point detection method: LAD. LAD utilizes the singular values of the Laplacian matrix to obtain a low dimension embedding of graph snapshots. To the best of our knowledge, this is the first time that the Laplacian spectrum has been used for change point detection.
–
LAD captures both the short-term and long-term temporal relations to detect both events and change points.
–
We extensively evaluate LAD on three synthetic tasks and three real-world datasets. We show that LAD is more effective at identifying significant events than state-of-the-art methods.
–
We extend LAD to the multi-view setting by proposing MultiLAD: the first change point detection method for multi-view dynamic networks. MultiLAD aggregates the singular values of the normalized Laplacian matrices through scalar power mean and identifies the most informative singular values from each view.
–
We extensively evaluate MultiLAD on synthetic experiments and show that MultiLAD (1). gains increased performance from additional views, (2). is highly robust to noise, and (3). significantly outperforms state-of-the-art single view baselines and their multi-view extensions.
–
We apply MultiLAD to two real-world multi-view traffic networks. We demonstrate that MultiLAD correctly detects major real-world traffic disruptions such as the implementation of stay-at-home order for COVID-19 and Christmas day.
Reproducibility: All code is publicly available. The codes for LAD and MultiLAD are available publicly as follows:
3 Problem Definition
In this section, we formally define the multi-view change point and event detection problems which encompasses the classical single-view definition of the same problems as a special case.
Multi-view Dynamic Graph A multi-view dynamic graph
\(\mathbb {G} = \lbrace \mathcal {G}_t \rbrace _{t=1}^{T}\) is given by a collection of graph snapshots
\(\mathcal {G}_t = \lbrace \mathbf {G}_{t,r} \rbrace _{r=1}^m\) , where
\(\mathbf {G}_{t,r}\) is the graph from the view (or relation)
\(r\) at time step
\(t\) . At each time step, each graph
\(\mathbf {G}_{t,r} = (\mathbf {V}, \mathbf {E}_{t,r})\) consists of the set of nodes
\(\mathbf {V}\) and a set of edges
\(\mathbf {E}_{t,r} \subset \mathbf {V} \times \mathbf {V}\) . For clarity of exposition, we assume that the set of nodes is fixed over time and each view has the same number of nodes (though MultiLAD can handle settings where this assumption does not hold, see Section
6.1). Each edge
\(e=(i,j,w) \in \mathbf {E}_{t,r}\) is then defined as the connection between node
\(i\) and node
\(j\) at timestamp
\(t\) in the view
\(l\) with weight
\(w \in \mathbb {R}^{+}\) . Note that in practice both nodes and edges can appear or disappear from one-time step to another. We denote by
\(\mathbf {A}_{t,r} \in \mathbb {R}^{n \times n}\) the adjacency matrix representing edges in
\(\mathbf {E}_{t,r}\) where
\(n = |\mathbf {V}|\) . When the number of views
\(m=1\) , we fall back into the single-view setting similar to [
30,
57].
Multi-view Change Point Detection Based on the above definition, let \(\mathbb {G}\) be a multi-view dynamic graph and let \(\mathbf {H}_t\) be the underlying graph generative model at time \(t\) . \(\mathbf {H}_t\) is not observed but is assumed to control the graph behavior across all views. The goal of multi-view change point detection is to find time points after which \(\mathbf {H}_t\) significantly differs from the previous steps. More precisely, we want to find a set \(S \subseteq \lbrace 1, \ldots , T\rbrace\) such that for each \(t \in S\) , \(\cdots \simeq \mathbf {H}_{t-2} \simeq \mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t} \simeq \mathbf {H}_{t+1} \simeq \cdots\) .
Multi-view Event Detection In this work, we also consider events similar to [
30]. Let
\(\mathbb {G}\) be a multi-view dynamic graph and let
\(\mathbf {H}_t\) be the underlying graph generative model at time
\(t\) . The goal of event detection is to find a set of time points
\(S \subseteq \lbrace 1, \ldots , T\rbrace\) such that for each
\(t \in S\) ,
\(\mathbf {H}_{t-1} \not\simeq \mathbf {H}_{t}\) and
\(\mathbf {H}_{t-1} \simeq \mathbf {H}_{t+1}\) . Events are one-time sudden changes in graph structure where the generative model
\(\mathbf {H}_{t+1}\) reverts back to normal after the event at time
\(t\) . For clarity, we include a table of notations in Table
2.
8 Multi-view Experiments
8.1 Synthetic Experiments
In this section, we empirically examine the change point detection problem for multi-view dynamic graphs and evaluate our proposed MultiLAD approach. In the synthetic experiments, we focus on answering the following questions: Q1. Performance Improvement: How accurately can MultiLAD detect synthetic anomalies by utilizing information from additional views as compared to state-of-the-art single view baselines and naive multi-view baselines? Q2. Noise Invariant: How robust is MultiLAD to noise at individual views and can it leverage multi-view data to reduce the effect of noise? Q3. Benefit of More Views: How will the performance of MultiLAD change as we increase the number of incorporated views? and how the performance gain compares to the baselines?
8.2 Baselines
Each data point in Figure
7,
8, and Table
7 is the average over 30 trials and the standard deviation is reflected as the shaded area with the same color. All experiments are run on a desktop with AMD Ryzen 5 1600 CPU and 16GB installed RAM. Each MultiLAD trial takes less than 5 minutes. We compare MultiLAD with the following methods:
LAD: We report the best average individual view LAD performance over multiple trials using the spectrum of unnormalized and normalized Laplacian ( \(\mathbf {L}\) and \(\mathbf {L}_{sym}\) ) for comparison (denoted by LAD and NL LAD, respectively). Note that in practice, this is overestimating the performance of such baseline as determining which view is most important is a difficult task.
Naive Aggregation: We consider two naive aggregation strategies to extend LAD into the multi-view setting. First, the maximum of the LAD anomaly score from each view (per time step) is used as the aggregated anomaly score, named maxLAD. Second, the mean of the LAD anomaly scores is used instead and called meanLAD. Lastly, we apply the same naive aggregation approach to LAD scores obtained from the normalized Laplacian singular values and report them as “NL maxLAD” and “NL meanLAD” respectively.
TENSORSPLAT: A multi-view dynamic graph can be represented as a 4th order tensor
\(\mathbf {\boldsymbol {\mathcal {T}}} \in \mathbb {R}^{T \times M \times N \times N}\) where
\(T, M, N\) are the total number of time steps, views and nodes respectively. After a rank
\(R\) CP decomposition of
\(\mathbf {\boldsymbol {\mathcal {T}}}\) , one can obtain a signature vector of size
\(R\) for each time step
\(t\) . To make this baseline more competitive, we then apply the same sliding window techniques used in LAD to identify the anomalies. Due to its high computational cost, we only report TENSORSPLAT [
34] in Section
8.4 and average over 10 trials.
8.3 Performance Evaluation
Hits at
\(n\) (
\(Hits@n\) ) metric reports the number of correctly identified anomalies out of the top
\(n\) time steps with the highest anomaly score. For synthetic experiment, we use the ground truth change points or events for evaluation. We extensively examine the performance of MultiLAD as compared to baselines on two widely used graph generative model: the
Stochastic Block Model (
SBM) [
26] and the
Barabási-Albert (
BA) [
3] model. The accuracy reported in plots are the
\(Hits@7\) score for the 7 planted anomalies. The number of views are set to be 3 except for Section
8.6 and
8.7. At each time step, the snapshot is re-sampled from its generative model. For all experiments, the sliding windows sizes are
\(w_s=5\) and
\(w_l=10\) respectively and the startup period (
\(0,... ,w_l\) ) is set to have an anomaly score of 0. For experiments involving the SBM model, change points correspond to splitting and merging communities and events are sudden increase in cross community connectivity.
8.4 Case 1: SBM with increasing difficulty
To answer if MultiLAD leverages multi-view information to improve change point detection, we examine a sparse SBM model with an increasing level of difficulty. In a SBM model with
\(n\) nodes,
\(p_{in} = \frac{c_{in}}{n}\) and
\(p_{ex} = \frac{c_{out}}{n}\) are the edge probabilities for the within community and cross community edges. The constants
\(c_{in}\) and
\(c_{out}\) are used to adjust the strength of community connectivity. In this experiment, we set
\(c_{in} = 12\) and consider different values for
\(c_{out}\) ; note that as
\(c_{in} - c_{out}\) gets smaller, the task increases in difficulty. As events would be easy to detect when the difference between
\(c_{in}\) and
\(c_{out}\) is small, we only consider change points for this experiment (see Table
5 (a)).
Figure
7 (a) shows that MultiLAD significantly outperforms all baselines as the community structure becomes harder to detect. MultiLAD is able to aggregate over 3 views and significantly outperform the single view method LAD by at as much as
\(48.3\%\) at
\(c_{out}=4\) . MultiLAD also outperforms the best naive multi-view baseline NL meanLAD by
\(9.5\%\) at
\(c_{out}=7\) . Note that for
\(c_{out} \gt 11\) , the community structure is almost non-existent as the graph closely resembles an Erdős-Rényi graph [
14] where all edges are sampled with equal probability. As a reference for the strength of community structure, we indicate the community detection limit for spectral clustering methods on two equal sized communities as a dotted vertical line in the figure. The limit is reached when
\(c_{in} - c_{out} = \sqrt {2(c_{in} + c_{out})}\) [
40] and in this case,
\(c_{in} = 12\) and
\(c_{out}=6\) . We observe that after
\(c_{out}=6\) , MultiLAD still detects the change points to a significant extent while outperforming other methods. Although being a multi-view baseline, TENSORSPLAT’s performance is much worse than MultiLAD and even single-view LAD. Designing an effective multi-view method is an important direction.
8.5 Case 2: all views are perturbed by noise
To examine if MultiLAD is robust to noise from individual views, we add noise to the SBM generative process at each view by examining each pair node pair
\(i,j\) and flip the entry
\(\mathcal {A}_{i,j}\) of the adjacency matrix to
\(1-\mathcal {A}_{i,j}\) with probability
\(p_n\) . In this experiment, we set
\(p_{in}=0.024\) and
\(p_{ex}=0.004\) and adjust
\(p_n\) for the noise level. We also include both events and change points (see Table
5 (b)).
Figure
7(b) demonstrates that MultiLAD is highly robust to noise and outperforms all baselines. Notably, single-view methods like LAD are highly susceptible to noise and their performance degrades quickly as higher ratios of noise are injected, In comparison, MultiLAD maintains strong performance despite each view having a high-level of noise. This shows that using the scalar power mean to aggregate singular values from individual views is a very effective way to filter out noise. When
\(p_n = 0.30\) , MultiLAD outperforms single-view LAD by
\(45.7\%\) and NL LAD by
\(22.9\%\) . MultiLAD upper bounds all naive multi-view baselines and significantly outperforms NL maxLAD and NL meanLAD when the noise ratio is high (i.e.,
\(p_n \gt 0.15\) ).
8.6 Case 3: SBM with additional views
To understand how MultiLAD benefits from having additional views, we follow the same setting as in Section
8.4 and add more views. For this experiment, we fix
\(p_{in}=0.024\) and
\(p_{ex}=0.012\) , and compare the performances of all methods when the number of available views increases.
Figure
8 (a) shows that MultiLAD benefits the most from having additional views. Single view methods such as LAD are unable to take advantage of the multi-view data thus remaining at the same performance. Another interesting observation is that max aggregation barely benefits from the additional views as it can be easily dominated by the anomaly score from one particular view. Both NL meanLAD and meanLAD improve with additional views as they compute the average of the anomaly score thus converging to the true expected anomaly score. The strong performance of MultiLAD can be attributed to the fact that MultiLAD directly aggregates the singular values while NL meanLAD and NL maxLAD are dependent on the anomaly score output from individual views. The biggest performance gap appears at 7 views where MultiLAD outperforms NL maxLAD by
\(46.7\%\) and NL meanLAD by
\(15.7\%\) .
8.7 Case 4: all views are BA models
Lastly, we further investigate MultiLAD performance in the BA model to show that MultiLAD is effective beyond the SBM model. In this experiment, the change points correspond to the densification of the network (parameter
\(m\) , increased number of edges attached from a new node to an existing node). The details are described in Table
6 (c).
Figure
8 (b) shows that MultiLAD benefits significantly from more views in the BA setting as well. In particular, with just 2 views, MultiLAD outperforms LAD by 61.9%, NL LAD by 9% and NL meanLAD by 6.7%. MultiLAD is able to efficiently utilize multi-view information as soon as additional views are available and benefits even more when there are more views. We also observe that NL maxLAD has decreasing performance when the number of views increased. This is likely due to naively aggregating the max anomaly score from all views and picking up the noise from additional views rather than useful information. This shows that naive aggregation strategies can have adverse effects. Also, NL meanLAD performs similarly to single view NL LAD thus is not able to effectively leverage the multi-view information.
8.8 Summary of MultiLAD Results
Table
7 summarizes the performance comparison between MultiLAD and other baselines across different experiments. We use k“v” to indicate the number of views in the network. For SBM (3v) and SBM (12v), we report the setting where
\(p_{in} = 0.024\) and
\(p_{ex}=0.012\) . For noisy SBM setting, we follow Section
8.5 and set noise to be 0.15. For BA, we follow the set up in Section
8.7.
In all settings, MultiLAD significantly outperforms all baselines. By efficient use of multi-view information, MultiLAD improves upon LAD by as high as 62% on the SBM experiments and 77% on the BA experiment. MultiLAD also sees increased gain as additional views become available, achieving 33% gain when using 12 views instead of 3. This shows that MultiLAD effectively leverages the additional views to achieve stronger performance.
MultiLAD outperforms alternative aggregation strategies including max and mean across all settings and is more robust to noise which shows the benefit of merging information at the signature vector level rather than the anomaly score level. Moreover, max and mean baselines exhibit inconsistent behaviors on different settings. On the BA model, the mean operation does not benefit from having additional views while it is able to outperform single-view baselines on the SBM model. The max operation is detrimental to performance with additional views on the BA model, but not on the SBM model. In contrast, MultiLAD always benefits from additional views.
We can also see that it is better to use the
normalized Laplacian (
NL LAD) over the unnormalized one (LAD). It is known that on large dense graphs, the distribution of the eigenvalues of the Laplacian is close to the distribution of the degrees of the vertices [
25,
54]. After normalization by node degree, the singular values of the normalized Laplacian can better reflect the graph structure, thus be more suited for change point detection.
8.9 Real-world Experiments
Next, we examine MultiLAD in the real-world setting. We used
New York City (
NYC)
Taxi & Limousine Commission (
NYC TLC)
4 trip records to construct and analyze two dynamic multi-view graphs at different time periods. One of them is a novel multi-view dynamic graph dataset set in 2020, showcasing MultiLAD’s ability to detect key dates in the implementation of New York City’s COVID-19 response. Details on data processing are provided in Appendix
C.2. We use window sizes of 3 and 7 days in our experiments to model both short-term events and long-term change points.
NYC TLC 2015–2016: From NYC TLC records, we acquired the timestamped trip data from two taxi companies (green and yellow) covering November 2015 to January 2016 similar to [
15]. We used these records to construct a dynamic two-view directed weighted graph over 264 nodes (representing taxi zones) in which each time point is a day.
Figure
9 shows MultiLAD’s anomaly score curve and the corresponding Laplacian spectrum (aggregated using the scalar power mean approach). In the spectrum visualization, the time points with high anomaly scores are visually distinct from their surrounding points thus confirming that MultiLAD’s aggregated spectrum correctly retains meaningful information from individual views. Comparing MultiLAD’s results to those reported by [
15] revealed that there are many high-scoring anomalies in common (including Christmas day, New Year’s Eve, New Year’s Day, and the blizzard on January 23
rd.) There are some minor differences in the anomalies identified by MultiLAD and SPOTLIGHT which can be attributed to the fact that MultiLAD is analyzing the multi-view version of this dataset while SPOTLIGHT is operating on the single-view version (these differences are discussed further in Appendix
C.3).
NYC TLC 2020: We also extracted NYC TLC data from January 1st 2020 to April 30th 2020. There are 4 views in total with 2 additional views corresponding to for-hire vehicle (FHV; pre-arranged transportation services like community cars and limousines
5) and high-volume for-hire vehicle (FHVHV; Uber, Lyft, and Via
6) transportation services were available for this timecourse.
We hypothesized that anomalies in the 2020 NYC transit dataset would align with key time points in the rollout of NYC’s COVID-19 response. Our results suggest this hypothesis was largely correct (see Figure
9 and Appendix
C.4): MultiLAD is able to identify the adoption of the “NYS on Pause Program” (all non-essential workers must stay home, March 22
nd), March 29
th (the day after all non-essential construction sites were halted in NYS), and April 18
th (the Saturday after the first extension of the “NYS on Pause Program”, April 16
th). These results show that MultiLAD can indeed detect significant and meaningful real-world events from multiple views. In this case, considering the 4 views individually and determining the most informative one is unpractical.