-
The Complexity of Data-Free Nfer
Authors:
Sean Kauffman,
Kim Guldstrand Larsen,
Martin Zimmermann
Abstract:
Nfer is a Runtime Verification language for the analysis of event traces that applies rules to create hierarchies of time intervals. This work examines the complexity of the evaluation and satisfiability problems for the data-free fragment of nfer. The evaluation problem asks whether a given interval is generated by applying rules to a known input, while the satisfiability problem asks if an input…
▽ More
Nfer is a Runtime Verification language for the analysis of event traces that applies rules to create hierarchies of time intervals. This work examines the complexity of the evaluation and satisfiability problems for the data-free fragment of nfer. The evaluation problem asks whether a given interval is generated by applying rules to a known input, while the satisfiability problem asks if an input exists that will generate a given interval. By excluding data from the language, we obtain polynomial-time algorithms for the evaluation problem and for satisfiability when only considering inclusive rules. Furthermore, we show decidability for the satisfiability problem for cycle-free specifications and undecidability for satisfiability of full data-free nfer.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Monitoring Real-Time Systems under Parametric Delay
Authors:
Martin Fränzle,
Thomas M. Grosen,
Kim G. Larsen,
Martin Zimmermann
Abstract:
Online monitoring of embedded real-time systems can be achieved by reduction of an adequate property language, like Metric Interval Temporal Logic, to timed automata and symbolic execution of the resulting automata on the trace observed from the system. This direct construction however only is faithful if observation of the trace is immediate in the sense that the monitor can assign exact time sta…
▽ More
Online monitoring of embedded real-time systems can be achieved by reduction of an adequate property language, like Metric Interval Temporal Logic, to timed automata and symbolic execution of the resulting automata on the trace observed from the system. This direct construction however only is faithful if observation of the trace is immediate in the sense that the monitor can assign exact time stamps to the actions it observes, which is rarely true in practice due to the substantial and fluctuating parametric delays introduced by the circuitry connecting the observed system to its monitoring device. We present a purely zone-based online monitoring procedure and its implementation which handle such parametric delays exactly without recurrence to costly verification procedures for parametric timed automata.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Majority-of-Three: The Simplest Optimal Learner?
Authors:
Ishaq Aden-Ali,
Mikael Møller Høgsgaard,
Kasper Green Larsen,
Nikita Zhivotovskiy
Abstract:
Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets…
▽ More
Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Replicable Learning of Large-Margin Halfspaces
Authors:
Alkis Kalavasis,
Amin Karbasi,
Kasper Green Larsen,
Grigoris Velegkas,
Felix Zhou
Abstract:
We provide efficient replicable algorithms for the problem of learning large-margin halfspaces. Our results improve upon the algorithms provided by Impagliazzo, Lei, Pitassi, and Sorrell [STOC, 2022]. We design the first dimension-independent replicable algorithms for this task which runs in polynomial time, is proper, and has strictly improved sample complexity compared to the one achieved by Imp…
▽ More
We provide efficient replicable algorithms for the problem of learning large-margin halfspaces. Our results improve upon the algorithms provided by Impagliazzo, Lei, Pitassi, and Sorrell [STOC, 2022]. We design the first dimension-independent replicable algorithms for this task which runs in polynomial time, is proper, and has strictly improved sample complexity compared to the one achieved by Impagliazzo et al. [2022] with respect to all the relevant parameters. Moreover, our first algorithm has sample complexity that is optimal with respect to the accuracy parameter $ε$. We also design an SGD-based replicable algorithm that, in some parameters' regimes, achieves better sample and time complexity than our first algorithm. Departing from the requirement of polynomial time algorithms, using the DP-to-Replicability reduction of Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sorrell, and Sivakumar [STOC, 2023], we show how to obtain a replicable algorithm for large-margin halfspaces with improved sample complexity with respect to the margin parameter $τ$, but running time doubly exponential in $1/τ^2$ and worse sample complexity dependence on $ε$ than one of our previous algorithms. We then design an improved algorithm with better sample complexity than all three of our previous algorithms and running time exponential in $1/τ^{2}$.
△ Less
Submitted 1 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Boosting, Voting Classifiers and Randomized Sample Compression Schemes
Authors:
Arthur da Cunha,
Kasper Green Larsen,
Martin Ritzert
Abstract:
In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-…
▽ More
In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: the best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
The NFA Acceptance Hypothesis: Non-Combinatorial and Dynamic Lower Bounds
Authors:
Karl Bringmann,
Allan Grønlund,
Marvin Künnemann,
Kasper Green Larsen
Abstract:
We pose the fine-grained hardness hypothesis that the textbook algorithm for the NFA Acceptance problem is optimal up to subpolynomial factors, even for dense NFAs and fixed alphabets.
We show that this barrier appears in many variations throughout the algorithmic literature by introducing a framework of Colored Walk problems. These yield fine-grained equivalent formulations of the NFA Acceptanc…
▽ More
We pose the fine-grained hardness hypothesis that the textbook algorithm for the NFA Acceptance problem is optimal up to subpolynomial factors, even for dense NFAs and fixed alphabets.
We show that this barrier appears in many variations throughout the algorithmic literature by introducing a framework of Colored Walk problems. These yield fine-grained equivalent formulations of the NFA Acceptance problem as problems concerning detection of an $s$-$t$-walk with a prescribed color sequence in a given edge- or node-colored graph. For NFA Acceptance on sparse NFAs (or equivalently, Colored Walk in sparse graphs), a tight lower bound under the Strong Exponential Time Hypothesis has been rediscovered several times in recent years. We show that our hardness hypothesis, which concerns dense NFAs, has several interesting implications:
- It gives a tight lower bound for Context-Free Language Reachability. This proves conditional optimality for the class of 2NPDA-complete problems, explaining the cubic bottleneck of interprocedural program analysis.
- It gives a tight $(n+nm^{1/3})^{1-o(1)}$ lower bound for the Word Break problem on strings of length $n$ and dictionaries of total size $m$.
- It implies the popular OMv hypothesis. Since the NFA acceptance problem is a static (i.e., non-dynamic) problem, this provides a static reason for the hardness of many dynamic problems.
Thus, a proof of the NFA Acceptance hypothesis would resolve several interesting barriers. Conversely, a refutation of the NFA Acceptance hypothesis may lead the way to attacking the current barriers observed for Context-Free Language Reachability, the Word Break problem and the growing list of dynamic problems proven hard under the OMv hypothesis.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
CGAAL: Distributed On-The-Fly ATL Model Checker with Heuristics
Authors:
Falke B. Ø. Carlsen,
Lars Bo P. Frydenskov,
Nicolaj Ø. Jensen,
Jener Rasmussen,
Mathias M. Sørensen,
Asger G. Weirsøe,
Mathias C. Jensen,
Kim G. Larsen
Abstract:
We present CGAAL, our efficient on-the-fly model checker for alternating-time temporal logic (ATL) on concurrent game structures (CGS). We present how our tool encodes ATL as extended dependency graphs with negation edges and employs the distributed on-the-fly algorithm by Dalsgaard et al. Our tool offers multiple novel search strategies for the algorithm, including DHS which is inspired by PageRa…
▽ More
We present CGAAL, our efficient on-the-fly model checker for alternating-time temporal logic (ATL) on concurrent game structures (CGS). We present how our tool encodes ATL as extended dependency graphs with negation edges and employs the distributed on-the-fly algorithm by Dalsgaard et al. Our tool offers multiple novel search strategies for the algorithm, including DHS which is inspired by PageRank and uses the in-degree of configurations as a heuristic, IHS which estimates instability of assignment values, and LPS which estimates the distance to a state satisfying the constituent property using linear programming. CGS are input using our modelling language LCGS, where composition and synchronisation are easily described. We prove the correctness of our encoding, and our experiments show that our tool CGAAL is often one to three orders of magnitude faster than the popular tool PRISM-games on case studies from PRISM's documentation and among case studies we have developed. In our evaluation, we also compare and evaluate our search strategies, and find that our custom search strategies are often significantly faster than the usual breadth-first and depth-first search strategies.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Optimal Non-Adaptive Cell Probe Dictionaries and Hashing
Authors:
Kasper Green Larsen,
Rasmus Pagh,
Giuseppe Persiano,
Toniann Pitassi,
Kevin Yeo,
Or Zamir
Abstract:
We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]x[u] using s words of space and answering key lookup queries in t = O(lg(u/n)/ lg(s/n)) nonadaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due…
▽ More
We present a simple and provably optimal non-adaptive cell probe data structure for the static dictionary problem. Our data structure supports storing a set of n key-value pairs from [u]x[u] using s words of space and answering key lookup queries in t = O(lg(u/n)/ lg(s/n)) nonadaptive probes. This generalizes a solution to the membership problem (i.e., where no values are associated with keys) due to Buhrman et al. We also present matching lower bounds for the non-adaptive static membership problem in the deterministic setting. Our lower bound implies that both our dictionary algorithm and the preceding membership algorithm are optimal, and in particular that there is an inherent complexity gap in these problems between no adaptivity and one round of adaptivity (with which hashing-based algorithms solve these problems in constant time). Using the ideas underlying our data structure, we also obtain the first implementation of a n-wise independent family of hash functions with optimal evaluation time in the cell probe model.
△ Less
Submitted 19 April, 2024; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Shielded Reinforcement Learning for Hybrid Systems
Authors:
Asger Horn Brorholt,
Peter Gjøl Jensen,
Kim Guldstrand Larsen,
Florian Lorber,
Christian Schilling
Abstract:
Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system's state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety t…
▽ More
Safe and optimal controller synthesis for switched-controlled hybrid systems, which combine differential equations and discrete changes of the system's state, is known to be intricately hard. Reinforcement learning has been leveraged to construct near-optimal controllers, but their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a shield, which is correct by design. However, obtaining a shield for non-linear and hybrid environments is itself intractable. In this paper, we propose the construction of a shield using the so-called barbaric method, where an approximate finite representation of an underlying partition-based two-player safety game is extracted via systematically picked samples of the true transition function. While hard safety guarantees are out of reach, we experimentally demonstrate strong statistical safety guarantees with a prototype implementation and UPPAAL STRATEGO. Furthermore, we study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). We experimentally demonstrate superiority of the pre-shielding approach. We apply our technique on a range of case studies, including two industrial examples, and further study post-optimization of the post-shielding approach.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Forward and Backward Constrained Bisimulations for Quantum Circuits using Decision Diagrams
Authors:
Lukas Burgholzer,
Antonio Jiménez-Pastor,
Kim G. Larsen,
Mirco Tribastone,
Max Tschaikowski,
Robert Wille
Abstract:
Efficient methods for the simulation of quantum circuits on classic computers are crucial for their analysis due to the exponential growth of the problem size with the number of qubits. Here we study lumping methods based on bisimulation, an established class of techniques that has been proven successful for (classic) stochastic and deterministic systems such as Markov chains and ordinary differen…
▽ More
Efficient methods for the simulation of quantum circuits on classic computers are crucial for their analysis due to the exponential growth of the problem size with the number of qubits. Here we study lumping methods based on bisimulation, an established class of techniques that has been proven successful for (classic) stochastic and deterministic systems such as Markov chains and ordinary differential equations. Forward constrained bisimulation yields a lower-dimensional model which exactly preserves quantum measurements projected on a linear subspace of interest. Backward constrained bisimulation gives a reduction that is valid on a subspace containing the circuit input, from which the circuit result can be fully recovered. We provide an algorithm to compute the constraint bisimulations yielding coarsest reductions in both cases, using a duality result relating the two notions. As applications, we provide theoretical bounds on the size of the reduced state space for well-known quantum algorithms for search, optimization, and factorization. Using a prototype implementation, we report significant reductions on a set of benchmarks. In particular, we show that constrained bisimulation can boost decision-diagram-based quantum circuit simulation by several orders of magnitude, allowing thus for substantial synergy effects.
△ Less
Submitted 10 May, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Sublinear Time Shortest Path in Expander Graphs
Authors:
Noga Alon,
Allan Grønlund,
Søren Fuglede Jørgensen,
Kasper Green Larsen
Abstract:
Computing a shortest path between two nodes in an undirected unweighted graph is among the most basic algorithmic tasks. Breadth first search solves this problem in linear time, which is clearly also a lower bound in the worst case. However, several works have shown how to solve this problem in sublinear time in expectation when the input graph is drawn from one of several classes of random graphs…
▽ More
Computing a shortest path between two nodes in an undirected unweighted graph is among the most basic algorithmic tasks. Breadth first search solves this problem in linear time, which is clearly also a lower bound in the worst case. However, several works have shown how to solve this problem in sublinear time in expectation when the input graph is drawn from one of several classes of random graphs. In this work, we extend these results by giving sublinear time shortest path (and short path) algorithms for expander graphs. We thus identify a natural deterministic property of a graph (that is satisfied by typical random regular graphs) which suffices for sublinear time shortest paths. The algorithms are very simple, involving only bidirectional breadth first search and short random walks. We also complement our new algorithms by near-matching lower bounds.
△ Less
Submitted 31 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Invertible Bloom Lookup Tables with Less Memory and Randomness
Authors:
Nils Fleischhacker,
Kasper Green Larsen,
Maciej Obremski,
Mark Simkin
Abstract:
In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing $n$ elements and ensuring correctness with probability at least $1 - δ$, existing IBLT constructions requi…
▽ More
In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing $n$ elements and ensuring correctness with probability at least $1 - δ$, existing IBLT constructions require $Ω(n(\frac{\log(1/δ)}{\log(n)}+1))$ space and they crucially rely on fully random hash functions.
We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing $n$ elements with a failure probability of at most $δ$, our data structure only requires $\mathcal{O}(n + \log(1/δ)\log\log(1/δ))$ space and $\mathcal{O}(\log(\log(n)/δ))$-wise independent hash functions.
As a key technical ingredient we show that hashing $n$ keys with any $k$-wise independent hash function $h:U \to [Cn]$ for some sufficiently large constant $C$ guarantees with probability $1 - 2^{-Ω(k)}$ that at least $n/2$ keys will have a unique hash value. Proving this is highly non-trivial as $k$ approaches $n$. We believe that the techniques used to prove this statement may be of independent interest.
△ Less
Submitted 14 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Super-Logarithmic Lower Bounds for Dynamic Graph Problems
Authors:
Kasper Green Larsen,
Huacheng Yu
Abstract:
In this work, we prove a $\tildeΩ(\lg^{3/2} n )$ unconditional lower bound on the maximum of the query time and update time for dynamic data structures supporting reachability queries in $n$-node directed acyclic graphs under edge insertions. This is the first super-logarithmic lower bound for any natural graph problem. In proving the lower bound, we also make novel contributions to the state-of-t…
▽ More
In this work, we prove a $\tildeΩ(\lg^{3/2} n )$ unconditional lower bound on the maximum of the query time and update time for dynamic data structures supporting reachability queries in $n$-node directed acyclic graphs under edge insertions. This is the first super-logarithmic lower bound for any natural graph problem. In proving the lower bound, we also make novel contributions to the state-of-the-art data structure lower bound techniques that we hope may lead to further progress in proving lower bounds.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Energy Consumption Optimization in Radio Access Networks (ECO-RAN)
Authors:
Anders Mariegaard,
Kim G. Larsen,
Marco Muniz,
Thomas Dyhre Nielsen
Abstract:
In recent years, mobile network operators are showing interest in reducing energy consumption. Toward this goal, in cooperation with the Danish company 2Operate we have developed a stochastic simulation environment for mobile networks. Our simulator interacts with historical data from 2Operate and allow us to turn on and off network cells, replay traffic loads, etc. We have developed an optimizati…
▽ More
In recent years, mobile network operators are showing interest in reducing energy consumption. Toward this goal, in cooperation with the Danish company 2Operate we have developed a stochastic simulation environment for mobile networks. Our simulator interacts with historical data from 2Operate and allow us to turn on and off network cells, replay traffic loads, etc. We have developed an optimization tool which is based on stochastic and distributed controllers computed by \uppaal. We have conducted experiments in our simulation tool. Experiments show that there is a potential to save up to 10\% of energy. We observe that for larger networks, there exists a larger potential for saving energy. Our simulator and \uppaal controllers, have been constructed in accordance to the 2Operate data and infrastructure. However, a main difference is that current equipment do not support updating schedulers on hourly bases. Nevertheless, new equipment e.g. new Huawei equipment do support changing schedulers on hourly basis. Therefore, integrating our solution in the production server of 2Operate is possible. However, rigorous testing in the production system is required.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
MM Algorithms to Estimate Parameters in Continuous-time Markov Chains
Authors:
Giovanni Bacci,
Anna Ingólfsdóttir,
Kim G. Larsen,
Raphaël Reynouard
Abstract:
Continuous-time Markov chains (CTMCs) are popular modeling formalism that constitutes the underlying semantics for real-time probabilistic systems such as queuing networks, stochastic process algebras, and calculi for systems biology. Prism and Storm are popular model checking tools that provide a number of powerful analysis techniques for CTMCs. These tools accept models expressed as the parallel…
▽ More
Continuous-time Markov chains (CTMCs) are popular modeling formalism that constitutes the underlying semantics for real-time probabilistic systems such as queuing networks, stochastic process algebras, and calculi for systems biology. Prism and Storm are popular model checking tools that provide a number of powerful analysis techniques for CTMCs. These tools accept models expressed as the parallel composition of a number of modules interacting with each other. The outcome of the analysis is strongly dependent on the parameter values used in the model which govern the timing and probability of events of the resulting CTMC. However, for some applications, parameter values have to be empirically estimated from partially-observable executions. In this work, we address the problem of estimating parameter values of CTMCs expressed as Prism models from a number of partially-observable executions. We introduce the class parametric CTMCs -- CTMCs where transition rates are polynomial functions over a set of parameters -- as an abstraction of CTMCs covering a large class of Prism models. Then, building on a theory of algorithms known by the initials MM, for minorization-maximization, we present iterative maximum likelihood estimation algorithms for parametric CTMCs covering two learning scenarios: when both state-labels and dwell times are observable, or just state-labels are. We conclude by illustrating the use of our technique in a simple but non-trivial case study: the analysis of the spread of COVID-19 in presence of lockdown countermeasures.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Sparse Dimensionality Reduction Revisited
Authors:
Mikael Møller Høgsgaard,
Lion Kamma,
Kasper Green Larsen,
Jelani Nelson,
Chris Schwiegelshohn
Abstract:
The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise distances to within $1 \pm \varepsilon$. Each input point $x$ is embedded to $Ax$, where $A$ is an $m \times d$ matrix having $s$ non-zeros per column, allowin…
▽ More
The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of $n$ points in $\mathbb{R}^d$ into $m=O(\varepsilon^{-2} \lg n)$ dimensions while preserving all pairwise distances to within $1 \pm \varepsilon$. Each input point $x$ is embedded to $Ax$, where $A$ is an $m \times d$ matrix having $s$ non-zeros per column, allowing for an embedding time of $O(s \|x\|_0)$.
Since the sparsity of $A$ governs the embedding time, much work has gone into improving the sparsity $s$. The current state-of-the-art by Kane and Nelson (JACM'14) shows that $s = O(\varepsilon ^{-1} \lg n)$ suffices. This is almost matched by a lower bound of $s = Ω(\varepsilon ^{-1} \lg n/\lg(1/\varepsilon))$ by Nelson and Nguyen (STOC'13). Previous work thus suggests that we have near-optimal embeddings.
In this work, we revisit sparse embeddings and identify a loophole in the lower bound. Concretely, it requires $d \geq n$, which in many applications is unrealistic. We exploit this loophole to give a sparser embedding when $d = o(n)$, achieving $s = O(\varepsilon^{-1}(\lg n/\lg(1/\varepsilon)+\lg^{2/3}n \lg^{1/3} d))$. We also complement our analysis by strengthening the lower bound of Nelson and Nguyen to hold also when $d \ll n$, thereby matching the first term in our new sparsity upper bound. Finally, we also improve the sparsity of the best oblivious subspace embeddings for optimal embedding dimensionality.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Timed I/O Automata: It is never too late to complete your timed specification theory
Authors:
Martijn A. Goorden,
Kim G. Larsen,
Axel Legay,
Florian Lorber,
Ulrik Nyman,
Andrzej Wasowski
Abstract:
A specification theory combines notions of specifications and implementations with a satisfaction relation, a refinement relation and a set of operators supporting stepwise design. We develop a complete specification framework for real-time systems using Timed I/O Automata as the specification formalism, with the semantics expressed in terms of Timed I/O Transition Systems. We provide constructs f…
▽ More
A specification theory combines notions of specifications and implementations with a satisfaction relation, a refinement relation and a set of operators supporting stepwise design. We develop a complete specification framework for real-time systems using Timed I/O Automata as the specification formalism, with the semantics expressed in terms of Timed I/O Transition Systems. We provide constructs for refinement, consistency checking, logical and structural composition, and quotient of specifications -- all indispensable ingredients of a compositional design methodology. The theory is backed by rigorous proofs and is being implemented in the open-source tool ECDAR.
△ Less
Submitted 13 July, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
AdaBoost is not an Optimal Weak to Strong Learner
Authors:
Mikael Møller Høgsgaard,
Kasper Green Larsen,
Martin Ritzert
Abstract:
AdaBoost is a classic boosting algorithm for combining multiple inaccurate classifiers produced by a weak learner, to produce a strong learner with arbitrarily high accuracy when given enough training data. Determining the optimal number of samples necessary to obtain a given accuracy of the strong learner, is a basic learning theoretic question. Larsen and Ritzert (NeurIPS'22) recently presented…
▽ More
AdaBoost is a classic boosting algorithm for combining multiple inaccurate classifiers produced by a weak learner, to produce a strong learner with arbitrarily high accuracy when given enough training data. Determining the optimal number of samples necessary to obtain a given accuracy of the strong learner, is a basic learning theoretic question. Larsen and Ritzert (NeurIPS'22) recently presented the first provably optimal weak-to-strong learner. However, their algorithm is somewhat complicated and it remains an intriguing question whether the prototypical boosting algorithm AdaBoost also makes optimal use of training samples. In this work, we answer this question in the negative. Concretely, we show that the sample complexity of AdaBoost, and other classic variations thereof, are sub-optimal by at least one logarithmic factor in the desired accuracy of the strong learner.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
The Impossibility of Parallelizing Boosting
Authors:
Amin Karbasi,
Kasper Green Larsen
Abstract:
The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. In this paper, we investigate the possibility of parallelizing boosting. Our main contribution is a strong negative result, implying that significant parallelization of boosting requires an exponential blow-up in the total computing resources needed for training.
The aim of boosting is to convert a sequence of weak learners into a strong learner. At their heart, these methods are fully sequential. In this paper, we investigate the possibility of parallelizing boosting. Our main contribution is a strong negative result, implying that significant parallelization of boosting requires an exponential blow-up in the total computing resources needed for training.
△ Less
Submitted 21 August, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Diagonalization Games
Authors:
Noga Alon,
Olivier Bousquet,
Kasper Green Larsen,
Shay Moran,
Shlomo Moran
Abstract:
We study several variants of a combinatorial game which is based on Cantor's diagonal argument.
The game is between two players called Kronecker and Cantor. The names of the players are motivated by the known fact that Leopold Kronecker did not appreciate Georg Cantor's arguments about the infinite, and even referred to him as a "scientific charlatan". In the game Kronecker maintains a list of m…
▽ More
We study several variants of a combinatorial game which is based on Cantor's diagonal argument.
The game is between two players called Kronecker and Cantor. The names of the players are motivated by the known fact that Leopold Kronecker did not appreciate Georg Cantor's arguments about the infinite, and even referred to him as a "scientific charlatan". In the game Kronecker maintains a list of m binary vectors, each of length n, and Cantor's goal is to produce a new binary vector which is different from each of Kronecker's vectors, or prove that no such vector exists. Cantor does not see Kronecker's vectors but he is allowed to ask queries of the form"What is bit number j of vector number i?" What is the minimal number of queries with which Cantor can achieve his goal? How much better can Cantor do if he is allowed to pick his queries \emph{adaptively}, based on Kronecker's previous replies? The case when m=n is solved by diagonalization using n (non-adaptive) queries. We study this game more generally, and prove an optimal bound in the adaptive case and nearly tight upper and lower bounds in the non-adaptive case.
△ Less
Submitted 22 January, 2023; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Bagging is an Optimal PAC Learner
Authors:
Kasper Green Larsen
Abstract:
Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each…
▽ More
Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each of the sub-samples. While being a very exciting theoretical result, it has not had much impact in practice, in part due to inefficiency, since it constructs a polynomial number of sub-samples of the training data, each of linear size.
In this work, we prove the surprising result that the practical and classic heuristic bagging (a.k.a. bootstrap aggregation), due to Breiman (1996), is in fact also an optimal PAC learner. Bagging pre-dates Hanneke's algorithm by twenty years and is taught in most undergraduate machine learning courses. Moreover, we show that it only requires a logarithmic number of sub-samples to reach optimality.
△ Less
Submitted 16 June, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Improved Coresets for Euclidean $k$-Means
Authors:
Vincent Cohen-Addad,
Kasper Green Larsen,
David Saulpic,
Chris Schwiegelshohn,
Omar Ali Sheikh-Omar
Abstract:
Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weigh…
▽ More
Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a $(1\pm \varepsilon)$ factor. The current state of the art coreset size is $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for Euclidean $k$-median. The best known lower bound for both problems is $Ω(k \varepsilon^{-2})$. In this paper, we improve the upper bounds $\tilde O(\min(k^{3/2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for $k$-means and $\tilde O(\min(k^{4/3} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for $k$-median. In particular, ours is the first provable bound that breaks through the $k^2$ barrier while retaining an optimal dependency on $\varepsilon$.
△ Less
Submitted 16 November, 2022; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Hierarchical Categories in Colored Searching
Authors:
Peyman Afshani,
Rasmus Killman,
Kasper Green Larsen
Abstract:
In colored range counting (CRC), the input is a set of points where each point is assigned a ``color'' (or a ``category'') and the goal is to store them in a data structure such that the number of distinct categories inside a given query range can be counted efficiently. CRC has strong motivations as it allows data structure to deal with categorical data. However, colors (i.e., the categories) in…
▽ More
In colored range counting (CRC), the input is a set of points where each point is assigned a ``color'' (or a ``category'') and the goal is to store them in a data structure such that the number of distinct categories inside a given query range can be counted efficiently. CRC has strong motivations as it allows data structure to deal with categorical data. However, colors (i.e., the categories) in the CRC problem do not have any internal structure, whereas this is not the case for many datasets in practice where hierarchical categories exists or where a single input belongs to multiple categories. Motivated by these, we consider variants of the problem where such structures can be represented. We define two variants of the problem called hierarchical range counting (HCC) and sub-category colored range counting (SCRC) and consider hierarchical structures that can either be a DAG or a tree. We show that the two problems on some special trees are in fact equivalent to other well-known problems in the literature. Based on these, we also give efficient data structures when the underlying hierarchy can be represented as a tree. We show a conditional lower bound for the general case when the existing hierarchy can be any DAG, through reductions from the orthogonal vectors problem.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Barriers for Faster Dimensionality Reduction
Authors:
Ora Nova Fandina,
Mikael Møller Høgsgaard,
Kasper Green Larsen
Abstract:
The Johnson-Lindenstrauss transform allows one to embed a dataset of $n$ points in $\mathbb{R}^d$ into $\mathbb{R}^m,$ while preserving the pairwise distance between any pair of points up to a factor $(1 \pm \varepsilon)$, provided that $m = Ω(\varepsilon^{-2} \lg n)$. The transform has found an overwhelming number of algorithmic applications, allowing to speed up algorithms and reducing memory co…
▽ More
The Johnson-Lindenstrauss transform allows one to embed a dataset of $n$ points in $\mathbb{R}^d$ into $\mathbb{R}^m,$ while preserving the pairwise distance between any pair of points up to a factor $(1 \pm \varepsilon)$, provided that $m = Ω(\varepsilon^{-2} \lg n)$. The transform has found an overwhelming number of algorithmic applications, allowing to speed up algorithms and reducing memory consumption at the price of a small loss in accuracy. A central line of research on such transforms, focus on developing fast embedding algorithms, with the classic example being the Fast JL transform by Ailon and Chazelle. All known such algorithms have an embedding time of $Ω(d \lg d)$, but no lower bounds rule out a clean $O(d)$ embedding time. In this work, we establish the first non-trivial lower bounds (of magnitude $Ω(m \lg m)$) for a large class of embedding algorithms, including in particular most known upper bounds.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Fast Discrepancy Minimization with Hereditary Guarantees
Authors:
Kasper Green Larsen
Abstract:
Efficiently computing low discrepancy colorings of various set systems, has been studied extensively since the breakthrough work by Bansal (FOCS 2010), who gave the first polynomial time algorithms for several important settings, including for general set systems, sparse set systems and for set systems with bounded hereditary discrepancy. The hereditary discrepancy of a set system, is the maximum…
▽ More
Efficiently computing low discrepancy colorings of various set systems, has been studied extensively since the breakthrough work by Bansal (FOCS 2010), who gave the first polynomial time algorithms for several important settings, including for general set systems, sparse set systems and for set systems with bounded hereditary discrepancy. The hereditary discrepancy of a set system, is the maximum discrepancy over all set systems obtainable by deleting a subset of the ground elements. While being polynomial time, Bansal's algorithms were not practical, with e.g. his algorithm for the hereditary setup running in time $Ω(m n^{4.5})$ for set systems with $m$ sets over a ground set of $n$ elements. More efficient algorithms have since then been developed for general and sparse set systems, however, for the hereditary case, Bansal's algorithm remains state-of-the-art. In this work, we give a significantly faster algorithm with hereditary guarantees, running in $O(mn^2\lg(2 + m/n) + n^3)$ time. Our algorithm is based on new structural insights into set systems with bounded hereditary discrepancy. We also implement our algorithm and show experimentally that it computes colorings that are significantly better than random and finishes in a reasonable amount of time, even on set systems with thousands of sets over a ground set of thousands of elements.
△ Less
Submitted 24 November, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Monitoring Timed Properties (Revisited)
Authors:
Thomas Møller Grosen,
Sean Kauffman,
Kim Guldstrand Larsen,
Martin Zimmermann
Abstract:
In this paper we revisit monitoring real-time systems with respect to properties expressed either in Metric Interval Temporal Logic or as Timed Büchi Automata. We offer efficient symbolic online monitoring algorithms in a number of settings, exploiting so-called zones well-known from efficient model checking of Timed Automata. The settings considered include new, much simplified treatment of time…
▽ More
In this paper we revisit monitoring real-time systems with respect to properties expressed either in Metric Interval Temporal Logic or as Timed Büchi Automata. We offer efficient symbolic online monitoring algorithms in a number of settings, exploiting so-called zones well-known from efficient model checking of Timed Automata. The settings considered include new, much simplified treatment of time divergence, monitoring under timing uncertainty, and extension of monitoring to offer minimum time estimates before conclusive verdicts can be made.
△ Less
Submitted 5 September, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Optimal Weak to Strong Learning
Authors:
Kasper Green Larsen,
Martin Ritzert
Abstract:
The classic algorithm AdaBoost allows to convert a weak learner, that is an algorithm that produces a hypothesis which is slightly better than chance, into a strong learner, achieving arbitrarily high accuracy when given enough training data. We present a new algorithm that constructs a strong learner from a weak learner but uses less training data than AdaBoost and all other weak to strong learne…
▽ More
The classic algorithm AdaBoost allows to convert a weak learner, that is an algorithm that produces a hypothesis which is slightly better than chance, into a strong learner, achieving arbitrarily high accuracy when given enough training data. We present a new algorithm that constructs a strong learner from a weak learner but uses less training data than AdaBoost and all other weak to strong learners to achieve the same generalization bounds. A sample complexity lower bound shows that our new algorithm uses the minimum possible amount of training data and is thus optimal. Hence, this work settles the sample complexity of the classic problem of constructing a strong learner from a weak learner.
△ Less
Submitted 25 November, 2022; v1 submitted 3 June, 2022;
originally announced June 2022.
-
The Fast Johnson-Lindenstrauss Transform is Even Faster
Authors:
Ora Nova Fandina,
Mikael Møller Høgsgaard,
Kasper Green Larsen
Abstract:
The seminal Fast Johnson-Lindenstrauss (Fast JL) transform by Ailon and Chazelle (SICOMP'09) embeds a set of $n$ points in $d$-dimensional Euclidean space into optimal $k=O(\varepsilon^{-2} \ln n)$ dimensions, while preserving all pairwise distances to within a factor $(1 \pm \varepsilon)$. The Fast JL transform supports computing the embedding of a data point in $O(d \ln d +k \ln^2 n)$ time, wher…
▽ More
The seminal Fast Johnson-Lindenstrauss (Fast JL) transform by Ailon and Chazelle (SICOMP'09) embeds a set of $n$ points in $d$-dimensional Euclidean space into optimal $k=O(\varepsilon^{-2} \ln n)$ dimensions, while preserving all pairwise distances to within a factor $(1 \pm \varepsilon)$. The Fast JL transform supports computing the embedding of a data point in $O(d \ln d +k \ln^2 n)$ time, where the $d \ln d$ term comes from multiplication with a $d \times d$ Hadamard matrix and the $k \ln^2 n$ term comes from multiplication with a sparse $k \times d$ matrix. Despite the Fast JL transform being more than a decade old, it is one of the fastest dimensionality reduction techniques for many tradeoffs between $\varepsilon, d$ and $n$.
In this work, we give a surprising new analysis of the Fast JL transform, showing that the $k \ln^2 n$ term in the embedding time can be improved to $(k \ln^2 n)/α$ for an $α= Ω(\min\{\varepsilon^{-1}\ln(1/\varepsilon), \ln n\})$. The improvement follows by using an even sparser matrix. We also complement our improved analysis with a lower bound showing that our new analysis is in fact tight.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Stronger 3SUM-Indexing Lower Bounds
Authors:
Eldon Chung,
Kasper Green Larsen
Abstract:
The $3$SUM-Indexing problem was introduced as a data structure version of the $3$SUM problem, with the goal of proving strong conditional lower bounds for static data structures via reductions. Ideally, the conjectured hardness of $3$SUM-Indexing should be replaced by an unconditional lower bound. Unfortunately, we are far from proving this, with the strongest current lower bound being a logarithm…
▽ More
The $3$SUM-Indexing problem was introduced as a data structure version of the $3$SUM problem, with the goal of proving strong conditional lower bounds for static data structures via reductions. Ideally, the conjectured hardness of $3$SUM-Indexing should be replaced by an unconditional lower bound. Unfortunately, we are far from proving this, with the strongest current lower bound being a logarithmic query time lower bound by Golovnev et al. from STOC'20. Moreover, their lower bound holds only for non-adaptive data structures and they explicitly asked for a lower bound for adaptive data structures. Our main contribution is precisely such a lower bound against adaptive data structures. As a secondary result, we also strengthen the non-adaptive lower bound of Golovnev et al. and prove strong lower bounds for $2$-bit-probe non-adaptive $3$SUM-Indexing data structures via a completely new approach that we find interesting in its own right.
△ Less
Submitted 25 March, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Towards Optimal Lower Bounds for k-median and k-means Coresets
Authors:
Vincent Cohen-Addad,
Kasper Green Larsen,
David Saulpic,
Chris Schwiegelshohn
Abstract:
Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized. Special cases include the famous k-median problem ($z = 1$) and k-means problem ($z = 2$). The $k$-median and $k$-means problems are at the heart of modern da…
▽ More
Given a set of points in a metric space, the $(k,z)$-clustering problem consists of finding a set of $k$ points called centers, such that the sum of distances raised to the power of $z$ of every data point to its closest center is minimized. Special cases include the famous k-median problem ($z = 1$) and k-means problem ($z = 2$). The $k$-median and $k$-means problems are at the heart of modern data analysis and massive data applications have given raise to the notion of coreset: a small (weighted) subset of the input point set preserving the cost of any solution to the problem up to a multiplicative $(1 \pm \varepsilon)$ factor, hence reducing from large to small scale the input to the problem.
In this paper, we present improved lower bounds for coresets in various metric spaces. In finite metrics consisting of $n$ points and doubling metrics with doubling constant $D$, we show that any coreset for $(k,z)$ clustering must consist of at least $Ω(k \varepsilon^{-2} \log n)$ and $Ω(k \varepsilon^{-2} D)$ points, respectively. Both bounds match previous upper bounds up to polylog factors. In Euclidean spaces, we show that any coreset for $(k,z)$ clustering must consists of at least $Ω(k\varepsilon^{-2})$ points. We complement these lower bounds with a coreset construction consisting of at most $\tilde{O}(k\varepsilon^{-2}\cdot \min(\varepsilon^{-z},k))$ points.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Optimality of the Johnson-Lindenstrauss Dimensionality Reduction for Practical Measures
Authors:
Yair Bartal,
Ora Nova Fandina,
Kasper Green Larsen
Abstract:
It is well known that the Johnson-Lindenstrauss dimensionality reduction method is optimal for worst case distortion. While in practice many other methods and heuristics are used, not much is known in terms of bounds on their performance. The question of whether the JL method is optimal for practical measures of distortion was recently raised in BFN19 (NeurIPS'19). They provided upper bounds on it…
▽ More
It is well known that the Johnson-Lindenstrauss dimensionality reduction method is optimal for worst case distortion. While in practice many other methods and heuristics are used, not much is known in terms of bounds on their performance. The question of whether the JL method is optimal for practical measures of distortion was recently raised in BFN19 (NeurIPS'19). They provided upper bounds on its quality for a wide range of practical measures and showed that indeed these are best possible in many cases. Yet, some of the most important cases, including the fundamental case of average distortion were left open. In particular, they show that the JL transform has $1+ε$ average distortion for embedding into $k$-dimensional Euclidean space, where $k=O(1/ε^2)$, and for more general $q$-norms of distortion, $k = O(\max\{1/ε^2,q/ε\})$, whereas tight lower bounds were established only for large values of $q$ via reduction to the worst case.
In this paper we prove that these bounds are best possible for any dimensionality reduction method, for any $1 \leq q \leq O(\frac{\log (2ε^2 n)}ε)$ and $ε\geq \frac{1}{\sqrt{n}}$, where $n$ is the size of the subset of Euclidean space.
Our results imply that the JL method is optimal for various distortion measures commonly used in practice such as stress, energy and relative error. We prove that if any of these measures is bounded by $ε$ then $k=Ω(1/ε^2)$ for any $ε\geq \frac{1}{\sqrt{n}}$, matching the upper bounds of BFN19 and extending their tightness results for the full range moment analysis.
Our results may indicate that the JL dimensionality reduction method should be considered more often in practical applications, and the bounds we provide for its quality should be served as a measure for comparison when evaluating the performance of other methods and heuristics.
△ Less
Submitted 15 March, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Compression Implies Generalization
Authors:
Allan Grønlund,
Mikael Høgsgaard,
Lior Kamma,
Kasper Green Larsen
Abstract:
Explaining the surprising generalization performance of deep neural networks is an active and important line of research in theoretical machine learning. Influential work by Arora et al. (ICML'18) showed that, noise stability properties of deep nets occurring in practice can be used to provably compress model representations. They then argued that the small representations of compressed networks i…
▽ More
Explaining the surprising generalization performance of deep neural networks is an active and important line of research in theoretical machine learning. Influential work by Arora et al. (ICML'18) showed that, noise stability properties of deep nets occurring in practice can be used to provably compress model representations. They then argued that the small representations of compressed networks imply good generalization performance albeit only of the compressed nets. Extending their compression framework to yield generalization bounds for the original uncompressed networks remains elusive.
Our main contribution is the establishment of a compression-based framework for proving generalization bounds. The framework is simple and powerful enough to extend the generalization bounds by Arora et al. to also hold for the original network. To demonstrate the flexibility of the framework, we also show that it allows us to give simple proofs of the strongest known generalization bounds for other popular machine learning models, namely Support Vector Machines and Boosting.
△ Less
Submitted 1 July, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Property-Preserving Hash Functions from Standard Assumptions
Authors:
Nils Fleischhacker,
Kasper Green Larsen,
and Mark Simkin
Abstract:
Property-preserving hash functions allow for compressing long inputs $x_0$ and $x_1$ into short hashes $h(x_0)$ and $h(x_1)$ in a manner that allows for computing a predicate $P(x_0, x_1)$ given only the two hash values without having access to the original data. Such hash functions are said to be adversarially robust if an adversary that gets to pick $x_0$ and $x_1$ after the hash function has be…
▽ More
Property-preserving hash functions allow for compressing long inputs $x_0$ and $x_1$ into short hashes $h(x_0)$ and $h(x_1)$ in a manner that allows for computing a predicate $P(x_0, x_1)$ given only the two hash values without having access to the original data. Such hash functions are said to be adversarially robust if an adversary that gets to pick $x_0$ and $x_1$ after the hash function has been sampled, cannot find inputs for which the predicate evaluated on the hash values outputs the incorrect result.
In this work we construct robust property-preserving hash functions for the hamming-distance predicate which distinguishes inputs with a hamming distance at least some threshold $t$ from those with distance less than $t$. The security of the construction is based on standard lattice hardness assumptions.
Our construction has several advantages over the best known previous construction by Fleischhacker and Simkin. Our construction relies on a single well-studied hardness assumption from lattice cryptography whereas the previous work relied on a newly introduced family of computational hardness assumptions. In terms of computational effort, our construction only requires a small number of modular additions per input bit, whereas previously several exponentiations per bit as well as the interpolation and evaluation of high-degree polynomials over large fields were required. An additional benefit of our construction is that the description of the hash function can be compressed to $λ$ bits assuming a random oracle. Previous work has descriptions of length $\mathcal{O}(\ell λ)$ bits for input bit-length $\ell$, which has a secret structure and thus cannot be compressed.
We prove a lower bound on the output size of any property-preserving hash function for the hamming distance predicate. The bound shows that the size of our hash value is not far from optimal.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
Efficient Local Computation of Differential Bisimulations via Coupling and Up-to Methods
Authors:
Giorgio Bacci,
Giovanni Bacci,
Kim G. Larsen,
Mirco Tribastone,
Max Tschaikowski,
Andrea Vandin
Abstract:
We introduce polynomial couplings, a generalization of probabilistic couplings, to develop an algorithm for the computation of equivalence relations which can be interpreted as a lifting of probabilistic bisimulation to polynomial differential equations, a ubiquitous model of dynamical systems across science and engineering. The algorithm enjoys polynomial time complexity and complements classical…
▽ More
We introduce polynomial couplings, a generalization of probabilistic couplings, to develop an algorithm for the computation of equivalence relations which can be interpreted as a lifting of probabilistic bisimulation to polynomial differential equations, a ubiquitous model of dynamical systems across science and engineering. The algorithm enjoys polynomial time complexity and complements classical partition-refinement approaches because: (a) it implements a local exploration of the system, possibly yielding equivalences that do not necessarily involve the inspection of the whole system of differential equations; (b) it can be enhanced by up-to techniques; and (c) it allows the specification of pairs which ought not to be included in the output. Using a prototype, these advantages are demonstrated on case studies from systems biology for applications to model reduction and comparison. Notably, we report four orders of magnitude smaller runtimes than partition-refinement approaches when disproving equivalences between Markov chains.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
CountSketches, Feature Hashing and the Median of Three
Authors:
Kasper Green Larsen,
Rasmus Pagh,
Jakub Tětek
Abstract:
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector $v$ to a vector of dimension $(2t-1) s$, where $t, s > 0$ are integer parameters. It is known that even for $t=1$, a CountSketch allows estimating coordinates of $v$ with variance bounded by $\|v\|_2^2/s$. For $t > 1$, the estimator takes the median o…
▽ More
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector $v$ to a vector of dimension $(2t-1) s$, where $t, s > 0$ are integer parameters. It is known that even for $t=1$, a CountSketch allows estimating coordinates of $v$ with variance bounded by $\|v\|_2^2/s$. For $t > 1$, the estimator takes the median of $2t-1$ independent estimates, and the probability that the estimate is off by more than $2 \|v\|_2/\sqrt{s}$ is exponentially small in $t$. This suggests choosing $t$ to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant $t$. Previous work only predicts a constant factor improvement in this setting.
Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to $O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})$ when $t > 1$. That is, the variance decreases proportionally to $s^{-2}$, asymptotically for large enough $s$. We also study the variance in the setting where an inner product is to be estimated from two CountSketches. This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median estimator, can be made more reliable at a small cost in settings where using a median estimator is possible.
We confirm our theoretical findings in experiments and thereby help justify why a small constant number of estimates often suffice in practice. Our improved variance bounds are based on new general theorems about the variance and higher moments of the median of i.i.d. random variables that may be of independent interest.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Margins are Insufficient for Explaining Gradient Boosting
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen
Abstract:
Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which…
▽ More
Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which was recently proved to be near-tight for some data distributions (Gronlund et al. 2019). In this work, we first demonstrate that the $k$'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the $k$'th margin bound and prove a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explaining the performance of modern gradient boosters. Finally, we improve upon the recent generalization lower bound by Grønlund et al. (2019).
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Further Unifying the Landscape of Cell Probe Lower Bounds
Authors:
Kasper Green Larsen,
Jonathan Lindegaard Starup,
Jesper Steensgaard
Abstract:
In a landmark paper, Pǎtraşcu demonstrated how a single lower bound for the static data structure problem of reachability in the butterfly graph, could be used to derive a wealth of new and previous lower bounds via reductions. These lower bounds are tight for numerous static data structure problems. Moreover, he also showed that reachability in the butterfly graph reduces to dynamic marked ancest…
▽ More
In a landmark paper, Pǎtraşcu demonstrated how a single lower bound for the static data structure problem of reachability in the butterfly graph, could be used to derive a wealth of new and previous lower bounds via reductions. These lower bounds are tight for numerous static data structure problems. Moreover, he also showed that reachability in the butterfly graph reduces to dynamic marked ancestor, a classic problem used to prove lower bounds for dynamic data structures. Unfortunately, Pǎtraşcu's reduction to marked ancestor loses a $\lg \lg n$ factor and therefore falls short of fully recovering all the previous dynamic data structure lower bounds that follow from marked ancestor. In this paper, we revisit Pǎtraşcu's work and give a new lossless reduction to dynamic marked ancestor, thereby establishing reachability in the butterfly graph as a single seed problem from which a range of tight static and dynamic data structure lower bounds follow.
△ Less
Submitted 2 November, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Verification and Parameter Synthesis for Real-Time Programs using Refinement of Trace Abstraction
Authors:
Franck Cassez,
Peter Gjøl Jensen,
Kim Guldstrand Larsen
Abstract:
We address the safety verification and synthesis problems for real-time systems. We introduce real-time programs that are made of instructions that can perform assignments to discrete and real-valued variables. They are general enough to capture interesting classes of timed systems such as timed automata, stopwatch automata, time(d) Petri nets and hybrid automata.
We propose a semi-algorithm usi…
▽ More
We address the safety verification and synthesis problems for real-time systems. We introduce real-time programs that are made of instructions that can perform assignments to discrete and real-valued variables. They are general enough to capture interesting classes of timed systems such as timed automata, stopwatch automata, time(d) Petri nets and hybrid automata.
We propose a semi-algorithm using refinement of trace abstractions to solve both the reachability verification problem and the parameter synthesis problem for real-time programs.
All of the algorithms proposed have been implemented and we have conducted a series of experiments, comparing the performance of our new approach to state-of-the-art tools in classical reachability, robustness analysis and parameter synthesis for timed systems. We show that our new method provides solutions to problems which are unsolvable by the current state-of-the-art tools.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
It's Time to Play Safe: Shield Synthesis for Timed Systems
Authors:
Roderick Bloem,
Peter Gjøl Jensen,
Bettina Könighofer,
Kim Guldstrand Larsen,
Florian Lorber,
Alexander Palmisano
Abstract:
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before…
▽ More
Erroneous behaviour in safety critical real-time systems may inflict serious consequences. In this paper, we show how to synthesize timed shields from timed safety properties given as timed automata. A timed shield enforces the safety of a running system while interfering with the system as little as possible. We present timed post-shields and timed pre-shields. A timed pre-shield is placed before the system and provides a set of safe outputs. This set restricts the choices of the system. A timed post-shield is implemented after the system. It monitors the system and corrects the system's output only if necessary. We further extend the timed post-shield construction to provide a guarantee on the recovery phase, i.e., the time between a specification violation and the point at which full control can be handed back to the system. In our experimental results, we use timed post-shields to ensure the safety in a reinforcement learning setting for controlling a platoon of cars, during the learning and execution phase, and study the effect.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Approximating Euclidean by Imprecise Markov Decision Processes
Authors:
Manfred Jaeger,
Giorgio Bacci,
Giovanni Bacci,
Kim Guldstrand Larsen,
Peter Gjøl Jensen
Abstract:
Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In this paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated…
▽ More
Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In this paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations induced by increasingly fine partitions of the continuous state space. We show that for cost functions over finite time horizons the approximations become arbitrarily precise. Second, we use imprecise Markov decision process approximations as a tool to analyse and validate cost functions and strategies obtained by reinforcement learning. We find that, on the one hand, our new theoretical results validate basic design choices of a previously proposed reinforcement learning approach. On the other hand, the imprecise Markov decision process approximations reveal some inaccuracies in the learned cost functions.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
A general framework for defining and optimizing robustness
Authors:
Alessandro Tibo,
Manfred Jaeger,
Kim G. Larsen
Abstract:
Robustness of neural networks has recently attracted a great amount of interest. The many investigations in this area lack a precise common foundation of robustness concepts. Therefore, in this paper, we propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our robustness concept is based on postulates that robustness of a classifier shoul…
▽ More
Robustness of neural networks has recently attracted a great amount of interest. The many investigations in this area lack a precise common foundation of robustness concepts. Therefore, in this paper, we propose a rigorous and flexible framework for defining different types of robustness properties for classifiers. Our robustness concept is based on postulates that robustness of a classifier should be considered as a property that is independent of accuracy, and that it should be defined in purely mathematical terms without reliance on algorithmic procedures for its measurement. We develop a very general robustness framework that is applicable to any type of classification model, and that encompasses relevant robustness concepts for investigations ranging from safety against adversarial attacks to transferability of models to new domains. For two prototypical, distinct robustness objectives we then propose new learning approaches based on neural network co-training strategies for obtaining image classifiers optimized for these respective objectives.
△ Less
Submitted 29 May, 2021; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Near-Tight Margin-Based Generalization Bounds for Support Vector Machines
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen
Abstract:
Support Vector Machines (SVMs) are among the most fundamental tools for binary classification. In its simplest formulation, an SVM produces a hyperplane separating two classes of data using the largest possible margin to the data. The focus on maximizing the margin has been well motivated through numerous generalization bounds. In this paper, we revisit and improve the classic generalization bound…
▽ More
Support Vector Machines (SVMs) are among the most fundamental tools for binary classification. In its simplest formulation, an SVM produces a hyperplane separating two classes of data using the largest possible margin to the data. The focus on maximizing the margin has been well motivated through numerous generalization bounds. In this paper, we revisit and improve the classic generalization bounds in terms of margins. Furthermore, we complement our new generalization bound by a nearly matching lower bound, thus almost settling the generalization performance of SVMs in terms of margins.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Stubborn Set Reduction for Two-Player Reachability Games
Authors:
Frederik Meyer Bønneland,
Peter Gjøl Jensen,
Kim Guldstrand Larsen,
Marco Muñiz,
Jiří Srba
Abstract:
Partial order reductions have been successfully applied to model checking of concurrent systems and practical applications of the technique show nontrivial reduction in the size of the explored state space. We present a theory of partial order reduction based on stubborn sets in the game-theoretical setting of 2-player games with reachability objectives. Our stubborn reduction allows us to prune t…
▽ More
Partial order reductions have been successfully applied to model checking of concurrent systems and practical applications of the technique show nontrivial reduction in the size of the explored state space. We present a theory of partial order reduction based on stubborn sets in the game-theoretical setting of 2-player games with reachability objectives. Our stubborn reduction allows us to prune the interleaving behaviour of both players in the game, and we formally prove its correctness on the class of games played on general labelled transition systems. We then instantiate the framework to the class of weighted Petri net games with inhibitor arcs and provide its efficient implementation in the model checker TAPAAL. Finally, we evaluate our stubborn reduction on several case studies and demonstrate its efficiency.
△ Less
Submitted 17 March, 2021; v1 submitted 20 December, 2019;
originally announced December 2019.
-
Margin-Based Generalization Lower Bounds for Boosted Classifiers
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen,
Alexander Mathiasen,
Jelani Nelson
Abstract:
Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date,…
▽ More
Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date, the strongest known generalization (upper bound) is the $k$th margin bound of Gao and Zhou (2013). Despite the numerous generalization upper bounds that have been proved over the last two decades, nothing is known about the tightness of these bounds. In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers. Our lower bounds nearly match the $k$th margin bound and thus almost settle the generalization performance of boosted classifiers in terms of margins.
△ Less
Submitted 7 May, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Optimal Learning of Joint Alignments with a Faulty Oracle
Authors:
Kasper Green Larsen,
Michael Mitzenmacher,
Charalampos E. Tsourakakis
Abstract:
We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover $n$ discrete variables $g_i \in \{0, \ldots, k-1\}$ (up to some global offset) given noisy observations of a set of their pairwise differences $\{(g_i - g_j) \bmod k\}$; specifically, with probability $\frac{1}{k}+δ$ for some $δ> 0$ one obtains the correct answer, and…
▽ More
We consider the following problem, which is useful in applications such as joint image and shape alignment. The goal is to recover $n$ discrete variables $g_i \in \{0, \ldots, k-1\}$ (up to some global offset) given noisy observations of a set of their pairwise differences $\{(g_i - g_j) \bmod k\}$; specifically, with probability $\frac{1}{k}+δ$ for some $δ> 0$ one obtains the correct answer, and with the remaining probability one obtains a uniformly random incorrect answer. We consider a learning-based formulation where one can perform a query to observe a pairwise difference, and the goal is to perform as few queries as possible while obtaining the exact joint alignment. We provide an easy-to-implement, time efficient algorithm that performs $O\big(\frac{n \lg n}{k δ^2}\big)$ queries, and recovers the joint alignment with high probability. We also show that our algorithm is optimal by proving a general lower bound that holds for all non-adaptive algorithms. Our work improves significantly recent work by Chen and Candés \cite{chen2016projected}, who view the problem as a constrained principal components analysis problem that can be solved using the power method. Specifically, our approach is simpler both in the algorithm and the analysis, and provides additional insights into the problem structure.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
Computing Probabilistic Bisimilarity Distances for Probabilistic Automata
Authors:
Giorgio Bacci,
Giovanni Bacci,
Kim G. Larsen,
Radu Mardare,
Qiyi Tang,
Franck van Breugel
Abstract:
The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's si…
▽ More
The probabilistic bisimilarity distance of Deng et al. has been proposed as a robust quantitative generalization of Segala and Lynch's probabilistic bisimilarity for probabilistic automata. In this paper, we present a characterization of the bisimilarity distance as the solution of a simple stochastic game. The characterization gives us an algorithm to compute the distances by applying Condon's simple policy iteration on these games. The correctness of Condon's approach, however, relies on the assumption that the games are stopping. Our games may be non-stopping in general, yet we are able to prove termination for this extended class of games. Already other algorithms have been proposed in the literature to compute these distances, with complexity in $\textbf{UP} \cap \textbf{coUP}$ and \textbf{PPAD}. Despite the theoretical relevance, these algorithms are inefficient in practice. To the best of our knowledge, our algorithm is the first practical solution.
The characterization of the probabilistic bisimilarity distance mentioned above crucially uses a dual presentation of the Hausdorff distance due to Mémoli. As an additional contribution, in this paper we show that Mémoli's result can be used also to prove that the bisimilarity distance bounds the difference in the maximal (or minimal) probability of two states to satisfying arbitrary $ω$-regular properties, expressed, eg., as LTL formulas.
△ Less
Submitted 1 February, 2021; v1 submitted 3 July, 2019;
originally announced July 2019.
-
L*-Based Learning of Markov Decision Processes (Extended Version)
Authors:
Martin Tappler,
Bernhard K. Aichernig,
Giovanni Bacci,
Maria Eichlseder,
Kim G. Larsen
Abstract:
Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient.
An influential active learning technique is Angluin's L* algo…
▽ More
Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient.
An influential active learning technique is Angluin's L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.
△ Less
Submitted 28 June, 2019;
originally announced June 2019.
-
SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes
Authors:
Pranav Ashok,
Jan Křetínský,
Kim Guldstrand Larsen,
Adrien Le Coënt,
Jakob Haahr Taankvist,
Maximilian Weininger
Abstract:
For hybrid Markov decision processes, UPPAAL Stratego can compute strategies that are safe for a given safety property and (in the limit) optimal for a given cost function. Unfortunately, these strategies cannot be exported easily since they are computed as a very long list. In this paper, we demonstrate methods to learn compact representations of the strategies in the form of decision trees. Thes…
▽ More
For hybrid Markov decision processes, UPPAAL Stratego can compute strategies that are safe for a given safety property and (in the limit) optimal for a given cost function. Unfortunately, these strategies cannot be exported easily since they are computed as a very long list. In this paper, we demonstrate methods to learn compact representations of the strategies in the form of decision trees. These decision trees are much smaller, more understandable, and can easily be exported as code that can be loaded into embedded systems. Despite the size compression and actual differences to the original strategy, we provide guarantees on both safety and optimality of the decision-tree strategy. On the top, we show how to obtain yet smaller representations, which are still guaranteed safe, but achieve a desired trade-off between size and optimality.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Lower Bounds for Oblivious Near-Neighbor Search
Authors:
Kasper Green Larsen,
Tal Malkin,
Omri Weinstein,
Kevin Yeo
Abstract:
We prove an $Ω(d \lg n/ (\lg\lg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $\mathit{oblivious}$ approximate-near-neighbor search ($\mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Θ(\log n)$, our result implies an $\tildeΩ(\lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower boun…
▽ More
We prove an $Ω(d \lg n/ (\lg\lg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $\mathit{oblivious}$ approximate-near-neighbor search ($\mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Θ(\log n)$, our result implies an $\tildeΩ(\lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for $\mathsf{ANN}$. This is the first super-logarithmic $\mathit{unconditional}$ lower bound for $\mathsf{ANN}$ against general (non black-box) data structures. We also show that any oblivious $\mathit{static}$ data structure for decomposable search problems (like $\mathsf{ANN}$) can be obliviously dynamized with $O(\log n)$ overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Lower Bounds for Multiplication via Network Coding
Authors:
Peyman Afshani,
Casper Benjamin Freksen,
Lior Kamma,
Kasper Green Larsen
Abstract:
Multiplication is one of the most fundamental computational problems, yet its true complexity remains elusive. The best known upper bound, by Fürer, shows that two $n$-bit numbers can be multiplied via a boolean circuit of size $O(n \lg n \cdot 4^{\lg^*n})$, where $\lg^*n$ is the very slowly growing iterated logarithm. In this work, we prove that if a central conjecture in the area of network codi…
▽ More
Multiplication is one of the most fundamental computational problems, yet its true complexity remains elusive. The best known upper bound, by Fürer, shows that two $n$-bit numbers can be multiplied via a boolean circuit of size $O(n \lg n \cdot 4^{\lg^*n})$, where $\lg^*n$ is the very slowly growing iterated logarithm. In this work, we prove that if a central conjecture in the area of network coding is true, then any constant degree boolean circuit for multiplication must have size $Ω(n \lg n)$, thus almost completely settling the complexity of multiplication circuits. We additionally revisit classic conjectures in circuit complexity, due to Valiant, and show that the network coding conjecture also implies one of Valiant's conjectures.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.