subscribe to arXiv mailings

Comparing Differentiable Logics for Learning with Logical Constraints

Authors: Thomas Flinkow, Barak A. Pearlmutter, Rosemary Monahan

Abstract: Extensive research on formal verification of machine learning systems indicates that learning from data alone often fails to capture underlying background knowledge such as specifications implicitly available in the data. Various neural network verifiers have been developed to ensure that a machine-learnt model satisfies correctness and safety properties, however, they typically assume a trained n… ▽ More Extensive research on formal verification of machine learning systems indicates that learning from data alone often fails to capture underlying background knowledge such as specifications implicitly available in the data. Various neural network verifiers have been developed to ensure that a machine-learnt model satisfies correctness and safety properties, however, they typically assume a trained network with fixed weights. A promising approach for creating machine learning models that inherently satisfy constraints after training is to encode background knowledge as explicit logical constraints that guide the learning process via so-called differentiable logics. In this paper, we experimentally compare and evaluate various logics from the literature, presenting our findings and highlighting open problems for future work. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 20 pages, 8 figures. Submitted to Science of Computer Programming

arXiv:2402.06751 [pdf, other]

Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse

Authors: Bradley T. Baker, Barak A. Pearlmutter, Robyn Miller, Vince D. Calhoun, Sergey M. Plis

Abstract: Our understanding of learning dynamics of deep neural networks (DNNs) remains incomplete. Recent research has begun to uncover the mathematical principles underlying these networks, including the phenomenon of "Neural Collapse", where linear classifiers within DNNs converge to specific geometrical structures during late-stage training. However, the role of geometric constraints in learning extends… ▽ More Our understanding of learning dynamics of deep neural networks (DNNs) remains incomplete. Recent research has begun to uncover the mathematical principles underlying these networks, including the phenomenon of "Neural Collapse", where linear classifiers within DNNs converge to specific geometrical structures during late-stage training. However, the role of geometric constraints in learning extends beyond this terminal phase. For instance, gradients in fully-connected layers naturally develop a low-rank structure due to the accumulation of rank-one outer products over a training batch. Despite the attention given to methods that exploit this structure for memory saving or regularization, the emergence of low-rank learning as an inherent aspect of certain DNN architectures has been under-explored. In this paper, we conduct a comprehensive study of gradient rank in DNNs, examining how architectural choices and structure of the data effect gradient rank bounds. Our theoretical analysis provides these bounds for training fully-connected, recurrent, and convolutional neural networks. We also demonstrate, both theoretically and empirically, how design choices like activation function linearity, bottleneck layer introduction, convolutional stride, and sequence truncation influence these bounds. Our findings not only contribute to the understanding of learning dynamics in DNNs, but also provide practical guidance for deep learning engineers to make informed design decisions. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2311.09809 [pdf, other]

doi 10.4204/EPTCS.395.3

Comparing Differentiable Logics for Learning Systems: A Research Preview

Authors: Thomas Flinkow, Barak A. Pearlmutter, Rosemary Monahan

Abstract: Extensive research on formal verification of machine learning (ML) systems indicates that learning from data alone often fails to capture underlying background knowledge. A variety of verifiers have been developed to ensure that a machine-learnt model satisfies correctness and safety properties, however, these verifiers typically assume a trained network with fixed weights. ML-enabled autonomous s… ▽ More Extensive research on formal verification of machine learning (ML) systems indicates that learning from data alone often fails to capture underlying background knowledge. A variety of verifiers have been developed to ensure that a machine-learnt model satisfies correctness and safety properties, however, these verifiers typically assume a trained network with fixed weights. ML-enabled autonomous systems are required to not only detect incorrect predictions, but should also possess the ability to self-correct, continuously improving and adapting. A promising approach for creating ML models that inherently satisfy constraints is to encode background knowledge as logical constraints that guide the learning process via so-called differentiable logics. In this research preview, we compare and evaluate various logics from the literature in weakly-supervised contexts, presenting our findings and highlighting open problems for future work. Our experimental results are broadly consistent with results reported previously in literature; however, learning with differentiable logics introduces a new hyperparameter that is difficult to tune and has significant influence on the effectiveness of the logics. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: In Proceedings FMAS 2023, arXiv:2311.08987

Journal ref: EPTCS 395, 2023, pp. 17-29

arXiv:2306.15545 [pdf, ps, other]

doi 10.14569/IJACSA.2023.0140805

Visualization of AI Systems in Virtual Reality: A Comprehensive Review

Authors: Medet Inkarbekov, Rosemary Monahan, Barak A. Pearlmutter

Abstract: This study provides a comprehensive review of the utilization of Virtual Reality (VR) for visualizing Artificial Intelligence (AI) systems, drawing on 18 selected studies. The results illuminate a complex interplay of tools, methods, and approaches, notably the prominence of VR engines like Unreal Engine and Unity. However, despite these tools, a universal solution for effective AI visualization r… ▽ More This study provides a comprehensive review of the utilization of Virtual Reality (VR) for visualizing Artificial Intelligence (AI) systems, drawing on 18 selected studies. The results illuminate a complex interplay of tools, methods, and approaches, notably the prominence of VR engines like Unreal Engine and Unity. However, despite these tools, a universal solution for effective AI visualization remains elusive, reflecting the unique strengths and limitations of each technique. We observed the application of VR for AI visualization across multiple domains, despite challenges such as high data complexity and cognitive load. Moreover, it briefly discusses the emerging ethical considerations pertaining to the broad integration of these technologies. Despite these challenges, the field shows significant potential, emphasizing the need for dedicated research efforts to unlock the full potential of these immersive technologies. This review, therefore, outlines a roadmap for future research, encouraging innovation in visualization techniques, addressing identified challenges, and considering the ethical implications of VR and AI convergence. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 19 pages

arXiv:2202.08587 [pdf, other]

Gradients without Backpropagation

Authors: Atılım Güneş Baydin, Barak A. Pearlmutter, Don Syme, Frank Wood, Philip Torr

Abstract: Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can comp… ▽ More Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 10 pages, 6 figures

MSC Class: 68T07 ACM Class: I.2.6; I.2.5

arXiv:2111.00343 [pdf, other]

Continuous Convolutional Neural Networks: Coupled Neural PDE and ODE

Authors: Mansura Habiba, Barak A. Pearlmutter

Abstract: Recent work in deep learning focuses on solving physical systems in the Ordinary Differential Equation or Partial Differential Equation. This current work proposed a variant of Convolutional Neural Networks (CNNs) that can learn the hidden dynamics of a physical system using ordinary differential equation (ODEs) systems (ODEs) and Partial Differential Equation systems (PDEs). Instead of considerin… ▽ More Recent work in deep learning focuses on solving physical systems in the Ordinary Differential Equation or Partial Differential Equation. This current work proposed a variant of Convolutional Neural Networks (CNNs) that can learn the hidden dynamics of a physical system using ordinary differential equation (ODEs) systems (ODEs) and Partial Differential Equation systems (PDEs). Instead of considering the physical system such as image, time -series as a system of multiple layers, this new technique can model a system in the form of Differential Equation (DEs). The proposed method has been assessed by solving several steady-state PDEs on irregular domains, including heat equations, Navier-Stokes equations. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Comments: Proc. of the International Conference on Electrical, Computer and Energy Technologies (ICECET)

arXiv:2111.00326 [pdf, other]

Neural Network based on Automatic Differentiation Transformation of Numeric Iterate-to-Fixedpoint

Authors: Mansura Habiba, Barak A. Pearlmutter

Abstract: This work proposes a Neural Network model that can control its depth using an iterate-to-fixed-point operator. The architecture starts with a standard layered Network but with added connections from current later to earlier layers, along with a gate to make them inactive under most circumstances. These ``temporal wormhole'' connections create a shortcut that allows the Neural Network to use the in… ▽ More This work proposes a Neural Network model that can control its depth using an iterate-to-fixed-point operator. The architecture starts with a standard layered Network but with added connections from current later to earlier layers, along with a gate to make them inactive under most circumstances. These ``temporal wormhole'' connections create a shortcut that allows the Neural Network to use the information available at deeper layers and re-do earlier computations with modulated inputs. End-to-end training is accomplished by using appropriate calculations for a numeric iterate-to-fixed-point operator. In a typical case, where the ``wormhole'' connections are inactive, this is inexpensive; but when they are active, the network takes a longer time to settle down, and the gradient calculation is also more laborious, with an effect similar to making the network deeper. In contrast to the existing skip-connection concept, this proposed technique enables information to flow up and down in the network. Furthermore, the flow of information follows a fashion that seems analogous to the afferent and efferent flow of information through layers of processing in the brain. We evaluate models that use this novel mechanism on different long-term dependency tasks. The results are competitive with other studies, showing that the proposed model contributes significantly to overcoming traditional deep learning models' vanishing gradient descent problem. At the same time, the training time is significantly reduced, as the ``easy'' input cases are processed more quickly than ``difficult'' ones. △ Less

Submitted 30 October, 2021; originally announced November 2021.

Comments: Proc. of the International Conference on Electrical, Computer and Energy Technologies (ICECET)

arXiv:2111.00314 [pdf, other]

ECG synthesis with Neural ODE and GAN models

Authors: Mansura Habiba, Eoin Brophy, Barak A. Pearlmutter, Tomas Ward

Abstract: Continuous medical time series data such as ECG is one of the most complex time series due to its dynamic and high dimensional characteristics. In addition, due to its sensitive nature, privacy concerns and legal restrictions, it is often even complex to use actual data for different medical research. As a result, generating continuous medical time series is a very critical research area. Several… ▽ More Continuous medical time series data such as ECG is one of the most complex time series due to its dynamic and high dimensional characteristics. In addition, due to its sensitive nature, privacy concerns and legal restrictions, it is often even complex to use actual data for different medical research. As a result, generating continuous medical time series is a very critical research area. Several research works already showed that the ability of generative adversarial networks (GANs) in the case of continuous medical time series generation is promising. Most medical data generation works, such as ECG synthesis, are mainly driven by the GAN model and its variation. On the other hand, Some recent work on Neural Ordinary Differential Equation (Neural ODE) demonstrates its strength against informative missingness, high dimension as well as dynamic nature of continuous time series. Instead of considering continuous-time series as a discrete-time sequence, Neural ODE can train continuous time series in real-time continuously. In this work, we used Neural ODE based model to generate synthetic sine waves and synthetic ECG. We introduced a new technique to design the generative adversarial network with Neural ODE based Generator and Discriminator. We developed three new models to synthesise continuous medical data. Different evaluation metrics are then used to quantitatively assess the quality of generated synthetic data for real-world applications and data analysis. Another goal of this work is to combine the strength of GAN and Neural ODE to generate synthetic continuous medical time series data such as ECG. We also evaluated both the GAN model and the Neural ODE model to understand the comparative efficiency of models from the GAN and Neural ODE family in medical data synthesis. △ Less

Submitted 6 June, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Proc. of the International Conference on Electrical, Computer and Energy Technologies (ICECET), 9-10 December 2021, Cape Town-South Africa

arXiv:2105.06168 [pdf, other]

HeunNet: Extending ResNet using Heun's Methods

Authors: Mehrdad Maleki, Mansura Habiba, Barak A. Pearlmutter

Abstract: There is an analogy between the ResNet (Residual Network) architecture for deep neural networks and an Euler solver for an ODE. The transformation performed by each layer resembles an Euler step in solving an ODE. We consider the Heun Method, which involves a single predictor-corrector cycle, and complete the analogy, building a predictor-corrector variant of ResNet, which we call a HeunNet. Just… ▽ More There is an analogy between the ResNet (Residual Network) architecture for deep neural networks and an Euler solver for an ODE. The transformation performed by each layer resembles an Euler step in solving an ODE. We consider the Heun Method, which involves a single predictor-corrector cycle, and complete the analogy, building a predictor-corrector variant of ResNet, which we call a HeunNet. Just as Heun's method is more accurate than Euler's, experiments show that HeunNet achieves high accuracy with low computational (both training and test) time compared to both vanilla recurrent neural networks and other ResNet variants. △ Less

Submitted 14 May, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

Comments: Irish Signals & Systems Conference 2021

arXiv:2005.10693 [pdf, other]

Neural ODEs for Informative Missingness in Multivariate Time Series

Authors: Mansura Habiba, Barak A. Pearlmutter

Abstract: Informative missingness is unavoidable in the digital processing of continuous time series, where the value for one or more observations at different time points are missing. Such missing observations are one of the major limitations of time series processing using deep learning. Practical applications, e.g., sensor data, healthcare, weather, generates data that is in truth continuous in time, and… ▽ More Informative missingness is unavoidable in the digital processing of continuous time series, where the value for one or more observations at different time points are missing. Such missing observations are one of the major limitations of time series processing using deep learning. Practical applications, e.g., sensor data, healthcare, weather, generates data that is in truth continuous in time, and informative missingness is a common phenomenon in these datasets. These datasets often consist of multiple variables, and often there are missing values for one or many of these variables. This characteristic makes time series prediction more challenging, and the impact of missing input observations on the accuracy of the final output can be significant. A recent novel deep learning model called GRU-D is one early attempt to address informative missingness in time series data. On the other hand, a new family of neural networks called Neural ODEs (Ordinary Differential Equations) are natural and efficient for processing time series data which is continuous in time. In this paper, a deep learning model is proposed that leverages the effective imputation of GRU-D, and the temporal continuity of Neural ODEs. A time series classification task performed on the PhysioNet dataset demonstrates the performance of this architecture. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:2005.09807 [pdf, other]

Neural Ordinary Differential Equation based Recurrent Neural Network Model

Authors: Mansura Habiba, Barak A. Pearlmutter

Abstract: Neural differential equations are a promising new member in the neural network family. They show the potential of differential equations for time series data analysis. In this paper, the strength of the ordinary differential equation (ODE) is explored with a new extension. The main goal of this work is to answer the following questions: (i)~can ODE be used to redefine the existing neural network m… ▽ More Neural differential equations are a promising new member in the neural network family. They show the potential of differential equations for time series data analysis. In this paper, the strength of the ordinary differential equation (ODE) is explored with a new extension. The main goal of this work is to answer the following questions: (i)~can ODE be used to redefine the existing neural network model? (ii)~can Neural ODEs solve the irregular sampling rate challenge of existing neural network models for a continuous time series, i.e., length and dynamic nature, (iii)~how to reduce the training and evaluation time of existing Neural ODE systems? This work leverages the mathematical foundation of ODEs to redesign traditional RNNs such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). The main contribution of this paper is to illustrate the design of two new ODE-based RNN models (GRU-ODE model and LSTM-ODE) which can compute the hidden state and cell state at any point of time using an ODE solver. These models reduce the computation overhead of hidden state and cell state by a vast amount. The performance evaluation of these two new models for learning continuous time series with irregular sampling rate is then demonstrated. Experiments show that these new ODE based RNN models require less training time than Latent ODEs and conventional Neural ODEs. They can achieve higher accuracy quickly, and the design of the neural network is simpler than, previous neural ODE systems. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:1911.03028 [pdf, other]

Lock-Free Hopscotch Hashing

Authors: Robert Kelly, Barak A. Pearlmutter, Phil Maguire

Abstract: In this paper we present a lock-free version of Hopscotch Hashing. Hopscotch Hashing is an open addressing algorithm originally proposed by Herlihy, Shavit, and Tzafrir, which is known for fast performance and excellent cache locality. The algorithm allows users of the table to skip or jump over irrelevant entries, allowing quick search, insertion, and removal of entries. Unlike traditional linear… ▽ More In this paper we present a lock-free version of Hopscotch Hashing. Hopscotch Hashing is an open addressing algorithm originally proposed by Herlihy, Shavit, and Tzafrir, which is known for fast performance and excellent cache locality. The algorithm allows users of the table to skip or jump over irrelevant entries, allowing quick search, insertion, and removal of entries. Unlike traditional linear probing, Hopscotch Hashing is capable of operating under a high load factor, as probe counts remain small. Our lock-free version improves on both speed, cache locality, and progress guarantees of the original, being a chimera of two concurrent hash tables. We compare our data structure to various other lock-free and blocking hashing algorithms and show that its performance is in many cases superior to existing strategies. The proposed lock-free version overcomes some of the drawbacks associated with the original blocking version, leading to a substantial boost in scalability while maintaining attractive features like physical deletion or probe-chain compression. △ Less

Submitted 7 November, 2019; originally announced November 2019.

Comments: 15 pages, to appear in APOCS20

arXiv:1809.04339 [pdf, other]

Concurrent Robin Hood Hashing

Authors: Robert Kelly, Barak A. Pearlmutter, Phil Maguire

Abstract: In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algorithm. We present a non-blocking obstruction-free K-CAS Robin Hood algorithm which requires only a single word compare-and-swap primitive, thus making it highly portable. The implementation maintains the attractive properties of the original Robin Hood structure, such as a low expected probe length,… ▽ More In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algorithm. We present a non-blocking obstruction-free K-CAS Robin Hood algorithm which requires only a single word compare-and-swap primitive, thus making it highly portable. The implementation maintains the attractive properties of the original Robin Hood structure, such as a low expected probe length, capability to operate effectively under a high load factor and good cache locality, all of which are essential for high performance on modern computer architectures. We compare our data-structures to various other lock-free and concurrent algorithms, as well as a simple hardware transactional variant, and show that our implementation performs better across a number of contexts. △ Less

Submitted 14 November, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

Comments: 16 pages, 12 figures

arXiv:1708.06799 [pdf, ps, other]

doi 10.1080/10556788.2018.1459621

Divide-and-Conquer Checkpointing for Arbitrary Programs with No User Annotation

Authors: Jeffrey Mark Siskind, Barak A. Pearlmutter

Abstract: Classical reverse-mode automatic differentiation (AD) imposes only a small constant-factor overhead in operation count over the original computation, but has storage requirements that grow, in the worst case, in proportion to the time consumed by the original computation. This storage blowup can be ameliorated by checkpointing, a process that reorders application of classical reverse-mode AD over… ▽ More Classical reverse-mode automatic differentiation (AD) imposes only a small constant-factor overhead in operation count over the original computation, but has storage requirements that grow, in the worst case, in proportion to the time consumed by the original computation. This storage blowup can be ameliorated by checkpointing, a process that reorders application of classical reverse-mode AD over an execution interval to tradeoff space \vs\ time. Application of checkpointing in a divide-and-conquer fashion to strategically chosen nested execution intervals can break classical reverse-mode AD into stages which can reduce the worst-case growth in storage from linear to sublinear. Doing this has been fully automated only for computations of particularly simple form, with checkpoints spanning execution intervals resulting from a limited set of program constructs. Here we show how the technique can be automated for arbitrary computations. The essential innovation is to apply the technique at the level of the language implementation itself, thus allowing checkpoints to span any execution interval. △ Less

Submitted 29 March, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

MSC Class: 68N20; 68N18; 65F50; 65D25; 46G05; 58C20

Journal ref: Optimization Methods and Software 33(04-06):1288-1330, 2018

arXiv:1611.03777 [pdf, ps, other]

Tricks from Deep Learning

Authors: Atılım Güneş Baydin, Barak A. Pearlmutter, Jeffrey Mark Siskind

Abstract: The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectiv… ▽ More The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectively optimized. The result now dominates much of practical machine learning, with applications in machine translation, computer vision, and speech recognition. Many of these methods, viewed through the lens of algorithmic differentiation (AD), can be seen as either addressing issues with the gradient itself, or finding ways of achieving increased efficiency using tricks that are AD-related, but not provided by current AD systems. The goal of this paper is to explain not just those methods of most relevance to AD, but also the technical constraints and mindset which led to their discovery. After explaining this context, we present a "laundry list" of methods developed by the deep learning community. Two of these are discussed in further mathematical detail: a way to dramatically reduce the size of the tape when performing reverse-mode AD on a (theoretically) time-reversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton's method. △ Less

Submitted 10 November, 2016; originally announced November 2016.