subscribe to arXiv mailings

doi 10.1088/2632-2153/ad5926

Mixed Noise and Posterior Estimation with Conditional DeepGEM

Authors: Paul Hagemann, Johannes Hertrich, Maren Casfor, Sebastian Heidenreich, Gabriele Steidl

Abstract: Motivated by indirect measurements and applications from nanometrology with a mixed noise model, we develop a novel algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that appro… ▽ More Motivated by indirect measurements and applications from nanometrology with a mixed noise model, we develop a novel algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse KL, and show that our model is able to incorporate information from many measurements, unlike previous approaches. △ Less

Submitted 5 July, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Published in Machine Learning: Science and Technology

Journal ref: Machine Learning: Science and Technology, Volume 5, Number 3, 2024

arXiv:2401.08260 [pdf, other]

Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms

Authors: Johannes Hertrich

Abstract: Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-di… ▽ More Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations. △ Less

Submitted 12 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.16611 [pdf, other]

Learning from small data sets: Patch-based regularizers in inverse problems for image reconstruction

Authors: Moritz Piening, Fabian Altekrüger, Johannes Hertrich, Paul Hagemann, Andrea Walther, Gabriele Steidl

Abstract: The solution of inverse problems is of fundamental interest in medical and astronomical imaging, geophysics as well as engineering and life sciences. Recent advances were made by using methods from machine learning, in particular deep neural networks. Most of these methods require a huge amount of (paired) data and computer capacity to train the networks, which often may not be available. Our pape… ▽ More The solution of inverse problems is of fundamental interest in medical and astronomical imaging, geophysics as well as engineering and life sciences. Recent advances were made by using methods from machine learning, in particular deep neural networks. Most of these methods require a huge amount of (paired) data and computer capacity to train the networks, which often may not be available. Our paper addresses the issue of learning from small data sets by taking patches of very few images into account. We focus on the combination of model-based and data-driven methods by approximating just the image prior, also known as regularizer in the variational model. We review two methodically different approaches, namely optimizing the maximum log-likelihood of the patch distribution, and penalizing Wasserstein-like discrepancies of whole empirical patch distributions. From the point of view of Bayesian inverse problems, we show how we can achieve uncertainty quantification by approximating the posterior using Langevin Monte Carlo methods. We demonstrate the power of the methods in computed tomography, image super-resolution, and inpainting. Indeed, the approach provides also high-quality results in zero-shot super-resolution, where only a low-resolution image is available. The paper is accompanied by a GitHub repository containing implementations of all methods as well as data examples so that the reader can get their own insight into the performance. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2310.03054 [pdf, other]

Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel

Authors: Paul Hagemann, Johannes Hertrich, Fabian Altekrüger, Robert Beinert, Jannis Chemseddine, Gabriele Steidl

Abstract: We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wassers… ▽ More We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings. △ Less

Submitted 21 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Published as a conference paper at ICLR 2024

arXiv:2305.11463 [pdf, other]

Generative Sliced MMD Flows with Riesz Kernels

Authors: Johannes Hertrich, Christian Wald, Fabian Altekrüger, Paul Hagemann

Abstract: Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \|x-y\|^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a… ▽ More Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \|x-y\|^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10. △ Less

Submitted 20 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Published as a conference paper at ICLR 2024

arXiv:2303.15244 [pdf, other]

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Authors: Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria, Silvia Sciutto

Abstract: Representing a manifold of very high-dimensional data with generative models has been shown to be computationally efficient in practice. However, this requires that the data manifold admits a global parameterization. In order to represent manifolds of arbitrary topology, we propose to learn a mixture model of variational autoencoders. Here, every encoder-decoder pair represents one chart of a mani… ▽ More Representing a manifold of very high-dimensional data with generative models has been shown to be computationally efficient in practice. However, this requires that the data manifold admits a global parameterization. In order to represent manifolds of arbitrary topology, we propose to learn a mixture model of variational autoencoders. Here, every encoder-decoder pair represents one chart of a manifold. We propose a loss function for maximum likelihood estimation of the model weights and choose an architecture that provides us the analytical expression of the charts and of their inverses. Once the manifold is learned, we use it for solving inverse problems by minimizing a data fidelity term restricted to the learned manifold. To solve the arising minimization problem we propose a Riemannian gradient descent algorithm on the learned manifold. We demonstrate the performance of our method for low-dimensional toy examples as well as for deblurring and electrical impedance tomography on certain image manifolds. △ Less

Submitted 12 June, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2301.11624 [pdf, other]

Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

Authors: Fabian Altekrüger, Johannes Hertrich, Gabriele Steidl

Abstract: Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as… ▽ More Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples. △ Less

Submitted 21 March, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

Comments: Accepted at ICML 2023

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:664-690, 2023

arXiv:2211.17158 [pdf, other]

doi 10.1007/978-3-031-31975-4_16

Proximal Residual Flows for Bayesian Inverse Problems

Authors: Johannes Hertrich

Abstract: Normalizing flows are a powerful tool for generative modelling, density estimation and posterior reconstruction in Bayesian inverse problems. In this paper, we introduce proximal residual flows, a new architecture of normalizing flows. Based on the fact, that proximal neural networks are by definition averaged operators, we ensure invertibility of certain residual blocks. Moreover, we extend the a… ▽ More Normalizing flows are a powerful tool for generative modelling, density estimation and posterior reconstruction in Bayesian inverse problems. In this paper, we introduce proximal residual flows, a new architecture of normalizing flows. Based on the fact, that proximal neural networks are by definition averaged operators, we ensure invertibility of certain residual blocks. Moreover, we extend the architecture to conditional proximal residual flows for posterior reconstruction within Bayesian inverse problems. We demonstrate the performance of proximal residual flows on numerical examples. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2205.12021 [pdf, other]

doi 10.1088/1361-6420/acce5e

PatchNR: Learning from Very Few Images by Patch Normalizing Flow Regularization

Authors: Fabian Altekrüger, Alexander Denker, Paul Hagemann, Johannes Hertrich, Peter Maass, Gabriele Steidl

Abstract: Learning neural networks using only few available information is an important ongoing research topic with tremendous potential for applications. In this paper, we introduce a powerful regularizer for the variational modeling of inverse problems in imaging. Our regularizer, called patch normalizing flow regularizer (patchNR), involves a normalizing flow learned on small patches of very few images.… ▽ More Learning neural networks using only few available information is an important ongoing research topic with tremendous potential for applications. In this paper, we introduce a powerful regularizer for the variational modeling of inverse problems in imaging. Our regularizer, called patch normalizing flow regularizer (patchNR), involves a normalizing flow learned on small patches of very few images. In particular, the training is independent of the considered inverse problem such that the same regularizer can be applied for different forward operators acting on the same class of images. By investigating the distribution of patches versus those of the whole image class, we prove that our model is indeed a MAP approach. Numerical examples for low-dose and limited-angle computed tomography (CT) as well as superresolution of material images demonstrate that our method provides very high quality results. The training set consists of just six images for CT and one image for superresolution. Finally, we combine our patchNR with ideas from internal learning for performing superresolution of natural images directly from the low-resolution observation without knowledge of any high-resolution image. △ Less

Submitted 21 November, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Journal ref: Inverse Problems, Volume 39, Number 6, 2023

arXiv:2201.08157 [pdf, other]

doi 10.1137/22M1496542

WPPNets and WPPFlows: The Power of Wasserstein Patch Priors for Superresolution

Authors: Fabian Altekrüger, Johannes Hertrich

Abstract: Exploiting image patches instead of whole images have proved to be a powerful approach to tackle various problems in image processing. Recently, Wasserstein patch priors (WPP), which are based on the comparison of the patch distributions of the unknown image and a reference image, were successfully used as data-driven regularizers in the variational formulation of superresolution. However, for eac… ▽ More Exploiting image patches instead of whole images have proved to be a powerful approach to tackle various problems in image processing. Recently, Wasserstein patch priors (WPP), which are based on the comparison of the patch distributions of the unknown image and a reference image, were successfully used as data-driven regularizers in the variational formulation of superresolution. However, for each input image, this approach requires the solution of a non-convex minimization problem which is computationally costly. In this paper, we propose to learn two kind of neural networks in an unsupervised way based on WPP loss functions. First, we show how convolutional neural networks (CNNs) can be incorporated. Once the network, called WPPNet, is learned, it can be very efficiently applied to any input image. Second, we incorporate conditional normalizing flows to provide a tool for uncertainty quantification. Numerical examples demonstrate the very good performance of WPPNets for superresolution in various image classes even if the forward operator is known only approximately. △ Less

Submitted 5 January, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Journal ref: SIAM Journal on Imaging Sciences, vol. 16(3), pp. 1033-1067, 2023

arXiv:2111.12506 [pdf, other]

doi 10.1017/9781009331012

Generalized Normalizing Flows via Markov Chains

Authors: Paul Hagemann, Johannes Hertrich, Gabriele Steidl

Abstract: Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This chapter provides a unified framework to handle these approaches via Markov chains. We consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simul… ▽ More Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This chapter provides a unified framework to handle these approaches via Markov chains. We consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables us to couple both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. Our framework establishes a useful mathematical tool to combine the various approaches. △ Less

Submitted 20 July, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: arXiv admin note: text overlap with arXiv:2109.11375

arXiv:2109.12880 [pdf, other]

doi 10.1109/TCI.2022.3199600

Wasserstein Patch Prior for Image Superresolution

Authors: Johannes Hertrich, Antoine Houdard, Claudia Redenbach

Abstract: In this paper, we introduce a Wasserstein patch prior for superresolution of two- and three-dimensional images. Here, we assume that we have given (additionally to the low resolution observation) a reference image which has a similar patch distribution as the ground truth of the reconstruction. This assumption is e.g. fulfilled when working with texture images or material data. Then, the proposed… ▽ More In this paper, we introduce a Wasserstein patch prior for superresolution of two- and three-dimensional images. Here, we assume that we have given (additionally to the low resolution observation) a reference image which has a similar patch distribution as the ground truth of the reconstruction. This assumption is e.g. fulfilled when working with texture images or material data. Then, the proposed regularizer penalizes the $W_2$-distance of the patch distribution of the reconstruction to the patch distribution of some reference image at different scales. We demonstrate the performance of the proposed regularizer by two- and three-dimensional numerical examples. △ Less

Submitted 17 December, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

Journal ref: IEEE Transactions on Computational Imaging, vol. 8, pp. 693-704, 2022

arXiv:2109.11375 [pdf, other]

doi 10.1137/21M1450604

Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Authors: Paul Hagemann, Johannes Hertrich, Gabriele Steidl

Abstract: To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, Köhler and Noé introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by gene… ▽ More To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, Köhler and Noé introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by general Markov kernels and establish proofs via Radon-Nikodym derivatives which allows to incorporate distributions without densities in a sound way. Further, we generalize the results for sampling from posterior distributions as required in inverse problems. The performance of the proposed conditional stochastic normalizing flow is demonstrated by numerical examples. △ Less

Submitted 7 February, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

Journal ref: SIAM/ASA Journal on Uncertainty Quantification, vol. 10 (3), pp. 1162-1190, 2022

arXiv:2011.02281 [pdf, other]

Convolutional Proximal Neural Networks and Plug-and-Play Algorithms

Authors: Johannes Hertrich, Sebastian Neumayer, Gabriele Steidl

Abstract: In this paper, we introduce convolutional proximal neural networks (cPNNs), which are by construction averaged operators. For filters of full length, we propose a stochastic gradient descent algorithm on a submanifold of the Stiefel manifold to train cPNNs. In case of filters with limited length, we design algorithms for minimizing functionals that approximate the orthogonality constraints imposed… ▽ More In this paper, we introduce convolutional proximal neural networks (cPNNs), which are by construction averaged operators. For filters of full length, we propose a stochastic gradient descent algorithm on a submanifold of the Stiefel manifold to train cPNNs. In case of filters with limited length, we design algorithms for minimizing functionals that approximate the orthogonality constraints imposed on the operators by penalizing the least squares distance to the identity operator. Then, we investigate how scaled cPNNs with a prescribed Lipschitz constant can be used for denoising signals and images, where the achieved quality depends on the Lipschitz constant. Finally, we apply cPNN based denoisers within a Plug-and-Play (PnP) framework and provide convergence results for the corresponding PnP forward-backward splitting algorithm based on an oracle construction. △ Less

Submitted 4 November, 2020; originally announced November 2020.

arXiv:2009.07520 [pdf, other]

doi 10.3934/ipi.2021053

PCA Reduced Gaussian Mixture Models with Applications in Superresolution

Authors: Johannes Hertrich, Dang Phoung Lan Nguyen, Jean-Fancois Aujol, Dominique Bernard, Yannick Berthoumieu, Abdellatif Saadaldin, Gabriele Steidl

Abstract: Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. This paper provides a twofold contribution to the topic. First, we propose a Gaussian Mixture Model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, called PCA-GMM. To learn th… ▽ More Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. This paper provides a twofold contribution to the topic. First, we propose a Gaussian Mixture Model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, called PCA-GMM. To learn the (low dimensional) parameters of the mixture model we propose an EM algorithm whose M-step requires the solution of constrained optimization problems. Fortunately, these constrained problems do not depend on the usually large number of samples and can be solved efficiently by an (inertial) proximal alternating linearized minimization algorithm. Second, we apply our PCA-GMM for the superresolution of 2D and 3D material images based on the approach of Sandeep and Jacob. Numerical results confirm the moderate influence of the dimensionality reduction on the overall superresolution result. △ Less

Submitted 6 May, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

Journal ref: Inverse Problems and Imaging, vol. 16, pp. 341-366, 2022

Showing 1–15 of 15 results for author: Hertrich, J