5
$\begingroup$

I am working on a low rank decomposition technique that is robust to different types of noise (gaussian, salt and pepper, poisson). For starters, I simulated such low rank matrices and have successfully demonstrated that my technique works fairly well, and would like to test the method on more relevant matrices. As part of validation, I need to show how it performs on some commonly used low rank matrices. Is there such a repository of matrices? Here are some matrices that I have tried so far:

  1. Hyperspectral images (each column of the matrix is an image at a narrowband of wavelength)
  2. Eigenfaces (PCA of a covariance matrices constructed from images of faces)
  3. Videos (background subtraction)

Thanks in advance for the help!

$\endgroup$
2
  • $\begingroup$ Note: I've changed the tags with the hopes of improving the visibility of this question $\endgroup$ Commented Aug 16, 2021 at 16:18
  • $\begingroup$ @BenGrossmann, thanks a lot! $\endgroup$
    – Vishwanath
    Commented Aug 16, 2021 at 18:59

1 Answer 1

2
$\begingroup$

Unfortunately, I am not aware of big test data sets for real world low-rank matrices. However, I can point you to other low-rank matrices, which might be interesting for testing the performance of your algorithm.


Low-rank matrices from sampling functions

You can generate low-rank / numerically low-rank matrices by sampling bivariate functions on a grid, i.e. $M_{ij} = f(x_i,y_j)$ for some sets of points $\{x_1,\dots,x_n\}$,$\{y_1,\dots,y_n\}$.

A list of potentially interesting functions is contained in the testbattery of Chebfun2 (https://github.com/chebfun/chebfun/blob/master/tests/chebfun2/test_battery.m). They also have a test battery for trivariate functions, should you also be interested in low-rank approximations of tensors of order 3.

Such matrices/tensors do not contain real world data, but their low-rank approximations are still crucial for algorithms in numerical analysis. One big difference to real world data matrices is, however, that you can evaluate the function related matrices with almost no noise.

Remark. Note that you can generalize the concept of low-rank matrices to low-rank functions. A function is of rank $r$ if it can be written as $f(x,y) = \sum_{i=1}^r g_i(x) h_i(y)$ for some univariate functions $g_1,\dots,g_r$,$h_1,\dots,h_r$. If you sample form a rank-$r$ function the resulting matrix has at most rank-$r$.


Sampling supposedly real-world like matrices

If you are interested in generating matrices that supposedly behave like real world data matrices, you might want to have a look at the paper "Why Are Big Data Matrices Approximately Low Rank?" ( https://epubs.siam.org/doi/pdf/10.1137/18M1183480). The authors model real world data matrices using a special stochastic model, for which the generated matrices tend to be numerically low-rank.

Note, however, that the assumptions in that paper might not match with the data matrices you get in certain applications.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .