-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse algebra support with OOP API #760
base: master
Are you sure you want to change the base?
Conversation
Wouldn't it be useful if |
This could be done even shorter with a |
Agreed, |
Is it a good idea to have matvec product that do not initialize the result vector? I have to say that I have spent a good hour debugging a code and the source of divergence in the solver was the fact that I was not initializing the result vector before passing it to the subroutine. While I understand that this is a great way to perform Axpy operation, I would advise to have separate matvec (with initialization) and Axpy (without) routines.
|
Hi @ftucciarone, thanks for checking out the current draft and sorry for the inconvenience. Yes, I have intentionally designed the API to not initialize the vector, in the current version within FSPARSE https://github.com/jalvesz/FSPARSE?tab=readme-ov-file#sparse-matrix-vector-product I updated the README to be clear about it but forgot to update here. The reason for that choice is because it allows maximum flexibility for preprocessing linear systems needing operations with the same matrix operator that will be used for the solver. So it basically places the "responsability" on the user to place a This is of course totally open to debate, I just proposed following my experience reworking solvers. My feeling is that doing that can avoid having to manage different implementations or optionals, when the solution can be to explicitly state that the interface is updating (y = y + M * x) instead of overwriting (y = M * x). |
Hi @jalvesz, it was actually a pleasure to look into this draft, I am learning a lot and I look forward to discuss it with you if possible. I have to say that I have particularly strong feelings about the "non-initialization" problem, mostly twofold. However, I think that separating matvec and Axpy could be feasible at this stage (I can take care of that with the correct guidance) and potentially doable by preprocessing as well (I'm thinking at a fypp if). Cheers, Francesco Edit: example of splitting matvec and matAxpy with fypp #:include "../include/common.fypp"
#:set RANKS = range(1, 2+1)
#:set KINDS_TYPES = REAL_KINDS_TYPES
#:set OPERATIONS = ['matvec', 'matAxpy'] <--- DEFINE THE NAMES
#! define ranks without parentheses
#:def rksfx2(rank)
#{if rank > 0}#${":," + ":," * (rank - 1)}$#{endif}#
#:enddef
!! matvec_csr
#:for k1, t1 in (KINDS_TYPES)
#:for rank in RANKS
#:for idx, opname in enumerate(OPERATIONS) <--- ITERATE OVER THE NAMES
subroutine ${opname}$_csr_${rank}$d_${k1}$(vec_y,matrix,vec_x)
type(CSR_${k1}$), intent(in) :: matrix
${t1}$, intent(in) :: vec_x${ranksuffix(rank)}$
${t1}$, intent(inout) :: vec_y${ranksuffix(rank)}$
integer :: i, j
#:if rank == 1
${t1}$ :: aux
#:else
${t1}$ :: aux(size(vec_x,dim=1))
#:endif
associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
& nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
if( sym == k_NOSYMMETRY) then
do concurrent(i=1:nrows)
do j = rowptr(i), rowptr(i+1)-1
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = data(j) * vec_x(${rksfx2(rank-1)}$col(j))
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
#:endif
end do
end do
else if( sym == k_SYMTRIINF )then
do i = 1 , nrows
aux = 0._${k1}$
do j = rowptr(i), rowptr(i+1)-2
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
end do
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$i)
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = aux
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
#:endif
end do
else if( sym == k_SYMTRISUP )then
do i = 1 , nrows
aux = vec_x(${rksfx2(rank-1)}$i) * data(rowptr(i))
do j = rowptr(i)+1, rowptr(i+1)-1
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
end do
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = aux
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
#:endif
end do
end if
end associate
end subroutine
#:endfor
#:endfor
#:endfor |
@ftucciarone thanks for the idea! I will also bring parts of a discussion I had with @ivan-pi on exactly this topic, where he brought to my attention the following:
https://psctoolkit.github.io/psblasguide/userhtmlse4.html#x18-550004 Taking all those ideas together, I could imagine using optionals to cover:
Or the default to be overwrite and an optional for addition, or simply with beta =1 Let me know your thoughts on that |
Added both examples @perazz, let me know what do you think. |
#:endfor | ||
|
||
#:for k1, t1, s1 in (KINDS_TYPES) | ||
recursive subroutine quicksort_i_${s1}$(a, b, first, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be an idea to add such a procedure in stdlib_sorting
, based on sort_index
? sort_index
is clearly limited for such simpler cases, and making the argument index
as intent(inout)
would already facilitate such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvdp1 I'm not sure if I got it right, you mean to say to add a quick sort within sort_index? Or to add it as a different method within the sort module?... I took a look at sort index but my first impression is that changing the index array to inout
won't be enough. The internals seems to be thought for handling an array to sort and the work index arrays. Here, we need to sort one array and have a secondary array be sorted together with the first, simultaneously. From what I read in the specs, non of the current sorting interfaces enable this directly, but I just had quick glance so I might be missing something. I wouldn't mind trying to generalize the quicksort, or if you help me out figuring out how to adapt any other of the sorting procedures to enable sorting two arrays simultaneously (avoiding extra allocations), I could give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to extend sort_index
such that a provided vector (other than just integer(int64)
) is sorted in the same way as the sorted vector (if I am right, it is what you need). The kind of the array index
is already generated by fypp
. So I think that it should be possible (but I may be wrong). I will have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jalvesz Could you maye test the sort_index
in this branch? I did some dirty modifications such that the array index
van be of any integer
or real
type. If it works, we will need to discuss names, and approaches, before getting a nice implementation.
Here is an example I tested in my branch:
program example_sort_index_real
use stdlib_sorting, only: sort_index
implicit none
integer, allocatable :: array(:)
real, allocatable :: index(:)
array = [5, 4, 3, 1, 10, 4, 9]
index = real(array)
call sort_index(array, index)
print *, array !print [1, 3, 4, 4, 5, 9, 10]
print *, index !print [1., 3., 4., 4., 5., 9., 10.]
end program example_sort_index_real
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow thanks! I'll test this out and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jvdp1 I tested your modifications to sort a COO matrix representing a 2million nodes mesh, having 14millions of NNZ before sorting/removing duplicates:
Sorting index + data using quicksort (3 runs in release profile)
elapsed time (cpu_time) [s]: 1.4375559 / 1.4128339 / 1.1456219
Sorting index + data using sort_index:
elapsed time (cpu_time) [s]: 1.4756519 / 1.5304509 / 1.1852899
If I do not sort the data (only the index(1:2,1:nnz) array) then using stdlib sort
is about 9% faster than using quicksort.
I would say these are more than satisfying results!
Now, I included support for complex sparse matrices and the sorting interfaces do not support them. So we should discuss what to do here to extend for complex data ( ? )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say these are more than satisfying results!
Nice. Thank you for testing it.
Now, I included support for complex sparse matrices and the sorting interfaces do not support them. So we should discuss what to do here to extend for complex data ( ? )
It seems that complex data is not supported at all by the stdlib
sorting procedures. It should be quite easy to add to all of them by extending the fypp
variables to the complex definitions.
Note that my changes broke the original stdlib_index
. Now that I know that it works, I will have a closer look for a proper implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm guessing this would deserve a nice PR on its own. Should we wait for that before this PR? or can we keep reviewing this knowing that this particular point can be improved down the road?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it deserves its own PR. I suggest that you continue with your internal quicksort subroutine. We can review it later when the PR related to the sort procedures will be ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #849 for such an implementation
Thanks for the PR! I think it looks in the right direction. In order for me to start using these routines (for parallel sparse matmul, various conversions, etc.), I would need a version that does not use the stdlib's custom derived type, since I have my own derived type on my code. Is there a way to do that? The only way I know is to split the API into two parts:
Maybe there is another way, but the above way seems robust. Then I can start using the low level API, and just give it the CSR arrays that I have in my code from my own derived type. Why not structure this PR in this low level / high level style? It's almost there, but not quite. |
@certik Thank you for your comment. I agree with you that these features should be splitted into 2 parts (I have also my own sparse library, and I might be interested to call some of the stdlib features) :
However, @jalvesz started with the OO approach, and I think it is quite close to be merged as is, pending a few "minor" changes. Restructuring this PR might be a huge task. Therefore, I advice to review and merge this PR as is, maybe after implementing some suggestions where we think the API level could be already lowered a bit. I am afraid that if we go for the 2-API level approach in one go, we will have nothing in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some comments for the specs only (note: I did not look to the code yet; so some comments might just be dum).
doc/specs/stdlib_sparse.md
Outdated
Experimental | ||
|
||
### Description | ||
Type-bound procedures to enable adding or requesting data in/from a sparse matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it add data to existing entries, or does it create also entries if they do not exist in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just adds values into pre-existing entries in a given sparse pattern.
doc/specs/stdlib_sparse.md
Outdated
|
||
`COO`, `intent(inout)`: Shall be any `COO` type. The same object will be returned with the arrays reallocated to the correct size after removing duplicates. | ||
|
||
`sort_data`, `logical(in)`, `optional`:: Shall be an optional `logical` argument to determine whether data in the COO graph should be sorted while sorting the index array, default `.false.`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not mentioned in the previous syntax section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are some additional quick comments.
src/stdlib_sparse_kinds.fypp
Outdated
end type | ||
|
||
#:for k1, t1, s1 in (KINDS_TYPES) | ||
type, public, extends(COO_type) :: COO_${s1}$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be consistent with rest of stdlib
:
type, public, extends(COO_type) :: COO_${s1}$ | |
type, public, extends(COO_type) :: COO_${s1}$_type |
However, I will agree that it will become a bit long.
src/stdlib_sparse_kinds.fypp
Outdated
end type | ||
#:endfor | ||
|
||
!> Conversion from dense to coo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move all these conversion procedures to its own module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just module interfaces, the procedures are defined in their own submodule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these module interfaces be removed from stdlib_sparse_kinds
, and just be in their own module (and not submodule)? What is the advantage of having them in a submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rolled back on the use of submodules c8d94a3 ... I used the submodules to share the constants, they are now in a separated module.
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
doc/specs/stdlib_sparse.md
Outdated
type, public, abstract :: sparse_type | ||
integer :: nrows !! number of rows | ||
integer :: ncols !> number of columns | ||
integer :: nnz !> number of non-zero values | ||
integer :: storage !> assumed storage symmetry | ||
end type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick thought: it might be of interest to have two DTs with different kinds for the integers, e.g.,
type,...... :: sparse_int32_type
....
integer(int32) :: nnz
...
end type
type,...... :: sparse_int64_type
....
integer(int64) :: nnz
...
end type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how hard it would be to accomplish this. Maybe in the current PR it would be enough to generalize the index kind
from default integer
to integer(ik)
, so that later this extension will be easier.
What you're suggesting is probably that there should be two abstract
sparse types, one per length kind
:
type, public, abstract :: sparse_type
end type
#:for ik,it in INT_KINDS_TYPES
type, public, abstract, extends(sparse_type) :: sparse_${ik}$_type
${it}$ :: nrows !! number of rows
${it}$ :: ncols !> number of columns
${it}$ :: nnz !> number of non-zero values
${it}$ :: storage !> assumed storage symmetry
end type sparse_${ik}$_type
!:endfor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I had something similar to your proposition in mind. However, I agree that it should be for a following PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the feeling that adding this COO_${s1}$_type
plus then the different integer sizes would end up with declarations such as:
type(CSR_int32_dp_type) :: mymatrix
wich I find very verbose. The target design I was striving for is to have these two things being syntactically close from a hierarchical point of view:
real(dp), allocatable :: dense(:,:)
type(CSR_dp) :: sparse
Is there a possibility to find a middle ground?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 6ae038b I added an ilp
parameter following the linalg module to enable recompiling using int64. Rightnow it is set to int32, we could add a preprocessing macro to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibility to find a middle ground
I agree with you @jalvesz. @jvdp1's idea is correct, but maybe as a tradeoff, we don't need two separate types when you can just use the largest one internally? In other words, could we just require that the index type for sparse matrices be always a 64-bit one and define
integer, parameter :: index_t = int64
To support 32-bit index types as a user API, one idea could be to just define the interface on the abstract type:
type, abstract :: sparse
...
contains
procedure(get_elem), deferred, abstract :: get_int64
procedure :: get_int32 ! implemented, just calls get_int64
generic :: get => get_int32, get_int64
end type
...
elemental real(dp) function get_int32(this,i,j)
class(sparse), intent(in) :: this
integer(int32), intent(in) :: i,j
get_int32 = this%get(int(i,int64),int(j,int64))
end function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, could we just require that the index type for sparse matrices be always a 64-bit
I'm very hesitant about this. given that with int32 one can have nnz = 2,147,483,647
this means that, for a COO matrix of double precision values of such size, one would have 2147483647 *( 2 * 4 + 8 ) / (1024)^3 = 23 Gb of memory. Transforming the index arrays to int64 => 2147483647 *( 2 * 8 + 8 ) / (1024)^3 = 39 Gb ... And all the loops will be done by searching on larger sized indexing arrays, thus plummetting performance (do not know to which extent). Also, MKL uses int32 for the kind definition of index arrays. Shall we set a default to int64 this would make it more complicated to eventually bind against it.
Thinking about this makes me more inclined for either allowing recompilation by changing the ilp
parameter, or indeed adding support for both sizes with separate kinds. I would in that case let the int32 with a simplified syntax and the int64 one with something like long
added as a suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @jalvesz: at this stage, probably the best thing to do is just to generalize the integer kind:
from default
integer
tointeger(ik)
so that the implementation already supports any integer kind
s, but a more thoughtful decision about what should be used and how flexible should that be, is deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just require that the index type for sparse matrices be always a 64-bit one
The default kind should be integer(int32)
, at least for linking them or using other libraries (like MKL Sparse BLAS).
At this stage, using integer(ik)
is fine. There is enough WIP without this feature.
@certik I understand and agree that a low level API is also desirable. Ideally we should eventually be able to link agains MKL following their interfaces which are quite similar but not exactly to Sparse BLAS syntax: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-0/sparse-blas-level-2-and-level-3-routines-002.html Now, this is a huge work. My feeling is that it is doable by changing the low-level back-ends without having to modify the high-level API (or with minimal breaking changes). I tried to bring this proposal to kickoff the interest and see if that could happen eventually and progressively once the DTs are in shape. Also, the idea of the DTs is to have sparse objects at the same level as a dense array such that higher-level interfaces can then be designed to work on either dense or sparse matrices. |
PR related to:
#38
#749
#189
Here I try to propose the OOP API upfront as a means to centralize data-containment format within stdlib, which I feel would be a (the) big added value. I'm migrating the proposal started here FSPARSE to adapt it to stdlib.
Current proposal design:
sparse_type
(abstract base type)COO_type
,CSR_type
,CSC_type
,ELL_type
,SELLC_type
(DTs with no data buffer)COO_?p
,CSR_?p
,CSC_?p
,ELL_?p
,SELLC_?p
(DTs with data buffer, where?
kind precision including complex values)