Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse algebra support with OOP API #760

Open
wants to merge 86 commits into
base: master
Choose a base branch
from

Conversation

jalvesz
Copy link
Contributor

@jalvesz jalvesz commented Jan 10, 2024

PR related to:
#38
#749
#189

Here I try to propose the OOP API upfront as a means to centralize data-containment format within stdlib, which I feel would be a (the) big added value. I'm migrating the proposal started here FSPARSE to adapt it to stdlib.

Current proposal design:

  • Derived Types:
    sparse_type (abstract base type)
    COO_type, CSR_type, CSC_type, ELL_type, SELLC_type (DTs with no data buffer)
    COO_?p, CSR_?p, CSC_?p, ELL_?p, SELLC_?p (DTs with data buffer, where ? kind precision including complex values)
  • matrix-vector product:
call spmv( matrix, x, y [,alpha,beta] ) !> y = beta * y + alpha * matrix * x
  • Data accessors:
val = matrix%at( i , j ) !> to get the value at (i,j) position, if (i,j) not in the sparse pattern, value = 0, if out-of-bounds val = NaN
call matrix%add( i , j, val ) !> to set the value at (i,j) position, if (i,j) not in the current structure, skip
!> OR add a data block
call matrix%add( dofs_i(:), dofs_j(:), mat(:,:) ) !>
  • Conversions :
call coo2ordered(COO [, sort_data]) !> sort and remove duplicates from the COO data structure.
 
call dense2coo( dense , COO )  !> dense is a plain 2d array
call coo2dense( COO  , dense )

call coo2csr( COO , CSR ) !> assumes ordered and unique indexes for COO ... to implement a sorting algorithms before transferring data
call csr2coo( CSR , COO ) !> assumes ordered and unique indexes for CSR 

call csr2sellc( CSR , SELLC [, chunk ]  ) !> chunk sizes for spmv [4, 8, 16]
@zerothi
Copy link

zerothi commented Apr 4, 2024

Wouldn't it be useful if sparse_t has the interface for mat%to_coo|csr|... etc instead of using a function call. Since this is going OOP, lets keep it like that.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 4, 2024

This could be done even shorter with a %to(...) interface as the types are declared before calling the conversion. I've done that before but did not propose it here upfront as I wanted to hear some opinions on what the OOP API in stdlib should look like.

@zerothi
Copy link

zerothi commented Apr 4, 2024

Agreed, %to would be even better, or %convert or clarity? hmm...

@jalvesz jalvesz marked this pull request as draft April 20, 2024 09:14
@ftucciarone
Copy link

ftucciarone commented May 17, 2024

Is it a good idea to have matvec product that do not initialize the result vector? I have to say that I have spent a good hour debugging a code and the source of divergence in the solver was the fact that I was not initializing the result vector before passing it to the subroutine. While I understand that this is a great way to perform Axpy operation, I would advise to have separate matvec (with initialization) and Axpy (without) routines.

    subroutine matvec_csr_1d_sp(vec_y,matrix,vec_x)
        type(CSR_sp), intent(in) :: matrix
        real(sp), intent(in)    :: vec_x(:)
        real(sp), intent(inout) :: vec_y(:)
        integer :: i, j
        real(sp) :: aux

        associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
            & nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
            if( sym == k_NOSYMMETRY) then
                do concurrent(i=1:nrows)
                    do j = rowptr(i), rowptr(i+1)-1
                        vec_y(i) = vec_y(i) + data(j) * vec_x(col(j))
                    end do
                end do

            else if( sym == k_SYMTRIINF )then
                do i = 1 , nrows
                    aux  = 0._sp
                    do j = rowptr(i), rowptr(i+1)-2
                        aux = aux + data(j) * vec_x(col(j))
                        vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
                    end do
                    aux = aux + data(j) * vec_x(i)
                    vec_y(i) = vec_y(i) + aux
                end do

            else if( sym == k_SYMTRISUP )then
                do i = 1 , nrows
                    aux  = vec_x(i) * data(rowptr(i))
                    do j = rowptr(i)+1, rowptr(i+1)-1
                        aux = aux + data(j) * vec_x(col(j))
                        vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
                    end do
                    vec_y(i) = vec_y(i) + aux
                end do

            end if
        end associate
    end subroutine
@jalvesz
Copy link
Contributor Author

jalvesz commented May 18, 2024

Hi @ftucciarone, thanks for checking out the current draft and sorry for the inconvenience.

Yes, I have intentionally designed the API to not initialize the vector, in the current version within FSPARSE https://github.com/jalvesz/FSPARSE?tab=readme-ov-file#sparse-matrix-vector-product I updated the README to be clear about it but forgot to update here.

The reason for that choice is because it allows maximum flexibility for preprocessing linear systems needing operations with the same matrix operator that will be used for the solver. So it basically places the "responsability" on the user to place a y=0 just before if one does indeed need a clean vector.

This is of course totally open to debate, I just proposed following my experience reworking solvers. My feeling is that doing that can avoid having to manage different implementations or optionals, when the solution can be to explicitly state that the interface is updating (y = y + M * x) instead of overwriting (y = M * x).

@ftucciarone
Copy link

ftucciarone commented May 18, 2024

Hi @jalvesz, it was actually a pleasure to look into this draft, I am learning a lot and I look forward to discuss it with you if possible.

I have to say that I have particularly strong feelings about the "non-initialization" problem, mostly twofold.
First, if y = y + M*x is the chosen way to go, then it must be written very large, at the very beginning of the documentation, to make sure everyone sees that. Test example should also take care of that, showing that the result is wrong if not initialized before.
Second, I have the feeling that doing first y=0_dp and then call matvec(y, A, x) might add an overhead due to the initialization of y for very large problems, while initializing the component y(i) while doing the matvec might be faster, but I have to test because I'm not 100% sure about this.

However, I think that separating matvec and Axpy could be feasible at this stage (I can take care of that with the correct guidance) and potentially doable by preprocessing as well (I'm thinking at a fypp if).

Cheers, Francesco

Edit: example of splitting matvec and matAxpy with fypp

#:include "../include/common.fypp"
#:set RANKS = range(1, 2+1)
#:set KINDS_TYPES = REAL_KINDS_TYPES
#:set OPERATIONS = ['matvec', 'matAxpy'] <--- DEFINE THE NAMES
#! define ranks without parentheses
#:def rksfx2(rank)
#{if rank > 0}#${":," + ":," * (rank - 1)}$#{endif}#
#:enddef

!! matvec_csr
    #:for k1, t1 in (KINDS_TYPES)
    #:for rank in RANKS
    #:for idx, opname in enumerate(OPERATIONS) <--- ITERATE OVER THE NAMES
    subroutine ${opname}$_csr_${rank}$d_${k1}$(vec_y,matrix,vec_x)
        type(CSR_${k1}$), intent(in) :: matrix
        ${t1}$, intent(in)    :: vec_x${ranksuffix(rank)}$
        ${t1}$, intent(inout) :: vec_y${ranksuffix(rank)}$
        integer :: i, j
        #:if rank == 1
        ${t1}$ :: aux
        #:else
        ${t1}$ :: aux(size(vec_x,dim=1))
        #:endif

        associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
            & nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
            if( sym == k_NOSYMMETRY) then
                do concurrent(i=1:nrows)
                    do j = rowptr(i), rowptr(i+1)-1
                        #:if idx==0  <--- IF MATVEC 
                        vec_y(${rksfx2(rank-1)}$i) = data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        #:else  <--- IF AXPY
                        vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        #:endif
                    end do
                end do

            else if( sym == k_SYMTRIINF )then
                do i = 1 , nrows
                    aux  = 0._${k1}$
                    do j = rowptr(i), rowptr(i+1)-2
                        aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    end do
                    aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    #:if idx==0   <--- IF MATVEC 
                    vec_y(${rksfx2(rank-1)}$i) = aux
                    #:else  <--- IF AXPY
                    vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
                    #:endif                      
                end do

            else if( sym == k_SYMTRISUP )then
                do i = 1 , nrows
                    aux  = vec_x(${rksfx2(rank-1)}$i) * data(rowptr(i))
                    do j = rowptr(i)+1, rowptr(i+1)-1
                        aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
                        vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
                    end do
                    #:if idx==0   <--- IF MATVEC 
                    vec_y(${rksfx2(rank-1)}$i) = aux
                    #:else  <--- IF AXPY
                    vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
                    #:endif                    
                 end do

            end if
        end associate
    end subroutine
    
    #:endfor
    #:endfor
    #:endfor
@jalvesz
Copy link
Contributor Author

jalvesz commented May 18, 2024

@ftucciarone thanks for the idea! I will also bring parts of a discussion I had with @ivan-pi on exactly this topic, where he brought to my attention the following:

The MKL library offers, y := beta y + alpha A x, and then you need to pick either beta = 0, or beta = 1, depending on what you want.

In Apple's Accelerate Framework on the other hand, they offer two functions:

SparseMultiply(A,x,y)           // y = A x
SparseMultiplyAdd(A,x,y)     // y += A x

In PSBLAS, spmm covers both vectors and matrices (rank-2 dense arrays):

call psb_spmm(alpha, a, x, beta, y, desc_a, info)
call psb_spmm(alpha, a, x, beta, y, desc_a, info, trans, work)

https://psctoolkit.github.io/psblasguide/userhtmlse4.html#x18-550004

Taking all those ideas together, I could imagine using optionals to cover:

call matvec( A , vec_x, vec_y ) !> vec_y = vec_y + A * y
call matvec( A , vec_x, vec_y , overwrite = .true. ) !> vec_y = A * y
call matvec( A , vec_x, vec_y , alpha=value, beta=value ) !> vec_y = beta * vec_y + alpha * A * y

Or the default to be overwrite and an optional for addition, or simply with beta =1 if(present(beta)) would automatically determine that.

Let me know your thoughts on that

@jalvesz
Copy link
Contributor Author

jalvesz commented Jun 29, 2024

Added both examples @perazz, let me know what do you think.

#:endfor

#:for k1, t1, s1 in (KINDS_TYPES)
recursive subroutine quicksort_i_${s1}$(a, b, first, last)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be an idea to add such a procedure in stdlib_sorting, based on sort_index? sort_index is clearly limited for such simpler cases, and making the argument index as intent(inout) would already facilitate such cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvdp1 I'm not sure if I got it right, you mean to say to add a quick sort within sort_index? Or to add it as a different method within the sort module?... I took a look at sort index but my first impression is that changing the index array to inout won't be enough. The internals seems to be thought for handling an array to sort and the work index arrays. Here, we need to sort one array and have a secondary array be sorted together with the first, simultaneously. From what I read in the specs, non of the current sorting interfaces enable this directly, but I just had quick glance so I might be missing something. I wouldn't mind trying to generalize the quicksort, or if you help me out figuring out how to adapt any other of the sorting procedures to enable sorting two arrays simultaneously (avoiding extra allocations), I could give it a try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to extend sort_index such that a provided vector (other than just integer(int64)) is sorted in the same way as the sorted vector (if I am right, it is what you need). The kind of the array index is already generated by fypp. So I think that it should be possible (but I may be wrong). I will have a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jalvesz Could you maye test the sort_index in this branch? I did some dirty modifications such that the array index van be of any integer or real type. If it works, we will need to discuss names, and approaches, before getting a nice implementation.
Here is an example I tested in my branch:

program example_sort_index_real
  use stdlib_sorting, only: sort_index
  implicit none
  integer, allocatable :: array(:)
  real, allocatable :: index(:)

  array = [5, 4, 3, 1, 10, 4, 9]
  index = real(array)

  call sort_index(array, index)

  print *, array   !print [1, 3, 4, 4, 5, 9, 10]
  print *, index   !print [1., 3., 4., 4., 5., 9., 10.]

end program example_sort_index_real
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks! I'll test this out and let you know.

Copy link
Contributor Author

@jalvesz jalvesz Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvdp1 I tested your modifications to sort a COO matrix representing a 2million nodes mesh, having 14millions of NNZ before sorting/removing duplicates:

Sorting index + data using quicksort (3 runs in release profile)
elapsed time (cpu_time) [s]: 1.4375559 / 1.4128339 / 1.1456219
Sorting index + data using sort_index:
elapsed time (cpu_time) [s]: 1.4756519 / 1.5304509 / 1.1852899

If I do not sort the data (only the index(1:2,1:nnz) array) then using stdlib sort is about 9% faster than using quicksort.

I would say these are more than satisfying results!

Now, I included support for complex sparse matrices and the sorting interfaces do not support them. So we should discuss what to do here to extend for complex data ( ? )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say these are more than satisfying results!

Nice. Thank you for testing it.

Now, I included support for complex sparse matrices and the sorting interfaces do not support them. So we should discuss what to do here to extend for complex data ( ? )

It seems that complex data is not supported at all by the stdlib sorting procedures. It should be quite easy to add to all of them by extending the fypp variables to the complex definitions.
Note that my changes broke the original stdlib_index. Now that I know that it works, I will have a closer look for a proper implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm guessing this would deserve a nice PR on its own. Should we wait for that before this PR? or can we keep reviewing this knowing that this particular point can be improved down the road?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it deserves its own PR. I suggest that you continue with your internal quicksort subroutine. We can review it later when the PR related to the sort procedures will be ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #849 for such an implementation

@certik
Copy link
Member

certik commented Jul 2, 2024

Thanks for the PR! I think it looks in the right direction.

In order for me to start using these routines (for parallel sparse matmul, various conversions, etc.), I would need a version that does not use the stdlib's custom derived type, since I have my own derived type on my code. Is there a way to do that?

The only way I know is to split the API into two parts:

  • low level that does not use derived types
  • high level, OO style (this PR)

Maybe there is another way, but the above way seems robust. Then I can start using the low level API, and just give it the CSR arrays that I have in my code from my own derived type.

Why not structure this PR in this low level / high level style? It's almost there, but not quite.

@jvdp1
Copy link
Member

jvdp1 commented Jul 8, 2024

Thanks for the PR! I think it looks in the right direction.

In order for me to start using these routines (for parallel sparse matmul, various conversions, etc.), I would need a version that does not use the stdlib's custom derived type, since I have my own derived type on my code. Is there a way to do that?

The only way I know is to split the API into two parts:

  • low level that does not use derived types
  • high level, OO style (this PR)

Maybe there is another way, but the above way seems robust. Then I can start using the low level API, and just give it the CSR arrays that I have in my code from my own derived type.

Why not structure this PR in this low level / high level style? It's almost there, but not quite.

@certik Thank you for your comment. I agree with you that these features should be splitted into 2 parts (I have also my own sparse library, and I might be interested to call some of the stdlib features) :

  • low level procedures (e.g, spmv)
  • high level for OO

However, @jalvesz started with the OO approach, and I think it is quite close to be merged as is, pending a few "minor" changes. Restructuring this PR might be a huge task.

Therefore, I advice to review and merge this PR as is, maybe after implementing some suggestions where we think the API level could be already lowered a bit.
The merged procedures could then be reviewed in a second stage based on users' feedbacks and with having in mind this 2-level approach. The high-level OO approach would not be modified for the user. Only the low-level approach would be added based on waht is already existing.

I am afraid that if we go for the 2-API level approach in one go, we will have nothing in stdlib at the end
due to too many discussions, lack of time, @jalvesz motivation,.... This topic of sparse matrices is on the table for a long time, with many unsuccessful attempts. I think @jalvesz 's implementation is the closest one to be merged. At this stage, I prefer to have something usable in stdlib, even if it only includes OO-API. Users may start to use it, provide feedbacks, and hopefully contribute to stdlib.
Such an approach has been used for other stdlib features (e.g., sorting, hash maps, loggers) with some success.
I will be happy to hear your feedback.

Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments for the specs only (note: I did not look to the code yet; so some comments might just be dum).

doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
Experimental

### Description
Type-bound procedures to enable adding or requesting data in/from a sparse matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it add data to existing entries, or does it create also entries if they do not exist in the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just adds values into pre-existing entries in a given sparse pattern.

doc/specs/stdlib_sparse.md Show resolved Hide resolved

`COO`, `intent(inout)`: Shall be any `COO` type. The same object will be returned with the arrays reallocated to the correct size after removing duplicates.

`sort_data`, `logical(in)`, `optional`:: Shall be an optional `logical` argument to determine whether data in the COO graph should be sorted while sorting the index array, default `.false.`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not mentioned in the previous syntax section.

doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
doc/specs/stdlib_sparse.md Outdated Show resolved Hide resolved
doc/specs/stdlib_sparse.md Show resolved Hide resolved
Copy link
Member

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are some additional quick comments.

src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
end type

#:for k1, t1, s1 in (KINDS_TYPES)
type, public, extends(COO_type) :: COO_${s1}$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be consistent with rest of stdlib:

Suggested change
type, public, extends(COO_type) :: COO_${s1}$
type, public, extends(COO_type) :: COO_${s1}$_type

However, I will agree that it will become a bit long.

src/stdlib_sparse_kinds.fypp Outdated Show resolved Hide resolved
end type
#:endfor

!> Conversion from dense to coo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move all these conversion procedures to its own module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just module interfaces, the procedures are defined in their own submodule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these module interfaces be removed from stdlib_sparse_kinds, and just be in their own module (and not submodule)? What is the advantage of having them in a submodule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rolled back on the use of submodules c8d94a3 ... I used the submodules to share the constants, they are now in a separated module.

jalvesz and others added 9 commits July 9, 2024 07:09
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Co-authored-by: Jeremie Vandenplas <jeremie.vandenplas@gmail.com>
Comment on lines 25 to 30
type, public, abstract :: sparse_type
integer :: nrows !! number of rows
integer :: ncols !> number of columns
integer :: nnz !> number of non-zero values
integer :: storage !> assumed storage symmetry
end type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick thought: it might be of interest to have two DTs with different kinds for the integers, e.g.,

type,...... :: sparse_int32_type
 ....
 integer(int32) :: nnz
 ...
end type
type,...... :: sparse_int64_type
 ....
 integer(int64) :: nnz
 ...
end type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how hard it would be to accomplish this. Maybe in the current PR it would be enough to generalize the index kind from default integer to integer(ik), so that later this extension will be easier.

What you're suggesting is probably that there should be two abstract sparse types, one per length kind:

type, public, abstract :: sparse_type

end type
#:for ik,it in INT_KINDS_TYPES
type, public, abstract, extends(sparse_type) :: sparse_${ik}$_type
    ${it}$ :: nrows   !! number of rows
    ${it}$ :: ncols   !> number of columns
    ${it}$ :: nnz     !> number of non-zero values
    ${it}$ :: storage !> assumed storage symmetry
end type sparse_${ik}$_type
!:endfor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I had something similar to your proposition in mind. However, I agree that it should be for a following PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the feeling that adding this COO_${s1}$_type plus then the different integer sizes would end up with declarations such as:

type(CSR_int32_dp_type) :: mymatrix

wich I find very verbose. The target design I was striving for is to have these two things being syntactically close from a hierarchical point of view:

real(dp), allocatable :: dense(:,:)
type(CSR_dp) :: sparse

Is there a possibility to find a middle ground?

Copy link
Contributor Author

@jalvesz jalvesz Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 6ae038b I added an ilp parameter following the linalg module to enable recompiling using int64. Rightnow it is set to int32, we could add a preprocessing macro to change it.

Copy link
Contributor

@perazz perazz Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibility to find a middle ground

I agree with you @jalvesz. @jvdp1's idea is correct, but maybe as a tradeoff, we don't need two separate types when you can just use the largest one internally? In other words, could we just require that the index type for sparse matrices be always a 64-bit one and define

integer, parameter :: index_t = int64

To support 32-bit index types as a user API, one idea could be to just define the interface on the abstract type:

type, abstract :: sparse
 ...
contains
   procedure(get_elem), deferred, abstract :: get_int64
   procedure :: get_int32 ! implemented, just calls get_int64
   generic :: get => get_int32, get_int64
end type
...
elemental real(dp) function get_int32(this,i,j)
    class(sparse), intent(in) :: this
    integer(int32), intent(in) :: i,j
    get_int32 = this%get(int(i,int64),int(j,int64))
end function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, could we just require that the index type for sparse matrices be always a 64-bit

I'm very hesitant about this. given that with int32 one can have nnz = 2,147,483,647 this means that, for a COO matrix of double precision values of such size, one would have 2147483647 *( 2 * 4 + 8 ) / (1024)^3 = 23 Gb of memory. Transforming the index arrays to int64 => 2147483647 *( 2 * 8 + 8 ) / (1024)^3 = 39 Gb ... And all the loops will be done by searching on larger sized indexing arrays, thus plummetting performance (do not know to which extent). Also, MKL uses int32 for the kind definition of index arrays. Shall we set a default to int64 this would make it more complicated to eventually bind against it.

Thinking about this makes me more inclined for either allowing recompilation by changing the ilp parameter, or indeed adding support for both sizes with separate kinds. I would in that case let the int32 with a simplified syntax and the int64 one with something like long added as a suffix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you @jalvesz: at this stage, probably the best thing to do is just to generalize the integer kind:

from default integer to integer(ik)

so that the implementation already supports any integer kinds, but a more thoughtful decision about what should be used and how flexible should that be, is deferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just require that the index type for sparse matrices be always a 64-bit one

The default kind should be integer(int32), at least for linking them or using other libraries (like MKL Sparse BLAS).

At this stage, using integer(ik) is fine. There is enough WIP without this feature.

@jalvesz
Copy link
Contributor Author

jalvesz commented Jul 10, 2024

@certik I understand and agree that a low level API is also desirable. Ideally we should eventually be able to link agains MKL following their interfaces which are quite similar but not exactly to Sparse BLAS syntax: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-0/sparse-blas-level-2-and-level-3-routines-002.html

Now, this is a huge work. My feeling is that it is doable by changing the low-level back-ends without having to modify the high-level API (or with minimal breaking changes). I tried to bring this proposal to kickoff the interest and see if that could happen eventually and progressively once the DTs are in shape. Also, the idea of the DTs is to have sparse objects at the same level as a dense array such that higher-level interfaces can then be designed to work on either dense or sparse matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants