11

A couple of weeks ago I asked a question about the performance of matrix multiplication.

I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.

StackOverflow users recommended:

  • uBLAS
  • EIGEN
  • BLAS

At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.

After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before. Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.

Parts of program which take the most time looks like that

TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
{   
    for (int i=0;i<pointVector.size();i++ )
    {
        //Eigen::MatrixXd outcome =
        Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;
        //delete  prototypePointVector[i];      // ((rotation*scale)* (*prototypePointVector[i])  + translation).ConvertToPoint();
        MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
        MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
        //assosiatedPointIndexVector[i]    = prototypePointVector[i]->associatedTemplateIndex = i;
    }

    return this;
}

and

Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
{
    double membershipSum = 0,outcome = 0;
    double currentPower = 0;
    Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
    outcomePoint << 0,0;
    Eigen::MatrixXd templatePoint;
    for (int i=0;i< imageDataVector.size();i++)
    {
        currentPower =0; 
        membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
        outcomePoint.noalias() +=  (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
    }

    outcomePoint.noalias() = outcomePoint/=membershipSum;
    return outcomePoint; //.ConvertToMatrix();
}

As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.

Is there any way to speed up these functions?

Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).

6
  • 5
    "At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication." Huh? The exact page you linked to shows the following as being valid product operations (where A, B, and C are matrix types): C = prod(A, B); C = prec_prod(A, B); C = element_prod(A, B); axpy_prod(A, B, C, true); axpy_prod(A, B, C, false); So what is it that you want that it doesn't do..?
    – ildjarn
    Commented May 31, 2011 at 21:13
  • 2
    boost.org/doc/libs/1_46_1/libs/numeric/ublas/doc/blas.htm. The gmm template is general matrix-matrix multiplication.
    – talonmies
    Commented May 31, 2011 at 21:23
  • I also found Eigen slow. I just used it for calculation on every pixel in an image, in VS2010 Express with Debugging turned on, and it was unusably slow (> 50 * custom code). I realize it might be better in a release build, but I can't even do debugging with those load times. Think twice before using it for anything remotely performance critical! Commented Jan 10, 2013 at 11:11
  • Performance of Eigen is MUCH better in release builds than in debug builds. You are right that speed in debug is abysmally slow.
    – Joe
    Commented Oct 19, 2013 at 18:36
  • @ChristianAichinger visual studio's C++ performance in general is abysmally slow in debug builds; all kinds of standard containers do bounds checking (and the runtime itself inserts some boundschecking even for raw arrays). If that's a problem, you can define NDEBUG which turns off those checks; of course, that will make your code harder to debug (still better than release since it won't inline as much). Commented Jun 15, 2014 at 10:38

6 Answers 6

14

Make sure to have compiler optimization switched on (e.g. at least -O2 on gcc). Eigen is heavily templated and will not perform very well if you don't turn on optimization.

3
  • 7
    Also Eigen does much bounds and alignment checking when NDEBUG or EIGEN_NO_DEBUG is not defined.
    – Hannah S.
    Commented May 7, 2012 at 16:46
  • Note that -o2 means "output file is 2". For optimization options with GCC use -O2 and the like (note the capital 'O' instead of small 'o').
    – Ruslan
    Commented Sep 8, 2016 at 12:23
  • It apparently doesn't even perform well even when you turn on optimization (at least for small matrices) stackoverflow.com/questions/58071344/…
    – Mark
    Commented Dec 14, 2019 at 3:08
13

If you're using Eigen's MatrixXd types, those are dynamically sized. You should get much better results from using the fixed size types e.g Matrix4d, Vector4d.

Also, make sure you're compiling such that the code can get vectorized; see the relevant Eigen documentation.

Re your thought on using the Direct3D extensions library stuff (D3DXMATRIX etc): it's OK (if a bit old fashioned) for graphics geometry (4x4 transforms etc), but it's certainly not GPU accelerated (just good old SSE, I think). Also, note that it's floating point precision only (you seem to be set on using doubles). Personally I'd much prefer to use Eigen unless I was actually coding a Direct3D app.

4
  • It will be hard to use fixed size type becuase almost all my matrices ave size [2,1] - two rows and one columns. So far I only found 2x2 3x3 fixed sizes
    – george
    Commented May 31, 2011 at 23:35
  • Eigen would call that a Vector2d, rather than a matrix. I'm surprised if it's not already defined (since Vector2d is mentioned in the Vectorization docs above as being 16 bytes and as being SSE compatible). If you have to define your own, all it'll be is a typedef Matrix<double, 2, 1> Vector2d;
    – timday
    Commented Jun 1, 2011 at 7:43
  • Changing MatrixXd into Vector2d and Matrix2d I got only 2s better time. No it is 85 s instead of 87 s.Still slower than if I use my own matrix class. Strange :(
    – george
    Commented Jun 2, 2011 at 20:26
  • 8
    You can create fixed-size matrices of any dimension using Eigen::Matrix<double, n_rows, n_cols>. Commented Jan 10, 2013 at 11:07
9

Which version of Eigen are you using? They recently released 3.0.1, which is supposed to be faster than 2.x. Also, make sure you play a bit with the compiler options. For example, make sure SSE is being used in Visual Studio:

C/C++ --> Code Generation --> Enable Enhanced Instruction Set

2
  • 2
    +1, Sound advice, also, in general all compiler optimizations should be turned on for this type of tests. Commented May 31, 2011 at 21:40
  • I use Eigen 3.0.1, however I didn't turn on "Enable Enhanced instruction set". I'll try this
    – george
    Commented May 31, 2011 at 22:15
9

You should profile and then optimize first the algorithm, then the implementation. In particular, the posted code is quite innefficient:

for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i])  + translation;

I don't know the library, so I won't even try to guess the number of unnecessary temporaries that you are creating, but a simple refactor:

Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
   Eigen::MatrixXd outcome = tmp*(*pointVector[i])  + translation;

Can save you a good amount of expensive multiplications (and again, probably new temporary matrices that get discarded right away.

0
2

A couple of points.

  1. Why are you multiplying rotation*scale inside of the loop when that product will have the same value each iteration? That is a lot of wasted effort.

  2. You are using dynamically sized matrices rather than fixed sized matrices. Someone else mentioned this already, and you said you shaved off 2 sec.

  3. You are passing arguments as a vector of pointers to matrices. This adds an extra pointer indirection and destroys any guarantee of data locality, which will give poor cache performance.

  4. I hope this isn't insulting, but are you compiling in Release or Debug? Eigen is very slow in debug builds, because it uses lots of trivial templated functions that are optimized out of release but remain in debug.

Looking at your code, I am hesitant to blame Eigen for performance problems. However, most linear algebra libraries (including Eigen) are not really designed for your use case of lots of tiny matrices. In general, Eigen will be better optimized for 100x100 or larger matrices. You very well may be better off using your own matrix class or the DirectX math helper classes. The DirectX math classes are completely independent from your video card.

0

Looking back at your previous post and the code in there, my suggestion would be to use your old code, but improve its efficiency by moving things around. I'm posting on that previous question to keep the answers separate.

Not the answer you're looking for? Browse other questions tagged or ask your own question.