5
\$\begingroup\$

I have written a simple implementation of Sieve of Erasthotenes (in C, MSVC flavour), which I believe is correct (source below). Is there a way to make it faster and more readable?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <intrin.h>

typedef unsigned long long bitmap_entry_t;
#define NUMBERS_TO_CALCULATE (1024*1024*1024)
#define BITMAP_ENTRY_BITS (sizeof(bitmap_entry_t)*8)

inline void bitmap_set(bitmap_entry_t *bitmap, unsigned long long index)
{
    _bittestandset64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}

inline void bitmap_unset(bitmap_entry_t *bitmap, unsigned long long index)
{
    _bittestandreset64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}

inline bitmap_entry_t bitmap_get(bitmap_entry_t *bitmap, unsigned long long index)
{
    return _bittest64(&bitmap[index / BITMAP_ENTRY_BITS], index%BITMAP_ENTRY_BITS);
}

int main(void)
{
    bitmap_entry_t *bitmap = malloc((1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS) * sizeof(bitmap_entry_t));
    for (int i = 0; i < (1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS); i++)
    {
        bitmap[i] = 0xFFFFFFFFFFFFFFFFLL;
    }

    for (int i = 2; i < sqrt(NUMBERS_TO_CALCULATE); i++)
    {
        if (bitmap_get(bitmap, i - 2))
        {
            for (int j = 2 * i; j < NUMBERS_TO_CALCULATE; j += i)
            {
                bitmap_unset(bitmap, j - 2);
            }
        }
    }

    for (int i = 2; i < NUMBERS_TO_CALCULATE; i++)
    {
        if (bitmap_get(bitmap, i - 2))
        {
            printf("%d ", i);
        }
    }

    getchar();
}
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

I see a few things that might help you improve your code.

Declare local routines static

If you declare bitmap_get and the other routines as static, the compiler will know that they can never be called outside this one file. With that information, much more agressive optimization can be done and the inline keyword will probably not even be needed.

Eliminate unused routine

The code defines a routine bitmap_set but it is not actually used and can be eliminated from the code. Smaller code typically runs faster, so there's a small chance that this could improve speed as well.

Use calloc and reverse the sense of the bits

You could eliminate the initialization of all of the bits to 1 by using calloc instead of malloc and reversing the meaning of the bits in the bitmap. On my Linux box, the times were about the same, but of course the calloc version was shorter.

Skip all even numbers

Your bitmap could be half the current size by simply eliminating the storage of even numbers. The only even prime is 2, so your program doesn't really need to allocate bits for even numbers.

Optimize the inner loop

If we only test odd numbers, as per the last suggestion, we can also optimize the inner loop. Once we know that i is an odd number, it's obvious that 2*i must be an even number, so there's no reason to mark it as non-prime since we already know that all even numbers except 2 are non-prime. Using both suggestions, we can double the speed of both inner and outer loops by writing them like this:

for (int i = 3; i < sqrt(NUMBERS_TO_CALCULATE); i+=2)
{
    if (bitmap_get(bitmap, i - 2))
    {
        for (int j = 3 * i; j < NUMBERS_TO_CALCULATE; j += 2*i)
        {
            bitmap_unset(bitmap, j - 2);
        }
    }
}

Prefer portable constants

If you elect not to use calloc, you could still revise the way the constant is written. At the moment, the code uses this:

bitmap[i] = 0xFFFFFFFFFFFFFFFFLL;

It would be shorter, less potentially error prone (quick -- how many F's should there be?) and easier to read like this:

bitmap[i] = ~0;

On my machine, using gcc as the compiler, they compile to the same thing.

Consider using pointers

On some compilers, depending on optimizations used, the use of pointers can be faster than using indexing. You could test this on your machine with your compiler. For example, the initialization loop could be written like this:

for (bitmap_entry_t *p = bitmap; 
        p < &bitmap[(1 + NUMBERS_TO_CALCULATE / BITMAP_ENTRY_BITS)]; ++p)
{
    *p = ~0;
}

Optimize printing

As you may already know, it's the number conversion and printing of the results that takes most of the time for this program. If that's of interest, you could write your own special purpose output formatter. By keeping a static string array in memory and using that to increment through the numbers, you may be able to speed the entire program.

\$\endgroup\$
1
\$\begingroup\$

'Faster and more readable' is a bit of a tall order! One often has to sacrifice one for the other..

Some minor points: The second argument to bitmap_(get|set|test) is always in fact an int but the functions take an unsigned long long. I'd day they should take an int

While one would hope that the compiler will ensure that the constant expression sqrt(NUMBERS_TO_CALCULATE) is evaluated just once, personally I prefer to ensure its evaluated just once by assigning it to a local variable, just once. Moreover the compiler may well be evaluating the comparison by converting i to a double. This would also be avoided by assigning the limit to an integer local variable.

I don't think you gain anything with all the -2's in the second argument to the bitmap_(get|set|test) calls. They could go.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.