Project Euler #7 10001st prime

Question

I decided to start working on the Euler project exercises. I'm doing fine for now but the seventh problem is running quite slow and I can't think of anything to make it work faster. It takes around 1300-1500 ms to finish which doesn't satisfy me while I know it can be done in maybe 20-30 milliseconds.

int index = 1;
List<int> primes = new List<int> { 1 };
int i = primes[index - 1];
Stopwatch sw = new Stopwatch();
sw.Start();
while (primes.Count<10001)
{
    i += 2;
    bool isPrime = true;
    for (int j = 2; j < i / 2; j++)
    {
        if (i%j != 0) continue;
        isPrime = false;
        break;
    }
    if (!isPrime) continue;
    primes.Add(i);
    index++;
}
sw.Stop();
Console.WriteLine(primes[10000]);
Console.WriteLine("Time to calculate in milliseconds : {0}", sw.ElapsedMilliseconds);
Console.ReadKey();

Simplest improvement: test for divisors j <= Math.Sqrt(i)rather than j < i/2. Can you explain why this works? — Colonel Panic, Commented Apr 4, 2016 at 16:48
A concern: your list of primes includes 1 and omits 2. This cancels out though in your calculation of the nth. — Colonel Panic, Commented Apr 4, 2016 at 16:51
I also wondered this for a while but take a look at 9. 3^2 = 9 that's basically the lowest divisor of the number. 25 = 5^2 same here nothing smaller than 5 is a divisor of 25. If we have 12 we take sqrt(12) which is 3(~3.4) and we are still going to reach a divisor of 12 (3) . Where we can conclude that the smallest divisor of a number aside from 1 is the sqrt of a number. Correct me if I am wrong — Denis, Commented Apr 4, 2016 at 16:53
@denis: Your statement is completely wrong. The smallest divisor of 12 that is bigger than 1 is 2, not 3. The true statement that you are casting about trying to find here is if a number is composite then there exists a prime divisor that is less than or equal to the square root of that number. For example, take 391. Its square root is a little over 19.7, it is composite, and it has a prime divisor less than 19.7, namely 17. — Eric Lippert, Commented Apr 4, 2016 at 17:24

nhgrif · Accepted Answer · 2016-04-04 21:51:48Z

bool isPrime = true;
for (int j = 2; j < i / 2; j++)
{
    if (i%j != 0) continue;
    isPrime = false;
    break;
}
if (!isPrime) continue;

This is one of the most primitive and least efficient ways to calculate whether or not a value is prime.

First and foremost, we deserve a method which encapsulates this logic and can be called repeatedly:

bool isPrime(int value)
{
    // return true/false
}

This will allow us to write a set of unit tests around this method to make sure it's correct. It also allows us to very easily measure exactly how much this part of our total algorithm is costing us in terms of time. And in the end, it just makes our code more readable, which for me is usually enough of a reason to do anything.

Now, as for the logic here... I've got a few comments.

j < i / 2

Neverminding the fact that we need better variable names, we're calculating far too many numbers. Consider the prime number 97. Your algorithm will test every number up to 48, but we could stop at the closest thing to 97's square root, 10. If no number 10 or smaller divides evenly into 97 whose square root is 9.8 something, then it's prime.

j++

We're checking every divisor. We could eliminate the even numbers early and just see if our test value is divisible by odd numbers.

if (i%j != 0) continue;

First of all, omitting braces is a capital sin as far as I'm concerned. You shouldn't consider them optional. But perhaps more importantly, why the inverse logic? Your loop body could simply be written as:

if (i % j == 0)
{
    isPrime = false;
    break;
}

But... there's a far more efficient pattern, so let's get back to that isPrime method I was talking about...

Let's start with this logic to deal with a few special cases:

if (value < 2) { return false; } // no number less than 2 is prime
if (value % 2 == 0) { return value == 2; } // no multiple of 2 is prime
if (value % 3 == 0) { return value == 3; } // no multiple of 3 is prime
if (value % 5 == 0) { return value == 5; } // no multiple of 5 is prime

We're also going to go ahead and filter out 7, which is a prime.

if (value == 7) { return true; }

We're taken out a lot of the primes now. The first line takes out a little over half of all valid numbers (all negatives, 0, and 1).

The second line takes out another half of the remaining by eliminated all the even numbers.

The third line takes out a third of the remaining by eliminating all of the odd multiples of three (even multiples of three were eliminated in previous line).

The fourth line takes out about twenty percent of the remaining by eliminated the multiples of five which were not already knocked out by being even or a multiple of three (so for example, 10, 15, 20 already knocked out, but here we knocked out 25).

The final line deals with 7 so that all of our special cases are handled before we enter this pattern:

for (int divisor = 7; divisor * divisor <= value; divisor += 30)
{
    if (value % divisor == 0) { return false; }
    if (value % (divisor + 4) == 0) { return false; }
    if (value % (divisor + 6) == 0) { return false; }
    if (value % (divisor + 10) == 0) { return false; }
    if (value % (divisor + 12) == 0) { return false; }
    if (value % (divisor + 16) == 0) { return false; }
    if (value % (divisor + 22) == 0) { return false; }
    if (value % (divisor + 24) == 0) { return false; }
}

This loop starts at 7 and increments by 30 on each iteration. Each line of this loop tests a different off set which makes sure we're not double checking our special cases. With each line we're testing against a smaller portion of possible values, because the line above it eliminated more.

To be clear, our special cases eliminated...

roughly 50%
half of remaining
1/3rd of remaining
1/5th of remaining

Through the first iteration of this loop, we eliminated by line

1/7th of remaining
1/11th of remaining
1/13th of remaining
1/17th of remaining
1/19th of remaining
1/23rd of remaining
1/29th of remaining
1/31st of remaining

Notice those denominators? They're primes.

The loop becomes less efficient per iteration (and as you get into requiring more and more iterations of the loop, a sieve starts to become faster at the expense of memory).

But by the end of the first iteration of the loop, we have tested the value against the eleven primes. And that's the most efficient way to determine if a number is a prime, by testing if it is divisible by primes (which is what sieves allow you to do).

There's no point in checking if 97 is divisible by 4, because if it was going to be divisible by 4, it would have been divisible by 2, and we already checked 2.
There's no point in checking if 97 is divisible by 9, because if it was going to be divisible by 9, it'd be divisibly by 3, which we already checked.
There's no point in checking if 97 is divisible by 25, because if it was going to be divisible by 25, it'd be divisible by 5, which we already checked.

Also, we know that if 97 were divisible by 25, we would have already found the other number that divides in to it. And to be more clear on that part, let's look at a similar number that isn't prime: 98.

So, 98 is divisible by 49, right? 49 will go into 98 twice. So naïvely, we might consider in order to determine if 98 is prime, we need to check all the way up to 98 % 49, right? Wrong. Numbers factor into pairs. For every pair, we only need to test the smallest pair against the value.

So for the factors of 98, we have these pairs:

(2, 49)
(7, 14)

For any number, if we break it into pairs like this and look at the smaller number in the pair, the largest smaller number will be less than or equal to the square root of the number we are factoring.

Consider a number that is a perfect square, like 100. It's factor pairs look like this:

( 2,50)
( 4,25)
( 5,20)
(10,10)

Again, we only need to check the left column. Here, the largest value out of the left column is 10. This is the exact square root of 100, and we can stop our checking here. No need to go all the way up to halfway.

Anyway, long story short, the final isPrime function looks about like this:

bool isPrime(int value)
{
    if (value < 2) { return false; }
    if (value % 2 == 0) { return value == 2; }
    if (value % 3 == 0) { return value == 3; }
    if (value % 5 == 0) { return value == 5; }
    if (value == 7) { return true; }

    for (int divisor = 7; divisor * divisor <= value; divisor += 30)
    {
        if (value % divisor == 0) { return false; }
        if (value % (divisor + 4) == 0) { return false; }
        if (value % (divisor + 6) == 0) { return false; }
        if (value % (divisor + 10) == 0) { return false; }
        if (value % (divisor + 12) == 0) { return false; }
        if (value % (divisor + 16) == 0) { return false; }
        if (value % (divisor + 22) == 0) { return false; }
        if (value % (divisor + 24) == 0) { return false; }
    }

    return true;
}

Woah, thank you a lot for the in-depth answer I will make sure to inspect it in-depth as well and take notes from it . While I was writing the code I didn't check which numbers guarantee that they are not multiple's of prime's, and some other stuff concerning mathematics, for example the square roots. Once again thank you a lot. — Denis, Commented Apr 3, 2016 at 19:10
@nhgrif: Using sieves is a good idea ina general algorithm, however now we have list primes why not just loop through them: for (int divisor = 1; primes[divisor] * primes[divisor] <= value; divisor++)? That would really test only prime divisors. — Máté Juhász, Commented Apr 3, 2016 at 19:52
I'm not a .NET user, and I'm not on Windows, so I can't really test, but I think it's important to note here @MátéJuhász that acceptance criteria only specify we need to know the 10,001st prime. We can throw away the first 10,000. We don't need to spend execution time waiting for the OS to find a massive block of memory to allocate our sieve in. — nhgrif, Commented Apr 3, 2016 at 23:15
@MátéJuhász sqrt is computationally expensive whereas multiplication is much less expensive so the value of that optimisation depends on how much it will be used. If you are testing whether large values are prime then that optimisation may have value but if you are doing lots of tests on smaller values then it may lead to worse performance. I would suggest profiling that optimisation to see whether it has any value before implementing it globally as it may not give the improvement you expect - especially if you plan on re-using this as a prime generator in later Project Euler questions. — MT0, Commented Apr 4, 2016 at 7:35

Eric Lippert · Accepted Answer · 2016-04-05 14:57:46Z

You've got some excellent responses so far on how you can make your IsPrime logic faster and more general-purpose. I would add a few additional notes.

First, move to long or BigInteger now. Very soon the numbers in Project Euler will exceed the limits of an int, which can only represent numbers up to about 2 billion. If you are going to keep using ints, turn checked arithmetic on, so that you'll know when you exceed the range of an integer.

Think about the level of abstraction of your program. The problem is "What is the 10001st prime number?" What could we do to solve that? How about: generate a sequence of all prime numbers, and then take the 10001st member of that sequence? Here's how I would write your program:

static bool IsPrime(long x) { /* You implement this */ }

static IEnumerable<long> PositiveIntegers()
{
  for (long i = 1; i < long.MaxValue; ++i)
    yield return i;
}

static IEnumerable<long> Primes()
{
  return PositiveIntegers().Where(IsPrime);
}

static long Problem7() 
{
  return Primes().Skip(10000).First();
}

Look at how much nicer that program is compared to yours! Look at how readable and clear and obviously correct every method is! A method that is only one or two lines long is very likely to be correct. Your program is a mess of arrays and loops and continues and breaks and no wonder it has a bunch of bugs.

Is this solution slightly slower than a solution that is tightly optimized to just have those loops and breaks and continues? Sure. But who cares? This program is clear, it is easy to understand, it is fast enough, and it will be easy to take the gear from this program and use it to solve more problems in the future.

Consider problem 10, for example, which is "Find the sum of all the primes below two million." It is pretty easy to modify this solution of problem 7 to solve problem 10!

When you're writing programs, whether they are for Project Euler or for line-of-business applications, try to make every method clearly implement the desired specification. What do you want? The 10001st prime. How do you get it? Skip the first 10000 primes and then take the first one you find. What does the code look like? Exactly that.

UPDATE: A commenter notes that I could be using an expression-bodied function in C# 6.0. I thought that could be clarified. This is legal in C# 6.0:

static IEnumerable<long> Primes() => PositiveIntegers().Where(IsPrime);

static long Problem7() => Primes().Skip(10000).First();

There's no difference between this version and the original version; C# 6.0 just allows these short methods to be written with more concision.

Thank you for the neat sample code Eric ! I will make sure to implement my functions clearer next time. — Denis, Commented Apr 4, 2016 at 18:31
I think it's the C#6 feature I use most, although string interpolation is a strong contender. Was concerned for a moment that you were using them because there was something wrong with them... glad that's not the case. — Corey, Commented Apr 5, 2016 at 23:23

nhgrif · Accepted Answer · 2016-04-04 20:49:42Z

While my other answer focuses primarily on your algorithm itself, there's something else in your code I'd like to address:

primes.Add(i);

Let's look at the MSDN remarks on this Add method:

If Count already equals Capacity, the capacity of the List is increased by automatically reallocating the internal array, and the existing elements are copied to the new array before the new element is added.

If Count is less than Capacity, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.

The added emphasis is mine.

I'm not familiar with the .NET implementation here, but generally speaking the way these tend to work is there's some default capacity that these collections are initialized to, then each time the capacity needs to be expanded, the capacity is doubled.

So, if we started with capacity of 10, when we tried adding the 11th item, we'd move all of the contents, and resize to capacity of 20. When trying to add the 21st, we'd resize to capacity of 40. This becomes really expensive. Each time you resize, a lot of memory has to be moved in order to make space.

We can cut down on this by either using a fixed size array of exactly the size we need, or by setting the capacity of our List to exactly what we need (or more).

We can also arguably take this memory allocation out of the time measurement by doing it before we set the start time, but even if we do measure it, it's a single allocation rather than multiple allocations, and it doesn't involve moving any memory so it becomes more efficient.

It turns out you don't even need to store this list. You can discard the first 10000 primes and return the 10001st one when it is found. — Nayuki, Commented Apr 4, 2016 at 2:39
I have never heard of a "square when full" list. The point of a double-when-full list is that an analysis of the amortized cost shows that the cost per add remains constant on average no matter how many times the list is doubled. Yes, some small number of expensive adds become linear in cost, but so few of them that on average, it works out to a constant. — Eric Lippert, Commented Apr 4, 2016 at 17:21
It isn't being "squared when full" in this implementation. If it were, then the complexity would be O(n^2) when it needed to resize. — Lacklub, Commented Apr 4, 2016 at 19:40

Tamoghna Chowdhury · Accepted Answer · 2016-04-04 02:40:36Z

Performance

I can't think of anything to make it work faster

You can use a Sieve of Eratosthenes to make it faster - here is some sample code (sieving only odd numbers) that I had written in Java and adapted to C# for this question which gives an example of such a sieve:

IDEONE

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;

public class PrimeList
{
    // A list of prime values.
    private static readonly List<int> Primes        = new List<int>( 10001 );

    // An array of bits indicating whether the odd numbers (from 3 upwards)
    // are prime (true) or non-prime (false).
    private static readonly BitArray  OddPrimeSieve = new BitArray( 100000,
                                                                    true );

    // The maximum prime number that has currently been found from the sieve.
    private static          int       MaxSieved     = 3;

    static PrimeList()
    {
        Primes.Add( 2 );
        Primes.Add( 3 );
    }

    public static int GetNthPrime( int index )
    {
        // Caclulate the sieve index of the currently maximum sieved prime.
        int sieveIndex = ( MaxSieved - 3 ) / 2;
        int multiple;

        while( Primes.Count < index )
        {
            // Set all higher multiples of the current prime to be non-prime.
            for ( multiple = sieveIndex + MaxSieved;
                    multiple < OddPrimeSieve.Count; 
                    multiple += MaxSieved )
            {
                OddPrimeSieve.Set( multiple, false );
            }

            // Find the sieve index of the next prime number.
            do
            {
                ++sieveIndex;
            }
            while ( sieveIndex < OddPrimeSieve.Count
                    && !OddPrimeSieve[sieveIndex] );

            // Reverse the calculation of the sieve index to get the next prime.
            MaxSieved = 2*sieveIndex + 3;

            // Add the prime to the list.
            Primes.Add( MaxSieved );
        }
        return Primes[index-1];
    }

    public static void Main()
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        int p = GetNthPrime(10001);
        sw.Stop();
        Console.WriteLine( p );
        Console.WriteLine( "Time to calculate in milliseconds : {0}",
                            sw.ElapsedMilliseconds );
    }
}

Output:

104743
Time to calculate in milliseconds : 5

Explanation:

The OddPrimeSieve BitSet stores whether the odd numbers (from 3 upwards) are primes or not. Initially all odd values are set to be prime and whenever a prime is found then all higher multiples are set to be non-prime (the for loop). The next prime is then found by iterating over the sieve until a value is found where the bit is still set to true.

The do ... while loop can be written much more succinctly in Java (using BitSet) as primeIndex = OddPrimeSieve.nextSetBit( primeIndex + 1 );

Note:

I made the class so that it would maintain a list of primes that could be used across repeated requests. This is not necessary to solve Project Euler's question - so if you wanted to optimise for that then you could remove that part of the code and just use a simple counter to get the 10001st value.

@TamoghnaChowdhury The OP stated in the question "I can't think of anything to make it work faster" and this addresses that specific request. — MT0, Commented Apr 4, 2016 at 2:36
Also, the BitArray comment - why don't you use multiline comments? — Tamoghna Chowdhury, Commented Apr 4, 2016 at 2:41

gnasher729 · Accepted Answer · 2016-04-05 08:51:08Z

A big problem is that your algorithm identifies 4 as a prime :-(

The problem was "find the 10,001st prime". You solve this by checking whether numbers are primes, and adding the primes to a list, until there are 10,001 primes in the list (actually, the list contains the number 1 and 10,000 primes but not the prime 2). Note that creating the list is not needed at all, but if you create that list, you might as well use it and only check whether a potential prime p is divisible by another prime, not be every number.

You solve the problem "is p a prime" by checking "does p have no divisor >= 2 and < p/2" - which is incorrect when p = 4. This check is very inefficient: If p is not prime then not only must it have a divisor <= p/2, but it must have a divisor <= sqrt (p). That would give a huge improvement.

But with this kind of problem, you usually need to attack the problem as a whole. Not by checking individual numbers, but by checking a huge list of numbers, like the odd numbers from 3 to 199,999 all simultaneously. (Google for "Sieve of Eratosthenes" or just Eratosthenes. But in general, lots of the Project Euler problems require that you attack the problem as a whole).

Thanks for the note, how would I check a huge list of numbers, like the odd numbers from 3 to 199,999 all simultaneously. And also it only misses the 2 it doesn't identify the 4 as prime.. — Denis, Commented Apr 3, 2016 at 22:21
In your last paragraph, are you trying to recommend some sort of multithreading solution? I'd be interested for hearing the plan of attack on deciding how many threads to spawn on and how to divide up the work and then put it back together again. — nhgrif, Commented Apr 3, 2016 at 23:06

RedDragonWebDesign · Accepted Answer · 2016-04-04 07:21:47Z

I was working on the Project Euler problems today in C#. Here is my code for this one. nhgrif's technique is excellent, but you can shave your code execution time down to an extremely fast 50ms without any advanced techniques. The only thing I did to optimize the code was iterate from 2 to Math.sqrt(num) instead of 2 to (num-1). I didn't even implement code to skip even numbers, which could make it even faster.

Perhaps your technique of storing every prime, instead of just the current one, is why your code is slow? Your technique is to store every prime in a List advanced data type. Instead, you could just store the current prime and current prime count in int or long primitive data types.

private void button7_Click(object sender, EventArgs e)
{
    Stopwatch sw = StartProblem("By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.\r\n\r\nWhat is the 10 001st prime number?");

    FinishProblem(FindNthPrime(10001), sw);
}

private long FindNthPrime(long n)
{
    if (n < 1)
    {
        throw new ArgumentException("The number of the nth prime to find must be positive!");
    }

    long numToTry = 2;
    long currentPrime = 0;
    long currentPrimeCount = 0;

    // TODO: no need to check even numbers, they are not prime
    while (currentPrimeCount < n)
    {
        if (IsPrime(numToTry))
        {
            currentPrime = numToTry;
            currentPrimeCount++;
        }

        numToTry++;
    }

    return currentPrime;
}

private bool IsPrime(long NumToCheck)
{
    // A prime number (or a prime) is a natural number greater than 1 that has no positive divisors
    // other than 1 and itself.

    // By definition, numbers 1 or less are not primes.
    if (NumToCheck <= 1)
    {
        return false;
    }
    // 2 is prime, but the loop below won't think so. We need to define it up here.
    else if (NumToCheck == 2)
    {
        return true;
    }

    long SquareRoot = SquareRootRoundedUp(NumToCheck);

    for (long i = 2; i <= SquareRoot; i++)
    {
        if ( NumToCheck % i == 0 )
        {
            return false;
        }
    }

    return true;
}

private long SquareRootRoundedUp(long NumToSquareRoot)
{
    return Convert.ToInt64(Math.Ceiling(Convert.ToDecimal(Math.Sqrt(NumToSquareRoot))));
}

\$\begingroup\$ lists are fast. the add is not in the inner loop. \$\endgroup\$
– Aleksandr Dubinsky
Commented Apr 5, 2016 at 20:36 — Aleksandr Dubinsky, Commented Apr 5, 2016 at 20:36

gnasher729 · Accepted Answer · 2016-04-05 08:59:41Z

Just a remark on how to attack problems: The problem here is quite easy. Whether you have a fast or a slow algorithm, it should find the 10,001st prime in reasonable time. Now imagine they had asked for the 1,000,000,000,000,000,001st prime. Which is totally beyond reach if you examine which numbers are primes, and still out of reach if you use a highly optimised sieve.

If you read the question again, you find that nobody actually wanted all the primes, they are just asking for the one specific prime! So you would ask yourself: Can I find the (one billion billion + first) prime without finding all the others?

The answer is YES: There is an algorithm that counts how many primes there are <= n in O (n ^ (2/3)). There are formulas that tell you approximately how big that prime would be. You use such a formula to find "the prime we are looking for is about X". Then you use the prime counting function to find how many primes there are which are <= X. If that number is close to 10^18 + 1, you create a sieve for the numbers around X. Otherwise you use it to find a better approximation for the 10^18+1st prime and repeat.

This will still take your computer some considerable time, but it changes the problem from unsolvable to hard work.

Stack Exchange Network

Project Euler #7 10001st prime

7 Answers 7

Performance

Not the answer you're looking for? Browse other questions tagged
c#
performance
beginner
programming-challenge
primes
or ask your own question.

Linked

Hot Network Questions

Project Euler #7 10001st prime

7 Answers 7

Performance

Not the answer you're looking for? Browse other questions tagged c#performancebeginnerprogramming-challengeprimes or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
c#
performance
beginner
programming-challenge
primes
or ask your own question.