bool isPrime = true;
for (int j = 2; j < i / 2; j++)
{
if (i%j != 0) continue;
isPrime = false;
break;
}
if (!isPrime) continue;
This is one of the most primitive and least efficient ways to calculate whether or not a value is prime.
First and foremost, we deserve a method which encapsulates this logic and can be called repeatedly:
bool isPrime(int value)
{
// return true/false
}
This will allow us to write a set of unit tests around this method to make sure it's correct. It also allows us to very easily measure exactly how much this part of our total algorithm is costing us in terms of time. And in the end, it just makes our code more readable, which for me is usually enough of a reason to do anything.
Now, as for the logic here... I've got a few comments.
j < i / 2
Neverminding the fact that we need better variable names, we're calculating far too many numbers. Consider the prime number 97. Your algorithm will test every number up to 48, but we could stop at the closest thing to 97's square root, 10. If no number 10 or smaller divides evenly into 97 whose square root is 9.8 something, then it's prime.
j++
We're checking every divisor. We could eliminate the even numbers early and just see if our test value is divisible by odd numbers.
if (i%j != 0) continue;
First of all, omitting braces is a capital sin as far as I'm concerned. You shouldn't consider them optional. But perhaps more importantly, why the inverse logic? Your loop body could simply be written as:
if (i % j == 0)
{
isPrime = false;
break;
}
But... there's a far more efficient pattern, so let's get back to that isPrime
method I was talking about...
Let's start with this logic to deal with a few special cases:
if (value < 2) { return false; } // no number less than 2 is prime
if (value % 2 == 0) { return value == 2; } // no multiple of 2 is prime
if (value % 3 == 0) { return value == 3; } // no multiple of 3 is prime
if (value % 5 == 0) { return value == 5; } // no multiple of 5 is prime
We're also going to go ahead and filter out 7, which is a prime.
if (value == 7) { return true; }
We're taken out a lot of the primes now. The first line takes out a little over half of all valid numbers (all negatives, 0, and 1).
The second line takes out another half of the remaining by eliminated all the even numbers.
The third line takes out a third of the remaining by eliminating all of the odd multiples of three (even multiples of three were eliminated in previous line).
The fourth line takes out about twenty percent of the remaining by eliminated the multiples of five which were not already knocked out by being even or a multiple of three (so for example, 10, 15, 20 already knocked out, but here we knocked out 25).
The final line deals with 7 so that all of our special cases are handled before we enter this pattern:
for (int divisor = 7; divisor * divisor <= value; divisor += 30)
{
if (value % divisor == 0) { return false; }
if (value % (divisor + 4) == 0) { return false; }
if (value % (divisor + 6) == 0) { return false; }
if (value % (divisor + 10) == 0) { return false; }
if (value % (divisor + 12) == 0) { return false; }
if (value % (divisor + 16) == 0) { return false; }
if (value % (divisor + 22) == 0) { return false; }
if (value % (divisor + 24) == 0) { return false; }
}
This loop starts at 7 and increments by 30 on each iteration. Each line of this loop tests a different off set which makes sure we're not double checking our special cases. With each line we're testing against a smaller portion of possible values, because the line above it eliminated more.
To be clear, our special cases eliminated...
- roughly 50%
- half of remaining
- 1/3rd of remaining
- 1/5th of remaining
Through the first iteration of this loop, we eliminated by line
- 1/7th of remaining
- 1/11th of remaining
- 1/13th of remaining
- 1/17th of remaining
- 1/19th of remaining
- 1/23rd of remaining
- 1/29th of remaining
- 1/31st of remaining
Notice those denominators? They're primes.
The loop becomes less efficient per iteration (and as you get into requiring more and more iterations of the loop, a sieve starts to become faster at the expense of memory).
But by the end of the first iteration of the loop, we have tested the value
against the eleven primes. And that's the most efficient way to determine if a number is a prime, by testing if it is divisible by primes (which is what sieves allow you to do).
- There's no point in checking if 97 is divisible by 4, because if it was going to be divisible by 4, it would have been divisible by 2, and we already checked 2.
- There's no point in checking if 97 is divisible by 9, because if it was going to be divisible by 9, it'd be divisibly by 3, which we already checked.
- There's no point in checking if 97 is divisible by 25, because if it was going to be divisible by 25, it'd be divisible by 5, which we already checked.
Also, we know that if 97 were divisible by 25, we would have already found the other number that divides in to it. And to be more clear on that part, let's look at a similar number that isn't prime: 98.
So, 98 is divisible by 49, right? 49 will go into 98 twice. So naïvely, we might consider in order to determine if 98 is prime, we need to check all the way up to 98 % 49
, right? Wrong. Numbers factor into pairs. For every pair, we only need to test the smallest pair against the value.
So for the factors of 98, we have these pairs:
(2, 49)
(7, 14)
For any number, if we break it into pairs like this and look at the smaller number in the pair, the largest smaller number will be less than or equal to the square root of the number we are factoring.
Consider a number that is a perfect square, like 100. It's factor pairs look like this:
( 2,50)
( 4,25)
( 5,20)
(10,10)
Again, we only need to check the left column. Here, the largest value out of the left column is 10. This is the exact square root of 100, and we can stop our checking here. No need to go all the way up to halfway.
Anyway, long story short, the final isPrime
function looks about like this:
bool isPrime(int value)
{
if (value < 2) { return false; }
if (value % 2 == 0) { return value == 2; }
if (value % 3 == 0) { return value == 3; }
if (value % 5 == 0) { return value == 5; }
if (value == 7) { return true; }
for (int divisor = 7; divisor * divisor <= value; divisor += 30)
{
if (value % divisor == 0) { return false; }
if (value % (divisor + 4) == 0) { return false; }
if (value % (divisor + 6) == 0) { return false; }
if (value % (divisor + 10) == 0) { return false; }
if (value % (divisor + 12) == 0) { return false; }
if (value % (divisor + 16) == 0) { return false; }
if (value % (divisor + 22) == 0) { return false; }
if (value % (divisor + 24) == 0) { return false; }
}
return true;
}
j <= Math.Sqrt(i)
rather thanj < i/2
. Can you explain why this works? \$\endgroup\$