Efficient prime number testing

In nearly every lower-level CS class, students are invariably asked to create a function that can recognize a prime number. There are very good reasons for this. For one thing, the study of prime numbers is one of the longest-running fields in mathematics; the ancient Greeks were the first to study them in depth, but even the ancient Egyptians distinguished between prime numbers and composites. For another, the study of prime numbers has applications in a wide variety of fields. The one that will probably interest CS students the most is cryptography.

But the reason I’m going to talk about them is because recognizing prime numbers is a good starting point for talking about algorithms. By starting with the most basic, inefficient algorithm, we can learn some techniques to improve efficiency and eliminate redundancy. These same techniques can be applied to almost any algorithm, and will be useful (I hope) throughout our entire careers.

First, some ground rules.

I’m going to assume that most people reading this have a basic understanding of big-O notation. (If not, I plan to write an introduction to the topic; I’ll link it here when I do.) To simplify analysis, I’m going to only count integer divisions (including modulo) as operations, and not variable assignments or counter increments. Yes, I know it’s a cheat, but it’ll do for our purposes. For similar reasons, when analysing memory space efficiency, I’m only considering spaces that grow in proportion to input size.

The big-O analysis I am going to do is of each function call, not of the program as a whole. The worst case in this scenario is when the function is passed a prime number.

I am not going to assume that the reader is a math major, so no foreknowledge of prime number theory is required. I will introduce it (in a very basic way) when it comes up; but if you even know what a prime number is, you’ll understand this article. Those of you that are math majors will immediately recognize that I’m not.

I’m going to write my example code in C++, since it seems to be the language that is taught in most introductory CS courses. But I’m avoiding custom objects and (most) library functions, so the code should be easily adaptable to any programming language.

We also need a formal definition of a prime number. For this exercise, here it is:

A prime number is an integer that is evenly divisible by exactly two distinct positive integers: one, and itself.

From this definition, we can see that neither zero nor one is a prime number, and that a negative number cannot be a prime number.

Now, on to the task at hand. You have been given this code:

#include 
using namespace std;

// Function definition
bool isPrime(int);

// The upper boundary
const int PRIMES_UP_TO = 100;

int main()
{
    cout << "Prime numbers from 1 to " << PRIMES_UP_TO << ":" << endl;
    for (int i = 1; i <= PRIMES_UP_TO; i++)
    {
        if (isPrime(i))
            cout << i << " ";
    }
    cout << endl;
    return 0;
}

Your task is to implement the isPrime function.

Trial Division

The most straightforward algorithm is to simply divide the target number by every number below it, starting with two. If any of those numbers evenly divides the target number, then the target number is not prime, and we return false. If none of those numbers divide it, it is prime, and we return true. From our formal definition, we return false if it is less than two.

Here is the code for that algorithm:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    for (int i = 2; i < target; i++) {
        if (target % i == 0)
            return false;
    }
    return true;
}

As is the case with most simple algorithms, it is very inefficient. For each prime target number n that we test (above 2), we divide by every number between two and itself, so the efficiency is T(n) = n - 3 = O(n).

Changing the upper bound

We can improve this algorithm with a little thought. When the target number fails to be divisible by a specific number, i, we know that it can’t be divisible by any number above n / i. Let’s say that i = 2. There is no number between n and n / 2 that evenly divides that number. If there was, then dividing by that number would give us an integer between 1 and 2, of which, of course, there are none.

This gives us our first optimization. With each iteration, we can decrease the upper bound by setting it to the target number divided by the iteration number. Here is the new code:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    for (int i = 2, bound = target; i < bound; bound = target / i, i++)
    {
        if (target % i == 0)
            return false;
    }
    return true;
}

(Note: remember to divide bound before you increment i in the for-loop. The first time I wrote this code, I forgot this.)

Now, the astute among you (which I’m hoping is most of you) will recognize something. When the target number is a prime number, the upper bound will be the integer square root of the target number when the loop finishes. (The integer square root is the greatest integer less than or equal to the square root). To see this, let’s step through the algorithm when num = 29. This is a prime number, and its integer square root is 5.

i = 2: bound = 29
i = 3: bound = 29 / 2 = 14
i = 4: bound = 29 / 3 = 9
i = 5: bound = 29 / 5 = 5
i = 6: i &gt; 5, loop terminates

“But wait,” I hear you asking, “wouldn’t it be more efficient to just set the upper bound to the square root, rather than doing all of these integer divisions?”

Excellent question. And the answer would be yes, if taking the square root was an O(1) operation. As it turns out, it is usually not. To get a square root, most math library implementations use some variation of Newton’s method to find a floating-point approximation, accurate to the number of digits that can be expressed by a double type. While this is a perfectly fine algorithm, it’s complete overkill for our purposes. I do not know of any way to compute an integer square root that is any more efficient than the one we’re already using.

Even with the extra integer division in each iteration, it is a vast improvement. Our worst-case time is now T(n) = 2√n = O(√n).

With this improvement, we are now using what is called the trial division algorithm. But we can still do better.

Odd Iteration

At the moment, we are doing trial division with every number between 2 and √n. But we don’t need to do this. If we already know the target number is not divisible by 2, we also know that it is not divisible by 4, 6, 8, or any other even number. So we can do away with dividing by any even number once we have already tried 2.

We do this by explicitly testing whether the target number is divisible by 2, and returning false if it is. To test only odd numbers, the loop iterator starts 3 instead of 2, and increments by 2 each time.

Here’s the new code:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    if (target == 2)
        return true;
    if (target % 2 == 0)
        return false;
    for (int i = 3, bound = target; i < bound; bound = target / i, i += 2)
    {
        if (target % i == 0)
            return false;
    }
    return true;
}

Can we do any better if we increment by a greater amount? That is, is there some magic formula that lets us skip, say, every third number? After all, the same logic applies to numbers that are divisible by 3 – if we know the target number is not divisible by 3, then we know it’s also not divisible by 6, 9, 12, and so forth.

If we are attempting to do this in the update expression of the for-loop (instead of i += 2), then the answer is no. (You’ll see why in a moment.) We will need some other way to accomplish this task.

The Sieve of Eratosthenes

The earliest known algorithm that does what we need was created by Eratosthenes of Cyrene, the Greek mathematician who lived from (roughly) 276 BC to 195 BC. His algorithm creates a sieve: an enumeration of all the numbers between 2 and n, which are incrementally marked as multiples of some other number. The numbers that remain unmarked (“pass through the sieve”) are the ones we use for our trial division.

Here is the exact algorithm:

  1. Create a list of unmarked numbers between 2 and n.
  2. Start with i = 2.
  3. Mark each integer multiple of i in the sieve. Do not mark i itself.
  4. Increment i.
  5. If i is greater than n, then we are done; go to Step 7.
  6. If i is unmarked in the sieve, go to Step 4 (to skip it). Otherwise, go to Step 3 (to mark its multiples).
  7. If n is unmarked in the sieve, it is prime. Otherwise, it is a composite.

A little reflection about this algorithm will help clarify the trial division algorithm we used previously. We want to eliminate trial division by any number that is not a multiple of any previous numbers; this means that it would be unmarked in Step 6. In Step 7, we are saying that any unmarked number (so far) is a prime number. Therefore, we should only do trial division by prime numbers.

This is why we can’t simply increment i by more than two. There are many pairs of prime numbers – even very large prime numbers – whose difference is two. The mathematical term for them is twin primes, and there is a conjecture (so far unsolved) that there are infinitely many of them.

Back to the sieve. Implementing the sieve algorithm requires some decision making, but not too much. Since we’re only using the sieve to mark numbers, we can use an array of Booleans to represent the sieve. The index of the array is the number; its Boolean value represents whether that number is marked or unmarked.

This array is dynamically allocated so that its physical size matches the target number. Normally, when dynamic arrays of primitive types are created in C++, they are not initialized to any particular value. If we want to initialize the array, we need to loop through all of its elements, and explicitly assign the initial value to each element. But there is a C++ trick that we can use. If we initialize the array and append an empty set of parentheses, each element is initialized to “zero” (in the case of Boolean arrays, to false). This means a Boolean value of false represents the unmarked state; to mark the number, we set its Boolean value to true.

There is no such technique to initialize values to true, rather than false. So, I named the array “comp” (for “composite”), as this is actually what the Boolean value represents. That is, by the time we’re done, comp[4] will be true, since 4 is a composite number.

Once we’re done with the sieve, we need a temporary variable to hold the value of comp[target]. If we simply returned the value of comp[target] directly, we could not delete the array from memory, and we would get memory leaks. Conversely, we cannot access the array after we delete it. Since we are going to return the negation of the sieve’s Boolean value (we’re returning true if it’s prime, not if it’s a composite number), it’s convenient to do this when we’re initializing the temporary variable.

With all that out of the way, we’re now ready to write our code:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    // Allocate a new sieve, initialized to false
    bool* comp = new bool[num + 1]();
    // Loop over the sieve
    for (int i = 2, bound = target; i < bound; bound = target / i, i++)
    {
        if (!comp[i])
        {
            for (int j = i + i; j <= target; j += i)
                comp[j] = true; // Mark the number
        }
    }
    // If the number is not a composite, it is prime
    bool prime = !comp[num];
    delete[] comp;
    return prime;
}

The Sieve of Erasothenes algorithm is very efficient. Not only are we eliminating any integer divisions above n in the outer for-loop, but all integer division inside the loop. If we ignore my “cheat” and count operations that are not integer divisions, it’s still very efficient, because the number of iterations in the inner loop (the “j loop”) decrease each time.

But we can do a little better.

Improving the Sieve

The first improvement is rather obvious. If we mark our target number, we know immediately that it is not prime. If this happens, we can return false immediately (or almost immediately; we need to delete the array from memory first). On the other hand, if we have completed all iterations without returning, we know the target number is prime, so we return true (after deleting the array). This also eliminates the need for a temporary variable.

The second improvement requires a little thought. If we are doing our trial division by a number, i, we know it is prime, so we know it is not evenly divisible by any number less than itself. So, the next number we need to cross off is actually the square of that number, since 2 * i, 3 * i, and so on, will have been crossed off when we reached 2, 3, etc.

There is one more slight improvement we can make. In our original code, we did an integer division (for the upper bound) on each iteration. This is unnecessary; we can insted do that division only if we are dividing by a prime number, and simply increment i otherwise. An inner while-loop achieves this easily.

Here’s our improved code:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    // Allocate a new sieve, initialized to false
    bool* comp = new bool[num + 1]();
    // Loop over the sieve
    for (int i = 2, bound = target; i < bound; bound = target / i, i++)
    {
        // Skip numbers that are composites
        while (comp[i])
            i++;
        // Start at i squared, increment by i
        for (int j = i * i; j <= target; j += i)
        {
            // If target will be set to true, return now
            if (j == target)
            {
                delete[] comp;
                return false;
            }
            comp[j] = true;
        }
    }
    // If we made it this far, the number is prime
    delete[] comp;
    return true;
}

This does make our algorithm more efficient. But I’m betting that many of you don’t think it’s all that great. “Yeah, OK,” you’re saying, “it saves us a couple of operations. But each time you’re calling the function, you’re creating this huge array. It seems like a total waste of memory.”

Indeed it is. While this algorithm is efficient in the time domain, it is not at all efficient in the space domain. There are certain things we can do to use up less memory (e.g. using bits or flags instead of Booleans), but no matter what, we have a linear relationship between memory space and input size, or O(n).

All of this is to avoid trial division of numbers that aren’t primes. It seems like there should be a better way to do it. If only there were some sort of function that could tell us whether a number is prime or not…

Oh. Right.

Using Recursion

The solution seems easy enough. On each iteration, we test to see whether i is a prime number, by recursively calling the isPrime function itself. Forget the business with the sieve, we can just use the improved version of our trial division algorithm.

If we do this, there is one “gotcha” that we have to keep in mind. Without modification, we get an infinite recursion when we call isPrime(3). Since we are testing for primeness before we test for divisibility, we never test whether 3 is evenly divisible by 3, and instead just keep recursively calling isPrime(3) until we get a stack overflow.

But this is easily fixed by making 3 a base case. Here is the code:

bool isPrime(int target)
{
    if (target < 2)
        return false;
    if (target == 3) // Avoids infinite recursion when i == 3
        return true;
    if (target % 2 == 0)
        return false;
    for (int i = 3, bound = target; i < bound; bound = target / i, i += 2)
    {
        if (isPrime(i) && (target % i == 0))
            return false;
    }
    return true;
}

So, is this code more efficient than our other algorithms?

No. In fact, not only is it inefficient, it is astronomically inefficient. To see why, let’s go through the recursive calls when we invoke isPrime(41). It does a recursive call with 3; then a recursive call with 5, which also does a recursive call with 3; then a recursive call with 7, which again does a recursive call with 3. Those are a lot of duplicate recursive calls. And the larger the target number, the worse it gets.

The square-root upper bound lessens the problem, but does not eliminate it. In fact, this algorithm is sub-exponential: O(2n). It is even worse than the algorithm that we started with.

But there is a solution to this. That solution is (or borrows from) a technique called dynamic programming.

Dynamic(ish) programming

With dynamic programming, we avoid redundancy by storing the results of previous recursive calls in an external data structure. The function first looks in that data structure to see if a solution has already been found. If so, it returns the solution. If not, it recursively calculates the solution, and stores that solution in the data structure before returning it.

Dynamic programming is incredibly useful. It is used in graph traversal, game AI, matrix multiplication, finding substrings, and more; it should be used any time there’s a possibility of doing more than one recursive call with the same argument.

The algorithm we will use is not quite dynamic programming. Instead, what we need is simply an array of prime numbers below n that we can use for our trial division. It works like this: we iterate through the already-calculated array of prime numbers. When we reach the last value that is already calculated, we recursively try increasing odd numbers until we find one that is prime. At that point, we add it to the array, and continue.

(As an aside: Given our main function, we could take a different approach. We would not explicitly check to see whether the test value is the last in the array of primes. Instead, we test only the primes currently in the array. We would only add to the array if we determine that the current target number is prime. This approach works in this particular program, but only because we know we are testing all the integers in increasing order. It would not work in the general case, where you may be asked to calculate the primeness of (say) 12,345 when the function is first called.)

Because the state of the array is independent of the function, I created a global array, and a global variable that holds the index of the last prime number stored in it. (Yeah, I know globals are terrible, but doing otherwise would over-complicate the code.) The first prime should be initialized to 2; and if we want to increment by two (to skip even numbers) when searching for the next prime, we need to also initialize the second prime to 3. If we are using static arrays in C++, we can specify the first two values, and the rest will then be initialized to “zero” (false).

Here is the code:

int primes[PRIMES_UP_TO] = {2, 3};
int last = 1;

bool isPrime(int target)
{
    if (target < 2)
        return false;
    for (int i = 0, bound = target; primes[i] < bound; bound = target / primes[i], i++)
    {
        if (i == last)
        {
            // Add the next prime
            int next = primes[last] + 2;
            while(!isPrime(next))
                next += 2;
            primes[++last] = next;
        }
        if (target % primes[i] == 0)
            return false;
    }
    return true;
}

We could make this a true dynamic programming implementation by first checking to see whether target is already in the primes array before testing any numbers at all. Since the array holds prime numbers in increasing order, the most efficient way to check this is with a binary search through the array. But this is rather complicated for our purposes, so I will leave this to you to implement if you are really bored.

Not only are we eliminating any integer divisions above n, we are eliminating all integer divisions for numbers that aren’t prime numbers. The exact number depends upon the ratio of prime numbers to composite numbers between 2 and n. That ratio actually depends upon n itself. According to the prime number theorem, the chance of any random number between 0 and n being prime is roughly 1/ln(n) (one over the natural log of n). This is an asymptotic distribution; the actual ratio of primes to composites will not be exactly 1/ln(n), but it will approach it for very large values of n. (For example, there are actually 25 primes below 100, not 21 as the formula suggests.)

Recall that big-O notation does not distinguish between logarithmic bases. So, the asymptotic worst-case time complexity of this algorithm is O(√n / log(n)).

This algorithm also saves a huge amount of memory space relative to the sieve algorithm. We are only storing the number of primes up to n, so our memory space efficiency is also O(√n / log(n)). To show how much this saves, here’s an example. The 10,000th prime is 104,729. The sieve algorithm uses an array with over a hundred thousand elements. But with our “dynamic programming” algorithm, the array only needs roughly 28 elements.

But maybe we can do better. To find out, let’s go back to the sieve algorithm.

The Sieve of Eratosthenes (memoized version)

Our main concern with the original sieve algorithm is that it was constantly allocating and destroying a large amount of memory with each function call. But we can also make the sieve global. This means that we can do away with dynamically allocating the array, and just create one array that holds enough memory for every number we’re going to test throughout the entire program (here, 100).

The array remains uninitialized untill we call the function for the first time. At that point, we fill in the entire array using Eratosthenes’ algorithm. Once the array is filled, we simply look up the target number.

This seems like an example of dynamic programming, but technically it’s not: the term “dynamic programming” is used when we store the solutions to overlapping, recursively calculated sub-problems. Here, we are storing all the calculated values in one go, and doing it without any recursion. The specific term for this is memoization.

In our implementation, we need some way to tell if the array has been filled. We could create a Boolean variable for this, but since we never use the array index zero, we can use that instead.

Here is the code:

bool composite[PRIMES_UP_TO + 1] = {false};
bool isPrime(int target)
{
    // Is composite[0] still initialized? If so, we haven't done the sieve yet
    if (!composite[0])
    {
        composite[0] = true;
        // Loop over the sieve
        for (int i = 2, bound = PRIMES_UP_TO; i < bound; bound = PRIMES_UP_TO / i, i++)
        {
            if (!composite[i])
            {
                for (int j = i * i; j < PRIMES_UP_TO; j += i)
                    composite[j] = true;
            }
        }
    }
    // If the number is not a composite, it is prime
    return !composite[num];
}

In terms of time efficiency, this is the best algorithm of all. Filling up a 100-element array only requires that we mark the indexes that are multiples of 2, 3, 5, and 7. Furthermore, once the array is filled, each subsequent function call is an O(1) operation.

Of course, our space efficiency hasn’t changed: it is O(n). We are, at least, spared the overhead of allocating and de-allocating memory with each function call.

So, the dynamic programming algorithm is better in space efficiency, but worse in time efficiency. Deciding which algorithm to use depends upon whether processing speed or memory usage is more important. It’s the programmer’s version of that old chestnut: “Fast, good or cheap – pick two.”

Overall efficiency

So far, we have only been examining the efficiency of our algorithms by examining a single call to the isPrime function when passed a prime number. But that tells us nothing about the efficiency of the entire program. We don’t only call the function with prime numbers; instead, main calls the function with every number up to 100. So, what are the big-O efficiencies of our algorithms?

I can’t tell you. I don’t mean I won’t tell you, or that the answers don’t exist. I mean I simply can’t do the math.

But I can at least tell you what the math is.

Let’s formalize the problem. We use the isPrime function to check every integer between 0 and some sufficiently large number, which we’ll call N. Each individual number in that range that we check for primeness, we’ll call n. As above, the function uses a trial division algorithm that divides only by prime numbers. It halts either when an even divisor is found (if n is composite), or when the square root of the argument is reached (if n is prime). We consider only the number of trial divisions, and assume each trial division is an O(1) operation.

We know the efficiency when the function is passed a prime number, but these cases are only a small number of the total cases (vanishingly small, by the prime number theorem) for large values of N. In fact, fully half of our numbers (the even numbers) are evenly divisible by 2, and the function halts after a single operation.

But what about the rest? Any number that halts after a trial division by 3 has halted after two operations. But we can’t say that number is N / 3, because many of those trials (half, in fact) will have already halted after a trial division by 2. Similarly, if we count the number of function calls that halt after three operations, we can only count those that are divisible by 5, but not divisible by 2 or 3.

This shows us how to procede. For the function to halt after i trials when passed n, the ith prime number needs to be the lowest prime divisor of n. In mathematical notation, “the ith prime number” is denoted pi.

Now the question is this: For what values of i is pi the lowest prime divisor of numbers in the range of N?

Our thoughts about 2 and 3 give us the answer. For i = 1, where pi = 2, we start with 2 and skip every second number (the even numbers). When i = 2, pi = 3, we start with 3. But we don’t skip every third number, we skip every other third number. That is, we skip six numbers, so the lowest prime divisor is three when n = 3, 9, 15, 21, .... When i = 3, pi = 5, and we skip every number divisible by 5 but not divisible by 2 or 3. This means we skip 5 * 3 * 2 = 30 numbers, so the lowest prime divisor is five when n = 5, 35, 65, 95, .... And so on.

In mathematical terms, the number thirty is the primorial of 5. A primorial is similar to a factorial (hence the name), but you only include prime numbers. For example, the primorial of thirteen is 13 * 11 * 7 *5 * 3 * 2 = 30030. The notation for the primorial of the ith prime number is pi#.

This means that for any large N, the number of values of n for which pi is the lowest prime divisor, is (N / pi#). (For discrete values, we use integer division.) For each of those values, the function performs i trial divisions before halting.

So, for example, when N = 100 and i = 3, there are (100 / 5#) = 3 numbers that have five as the lowest prime divisor, and where the function returns after three trial divisions. As we saw above, those numbers are 35, 65, and 95. We do not count 5 itself, since it is a prime number.

This gives us a formula for our algorithm efficiency. We sum up (N / pi#) for i > 0, stopping when pi > √N. Here it is in mathematical notation, to the degree that I can format it in HTML:

 Σ    i(N/pi#)
i∈(pi<√N)

…And that is where my math skills fail me. I’m sure there is a way to calculate this sum, or at least approximate it using integration, but I have no idea what it is.

Hopefully the mathemeticians among you will be able to figure it out.

Advertisements

About Karl

I live in the Boston area, and am currently studying for a BS in Computer Science at UMass Boston. I graduated with honors with an AS in Computer Science (Transfer Option) from BHCC, acquiring a certificate in OOP along the way. I also perform experimental electronic music as Karlheinz.
This entry was posted in Programming and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s