Sieve of Eratosthenese

Efficiently find primes below a given number. An often required task when solving programming puzzles and challenges. Runtime is discussed. Example algorithms are provided.

As a programmer who likes math and puzzles, I’ve spent a lot of time answering questions on Project Euler. While writing solutions to the first fifty problems, I often needed to find all primes below a given number. The go-lang code for this article can be found here.

My First Solution

Here’s the first algorithm I wrote to find primes less than n:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func isPrime(n int) bool {

    if n <= 1 { return false }
    if n == 2 { return true }

    // numbers with divisors are not prime
    for i := 2; i < n; i++ {
        if n % i == 0 {
            return false
        }
    }

    return true
}

Ok, you got me, that wasn’t actually the first one, but it was the first one that worked, yet it didn’t work for long. As soon as the input numbers got large, its runtime grew in, what I later learnt was, linear time.

Linear time, in my own words, is when the runtime of an algorithm grows at the same rate as the size of the input. Let’s take a look at what kinds of numbers a linear time algorithm can handle:

n time
10,000 56 ms
100,000 4.5 sec
1,000,000 5min 40sec

The table above shows how long it takes to generate all the primes under n using the above algorithm.

The algorithm works for numbers up to about ten thousand; after that, we see the algorithm takes a few seconds for numbers around one-hundred thousand and several minutes for numbers of size one million or larger.

Improved Algorithm

My original solution didn’t run anywhere close to fast enough when questions involved numbers around one million. Luckily, after some research, I came across the following shortcut:

To determine if a number p is prime, one only has to check if it has any divisors from 2 to sqrt(n) inclusive. If a and b are factors of p and a <= b then a must be less than or equal to sqrt(p).

Since it’s only necessary to check numbers below sqrt(p), I wrote a new algorithm that ran much faster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func isPrimeImproved(n int) bool {

    if n <= 1 { return false }
    if n == 2 { return true }

    // a number is prime if it has no divisors
    // between 2 and sqrt(n) inclusive
    for i := 2; i <= int(math.Sqrt(float64(n))); i++ {
        if n % i == 0 {
            return false
        }
    }

    return true
}

I mentioned linear time previously. This new algorithm bounded by sqrt(n) runs in sqrt(n) time. And, if we want to use this improved algorithm to find all the primes under n, we’ll have to call it n times making the total runtime n * sqrt(n).

n Original Improved
10,000 56 ms 1.3 ms
100,000 4.5 sec 36 ms
1,000,000 5min 40sec 680 ms

Things get Interesting

Now, here’s where things get interesting. The above sqrt(n) shortcut made my code fast enough that I could successfully use it whenever generating primes was required. It was fast enough that for a long time I never searched for anything faster. Yet, in form posts on the topic, people always mentioned something called the Sieve of Eratosthenese.

The Sieve of Eratosthenese

The Sieve strategy for finding primes less than a given number is slightly different then what I’ve done so far in this article. Instead of looping over numbers from 0 to n and checking for divisors from 2 to sqrt(n), we start at the number 2 and remove all multiples of 2 from 4 to n. Then, we remove all multiples of 3 and so on. Once all multiples have been removed, the numbers left over are primes.

In terms of speed, I’d heard a Sieve was faster, but I never considered how much faster it really was until writing this post.

How Much Faster is a Sieve

Here’s the code for a prime number Sieve:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
func primeSieve(n int) []bool {

    // i-th number is prime if i-th element is true
    rtn := make([]bool, n)

    for i := range rtn {
        rtn[i] = true
    }

    for i := range rtn {

        // zero and one are non prime and
        // numbers with divisors are non-prime
        if i <= 1 {
            rtn[i] = false

        } else {

            // flag all non-prime values (multiples of k)
            var k int = i
            for {
                k += i
                if k >= n { break }
                rtn[k] = false

            }
        }
    }

    return rtn
}

Let’s look at the runtimes in a table compared to the Improved algorithm from before:

n Improved Sieve
10,000 56 ms 0.1 ms
100,000 4.5 sec 1.5 ms
1,000,000 5min 40sec 18 ms

The runtimes show that a sieve is significantly faster. Let’s think about why that might be. My original non-sieve algorithm in the worst case scenario would have to check sqrt(n) numbers n times. If we say that n is one million, that’s one billion operations in the worst case.

Number of Operations

Now lets think about how many operations the sieve method does. If our number n is once again one million, we would loop over (1 000 000 / 2) multiples of 2, (1 000 000 / 3) multiples of 3, and so on. This forms the following sequence: 1m/2 + 1m/3 + … + 1m/1m or 1m * (1/2 + 1/3 + … + 1/1m).

Here’s a python program to find that sum:

sum([1/x for x in range(2, 1000000+1)])

The answer is ~13.4 million. That’s only 1.34% of the operations required by the non-sieve method. Based on the measurements above the sieve method is about 30 times faster when n is one million and does only 1.34% of the work.

Finding Large Primes

Now, we might ask, in the time it takes the non-sieve method to generate all the primes under one million (~680ms) how many primes can the sieve method find.

Let’s use a binary search technique to find this number. In the table below, I start by asking my code how long it takes to find all the primes under 100 million. I’ll determine the next value I search based on how much time it took for the previous code to run. With a search space of 100million, it would take log2(100m) = 26 queries to find a final value.

Step n Time high/low
1 100 million 10 sec Too high
2 50 million 3 sec Too high
3 25 million 1.6 sec Too high
4 12.5 million 702 ms Too high
5 6 million 227 ms Too low
6 9.25 million 409 ms Too low
7 10.8 million 522 ms Too low
8 11.65 622 ms Too low
9 12 million ~680 ms -

We can find all primes under 12 million while a non-sieve algorithm finds only those under one million! There’s 788060 primes under 12 million, and, if we wanted to let our algorithm run for 10 sec with n = 100 million we could find the more than Five million primes! (specifically 5761455 primes)

Python Code for a Sieve

A Sieve can be implemented in only a few lines of python. So, a Sieve is just as quick to implement as the other methods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def primeSieve(n):
    # Returns a boolean array where
    # index 'i' is true if 'i' is prime

    p = [True for x in range(n+1)]
    p[0] = False
    p[1] = False

    for i in range(2, n+1):
        for k in range(2*i, n+1, i):
            p[k] = False

    return p

Conclusion

In this article, we found that a loop-based prime finding algorithm works well enough for low numbers (under one million). However, a Sieve can be equally quick to implement, runs an order of magnitude faster, and can find primes as large as five million in ~10sec.