Efficiently find primes below a given number. An often required task when solving programming puzzles and challenges. Runtime is discussed. Example algorithms are provided.
As a programmer who likes math and puzzles, I’ve spent a lot of time answering questions on Project Euler. While writing solutions to the first fifty problems, I often needed to find all primes below a given number. The
go-lang code for this article can be found here.
Here’s the first algorithm I wrote to find primes less than
Ok, you got me, that wasn’t actually the first one, but it was the first one that worked, yet it didn’t work for long. As soon as the input numbers got large, its runtime grew in, what I later learnt was, linear time.
Linear time, in my own words, is when the runtime of an algorithm grows at the same rate as the size of the input. Let’s take a look at what kinds of numbers a linear time algorithm can handle:
The table above shows how long it takes to generate all the primes under
n using the above algorithm.
The algorithm works for numbers up to about ten thousand; after that, we see the algorithm takes a few seconds for numbers around one-hundred thousand and several minutes for numbers of size one million or larger.
My original solution didn’t run anywhere close to fast enough when questions involved numbers around
one million. Luckily, after some research, I came across the following shortcut:
To determine if a number
pis prime, one only has to check if it has any divisors from 2 to sqrt(n) inclusive. If
bare factors of
a <= bthen
amust be less than or equal to sqrt(p).
Since it’s only necessary to check numbers below
sqrt(p), I wrote a new algorithm that ran much faster:
I mentioned linear time previously. This new algorithm bounded by
sqrt(n) runs in
sqrt(n) time. And, if we want to use this improved algorithm to find all the primes under
n, we’ll have to call it
n times making the total runtime
n * sqrt(n).
|10,000||56 ms||1.3 ms|
|100,000||4.5 sec||36 ms|
|1,000,000||5min 40sec||680 ms|
Now, here’s where things get interesting. The above
sqrt(n) shortcut made my code fast enough that I could successfully use it whenever generating primes was required. It was fast enough that for a long time I never searched for anything faster. Yet, in form posts on the topic, people always mentioned something called the
Sieve of Eratosthenese.
The Sieve strategy for finding primes less than a given number is slightly different then what I’ve done so far in this article. Instead of looping over numbers from
0 to n and checking for divisors from
2 to sqrt(n), we start at the number
2 and remove all multiples of
4 to n. Then, we remove all multiples of
3 and so on. Once all multiples have been removed, the numbers left over are primes.
In terms of speed, I’d heard a Sieve was faster, but I never considered how much faster it really was until writing this post.
Here’s the code for a prime number Sieve:
Let’s look at the runtimes in a table compared to the
Improved algorithm from before:
|10,000||56 ms||0.1 ms|
|100,000||4.5 sec||1.5 ms|
|1,000,000||5min 40sec||18 ms|
The runtimes show that a sieve is significantly faster. Let’s think about why that might be. My original non-sieve algorithm in the worst case scenario would have to check
n times. If we say that
n is one million, that’s
one billion operations in the worst case.
Now lets think about how many operations the sieve method does. If our number
n is once again one million, we would loop over
(1 000 000 / 2) multiples of 2,
(1 000 000 / 3) multiples of 3, and so on. This forms the following sequence:
1m/2 + 1m/3 + … + 1m/1m or
1m * (1/2 + 1/3 + … + 1/1m).
Here’s a python program to find that sum:
sum([1/x for x in range(2, 1000000+1)])
The answer is
~13.4 million. That’s only
1.34% of the operations required by the non-sieve method. Based on the measurements above the sieve method is about
30 times faster when
n is one million and does only
1.34% of the work.
Now, we might ask, in the time it takes the non-sieve method to generate all the primes under one million (~680ms) how many primes can the sieve method find.
Let’s use a binary search technique to find this number. In the table below, I start by asking my code how long it takes to find all the primes under 100 million. I’ll determine the next value I search based on how much time it took for the previous code to run. With a search space of 100million, it would take log2(100m) = 26 queries to find a final value.
|1||100 million||10 sec||Too high|
|2||50 million||3 sec||Too high|
|3||25 million||1.6 sec||Too high|
|4||12.5 million||702 ms||Too high|
|5||6 million||227 ms||Too low|
|6||9.25 million||409 ms||Too low|
|7||10.8 million||522 ms||Too low|
|8||11.65||622 ms||Too low|
|9||12 million||~680 ms||-|
We can find all primes under 12 million while a non-sieve algorithm finds only those under one million! There’s
788060 primes under 12 million, and, if we wanted to let our algorithm run for 10 sec with
n = 100 million we could find the more than Five million primes! (specifically
A Sieve can be implemented in only a few lines of python. So, a Sieve is just as quick to implement as the other methods.
In this article, we found that a loop-based prime finding algorithm works well enough for low numbers (under one million). However, a Sieve can be equally quick to implement, runs an order of magnitude faster, and can find primes as large as five million in ~10sec.