Understanding Basic Ideas Behind Optimised Code and Erastosthenes Sieve

As I have stated before, I'm not programmer by any means – I just don't have the ability, but I do appreciate the skills required to be one.

I also appreciate there has been a lot of maths, engineering, systems analysis and other technical brainpower that has, historically, gone into the theoretical and practical aspects of computer science, that decipher what factors make a program run well; much depending on what trade offs have to be made between importance of outcome over timescales, equipment requirements, and costs thereof in terms of man hours and money, for example.

I only got thinking about these aspects of program efficiency when playing around with the Java and C language prime number programme examples I have used in prior posts, and finding similar versions for Python, and was astounded by the performance differences in how much faster the Python programmes "apparently" ran for such small programmes that are only running 1 or 2 line "for" or "while" loops of some sort – mistakes by me possibly accounting for this behaviour…

This made me look at a good WikiP page on the many factors that contribute to a programmes performance, which I found quite an easy and interesting read:


If there had been a few seconds between all 3 language programmes, I would not have thought more of it, but the Python programmes are counting primes up to the 100,000 range in 1-5 seconds, compared to the C and Java versions taking 45-60 seconds.

That seems a considerable time difference, even to me – a complete programming "know-nothing", though I DO know that each program is not exactly the same, but are the differences that great – even for programs of such few lines?

All are run from the command line on the same linux Mint laptop. Should there really be that much difference? Yes, of course! A BIG difference.

If one line in a program is a loop of some kind then immediately, you have all sorts of time consuming iteration related possibilities. If there are 2 loops, and maybe interdependent, you can imagine where that may lead.

First of course, how do you make different programmes as "equivalent" to each other as possible, in terms of available syntax, functions or other script based features – before even thinking about behind the scene actions like available libraries etc. to get all text script versions approximately equal, before you can "time" them running the same operation, once compiled?

Are they then even equivalent? Is there a method to be able to define this concept?

I thought at least it would be reasonably easy to translate almost a line for line equivalent for each programme, but, not being a programmer, this is not even an easy start.

First, I have to choose a reference language version of a chosen programme, that is written on as few lines as possible, that can do a reasonably complex iteration, such as a calculation to generate prime numbers, then try to get at least "textual" equivalents for all 3 languages – C, Java and Python.

One definite design requirement is to be able to input manually from the keyboard, the maximum range of numbers to be run, as in the C programme version that contains the printf and scan functions for input, and extra an printf for understanding operation below:

#include <stdio.h>
int main()
int i,j;
int a;
int input;
printf("Input prime limit eg < 1000..");
scanf("%d", &input);
for( i=2; i<=input; i=i+1 )
for(j=2; j<i; j=j+1)
if (i%j == 0)
if (a == 0)
printf("%d ", i);

return 0;

This outputs:

Input prime limit eg < 1000..11
Prime 2
Prime 3
Prime 5
Prime 7
Prime 11

With an extra printf's in the for 2nd loop to show the divisor j function:

printf("i is %d\n; j is %d\n ", i, j);

i is 3 j is 2
i is 4 j is 2
i is 4 j is 3
i is 5 j is 2
i is 5 j is 3
i is 5 j is 4
i is 6 j is 2
i is 6 j is 3
i is 6 j is 4
i is 6 j is 5
i is 7 j is 2
i is 7 j is 3
i is 7 j is 4
i is 7 j is 5
i is 7 j is 6

It can be seen that this is very repetitive design, using a relatively "inefficient" algorithm, as each test prime is checked by the divisor starting back at 2 for each increment in the test number so will get progressively "exponentially slower" the larger each test number becomes.

Putting printf's strategically shows values at each loop cycle aiding understanding of a programmes operation.

Rather than spend time trying to modify my own bad examples, I thought it better to search the web for various examples that have what I want already.

Now the notion of "inefficient" programme design is seen, in the form of my old C example, (written by my lecturer at the time to try to point out some aspect that was lost on me anyway), compared to this similar version below, that I found here:


That programme can actually output up to 10000 range primes in a few seconds on my 2007 dual core laptop, similar to Python, whereas my program takes about 45 seconds!

99839 99859 99871 99877 99881 99901 99907 99923 99929 99961 99971 99989 99991
real 0m7.077s
user 0m1.404s (AMD A8 quad core)
sys 0m0.007s

The Python version here is from the same site, so the guy has done the language conversion work for me, but this program takes ages!! It more like it just printing out consecutives to 100,000!


# Python program to ask the user for a range and display all the prime numbers in that interval

# take input from the user

lower = int(input("Enter lower range: "))

upper = int(input("Enter upper range: "))

for num in range(lower,upper + 1):

# prime numbers are greater than 1

if num > 1:

for i in range(2,num):

if (num % i) == 0:





So, even at this point it's about the programming tools AND overall method used to find primes in the first place, because there are more ways to do the same thing, and from the WikiP article, some more "elegant", more logical, faster, easier to understand, more resource intensive…etc.

First I'll see if I can trim my C program down from unwanted lines, etc. comparing it to the Java version and make them as "textually" and functionally equivalent as I'm able.

it is possible to automatically "time" a program by using "time" before the command, compiled prog or BASH script e.g.:

time python primes1.py

time \./optsievebash.sh 999999


#include <stdio.h>

int main()


int i,j,a,input;

printf("Input prime limit eg < 1000..");

scanf("%d", &input);

for( i=2; i<input; i=i+1 )



for(j=2; j<i; j=j+1)

if (i%j == 0)


if (a == 0)

printf("%d ", i);


printf(" Done\n");

return 0;



99907 99923 99929 99961 99971 99989 99991
real 0m54.334s
user 0m47.852s (on Acer dual core T550 CPU, 2G ram)
sys 0m0.284s
stevee@Mint5630 ~/Documents/BashScripts $ time \./prime.sh
Input prime limit eg < 1000..100000

user 0m28.256s (on HP AMD8 quad core CPU, 8G ram) 


apt-get install openjdk-7-jdk

javac primes.java

time java primes


//** primes.java

import java.util.Scanner;

public class primes


public static void main ( String args[ ] )


Scanner stdin = new Scanner ( System.in) ;

int i,j,a;

System.out.println("Input prime upper limit eg < 1000..");

int limit = stdin.nextInt();

for( i = 2 ; i <= limit ; i = i + 1 )



for( j = 2 ; j < i ; j = j+1 )

if ( i % j == 0)


if (a == 0)

System.out.print( "," + i );


System.out.print( " Done\n" );




99907,99923,99929,99961,99971,99989,99991 Done

real 1m3.083s
user 0m49.248s
sys 0m0.436s
stevee@Mint5630 ~/Documents/BashScripts $ time java primes
Input prime upper limit eg < 1000…100000

user 0m31.637s (AMD A8)

So, not a lot in it between those two above – 47.852s and 49.248s in real time resp.

What was interesting was watching how each displayed the output. The C prog seemed to freeze periodically, where it stored then displayed chunks of output in stages, but the Java prog seemed faster to watch, as the numbers were output continuously, looking like it was running faster, but it wasn't overall.

Now, to find how to write the " equivalent in Python. I found this for now:


primes.py (131 secs)



99907 99923 99929 99961 99971 99989 99991 Done
real 2m23.075s
user 2m17.060s
sys 0m0.560s
stevee@Mint5630 ~/Documents/BashScripts $ time python primes.py
What number should I go up to? 100000

user 4m19.020s (AMD A8) ??? Can't python utilise 4 cores?

(Ah, I'm running an i386 kernel from an external drive on a quad core! Duh…)

Now it's surprising! This python script may not be as exactly "equivalent" as the other two, but it takes over two mins and ten secs to print the same numbers, using what "appears" (to me) the same basic loops and method, but is slower by half, at least. (And even worse with the quad core?)

My initial assumption about Python that got me into this, now seems completely wrong, or did I mistakenly run a "different" type of python prime generator when tired last night? It seems so, as you will see.

Of course, there may be a lot more to it than such a crude comparison – if that it what this is – because I have no technical knowledge how the different languages are implemented in linux – memory space usage, default niceness values etc.

Although the scripts look like they are doing the same job, in terms of calculation and iterations, and all have the two nested "for" loops – are they really? Is the output identical in terms of producing correct primes?

Judging by the last few line of numbers in the 999xx range, the programs are outputting the same numbers, so without further checks I'll assume they are doing the same tasks.

Even so, how did I think python was MUCH faster last night? Well, some algorithms are much faster than others, so it's also about the mathematical method or calculations used to produce primes in the first place, before it is even translated into a script.

One of the oldest known methods – and fastest for programs it seems – is called the Sieve of Erastosthenes, and you want to understand it's logic first before understanding the translation to code:



"First, strike out ("cross out") 1, because 1 is not a prime. Then, strike out all the even numbers except for 2 itself … because all even numbers greater than 2 divide by two, so cannot be prime. Next, find the smallest number which has not been struck out, i.e., 3.  Now strike out every multiple of 3 (except 3 itself), i.e. every third number. Moving on, we find that 4 has been eliminated, but 5 has not. So we strike out all multiples of 5 (except 5 itself), and move on again, repeating in this fashion. Continue until you have eliminated all the composite numbers.

The numbers that remain are the primes!  

You may have a grid that looks something like this:

The Sieve of Erathosthenes on the Number Square

This grid shows that 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 are prime.

You may have noticed that all composite numbers in the grid above (the crossed-out numbers) are divisible by 2, 3, 5 or 7. These are the primes which are less than the square root of 100, i.e. 10. In other words, it was only necessary to strike out the powers of 2, 3, 5 and 7 to complete the task for N=100. Can you see why this is? (Hint: The smallest composite number that is not divisible by 2, 3, 5 or 7 is 11 x 11 = 121).

Writing an algorithm

Erathosthenes' sieve works by repeating a set of simple steps: it is one of the very first examples of an algorithm. Computers are excellent at implementing algorithms, as they work extremely fast and do not seem to get bored! Let's consider how we could make the Sieve of Erathosthenes work in a computer code. First, we could write the steps in the algorithm out in words:

Set the size of the sieve, n

Make a list of all the numbers, 0 up to n. (We will use the numbers 0 to n, rather than 2 to n, so that the list index is equal to the number, for convenience).

Strike out the multiples of 2 (except 2 itself)

Find the next smallest number which is not struck out (i.e. p = 3)

Strike out the multiples of p (except p itself)

Repeat from step 4 until p is larger than the square root of n.

Build a list of primes from the remaining numbers.

Now, we can try coding these steps in Python.

Let's start with steps 1 to 3. We want to strike out the multiples of 2 by setting them to zero. But how can we keep track of whether a number has been struck out or not? One way is to use a list (a, say) with n+1 elements, representing the status of numbers 0 to n. We will strike out a number (i, say) by setting a[i] equal to 0. Let's try writing some lines of code to do this:

n = 10000 # Set the size of the sieve

a = [1]*(n+1) # list of length n+1, every element equal to 1 (not crossed out)

p = 2 # 2 is the first prime

j = 2*p # This is the first multiple of p

while j <= n: # a loop which continues until we exceed the size of the sieve

a[j] = 0 # "cross out" the multiple by setting a[j] equal to zero

j += p # move on to the next multiple of p

print a[:100] # check the output"




I downloaded the code for sieve1.py from the Sheffield site and amended it for a 100,000 range first, then 1M later:


# A simple Python code to implement the Sieve of Eratosthenes

def apply_sieve(n):

# n is the size of the sieve

# Key Idea:

# If a[i] == 0, then number i has been "crossed out",

# if a[i] == 1, then the number i is not (yet) crossed out.

a = [1]*(n+1) # Start with a list of 1s, of length (n+1).

a[0] = 0 # set to zero, as

a[1] = 0 # neither 0 nor 1 are primes

p = 2 # 2 is the first prime

pmax = int(round(n**0.5)) + 1 # we only need to sieve up to square root of n.

while p < pmax:

while a[p] == 0:

p += 1

j = 2*p

while j < n:

a[j] = 0

j += p

p += 1

return [p for p in range(n+1) if a[p] == 1] # return the list of primes, which are the numbers we have NOT crossed out.

N = 100000 # Look for primes in the first hundred thousand numbers.

primes = apply_sieve(N)

print "The first 100000 primes:"

print primes[:100000]


This spits out primes to 1,000,000, (1M!) using 2 nested "while" loops, in under 2 seconds real time! Superfast!

999863, 999883, 999907, 999917, 999931, 999953, 999959, 999961, 999979, 999983, 1000000]

real 0m1.658s
user 0m0.600s
sys 0m0.056s
stevee@Mint5630 ~/Documents/BashScripts $ time python sieve1.py

real 0m22.551s (AMD A8 to 10M)

real 3m46.300s (AMD A8 to 100M)

This seems to be the fastest algorithm for this problem by far. Equivalents for C and Java should be even faster.

The C and Java equivalents now have to be found/written, and their range extended to 1M also, to get a measurable difference in performance for each using this sieve method.


That Java code link does 1M in about 3 secs.


import java.util.Scanner;

class SieveOfEratosthenes {

public void primes(int n) {

boolean[] primes = new boolean[n + 1];

for (int i = 2; i < primes.length; i++) {

primes[i] = true;


int num = 2;

while (true) {

for (int i = 2;; i++) {

int multiple = num * i;

if (multiple > n) {


} else {

primes[multiple] = false;



boolean nextNumFound = false;

for (int i = num + 1; i < n + 1; i++) {

if (primes[i]) {

num = i;

nextNumFound = true;




if (!nextNumFound) {




for (int i = 0; i < primes.length; i++) {

if (primes[i]) {





public static void main(String[] args) {

Scanner scanner = new Scanner(System.in);

System.out.print("Enter an upper limit, say 10000: ");

int n = scanner.nextInt();

SieveOfEratosthenes sieve = new SieveOfEratosthenes();






real 0m8.424s
user 0m0.940s
sys 0m0.664s
stevee@Mint5630 ~/Documents/BashScripts $ time java SieveOfEratosthenes


Hmm…I just noticed the times indicated are completely wrong here – either user or real values are way out from what it actually took – about 3 secs…maybe the linux clock has gone bonkers…(no – the java prog makes the clock go wrong! I recompiled on the AMD A8 and it takes  2 secs to 1M.


real 0m12.550s
user 0m1.299s
sys 0m0.641s

So, a similar one for C anywhere? (That actually compiles?!! 7 different web scripts tried! Programmers not checking – AGAIN!! Jeez! How many times with you people…?)

Here's one working, amended for 100k range

(about 12 secs to 100,000)


/* Sieve Of Erathosthenes by Denis Sureau */

#include <stdlib.h>

#include <stdio.h>

void eratosthenes(int top)


int all[1000000];

int idx = 0;

int prime = 3;

int x, j;

printf("1 ");

while(prime <= top)


for(x = 0; x < top; x++)


if(all[x] == prime) goto skip;


printf("%d ", prime);

j = prime;

while(j <= (top / prime))


all[idx++] = prime * j;

j += 1;








int main(int argc, char **argv)


if(argc == 2) eratosthenes(atoi(argv[1]));

else eratosthenes(100000);

return 0;



99973 99989 99991

real 0m12.753s
user 0m12.648s
sys 0m0.072s
stevee@Mint5630 ~/Documents/BashScripts $ time ./SieveOfEratosthenes.sh

real 0m11.440s (AMD A8)

The last one is not in the same league as the prior Java and Python scripts as its range is only 100K, not 1M, and it takes 14 secs to do it. Slow.

And just for completeness, but a REALLY bad performance, a BASH script:


Invoke this script without any arguments to start off the whole chain; it invokes itself with a dummy argument to create each child process. The sequence of primes is sent, one to a line, to standard output. When you've seen enough, hit CTRL/C, and the whole chain of processes should shut itself gracefully down. Note the use of bc(1) to handle all the arithmetic; this is arguably a little optimistic. (yeah! Really! Steve…)

This optimised BASH below, does 1M range in 43 secs – the range is appended to the chmod 755 file: ./optsievebash.sh 1000000

If you want to time this one, you have to escape to dot char – the results for the 1M range BASH script below:

999907 999917 999931 999953 999959 999961 999979 999983
real 0m45.148s
user 0m43.196s
sys 0m0.292
stevee@Mint5630 ~/Documents/BashScripts $ time \./optsievebash.sh 999999

real 0m49.753s


# Optimized Sieve of Eratosthenes
# Script by Jared Martin, with very minor changes by ABS Guide author.
# Used in ABS Guide with permission (thanks!).

# Based on script in Advanced Bash Scripting Guide.
# https://tldp.org/LDP/abs/html/arrays.html#PRIMES0 (ex68.sh).

# https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf (reference)
# Check results against https://primes.utm.edu/lists/small/1000.txt

# Necessary but not sufficient would be, e.g.,
# (($(sieve 7919 | wc -w) == 1000)) && echo "7919 is the 1000th prime"

UPPER_LIMIT=${1:?"Need an upper limit of primes to search."}

Primes=( " $(seq ${UPPER_LIMIT}) )

typeset -i i t
Primes[i=1]=" # 1 is not a prime.
until (( ( i += 1 ) > (${UPPER_LIMIT}/i) )) # Need check only ith-way.
do # Why?
if ((${Primes[t=i*(i-1), i]}))
# Obscure, but instructive, use of arithmetic expansion in subscript.
until (( ( t += i ) > ${UPPER_LIMIT} ))
do Primes[t]=; done

# echo ${Primes[*]}
echo # Change to original script for pretty-printing (80-col. display).
printf "%8d" ${Primes[*]}
echo; echo

exit $?


here's another (slow) BASH script:

9871 9883 9887 9901 9907 9923 9929 9931 9941 9949 9967 9973

Total 1229 Primes generated
real 0m15.878s
user 0m2.164s
sys 0m0.712s
Mint5630 stevee # time \./sievebash.bash 0 9999



Amended printf %5 for %7 chars for larger range runs:

99871 99877 99881 99901 99907 99923 99929 99961 99971 99989 99991

Total 9592 Primes generated
real 4m1.031s

Conclusions? Not many…research and draw your own – too many factors involved for an accurate summation, but the main thing from the above examples, is what math algorithm/method is used primarily first to generate the numbers, then how that is scripted – whether while or for loops are used etc, then what idiosyncrasies hide in how the chosen language handles the code within it's own and the OS framework and what "nice" runtime level priorities the defaults for each are….etc..etc…a Computer Systems degree/masters  level investigation to know/do there! BASH is seemingly out of the running outside of academia for running "useful" programmes compared to dedicated languages.

Interesting though – imagine the possible optimisation potential that exists in programs millions of lines of code long – aircraft fly by wire systems etc? Mind boggling eh…?

Addend: I have since found 3 very similar sieve progs for python, java and C and run them to the number limits shown with the time taken for each – all almost the same:

python 10000019

real 0m22.628s
user 0m8.167s
sys 0m0.491s

java 10000019

real 0m22.362s
user 0m4.273s
sys 0m4.884s

C 10000019

real 0m22.664s
user 0m6.513s
sys 0m4.233s

/* sieve.c – It prompts the user to enter an integer N. It prints out
* It prints out all the primes up to N included.
* It uses a sieve method. As primes are found they are
* stored in an array PRIMES and used as possible factors
* for the next potential prime.

#include <stdio.h>

#define NPRIMES 1000
#define FALSE 0
#define TRUE 1
int main(void) {
int n;
int i,j;
int flag;
int primes[NPRIMES]; /*It will contain the primes smaller than n
*that we have already encountered*/
int level; /*1+Number of primes currently in PRIMES*/

printf("Enter value of N > ");
level = 0;

/*Main body*/
for(i=2;i<=n;i++) {
for(j = 0, flag = TRUE; j<level && flag; j++)
flag = (i%primes[j]);
if (flag) { /*I is a prime */
printf("%d\n", i);
if (level < NPRIMES)
primes[level++] = i;