## Finding Prime Numbers

I have no intention of competing with GIMPS. To me this is a fun scripting exercise that may produce useful results beyond this initial application. Our goal is to take a bunch of integers and pick out the prime numbers. There are a couple of simple ways to spot a prime number from shell. One is to use perl (or python, or java, etc.) with the awesome “/^1?$|^(11+?)\1+$/” regex (Abigail, 1998).

perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)+$/' [number]

Another way is to use the “factor” utility (part of the coreutils package) that decomposes integers into prime factors. You can wrap that command in a small script called “primefind”:

#!/bin/bash if [ ! "" ] ; then exit 1 ; else n="" ; fi if [ `BC_LINE_LENGTH=0 bc <<<"${n}" | factor | awk '{print NF}'` -eq 2 ] ; then echo "${n}" ; fi

Here are examples of using both methods to find all prime number between 1 and 2^4:

# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do echo $i | primefind $i; done 2 3 5 7 11 13 # for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done 2 3 5 7 11 13

As far as performance is concerned, the second method is considerably (about three times) faster. The next step is to increase the range, but here we may run into our first unexpected result: a Perl bug with the regex iteration limit. The bug is known, but seems to re-appear in various Perl versions, including some of the latest:

for i in $(seq 67867333 67867967) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done Complex regular subexpression recursion limit (32766) exceeded at -e line 1.

The Perl bug aside, this elegant regex approach will eventually grind to a halt as it tries every possible factorization as a pattern match. The regex makes

*n*attempts to match a string of length

*n*against each given pattern. And, as unary expansion, this gives us 2^(2n).

Consider the following shell script (geirha, 2008):

#!/bin/bash n=$1 function sqrt() { local i=1 j=1 n=$1 while (( i*i <= n )) do ((j=i, i++)) done echo "$j" } function isprime() { local i=3 n=$1 h=0 (( n >= 2 )) || return (( n == 2 || n == 3 )) && return (( n % 2 == 0 )) && return 1 h=$(( $(sqrt $n) + 1 )) while (( i < h )) do (( n % i == 0 )) && return 1 (( i += 2 )) done return 0 } if isprime $n ; then echo $n ; fi

While a bit more complex than the Perl regex, this approach is a lot faster for large integers. We can couple it with xargs to take advantage of multi-core CPUs:

time seq 67857333 67867967 | xargs -n1 -P`grep -c processor /proc/cpuinfo` /var/adm/bin/prime3.sh ... real 0m20.389s user 10m14.600s sys 0m23.247s

[spoiler title=”Uncork your CPUs”]

To get the most out of your CPUs, you may want to disable automatic CPU frequency scaling (*kondeman* daemon):

for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do [ -f $i ] || continue; echo -n performance > $i; done

[/spoiler]

Going back to the *factor* utility, here’s a much faster implementation:

time seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | xargs -n100 -P`grep -c processor /proc/cpuinfo` factor | awk '{gsub(":","")}; NF==2{print $1}{}' ... real 0m0.238s user 0m0.460s sys 0m1.035s

You may get even better time using

*parallel*and passing multiple arguments per line (option

*-N*). The advantage of

*parallel*over

*xargs*in this case is the ability to distribute load across multiple compute nodes on the network.

seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | parallel --gnu --will-cite -j10% -P10% -N256 factor | awk '{gsub(":","")}; NF==2{print $1}{}'

With very large numbers,

*seq*becomes useless:

seq `BC_LINE_LENGTH=0 bc <<<"2^64"` `BC_LINE_LENGTH=0 bc <<<"2^110"`|more 18446744073709551616 18446744073709551616 18446744073709551618 18446744073709551620 18446744073709551620 18446744073709551620 ...

Brace expansion will not work either:

echo {1..10} 1 2 3 4 5 6 7 8 9 10 echo {18446744073709551644..18446744073709551656} {18446744073709551644..18446744073709551656}

Shell arithmetic is out of the question as well (and, contrary to popular belief, the answer to life, the universe and everything is not 42, but it’s close.):

i=18446744073709551656 echo $i 18446744073709551656 ((i++)) echo $i 41

These are all unfortunate consequences of the range limitation of a 64-bit signed long int. To a point, the answer is to stick with

*bc*, until you hit this limit:

BC_LINE_LENGTH=0 bc <<<10^10^10 Runtime error (func=(main), adr=14): exponent too large in raise

The

*dc*doesn’t work either:

dc <<< '10 10000000000^pq' Runtime error: exponent too large in raise 1

And we can’t use Bash exponentiation:

var=$((10**10**10)) echo $var 0

Don’t be angry with Unix: considering that the observable universe consists of about 10

^{11}galaxies, 10

^{21}stars, 10

^{78}atoms, and 10

^{88}photons, the 10^10^10 limit of Bash seems rather generous. Luckily, here is Hypercalc, either Perl or JavaScript version. The large-number limit of Hypercalc in Knuth up-arrow notation is 10↑↑(10

^{10}):

yum -y install perl-Time-HiRes ; mkdir -p /var/adm/bin ; wget -O /var/adm/bin/hypercalc.pl "http://mrob.com/pub/comp/hypercalc/hypercalc.txt" ; chmod 755 /var/adm/bin/hypercalc.pl ; ln -s /var/adm/bin/hypercalc.pl /usr/bin/hypercalc

But what to do with it is a subject for later.

According to Wikipedia, the largest known prime number is 2^{74,207,281} − 1 found by GIMPS in January of 2016. This is still well within *bc* and *factor* limitations, so, theoretically at least, running the following command will eventually confirm this finding:

BC_LINE_LENGTH=0 bc -l <<< 2^74207281-1 | factor | awk '{gsub(":","")}; NF==2{print "prime"}{}'

However, how long would this take? Well, 2

^{74,207,281}− 1 contains 22,338,618 digits, which puts it between 10^10^7 and 10^10^8 in the following attempt to estimate approximate time required to run the command above:

for i in 1 2 3 4 5 6 7 8 9 ; do echo "------------------" ; echo "10^10^$i" ; time -p BC_LINE_LENGTH=0 bc -l <<< 10^10^$i | factor >/dev/null 2>&1 ; done

The time I got for each iteration on a 2.9GHz Xeon E5-2690 and 96GB of RAM is as follows:

x seconds 10^10^1 0.01 10^10^2 0.01 10^10^3 0.01 10^10^4 0.03 10^10^5 1.11 10^10^6 101.66 10^10^7 yawn... 10^10^8 yawn... 10^10^9 yawn...

As you can see, I had no patience to get past 10^10^6. Still, based on those few data points and assuming exponential growth (a good assumption in this case), I got y = 1.151806818·10

^{-8}e

^{3.815837901 x}, where x is 10^10^x and y is estimated time to execute “bc <<< 10^10^x | factor”. Using this equation, we can see about how long it would’ve taken my computer to confirm that 2

^{74,207,281}− 1 is in fact a prime number:

x seconds 10^10^7 4589.45 10^10^8 208429.00 10^10^9 9465770.00

So, I would have needed somewhere between 1.3 hours to 2.4 days, but closer to hours. And that’s just to confirm a known prime number. As you may imagine, actually finding one in that range would require lots of CPUs or an equivalent amount of luck.

So what about the elusive 10^10^10 – how long would it take me to confirm a prime number in that range, when one is inevitably found? Well, the Wolfram Alpha I’ve been using to solve the above equation also fails at that point and probably for the very same reason. The aforementioned Hypercalc, however, has no such problem:

hypercalc "1.151806818*10^(-8)*e^(3.815837901*10)/60/60/24/365" 13.631588907193

So, just over thirteen-and-a-half years.