Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured

Finding Prime Numbers

Submitted by on April 22, 2016 – 2:09 pm

I have no intention of competing with GIMPS. To me this is a fun scripting exercise that may produce useful results beyond this initial application. Our goal is to take a bunch of integers and pick out the prime numbers. There are a couple of simple ways to spot a prime number from shell. One is to use perl (or python, or java, etc.) with the awesome “/^1?$|^(11+?)\1+$/” regex (Abigail, 1998).

perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)+$/' [number]

Another way is to use the “factor” utility (part of the coreutils package) that decomposes integers into prime factors. You can wrap that command in a small script called “primefind”:
#!/bin/bash
if [ ! "" ] ; then exit 1 ; else n="" ; fi
if [ `BC_LINE_LENGTH=0 bc <<<"${n}" | factor | awk '{print NF}'` -eq 2 ] ; then echo "${n}" ; fi

Here are examples of using both methods to find all prime number between 1 and 2^4:
# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do echo $i | primefind $i; done
2
3
5
7
11
13

# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done
2
3
5
7
11
13

As far as performance is concerned, the second method is considerably (about three times) faster. The next step is to increase the range, but here we may run into our first unexpected result: a Perl bug with the regex iteration limit. The bug is known, but seems to re-appear in various Perl versions, including some of the latest:
for i in $(seq 67867333 67867967) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done

Complex regular subexpression recursion limit (32766) exceeded at -e line 1.

The Perl bug aside, this elegant regex approach will eventually grind to a halt as it tries every possible factorization as a pattern match. The regex makes n attempts to match a string of length n against each given pattern. And, as unary expansion, this gives us 2^(2n).

Consider the following shell script (geirha, 2008):

#!/bin/bash
n=$1
function sqrt() 
{
	local i=1 j=1 n=$1
	while (( i*i <= n ))
	do
		((j=i, i++))
	done
	echo "$j"
}

function isprime() 
{
    local i=3 n=$1 h=0
    (( n >= 2 )) || return
    (( n == 2 || n == 3 )) && return
    (( n % 2 == 0 )) && return 1
    h=$(( $(sqrt $n) + 1 ))
    while (( i < h ))
	do
        (( n % i == 0 )) && return 1
        (( i += 2 ))
    done
    return 0
}

if isprime $n ; then echo $n ; fi

While a bit more complex than the Perl regex, this approach is a lot faster for large integers. We can couple it with xargs to take advantage of multi-core CPUs:
time seq 67857333 67867967 | xargs -n1 -P`grep -c processor /proc/cpuinfo` /var/adm/bin/prime3.sh

...

real    0m20.389s
user    10m14.600s
sys     0m23.247s

[spoiler title=”Uncork your CPUs”]

To get the most out of your CPUs, you may want to disable automatic CPU frequency scaling (kondeman daemon):

for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;
do [ -f $i ] || continue; echo -n performance > $i; done

[/spoiler]

Going back to the factor utility, here’s a much faster implementation:

time seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | xargs -n100 -P`grep -c processor /proc/cpuinfo` factor | awk '{gsub(":","")}; NF==2{print $1}{}'

...

real    0m0.238s
user    0m0.460s
sys     0m1.035s

You may get even better time using parallel and passing multiple arguments per line (option -N). The advantage of parallel over xargs in this case is the ability to distribute load across multiple compute nodes on the network.
seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | parallel --gnu --will-cite -j10% -P10% -N256 factor | awk '{gsub(":","")}; NF==2{print $1}{}'

With very large numbers, seq becomes useless:
seq `BC_LINE_LENGTH=0 bc <<<"2^64"` `BC_LINE_LENGTH=0 bc <<<"2^110"`|more

18446744073709551616
18446744073709551616
18446744073709551618
18446744073709551620
18446744073709551620
18446744073709551620
...

Brace expansion will not work either:
echo {1..10}
1 2 3 4 5 6 7 8 9 10

echo {18446744073709551644..18446744073709551656}
{18446744073709551644..18446744073709551656}

Shell arithmetic is out of the question as well (and, contrary to popular belief, the answer to life, the universe and everything is not 42, but it’s close.):
i=18446744073709551656
echo $i
18446744073709551656

((i++))
echo $i
41

These are all unfortunate consequences of the range limitation of a 64-bit signed long int. To a point, the answer is to stick with bc, until you hit this limit:
BC_LINE_LENGTH=0 bc <<<10^10^10
Runtime error (func=(main), adr=14): exponent too large in raise

The dc doesn’t work either:
dc <<< '10 10000000000^pq'
Runtime error: exponent too large in raise
1

And we can’t use Bash exponentiation:
var=$((10**10**10))
echo $var
0

Don’t be angry with Unix: considering that the observable universe consists of about 1011 galaxies, 1021 stars, 1078 atoms, and 1088 photons, the 10^10^10 limit of Bash seems rather generous. Luckily, here is Hypercalc, either Perl or JavaScript version. The large-number limit of Hypercalc in Knuth up-arrow notation is 10↑↑(1010):
yum -y install perl-Time-HiRes ; mkdir -p /var/adm/bin ; wget -O /var/adm/bin/hypercalc.pl "http://mrob.com/pub/comp/hypercalc/hypercalc.txt" ; chmod 755 /var/adm/bin/hypercalc.pl ; ln -s /var/adm/bin/hypercalc.pl /usr/bin/hypercalc

But what to do with it is a subject for later.

According to Wikipedia, the largest known prime number is 274,207,281 − 1 found by GIMPS in January of 2016. This is still well within bc and factor limitations, so, theoretically at least, running the following command will eventually confirm this finding:

BC_LINE_LENGTH=0 bc -l <<< 2^74207281-1 | factor | awk '{gsub(":","")}; NF==2{print "prime"}{}'

However, how long would this take? Well, 274,207,281 − 1 contains 22,338,618 digits, which puts it between 10^10^7 and 10^10^8 in the following attempt to estimate approximate time required to run the command above:
for i in 1 2 3 4 5 6 7 8 9 ; do echo "------------------" ; echo "10^10^$i" ; time -p BC_LINE_LENGTH=0 bc -l <<< 10^10^$i | factor >/dev/null 2>&1 ; done

The time I got for each iteration on a 2.9GHz Xeon E5-2690 and 96GB of RAM is as follows:
x       seconds
10^10^1	0.01
10^10^2	0.01
10^10^3	0.01
10^10^4	0.03
10^10^5	1.11
10^10^6	101.66
10^10^7 yawn...
10^10^8 yawn...
10^10^9 yawn...

As you can see, I had no patience to get past 10^10^6. Still, based on those few data points and assuming exponential growth (a good assumption in this case), I got y = 1.151806818·10-8 e3.815837901 x , where x is 10^10^x and y is estimated time to execute “bc <<< 10^10^x | factor”. Using this equation, we can see about how long it would’ve taken my computer to confirm that 274,207,281 − 1 is in fact a prime number:
x       seconds
10^10^7	4589.45
10^10^8	208429.00
10^10^9	9465770.00

So, I would have needed somewhere between 1.3 hours to 2.4 days, but closer to hours. And that’s just to confirm a known prime number. As you may imagine, actually finding one in that range would require lots of CPUs or an equivalent amount of luck.

So what about the elusive 10^10^10 – how long would it take me to confirm a prime number in that range, when one is inevitably found? Well, the Wolfram Alpha I’ve been using to solve the above equation also fails at that point and probably for the very same reason. The aforementioned Hypercalc, however, has no such problem:

hypercalc "1.151806818*10^(-8)*e^(3.815837901*10)/60/60/24/365"
13.631588907193

So, just over thirteen-and-a-half years.

Print Friendly, PDF & Email

Leave a Reply