Computing prime numbers using parallel operations
A good method for determining whether a number is prime or not is Eratosthenes's sieve. For each number, we check whether it fits the bill for a prime (if it meets the criteria for a prime, it will filter through the sieve).
The series of tests are run on every number we check for prime. This is a great usage for parallel operations. Spark has the in-built ability to split up a task among the threads/machines available. The threads are configured through the SparkContext (we see that in every example).
In our case, we split up the workload among the available threads, each taking a set of numbers to check, and collect the results later on.
How to do it...
We can use a script like this:
import pyspark
if not 'sc' in globals():
sc = pyspark.SparkContext()
#check if a number is prime
def isprime(n):
# must be positive
n = abs(int(n))
# 2 or more
if n < 2:
return False
# 2 is the only even prime number
if...