You're reading from Julia Cookbook
In this chapter, you will learn about performing parallel computing and using it to handle big data. So, some concepts such as data movements, sharded arrays, and the Map-Reduce framework are important to know in order to handle large amounts of data by computing on it using parallelized CPUs. So, all the concepts discussed in this chapter will help you build good parallel computing and multiprocessing basics, including efficient data handling and code optimization.
Parallel computing is a way of dealing with data in a parallel way. This can be done by connecting multiple computers as a cluster and using their CPUs to carry out the computations.
This style of computation is used when handling large amounts of data and also while running complex algorithms over significantly large data. The computations are executed faster due to the availability of multiple CPUs running them in parallel as well as the direct availability of RAM to each of them.
Julia has in-built support for parallel computing and multiprocessing. So, these computations rarely require any external libraries for the task.
Julia can be started on your local computer using multiple cores of your CPU. So, we will now have multiple workers for the process. This is how you can fire up Julia in multi-processing mode in your terminal. This creates two worker process in the machine, which means it uses two CPU cores for the purpose...
In parallel computing, data movements are quite common and should be minimized due to the time and the network overhead as a result of the movements. In this recipe, we will see how that can be optimized to avoid latency as much as we can.
To get ready for this recipe, you need to have the Julia REPL started in multiprocessing mode. This is explained in the Getting ready section in the preceding recipe.
Firstly, we will see how to do a matrix computation using the
@spawn
macro, which helps in data movement. So, we construct a matrix of shape 200 x 200 and then try to square it using the@spawn
macro. This can be done as follows:mat = rand(200, 200) exec_mat = @spawn mat^2 fetch(exec_mat)
The preceding command gives the following output:
Now, we will look at an another way to achieve the same result. This time, we will use the
@spawn
macro directly instead of the initialization step. We will discuss the advantages and drawbacks of each method in the...
In this recipe, you will learn a bit about the famous Map-Reduce framework and why it is one of the most important ideas in the domains of big data and parallel computing. You will learn how to parallelize loops and use reducing functions on them through several CPUs and machines and you will further explore the concept of parallel computing, which you learned about in the previous recipes.
Just like the previous sections, Julia just needs to be running in multiprocessing mode to work through the following examples. This can be done through the instructions given in the first section.
Firstly, we will write a function that takes and adds n random bits. The writing of this function has nothing to do with multiprocessing. So, it has simple Julia functions and loops. This function can be written as follows:
Now, we will use the
@spawn
macro, which we learned about previously, to run thecount_heads()
function as separate processes...
Channels are like background plumbing for parallel computing in Julia. They are the reservoirs from which the individual processes access their data.
The requirements are similar to the previous sections. This is mostly a theoretical section, so you just need to run your experiments on your own. For that, you need to run your Julia REPL in a multiprocessing mode.
Channels are shared queues with a fixed length. They are common data reservoirs for the processes which are running.
The channels are like common data resources, which multiple readers or workers can access. They can access the data through the fetch()
function, which we already discussed in the previous sections.
The workers can also write to the channel through the put!()
function. This means that the workers can add more data to the resource, which can be accessed by all the workers running a particular computation.
Closing a channel after use is a good practice to avoid data corruption and unnecessary...