The Problems of Scale
At a certain point, the power of a single computer is insufficient to solve a problem, and one must seek to expand to use multiple computers as part of a cluster. Scaling up computation comes with a new set of problems that must be solved. Managing the work between each of the individual nodes is one part, and distributing data is another. Most importantly, one must manage the number of workers that can access a shared resource at a time or face suboptimal performance or worse.
In this chapter, we examine some of the technical challenges associated with scaling up our code to solve larger and more complex problems, and some of the technologies we need to overcome these challenges. We start with some of the mechanisms for controlling data access (at a thread level), which give us insight into the kind of mechanisms we need at even larger scales. Next, we see how to use the message passing interface (MPI) to coordinate data and work between many nodes in a...